The Year Agentic Operations Got Real: 2025 Reflections and What 2026 Demands

Pieter Van Schalkwyk

CEO at XMPRO

This article originally appeared on XMPro CEO's Linkedin Blog, The Digital Engineer

At the beginning of this year, I made an observation that irritated some people. I suggested that most predictions about AI agents were "professionally curated hallucinations with a dash of wishful thinking."

Twelve months later, I stand by that assessment. But something more interesting happened than predictions being wrong. The entire problem reframed.

We started 2025 thinking the challenge was building smarter agents. We ended 2025 realizing the challenge is building trustworthy operational infrastructure. That's not a small adjustment. That's a fundamental rethinking of what we're actually building.

The Reframe Nobody Expected

Niels Erik Andersen at LNS Research published a finding this year that changed how I think about adoption: ninety-six percent of organizations want AI agents, but sixty-seven percent refuse to give them full control.

Most people read that as a change management problem. Hesitant organizations that need convincing. Training programs and pilot projects to build confidence.

I read it differently. Sixty-seven percent of organizations are right to refuse full control. Not because they're behind the curve, but because they understand something the AI industry hasn't fully grasped: trust in industrial operations isn't a sentiment to be managed. It's an architectural property to be built.

I've spent this year watching the gap between what AI vendors promise and what operations teams will accept. The vendors talk about model capabilities, reasoning benchmarks, and autonomous potential. The operations teams ask one question: "How do I know it won't make things worse?"

That question isn't resistance to overcome. It's the design requirement nobody's addressing.

The Inverted Flywheel

Jaya Gupta and Animesh Koratana described an elegant flywheel for AI agents: deploy agents, agents solve problems, their reasoning becomes decision traces, traces accumulate into context graphs, richer context makes agents smarter, deploy more agents.

For enterprise software, this works. Deploy a sales agent that handles discount approvals. If it makes mistakes, you lose some margin. Iterate. Improve. Scale.

Industrial operations breaks this flywheel entirely.

You don't iterate on a compressor failure. You don't "learn from" a safety incident and try again. The cost of agent mistakes isn't lost margin. It's destroyed equipment, environmental releases, injured workers.

This means the flywheel must run in reverse:

You don't deploy agents and earn trust through demonstrated performance
You earn trust first, through architectural transparency
Only then can you deploy agents and expand their authority as the trust compounds

Build the decision trace infrastructure. Prove that every agent recommendation can be audited. Demonstrate that constraints are enforced regardless of what agents suggest. Show operators exactly why an agent reached its conclusion. Then, once trust is architectural rather than aspirational, expand agent authority.

Enterprise AI earns trust through iteration. Industrial AI earns trust through transparency.

This isn't a nuance. It changes architecture, go-to-market strategy, implementation approach, and business model. Most AI vendors are building for the enterprise flywheel. Industrial operations needs the inverse.

The Industry Is Investing in the Wrong Ten Percent

Our experience building and deploying XMPro Multi Agent Generative Systems (MAGS) over the year and a half taught us something important: true multi-agent systems are approximately ninety percent business process intelligence and ten percent language model capability.

The ninety percent includes:

Coordination frameworks that enable agents to work together without conflicts
Governance structures that enforce boundaries and capture approvals
Human collaboration protocols that handle handoffs and escalations
Decision trace infrastructure that preserves reasoning for audit and learning
Semantic understanding of what operational data actually means

The ten percent is the language model. The part that generates text and reasons through prompts.

Now look at where the industry is investing. The venture capital, the research papers, the conference talks, the technical benchmarks. Almost all of it focuses on the ten percent. Better models. Larger context windows. More sophisticated reasoning. Improved benchmarks.

The industry is optimizing the part that matters least while ignoring the part that matters most.

BCG published research this year concluding that "the limiting factors for agents aren't LLMs, but legacy systems and processes." They're right, but I'd go further. The limiting factor isn't legacy systems. It's that we're not building the infrastructure to capture reasoning.

Our historians capture what happened. Our CMMS captures what work was done. Our control systems capture what actions were taken. Nothing captures why. The reasoning that connects observation to decision to action has never been treated as data worth preserving.

Experienced operators carry decades of decision traces in their heads. When they see a particular vibration pattern, they don't just recognize it. They remember the last three times they saw something similar, what they tried, what worked, what failed. They remember the shift supervisor who taught them that "when this pump sounds like that, you've got about six hours." They remember the exception they made in 2019 that turned out to be wrong.

That reasoning walks out the door every time someone retires. And we're spending billions on better language models instead of capturing it.

Trust Is Architectural, Not Persuasive

The standard approach to AI adoption treats trust as a change management challenge. Run pilots to demonstrate value. Train users on the new tools. Communicate success stories. Address concerns through engagement.

This works when the worst case is inefficiency. It fails completely when the worst case is catastrophic.

I watch operations teams in demos. They see an agent diagnose a complex equipment issue in seconds. They're impressed. Then they ask their question: "How do I know it won't make things worse?" And nobody has a good answer.

The honest answer is: you don't know. The agent might be right. It might be confidently wrong in ways you won't discover until something breaks. And the agent can't tell you which situation you're in.

You cannot persuade your way past this problem. You have to architect your way past it.

Building architectural trust requires three things:

Visible reasoning. Every recommendation comes with the evidence chain that produced it. Operators see what the agent observed, what patterns it recognized, what precedents it retrieved, what tradeoffs it weighed. Not a summary. The actual reasoning.
Separation of cognition and control. Let agents reason freely and recommend confidently. But route all actions through a control layer that enforces constraints regardless of what agents suggest. Dangerous actions get blocked. Period. The agent decides, but the system enforces.
Human overrides as learning signals. When an operator rejects an agent recommendation, that rejection contains information. What did the operator see that the agent missed? Capture that systematically and the system improves. Ignore it and you've added a veto without adding intelligence.

Trust becomes architectural when operators don't need to trust the agent. They trust the system that makes agent reasoning transparent and enforces boundaries regardless of what agents recommend.

A New Data Infrastructure Paradigm

Looking across everything I've written and thought about this year, I see a single thread connecting it all. We need infrastructure that captures reasoning, not just state.

Our current data infrastructure assumes the valuable thing is what happened:

Sensor readings at points in time
Transaction records
Event logs
State snapshots

Agentic operations reveals that the valuable thing is why it happened:

What observations triggered evaluation
What context was gathered
What rules were considered and where they conflicted
What exceptions were weighed and what precedent supported them
What approval was obtained and from whom
What outcome resulted and how it compared to prediction

Gupta and Ashu Garg call this a "context graph." I've started calling it a DecisionGraph rather than my original BrainGraph, because the name matters. It's not a knowledge graph (static relationships). It's not accumulated context. It's a graph of decisions with complete reasoning chains. It is part of the XMPro MAGS solution to enable cognitive agents to understand decision traces.

This isn't a feature to add to existing systems. It's a different way of thinking about what operational data infrastructure should capture. In the past, we've built infrastructure optimized for answering "what." We need infrastructure optimized for answering "why."

Zero-copy architecture matters here. You can't wait for ETL pipelines to move data into analytical systems before reasoning about it. The reasoning has to happen where the data lives, in real time. Semantic understanding matters too. Raw tag values don't carry meaning. Agents need to understand that PI-2047 at 347.2 in the context of current operating mode and recent history means something specific about equipment health.

The organizations that build this infrastructure will have something competitors can't easily replicate: accumulated operational intelligence that compounds over time. Every decision captured. Every outcome linked back to the reasoning that produced it. Every human override preserved as a learning signal. [contact me or Gavin Green if you want to understand how we do it]

What 2026 Demands

Three shifts will define industrial AI and Agentic Operations in 2026.

The measurement conversation changes. Organizations will stop accepting model accuracy and inference speed as evidence of value. They'll demand evidence that agents improve operational outcomes: uptime, throughput, quality, cost. Technology metrics become irrelevant. Business outcomes become everything.

I see elite customers like BHP approach this with the discipline it deserves. They don't ask "what can AI do?" They ask "which value bucket does this address?" If you can't name the operational improvement and quantify the impact, you don't get past the first conversation. This rigor will spread.

Governance becomes architecture. The gap between adoption and autonomy (ninety-six percent want agents, sixty-seven percent refuse full control) forces a resolution. The answer isn't convincing the sixty-seven percent to accept more risk. It's building governance into the platform so that autonomy can expand as trust is earned.

What I call Human Agency Controls will become standard. Not as features but as architectural assumptions. Organizations will progress through levels of autonomy on single platforms that maintain operational coherence:

HAS 1-2 (Decision Support): "Tell me what is happening" through real-time monitoring and alerting
HAS 3-4 (Decision Augmentation): "Advise me what to do" through AI-powered recommendations with human-in-loop decision making
HAS 5 (Decision Automation): "Do it for me autonomously" through multi-agent coordination with human-on-loop oversight for exceptions

The same system that captures manual decisions at low autonomy will learn from human-approved recommendations at medium autonomy and audit autonomous decisions at high autonomy.

Causal Agents emerge as a category. We already use Composite AI in production (Causal AI, Predictive AI, Generative AI, First Principles Models, Symbolic AI working together) in projects where MAGS agents run safe operations in control rooms. These are agents that understand not just patterns but mechanisms. Agents that reason about counterfactuals: what would have happened if we'd acted differently? Agents that test their conclusions against physical constraints before recommending action.

In 2026, the broader industry will start to understand how this works. As organizations see Causal Agents operating transparently, they'll gain trust and adopt them more extensively.

This is what separates genuine operational intelligence from sophisticated pattern matching. Judea Pearl's Ladder of Causation provides the framework: moving from association (seeing patterns) through intervention (understanding effects) to counterfactuals (reasoning about alternatives). The agents that operate across all three levels will outperform those stuck at pattern recognition.

The Divide Ahead

David Shapiro uses a framework I find clarifying. He maps transformative technologies through orders of consequence, from first-order (direct applications) through second-order (systems built on the capability) to third-order (the point of no return where society becomes dependent).

For electricity, the third-order consequence was radio and motors. After that point, there was no going back. The technology became infrastructure.

For AI, we're somewhere between second and third order. Chatbots were first-order. Agents are second-order. Networks of agents coordinating across organizational boundaries will be third-order. We're approaching the point of no return.

The competitive divide that's forming isn't about who has the smartest agents. It's about who builds the infrastructure that makes agent-driven operations trustworthy.

The organizations investing in decision trace infrastructure now will accumulate operational intelligence that compounds. Each day of operation adds more reasoning to query. Each human override improves the system. Each outcome validates or invalidates the patterns that produced it.

Those who wait for the technology to mature will find themselves trying to catch up against competitors who've been accumulating this intelligence for years. The gap won't close. It will widen.

This isn't about adopting new technology. It's about building organizational capabilities for a world where operational decisions happen at machine speed with human oversight rather than human control.

The time to start is now. Not because the technology is ready (it mostly is) but because the accumulated learning takes time. You can't buy three years of decision traces. You have to generate them through operation.

Those who understand this are already building. Those who don't will discover, too late, that their competitors' agents have learned things theirs never will. That will be 2026 for Agentic Operations in Industrial Enterprises.

References

External Sources

Michael Carroll , "2025: The Year of Agent Washing" — on Arthur Kordon's axioms for industrial AI https://www.linkedin.com/pulse/2025-year-agent-washing-michael-carroll-fsyze/

Jaya Gupta and Ashu Garg, "AI's trillion-dollar opportunity: Context graphs" — Foundation Capital https://www.linkedin.com/pulse/ais-trillion-dollar-opportunity-context-graphs-jaya-gupta-cobue/

Jaya Gupta and Animesh Koratana, "How do you build a context graph?" https://www.linkedin.com/pulse/how-do-you-build-context-graph-jaya-gupta-xicwe/

Niels Erik Andersen, "Autonomous Operations: AI with Guardrails" — LNS Research https://blog.lnsresearch.com/autonomous-operations-ai-with-guardrails

BCG AI Platforms Group, "Building Effective Enterprise Agents" https://www.bcg.com/assets/2025/building-effective-enterprise-agents.pdf

David Shapiro, "ChatGPT was just the 'Lightbulb' Moment for AI" https://youtu.be/btaLViZ_bOE

Judea Pearl and Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (Basic Books, 2018)

Operate At Full Potential

Modules

Technology

Overview

Types

Industry

Type

Learn More About XMPro

Industry

Latest News

Getting Started

Templates

Blog

Search Blog Articles & Latest News

Get practical insights on AI, Agentic Systems & Digital Twins for industrial operations

Join Our Newsletter