When Your Senior Engineer Asks: "Can We Build This in Claude Code?"

Pieter Van Schalkwyk

CEO at XMPRO

This article originally appeared on XMPro CEO's Linkedin Blog, The Digital Engineer

A senior technical team member asked me last week if we could build something like XMPro MAGS using Claude Code. It's a smart question. Claude Code can orchestrate workflows, coordinate tasks, and manage complex operations. On the surface, it looks like it could replace a multi-agent runtime.

Then Anthropic published an article about NASA using Claude to plan the Mars Perseverance Rover's path across the Martian surface. For the first time, AI helped plan a route for a drive on another planet. Engineers at JPL used Claude to analyze overhead images, understand rover constraints, and plan waypoints. The system cut route planning time in half.

The question isn't whether Claude Code is capable. The question is whether session-based execution can do what production runtimes do. To answer this properly, you need to go back to first principles. What are the fundamental requirements that determine whether AI operates as a tool or as an autonomous runtime?

NASA's Mars rover achievement shows exactly when session-based AI works well. It also reveals the first principles that make runtime-based architecture essential for different operational contexts.

Two Different Execution Models

Session-based execution operates in discrete episodes. A human (or external script) initiates a session, provides a task, the AI executes that task, and the session ends. Between sessions, the AI is dormant. Think of it as a tool you pick up when needed, use for a specific purpose, then put down. Claude Code exemplifies this model: engineers start a session, Claude analyzes and plans, engineers validate the results, and the session concludes.

Runtime-based execution operates continuously, 24/7. The system starts when deployed and runs until deliberately shut down. AI agents continuously observe their environment, make decisions, and take actions automatically without human initiation of each task. Think of it as an operational system that never stops, like manufacturing equipment control or continuous quality monitoring. MAGS exemplifies this model: agents run persistent cycles, making thousands of decisions daily without human intervention for routine operations.

The distinction seems simple. The implications are profound. Seven first principles reveal why these aren't just different implementation choices, but fundamentally different architectures suited to fundamentally different operational needs.

First Principle 1: Temporal Continuity

The most fundamental question is simple: when does the AI operate?

Session-Based: NASA's Rover

NASA's rover drives once per Martian day. Each drive is a discrete event. Engineers initiate a Claude session for sol 1707, provide overhead images and constraints, iterate on the route plan, and validate the result. When the plan is ready, they convert it to rover commands. The session ends.

Between drives, Claude isn't running. It's not monitoring the rover, not analyzing telemetry, not planning future routes. It exists in a state of potential, waiting to be activated. This discrete execution model matches NASA's operational cadence perfectly. Rover drives are infrequent, carefully planned, and benefit from fresh human judgment.

While Claude Code can be wired into longer-running loops or background tasks via external schedulers, the fundamental model remains session-based: discrete, human-initiated executions rather than continuous autonomous operation.

Runtime-Based: Industrial Operations

Industrial equipment runs continuously. A bearing shows wear patterns at 3 AM on Sunday. Vibration increases at 11 PM on Friday. Quality measurements drift during the night shift. These situations don't wait for someone to initiate a monitoring session.

Runtime systems operate in continuous cycles. Every 15 to 60 seconds, agents observe their environment, reflect on observations, plan actions, execute decisions, and communicate with team members. This happens automatically, 24/7, without human initiation. The system starts when deployed and runs until deliberately shut down.

The Implication

Temporal continuity determines autonomy over timing. Session-based systems give humans control over when AI operates. Runtime-based systems give AI control over timing within defined parameters.

For NASA's rover: human control over timing is optimal. For industrial equipment: continuous operation is essential.

First Principle 2: Locus of Control

Who decides what tasks the AI executes?

Session-Based: Human Task Selection

In session-based systems, humans identify needs and initiate tasks. NASA engineers decided route planning was needed for sol 1707. They initiated Claude Code, provided specific parameters, and Claude executed that task. Claude didn't decide "route planning is the highest priority right now" or "I should also analyze rover health." Engineers controlled scope, timing, and objectives.

This human-controlled task selection is optimal for NASA's context. Engineers have broader mission context, understand competing priorities, and make strategic decisions about how to use limited rover time. Claude augments their decision-making without replacing their judgment about what needs to be done.

Runtime-Based: Autonomous Task Selection

In runtime systems, AI agents decide what tasks to execute based on observations and objectives. Humans set high-level objectives like "maximize equipment uptime while minimizing maintenance cost" and define constraints like "safety score must stay above 0.90." Within these bounds, agents autonomously decide what actions to take and when.

A maintenance agent continuously observes equipment, decides when analysis is needed, determines which diagnostics to run, evaluates maintenance options, and executes decisions without human initiation of each task. If vibration increases, the agent decides to run diagnostics. If diagnostics indicate bearing wear, the agent decides to schedule maintenance. If timing conflicts with production schedules, the agent coordinates with other agents.

The Implication

Locus of control determines who has agency over task selection. Session-based systems optimize for human judgment. Runtime systems optimize for autonomous response.

First Principle 3: Operational Scale

How many decisions does the system need to make?

Session-Based: Low Decision Volume

NASA's rover makes one drive decision per sol. Maybe two or three if conditions require mid-course adjustments. Over a year, that's 365 to 1,000 decisions. Each decision is high-stakes, novel, and benefits from human analysis and validation. Session-based execution handles this volume easily.

Runtime-Based: High Decision Volume

Industrial operations make thousands of decisions daily. Equipment monitoring checks run every minute across hundreds of assets. Quality measurements happen continuously on multiple production lines. Resource allocation adjusts throughout each shift. Maintenance scheduling coordinates across facilities. Production optimization responds to changing conditions.

A single plant might require 10,000 autonomous decisions per day. You cannot initiate Claude sessions for every vibration check, every quality measurement, every resource allocation decision. The volume alone makes it impractical. More fundamentally, these decisions happen when humans aren't available to initiate sessions.

The Implication

Operational scale determines whether human-triggered sessions are feasible. Low decision frequency works with session-based execution. High decision frequency requires autonomous runtime execution.

First Principle 4: State Persistence

What does the system remember across time?

Session-Based: Discrete Memory

Session-based systems reset between tasks. Each Claude Code session starts fresh. You can maintain some context through external memory or project files, but the fundamental session-based nature remains. The AI operates when triggered, accumulates context during the session, then that session ends.

For NASA's rover, session-based memory works fine. Route planning doesn't require accumulated knowledge of previous drives. Each planning task is relatively independent. Engineers provide relevant context for each session.

Runtime-Based: Continuous Memory

Runtime systems maintain continuous state. Agents accumulate observations over time, build models of equipment behavior, learn from outcomes, and improve decision-making. Every cycle adds to persistent memory. The system doesn't forget what it learned yesterday when a new decision is needed today.

For industrial operations, continuous memory is essential. Equipment failure patterns emerge over weeks. Quality trends develop over shifts. Maintenance effectiveness is measured over months. Agents need persistent memory of what they've observed, what actions they've taken, and what outcomes resulted.

The Implication

State persistence determines whether the system can learn from experience over time or must be taught anew each session.

First Principle 5: Decision Authority and Execution

Can the AI act independently or only recommend?

Session-Based: Advisory Authority

Session-based systems typically have recommendation authority but not execution authority. They suggest actions, humans decide whether to execute. NASA's use of Claude exemplifies this: Claude planned waypoints, engineers reviewed the plan, made adjustments based on ground-level camera images, ran simulations with 500,000 variables, and then decided to transmit commands to Mars. Claude had zero execution authority.

This human-in-the-loop validation is optimal for NASA's context. The stakes are enormous (a stuck rover ends the mission), the environment is unforgiving (no second chances on Mars), and human expertise adds critical value. Every decision deserves human validation.

The limitation is scalability. Human validation works when decisions are infrequent (one route per sol) and stakes are high (mission-critical). It doesn't scale to thousands of decisions daily.

Runtime-Based: Bounded Execution Authority

Runtime systems have bounded execution authority. They can make and execute decisions independently within defined constraints, escalating to humans only when approaching boundaries.

A MAGS agent monitoring equipment temperature can autonomously adjust setpoints within safe ranges (180-190°C). If temperature is 185°C and trending up, the agent decides "reduce to 183°C" and executes that action via XMPro DataStream. No human approval required. The decision is logged with reasoning, confidence score, and outcome, but execution is immediate and autonomous.

However, if temperature approaches critical thresholds (195°C, near the 200°C safety limit), the agent escalates: "Temperature approaching critical limit, recommend immediate shutdown, awaiting approval." Bounded autonomy means acting independently within safe ranges, requiring approval for critical decisions.

This bounded execution authority is essential for industrial operations with high decision volume. Equipment adjustments, resource allocations, quality checks, and routine optimizations happen thousands of times daily. Human approval for each is impractical.

The Implication

Decision authority determines operational autonomy. Session-based systems position AI as advisory, optimal when every decision deserves human validation. Runtime systems position AI as operational, essential when decision volume requires autonomous execution within governance.

First Principle 6: Coordination and Multi-Agent Collaboration

How do multiple AI instances work together?

Session-Based: Human-Coordinated Tasks

Session-based systems typically center on one AI instance per session, with humans or external scripts coordinating across sessions when multiple analyses are needed. In NASA's Mars route-planning workflow with Claude Code, engineers used a bounded planning process where Claude generated waypoints and commands, and humans validated them in simulation before execution.

When additional analyses are required (route planning, power checks, science target evaluation), they are run as separate tasks or sessions, with engineers integrating the results. While multiple Claude Code sessions can run in parallel (multiple terminals or CLI sessions), the coordination between them requires human oversight or custom orchestration code.

This human-coordinated approach works well when the number of concurrent tasks is manageable, human judgment adds mission value, and strict real-time coordination between AI processes is not required.

The limitation is that, by default, AI instances do not maintain an autonomous coordination or consensus layer. If one analysis recommends one path and another suggests a different option, humans or custom orchestration code must reconcile the conflict. There is no built-in AI-to-AI negotiation or distributed consensus mechanism. Any such behavior must be explicitly engineered on top of the session model.

Runtime-Based: Autonomous Multi-Agent Coordination

Runtime systems support true multi-agent coordination where independent agents work collaboratively. Multiple agents run simultaneously, communicate asynchronously through message brokers, negotiate conflicts, and reach consensus without human coordination.

Consider a manufacturing scenario: Process Engineer Agent detects efficiency drop and proposes increasing throughput. Maintenance Agent detects equipment wear and proposes reducing load. Safety Agent monitors both and enforces safety constraints. Their objectives conflict.

In MAGS, agents initiate distributed consensus. Each proposes a plan independently, the system detects resource conflicts and objective function disagreements, agents iteratively adjust plans across multiple rounds (collaborative iteration), and consensus is reached on a coordinated solution that balances all objectives. This happens automatically, through distributed consensus protocols implemented via asynchronous MQTT messaging.

This multi-agent coordination is essential for complex industrial operations where multiple objectives must be balanced, resources are shared, and decisions are interdependent. Human coordination of multiple AI sessions doesn't scale to dozens of agents making thousands of coordinated decisions daily.

The Implication

Coordination model determines complexity handling. Session-based systems rely on human coordination or external orchestration between AI instances, optimal for scenarios with manageable concurrent tasks. Runtime systems enable autonomous AI-to-AI coordination built into the architecture, essential for complex scenarios with many interdependent decisions requiring real-time consensus.

First Principle 7: Optimization Approach

How does the AI evaluate options and make selections?

Session-Based: Heuristic LLM Reasoning

Session-based systems typically use LLM reasoning for optimization: heuristic, qualitative evaluation based on the model's training and provided context. Claude's route planning used LLM reasoning, analyzing images, identifying obstacles, evaluating path options, and selecting routes based on learned patterns and provided constraints.

This heuristic optimization is powerful and flexible. Claude can handle novel terrain, adapt to unexpected obstacles, and apply general reasoning to specific situations. The engineers noted that Claude's plans "held up well" with only minor adjustments needed based on ground-level camera images.

The characteristics of LLM-based optimization include non-determinism (running the same route planning task multiple times might produce different routes due to LLM sampling) and qualitative evaluation (routes that "look good" based on reasoning rather than mathematical proof of optimality). For NASA's use case, this is entirely appropriate: human validation catches issues, and "reasonable" routes are sufficient.

Runtime-Based: Mathematical Optimization

Runtime systems use mathematical optimization through utility and objective functions: deterministic, quantitative evaluation that produces consistent results.

A MAGS agent evaluating maintenance options calculates utility for each:

Option A (defer 1 week): Utility = 0.65 (risk increasing, cost savings moderate)
Option B (maintain now): Utility = 0.82 (risk minimized, cost acceptable)
Option C (emergency maintenance): Utility = 0.58 (risk eliminated but cost excessive)

The agent selects Option B because it mathematically maximizes the objective function (0.4×uptime + 0.3×cost + 0.3×safety). This decision is deterministic. The same situation always produces the same decision. It's mathematically optimal within the defined objective function. And it's explainable: the agent can show exactly why Option B scored highest.

MAGS also incorporates LLM reasoning for natural language understanding and complex analysis (approximately 10% of the codebase), but uses mathematical optimization for decision evaluation to ensure consistency and auditability.

This mathematical optimization is essential for industrial operations requiring consistency, auditability, and precision. When optimizing production schedules, resource allocations, or quality parameters, deterministic mathematical optimization provides repeatability and explainability that heuristic LLM reasoning alone cannot match.

The Implication

Optimization approach determines precision and consistency. Session-based LLM reasoning excels at novel situations requiring flexible judgment, optimal when variability is acceptable and human validation is available. Runtime mathematical optimization excels at repeatable decisions requiring precision, essential when consistency and explainable optimality are required.

What These Principles Reveal

The seven first principles lead to precise architectural definitions that capture the fundamental differences:

Claude Code is a session-based, human-directed agentic workflow environment where AI operates in discrete, human-initiated sessions, executes specific tasks via configurable tools and workflows, and produces outputs for human review, validation, and follow-up action.

MAGS is a continuous, goal-directed agentic runtime where autonomous AI agents collaborate in teams to make and execute operational decisions 24/7 within clearly bounded autonomy, combining mathematical optimization with generative AI reasoning.

These aren't marketing distinctions. They're architectural properties that emerge directly from the first principles. Gartner's recent framework for Agentic Orchestration Platforms validates this distinction, specifically identifying "Agentic Runtime Fabric" as the execution engine that "carries out plans with speed, accuracy, and financial discipline" - fundamentally different from control planes where work is merely defined.

Article content — XMPro MAGS Team Running Safe Operations in Process Plant

Claude Code optimizes for:

Temporal discreteness (sessions triggered by humans)
Human control over task selection
Low decision volume (tens of decisions requiring deep analysis)
Session-based memory (fresh context each time)
Advisory authority (humans validate and execute)
Human or external orchestration of tasks (coordination requires custom engineering)
Heuristic LLM reasoning (flexible, qualitative evaluation)

MAGS optimizes for:

Temporal continuity (always-on operation)
Autonomous control within bounds (agents decide what needs attention)
High decision volume (thousands of routine decisions daily)
Persistent memory (continuous learning from outcomes)
Bounded execution authority (autonomous action within governance)
Built-in multi-agent coordination (distributed consensus without external orchestration)
Mathematical optimization (deterministic, quantitative evaluation)

The question my team member asked was harder to answer than it first appeared. Claude Code is remarkably capable. But capability isn't the same as architectural fit.

These different optimizations are reflected in how each system is built. XMPro MAGS is 90% business process and logic runtime and 10% LLM text processing. The majority of the codebase handles memory management, consensus protocols, planning algorithms, self-healing, data persistence, and observability. LLMs provide text generation and response parsing.

Claude Code inverts this ratio. It's primarily an LLM reasoning engine with workflow orchestration built on top. You can write custom skills to add business logic, but the core architecture is session-based LLM execution.

This isn't a criticism. It's a design choice that fits Claude Code's purpose: a development tool for complex, human-guided tasks that benefit from sophisticated reasoning.

But it means Claude Code can't replace what production runtimes do. Not because it lacks intelligence, but because the first principles are different. Session-based execution cannot provide temporal continuity, autonomous task selection, operational scale, persistent state accumulation, bounded execution authority, built-in multi-agent coordination, or mathematical optimization that industrial operations require.

The Right Integration Pattern

The first principles analysis points to the right architecture. During our team discussion, someone suggested MAGS agents could call Claude Code as a tool or "contractor" when they encounter situations requiring complex analysis.

This is exactly how XMPro MAGS works in practice. MAGS agents can use session-based agents as "contractors" for specific tasks that benefit from sophisticated LLM reasoning. The runtime system maintains continuous operation and orchestrates the overall mission, while session-based agents provide specialized intelligence on demand.

For NASA, this could mean a MAGS-based mission team with clear objectives (optimize rover positioning for science targets while maintaining power budget and safety margins). When the team needs route planning, a MAGS agent initiates a Claude Code session, provides mission context and constraints, receives the planned route, validates it against team objectives, and incorporates it into the coordinated mission plan. The runtime team maintains continuous operation. The session-based agent provides specialized planning intelligence.

This respects the first principles of both systems:

MAGS provides:

Temporal continuity (always-on operation)
Autonomous task selection (agents decide what needs attention)
Operational scale (thousands of decisions daily)
Persistent state (continuous memory accumulation)
Bounded execution authority (autonomous action within governance)
Built-in multi-agent coordination (distributed consensus protocols)
Mathematical optimization (deterministic objective functions)

Claude Code provides:

Sophisticated reasoning for complex situations
Deep analysis that benefits from flexible LLM capabilities
Strategic planning requiring novel problem-solving
Human-AI collaboration on high-stakes decisions

The MAGS runtime maintains control and enforces governance frameworks, coordinates multi-agent consensus, ensures bounded autonomy, and provides audit trails. Claude Code provides intelligence for specific situations that require more sophisticated reasoning than routine operations need.

In the NASA mission team example: MAGS agents handle continuous operations (power management, thermal control, health monitoring). When complex route planning is needed, they contract Claude Code sessions for specialized analysis. The runtime orchestrates the mission. The session-based agent provides route planning intelligence.

The runtime handles what needs to happen continuously. The session-based system handles what needs sophisticated reasoning.

Governance and Security

The runtime approach provides advantages beyond operational continuity. This becomes clear when you examine how each architecture enforces boundaries.

File-based systems (session-based architecture):

Expose attack surfaces through file manipulation
Allow arbitrary command execution
Store credentials in configuration files
Rely on developers to maintain boundaries (honor system)
Enable attackers to modify agent behavior or inject malicious code

Runtime systems (database-driven architecture):

Eliminate file system access entirely
Store all configuration in databases accessed through authenticated APIs
Validate all agent actions against policy constraints at execution time
Require authorization for changes with complete audit trails
Use atomic database transactions to enforce boundaries

This matters in industrial operations. Equipment damage, safety incidents, environmental releases, production losses. These consequences are physical. You can't roll them back like you can digital state.

Runtime validation ensures agents can't bypass governance frameworks, even accidentally. The architecture enforces boundaries rather than relying on files to maintain them.

Understanding the Distinction

The question my team member asked was harder to answer than it first appeared. It's easy to look at capable AI tools and assume they can replace specialized infrastructure. The capabilities look similar from the outside.

But first principles matter. Temporal continuity, locus of control, operational scale, state persistence, decision authority, coordination models, and optimization approaches. These aren't features you can add to session-based systems. They're architectural properties that emerge from how the system is designed to operate.

NASA's achievement with Claude Code is the right tool for the right job. Discrete route planning tasks that benefit from sophisticated reasoning and human validation. Session-based execution at its best.

Industrial operations running 24/7 with thousands of autonomous decisions daily require runtime execution. Not because session-based systems aren't intelligent enough, but because the first principles are fundamentally different.

What This Means

If you're evaluating AI for industrial operations, start with first principles. Ask:

When does the AI need to operate? Continuously or in discrete sessions?
Who decides what tasks to execute? Humans or autonomous agents within bounds?
How many decisions does the system make? Tens or thousands daily?
What state needs to persist? Session-based context or continuous memory?
Can the AI execute independently? Advisory recommendations or bounded autonomous action?
How do multiple AI instances coordinate? Human orchestration or autonomous consensus?
How are options evaluated? Heuristic LLM reasoning or mathematical optimization?

The answers determine which architecture fits your operational reality.

Session-based AI works well for complex, infrequent decisions that benefit from human collaboration and sophisticated reasoning. Runtime-based AI is essential for continuous operations requiring thousands of autonomous decisions within defined boundaries.

The future isn't session-based or runtime-based. It's both, each applied where first principles indicate it should be. Runtime systems provide the foundation for autonomous operations. Session-based systems provide deep intelligence for complex situations.

Together, they provide the complete solution.

My team member's question forced me to think through the first principles that distinguish these architectures. NASA's Mars rover achievement shows when session-based execution works well. It also shows why runtime execution is essential for different operational contexts.

Understanding what each approach does well, and why, helps us use both more effectively. That's what first principles thinking provides: clarity about which tool fits which job, based on operational requirements rather than surface capabilities.

If you're building industrial AI systems, I'd be interested in your perspective. What operational challenges are you facing that require continuous autonomous operation versus discrete intelligent assistance?

Pieter van Schalkwyk is the CEO of XMPro, specializing in industrial AI agent orchestration and governance. XMPro MAGS with APEX provides cognitive architecture and DecisionGraph capabilities for agent networks operating on existing industrial systems.

Our GitHub Repo has more technical information. You can also contact myself or Gavin Green for more information.

Read more on MAGS at The Digital Engineer

Operate At Full Potential

Modules

Technology

Overview

Types

Industry

Type

Learn More About XMPro

Industry

Latest News

Getting Started

Templates

Blog

Search Blog Articles & Latest News

Get practical insights on AI, Agentic Systems & Digital Twins for industrial operations

Join Our Newsletter

Pieter Van Schalkwyk

CEO at XMPRO

Two Different Execution Models

First Principle 1: Temporal Continuity

Session-Based: NASA's Rover

Runtime-Based: Industrial Operations

The Implication

First Principle 2: Locus of Control

Session-Based: Human Task Selection

Runtime-Based: Autonomous Task Selection

The Implication

First Principle 3: Operational Scale

Session-Based: Low Decision Volume

Runtime-Based: High Decision Volume

The Implication

First Principle 4: State Persistence

Session-Based: Discrete Memory

Runtime-Based: Continuous Memory

The Implication

First Principle 5: Decision Authority and Execution

Can the AI act independently or only recommend?

Session-Based: Advisory Authority

Runtime-Based: Bounded Execution Authority

The Implication

First Principle 6: Coordination and Multi-Agent Collaboration

Session-Based: Human-Coordinated Tasks

Runtime-Based: Autonomous Multi-Agent Coordination

The Implication

First Principle 7: Optimization Approach

Session-Based: Heuristic LLM Reasoning

Runtime-Based: Mathematical Optimization

The Implication

What These Principles Reveal

The Right Integration Pattern

Governance and Security

Understanding the Distinction

What This Means

Validated Design | NativeEdge Blueprint | DRD

XMPro is NVIDIA Cloud Validated.