Perspective · AI Strategy

The Governed System of Action

Why enterprise AI value will come from systems of action, not unbounded autonomy.

Executive summary

There is a great deal of excitement today around agentic AI. Some of it is justified. Much of it is premature.

The mistake I see in many boardrooms and executive teams is simple. Leaders think the question is whether AI can act autonomously. That is not the real question. The real question is whether intelligence can be embedded into execution without losing control, traceability, and accountability.

That distinction matters because enterprises do not run on demos. They run on permissions, controls, approvals, exception handling, audit trails, and named responsibility. They run on systems that know who did what, when, on which data, under which authority. That is what serious business processes were built to protect.

The economics of AI are real. Databricks, citing IDC, reports organizations are seeing an average 3.7x return on generative AI investment, with top performers reaching 10.3x. But the World Economic Forum makes the more important point: AI does not scale on legacy operating models, and layering it onto linear workflows and static roles limits impact. Structural redesign is the real bottleneck.

That is why I do not believe the future belongs either to rigid old workflows or to free autonomous agents. It belongs to something more serious: governed AI systems, built on top of enduring systems of record, where intelligence helps execution but does not escape oversight.

That is also where I believe the real software frontier now sits.

Where this argument comes from

Part of my own career started in enterprise software and process architecture, long before AI became the fashionable answer to every business question. I worked in the Business Process Management (BPM) world — around BPM, BPMN, and BPEL, standards I helped design and implement — and in process orchestration, at a time when the ambition already felt important: make execution legible, reduce friction, connect business intent with operational reality, and give organizations a more reliable way to run.

That problem never disappeared.

What BPM was really trying to solve was not glamour. It was organizational entropy. Too many handoffs. Too many undocumented approvals. Too much logic buried in applications, emails, spreadsheets, and local habits. A business process, at its best, is a discipline against drift. It makes the path visible. It clarifies exceptions. It assigns ownership. It turns recurring work into something the organization can trust. Some of the largest processes we implemented orchestrated thousands of tasks across year-long deployments.

That history matters because it helps explain why the current AI debate is often framed badly. The issue is not whether AI is powerful. It is. The issue is what happens when we confuse apparent intelligence with operational reliability.

The argument that matters now

I do not think the strongest argument today is that agents are bad and workflows are good. That is too blunt, and no longer accurate enough.

The better argument is that unbounded autonomy is overhyped, while governed execution is underrated.

McKinsey's 2025 global survey shows how real the tension is. AI use has broadened significantly: 88% of respondents say their organizations regularly use AI in at least one business function. But only about one-third say they are scaling AI across the organization. Twenty-three percent say they are scaling an agentic AI system somewhere in the enterprise, while 39% say they have started experimenting with one. And when you look at financial impact, only 39% report any level of enterprise-wide EBIT effect — with most of those saying AI still accounts for less than 5% of EBIT. Gartner's June 2025 forecast is even more sobering: it predicts that more than 40% of agentic AI projects will be canceled by the end of 2027 because of escalating costs, unclear business value, or inadequate risk controls.

Those numbers do not support a simplistic anti-agent conclusion. But they do support a serious one. The market is moving toward AI systems that act. It is not yet proving that unconstrained autonomy is a durable enterprise model.

Systems of record are not going away

One reason this matters is that the software layer underneath most companies is not about to vanish.

A recent a16z essay on SAP is useful here because it says bluntly what many people in enterprise software have always known. Systems like SAP, Salesforce, and ServiceNow persist not because they are elegant, but because they encode the canonical data model of the business. They hold the tables, roles, approvals, posting logic, controls, workflows, and exception handling that make large organizations operable. They also carry years of customization, often poorly documented, which makes migration painful, expensive, and risky. a16z notes that moving off these systems can take years and cost hundreds of millions of dollars, and that the real opportunity now is not to rip them out, but to build a new interface, automation, and extension layer on top of them.

This is a crucial architectural point.

The future enterprise stack is not likely to be a clean replacement of systems of record. It is more likely to be a governance layer sitting above them — a system of action where intent is translated into safe, inspectable execution. The underlying systems remain the place where truth, controls, and transaction integrity live. The new layer becomes the place where judgment is applied and work actually gets done.

That is a much stronger thesis than "agents replace workflows." It is also much closer to what serious enterprise buyers actually need.

The real bottleneck is the operating model

The problem is not a lack of models. It is that most firms are still trying to add AI onto operating models designed for a pre-AI world.

The World Economic Forum's February 2026 piece says it clearly: layering AI onto linear workflows and static roles limits impact, and structural redesign is the real bottleneck. AI-first organizations redesign work around human-AI collaboration, outcome-driven workflows, clear data definitions, real-time evaluation, and adaptive systems rather than treating AI as a simple add-on. Databricks makes the same point from a different angle: firms that capture value do not merely deploy AI as a tool — they rethink the operating model around it.

This is exactly why the old "workflow versus agent" debate is too narrow.

The real issue is whether the operating model changes. If it does not, AI becomes another thin layer on top of broken handoffs and unclear ownership. If it does, the company begins to redesign how decisions are made, how actions are triggered, how exceptions are handled, and how value is measured.

Why autonomy alone breaks in practice

The reason I remain skeptical of naive agentic narratives is not philosophical first. It is operational first.

Real environments are messy. Data is incomplete. Naming is inconsistent. APIs fail. Permissions conflict. Teams work around systems. Documents arrive in bad formats. Exceptions multiply. Legacy logic surfaces at the worst moment. The happy path is not the job. The exceptions are the job.

This is why a workflow still matters. A workflow is not exciting, but it is explicit. It tells you what happens next, under what conditions, and where control sits. An agent, by contrast, is attractive precisely because it can improvise. That can be useful. But once the system begins selecting tools, changing route, or reasoning through ambiguity on its own, the burden of trust moves from process design to system governance.

In other words, the question is not whether the model can produce action. The question is whether the action can be bounded, inspected, interrupted, and attributed.

What the market is quietly admitting

One of the clearest signals today is that the vendors pushing agents hardest are also rebuilding governance around them.

Microsoft now presents Agent 365 as a control plane for agents, designed to help organizations observe, secure, and govern every agent across the enterprise, with telemetry, dashboards, and alerts. ServiceNow's AI Control Tower is framed similarly: a centralized command center to govern, manage, secure, and realize value from any AI agent, model, and workflow, while allowing organizations to assign human managers to oversee their work.

This is not a side detail. It is the signal.

If the market truly believed autonomy was enough, it would not be rebuilding observability, approval logic, control planes, management consoles, and human oversight around these systems. What the market is really saying is that AI can act, but not outside governance.

That is precisely the thesis serious enterprise architecture should now take seriously.

Governance is the system, not a feature of it

This is also where the regulatory direction is important.

For high-risk AI systems, the EU AI Act requires effective human oversight and automatic logging designed to ensure traceability. Article 14 says high-risk systems must be designed so natural persons can oversee them and prevent or minimize risks. Article 12 requires automatic recording of events over the lifetime of the system so operation can be traced and monitored appropriately.

The important lesson is broader than compliance.

"Human in the loop" is too vague to be useful. The real design question is where the intervention rights are. What actions require approval? What thresholds trigger escalation? What output can be overridden? What gets logged? Who is accountable for a reversal? How is an exception turned into a future guardrail?

That is not decorative governance added at the end of the project. That is the architecture.

Hallucinations: a model problem or a system problem?

The same principle applies to model reliability.

The concern around hallucinations is entirely legitimate. OpenAI's April 2025 system card showed that on PersonQA, o3 had a hallucination rate of 0.33, o4-mini 0.48, and o1 0.16. Those are serious numbers, and they justify caution in high-stakes settings.

But the conclusion should be more careful than many critics make it. OpenAI's later ChatGPT agent system card, in a browsing-enabled setup, reported lower hallucination rates: 0.079 on SimpleQA and 0.043 on PersonQA, versus 0.046 and 0.024 respectively for o3 with browsing. That does not remove the problem. It does show that model behavior changes materially when architecture changes — grounding, retrieval, tooling, and evaluation all matter.

So the right enterprise conclusion is not that reasoning models are inherently unusable, nor that browsing magically solves hallucinations. It is that reliability is a system property, not just a model property.

That is why I keep returning to the same point: what matters is not cleverness in isolation. It is governed execution.

The commercial logic is changing too

This shift is not only technical. It is economic.

BCG argues that AI agents are accelerating the move away from traditional seat-based software pricing toward models tied more closely to usage and delivered value. That matters because once software begins to act rather than merely inform, the monetizable unit changes. You are no longer paying only for access. You are paying for actions completed, delays avoided, outcomes produced, and risks reduced.

This has a direct implication for the thesis of this paper. In an AI-native software world, governance stops looking like overhead. It becomes part of the product itself. Trust, auditability, policy enforcement, and controlled execution are not merely compliance layers. They are part of what the buyer is paying for.

That is one more reason the winning layer is unlikely to be a free-floating agent. It is far more likely to be a governed operating layer that makes action safe enough to be valuable.

Why this matters to me

This matters to me because I have spent enough time around enterprise systems to know that organizations do not fail because they lack ideas. They fail because they do not control execution well enough.

A fluent interface can hide fragility. A convincing answer can hide weak evidence. A system that looks autonomous can still fail badly when it hits the wrong data, the wrong threshold, the wrong exception, or the wrong incentive.

That is why I care less about whether a system appears intelligent than whether it is inspectable, reproducible, and governable.

It is also why I find the current agentic narrative both exciting and incomplete. The promise is real. The architecture is often not.

Why we're building Vela Intelligence

In high-stakes environments, the problem is rarely the absence of information. It is the weakness of the decision foundation built on top of too much unstructured information: filings, contracts, diligence reports, disclosures, annexes, side letters, board packs, memos, and fragmented evidence spread across formats and teams.

Putting a loosely governed agent on top of that does not solve the problem. It can make it worse.

Vela Intelligence is therefore not built as another AI layer designed to generate plausible commentary around documents. It is built as decision intelligence infrastructure for environments where evidence matters, where conclusions must be linked back to source material, where outputs need to be inspectable, and where human judgment must be strengthened rather than theatrically bypassed.

The ambition is not maximum autonomy. It is trustworthy intelligence inside a governed system of action.

Final thought

I do not think the future belongs to organizations that let AI improvise its way through critical processes.

I also do not think it belongs to organizations that cling to rigid workflows as if intelligence changes nothing.

The future belongs to those that understand the difference between intelligence and control, and refuse to sacrifice one for the other.

Systems of record will stay. What will change is the layer above them. The real software frontier is the governed system of action: the place where intent becomes execution, where agents and workflows are combined rather than opposed, where approvals and permissions remain intact, where evidence and outputs remain linked, and where humans stay responsible for consequences.

And I believe it is where the next durable enterprise value will be created.

Notes and sources

McKinsey & Company. The State of AI in 2025: Agents, Innovation, and Transformation. Used for AI adoption breadth, scaling levels, agentic deployment, and reported EBIT impact.

Gartner. Press release, 25 June 2025, Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027. Used for the cancellation forecast tied to cost, weak business value, and inadequate risk controls.

Databricks. AI Transformation: A Complete Strategy Guide for 2025. Used for the IDC-cited 3.7x average ROI and 10.3x top-performer ROI, and the transformation-not-tooling framing.

World Economic Forum. How AI-first operating models unlock scalable value, 12 February 2026. Used for the argument that layering AI onto linear workflows and static roles limits impact, and that structural redesign is the bottleneck.

Andreessen Horowitz. Why the World Still Runs on SAP, 16 March 2026. Used for the persistence of systems of record, canonical business data models, switching costs, custom process logic, and the rise of the interface/automation/extension layer as the new frontier.

Microsoft. Agent 365 product page and Microsoft 365 blog, November 2025. Used for the control-plane framing and the emphasis on observability, governance, and security for enterprise agents.

ServiceNow. AI Control Tower product and launch materials, May 2025. Used for the centralized governance, management, security, and human-manager framing around agents, models, and workflows.

European Commission AI Act Service Desk. Reference materials on Articles 12 and 14 of the EU AI Act. Used for the requirements around automatic logging, traceability, and effective human oversight for high-risk AI systems.

OpenAI. o3 and o4-mini System Card, April 2025. Used for the PersonQA and SimpleQA hallucination benchmarks showing that stronger reasoning does not automatically eliminate factual unreliability.

OpenAI. ChatGPT Agent System Card, July 2025. Used for the browsing-enabled hallucination results showing that system design materially affects reliability outcomes.

Boston Consulting Group. Rethinking B2B Software Pricing in the Agentic AI Era, August 2025. Used for the shift away from seat-based pricing toward usage- and value-linked models in agentic software.

Vela Intelligence builds decision intelligence infrastructure for regulated, high-stakes environments. We transform fragmented, unstructured evidence into inspectable, decision-ready insights — with traceability, governance, and human judgment at the core. For strategic conversations, contact contact@velaintelligence.com.

Want the short version?

A three-minute distillation of the argument, for sharing.