DMAIC Is Not a Governance Framework: Why Process Engineering Tools Fail AI Design

The conclusion is simple: organisations using DMAIC to govern AI are not being rigorous. They are being rigorously wrong.

Apr 16, 2026

Simon leads HR Technology Strategy at BHP - one of the world's most complex enterprises - where he sits at the intersection of enterprise transformation and workforce technology. He is a Fellow and Certified Practitioner HR of AHRI (and council member), holds an MBA, and serves on executive advisory boards at SAP and ServiceNow. He writes to close the gap between strategic intent and what HR technology can actually do today.

When a CHRO or CPO is asked to “put a governance framework around AI,” the instinct is to reach for familiar tools: Six Sigma, DMAIC, lean process management. These tools have worked. They have delivered quality, reduced defects, and disciplined entire industries. The instinct is reasonable.

The instinct is also a category error.

DMAIC was built for systems that are knowable, repeatable, and stable. AI systems are none of these things. Applying DMAIC to AI governance does not produce rigour - it produces the appearance of rigour while leaving actual risk unaddressed. Understanding why requires a short excursion into philosophy - Ryle, Hume, Aristotle, and Mill - because the problems with DMAIC are not primarily technical. They are epistemological.

The Ghost in the Machine - and the Machine in the Machine

Gilbert Ryle coined “the ghost in the machine” in The Concept of Mind (1949) to name Descartes’ category mistake: treating the mind as a separate substance inhabiting the body, like a ghost in a house. Ryle’s point was not that the mind doesn’t exist; it was that questions about mind cannot be answered using the conceptual vocabulary appropriate to mechanisms.

The same error structure applies here.

DMAIC - Define, Measure, Analyse, Improve, Control - was developed for mechanical problems. A bolt must be 2cm. Variation from that specification is a defect. Improvement means reducing that variation. The conceptual vocabulary of DMAIC is built entirely around this assumption: processes are knowable, bounded, and stable.

An AI system is not a mechanism in this sense. It is a probabilistic model producing outputs shaped by context, prompt construction, model version, and emergent behaviours that cannot be fully enumerated in advance. Asking DMAIC to govern it is like asking a bolt-torque specification to govern a jazz improvisation. The vocabulary does not extend to the domain.

Here is the practical consequence: organisations end up measuring the measurable rather than managing the meaningful. Model latency, error rates, and uptime are easy to instrument. Bias drift, contextual appropriateness, value alignment, and trust erosion are not. DMAIC’s Measure phase gravitates toward the former by design - and governance frameworks built on DMAIC inherit that gravitational pull.

The ghost in the machine, here, is the organisation’s assumption that because it has a number, it has oversight.

The Chicken and the Illusion of Control

The “C” in DMAIC - Control - is where the framework’s limitations become most acute.

David Hume’s problem of induction is one of philosophy’s most consequential unsolved problems. Simply put: we cannot rationally justify the belief that the future will resemble the past. No accumulation of past observations guarantees a future outcome. Bertrand Russell illustrated this with his famous chicken; an animal fed reliably every morning for a year, reaching the entirely reasonable, entirely wrong conclusion that it will always be fed. Until one day its carer wrings its neck.

DMAIC’s Control phase rests on what Hume called the Uniformity Principle: control the process, control the output. In a stable manufacturing environment with fixed inputs and specifications, this assumption is defensible. Past performance genuinely predicts future output.

AI systems violate this assumption structurally. Consider a model that is “in control” today by every available metric: low error rate, consistent output distribution, stable latency. That same model may behave very differently tomorrow. The operating environment changes. The user population changes. The model itself may be updated or fine-tuned. The statistical distribution of inputs in production may diverge from the training distribution in ways that are invisible until they produce a consequential failure.

This is not a failure of implementation. It is a failure of epistemology.

The DMAIC framework treats AI governance as a variance reduction problem: find the target state, reduce deviation from it, declare the process controlled. But AI risk is not primarily a variance problem. It is an adaptation problem. The relevant question is not “how far is the current output from the specification?” It is “how will this system behave as the world it operates in continues to change?” These are categorically different questions demanding categorically different tools.

A control chart cannot tell you that your hiring algorithm systematically disadvantages candidates from particular postcodes - because the historical data it was trained on encoded that pattern silently, and the deviation will only surface when an auditor goes looking for it. That is a risk adaptation problem. It requires ongoing human judgement, not a control phase.

Phronesis: The Leader as Judge, Not Inspector

This is where Aristotle becomes directly useful; not as decoration, but as a precision instrument.

Aristotle distinguished three forms of intellectual virtue in the Nicomachean Ethics: episteme (scientific knowledge of necessary truths), techne (technical craft knowledge), and phronesis (practical wisdom). Episteme concerns what cannot be otherwise: the laws of mathematics and logic. Techne concerns how to make things reliably. Phronesis concerns how to act well in conditions of genuine uncertainty; when the rules do not determine the answer, and experience and judgement must fill the gap that procedure cannot.

DMAIC is a technology of techne. It encodes craft knowledge with the implicit assumption that the situation is stable enough for craft rules to apply. Appropriate when the object is a manufactured part. Also useful when taking inputs such as time records to feed a repeatable payroll process. Inappropriate when the object is an AI system making consequential decisions about people.

AI governance requires phronesis. Not because process knowledge is irrelevant - a phronetic leader still understands the system’s technical characteristics - but because the central governance act is judgement, not inspection. The inspector asks: “does this output deviate from the specification?” The judge asks: “is this system, in this context, serving the people it is meant to serve: and what will it mean if we continue to deploy it as the world changes around it?” Only a judge can answer that question. An inspector, however diligent, cannot.

In short: a phronetic governance architecture does not look like a Six Sigma project. It looks more like a judicial system: independent review, deliberation about cases at the margin, precedent-setting, and recalibration of principles in light of experience. It asks who has standing to raise concerns, what the burden of proof is for deployment decisions, how decisions are recorded and revisited, and what happens when the AI produces an outcome that no specification anticipated. These are jurisprudential questions, not engineering questions.

The musician analogy is instructive. Techne allows a guitarist to execute a scale at 120bpm without error. Phronesis allows a musician to read the room during a live performance, to sense that this moment calls for restraint rather than virtuosity, that the arrangement needs to breathe, that the audience is already ahead of them. DMAIC tries to turn the live performance into a quantised MIDI file: perfectly specified, perfectly reproducible, and entirely without judgement.

The jazz ensemble consequently sounds technically correct and artistically dead.

Thanks for reading Phronesis! This post is public so feel free to share it.

Mill, Bentham, and the Metric That Misses the Point

The final philosophical failure of DMAIC in AI governance concerns measurement, and here John Stuart Mill’s correction of Jeremy Bentham becomes the operative lens.

Bentham’s utilitarianism reduced moral calculation to a “felicific calculus”: sum the pleasure produced by an action, subtract the pain, maximise the net. It is an elegant, entirely quantitative framework. Mill argued it was also categorically inadequate, because it cannot distinguish between qualitatively different goods. A world of mild, universal satisfaction is not equivalent to a world in which fewer people experience the pleasures of meaningful work, deep connection, or moral dignity. As Mill put it: “it is better to be Socrates dissatisfied than a fool satisfied” (1861).

DMAIC defaults to Benthamite measurement because it has no other choice. The Measure phase requires quantifiable outputs: cycle time, defect rate, throughput, variance. These are legitimate objects of process engineering.

But here is the problem: when applied to AI governance, this quantitative default produces what the economist Jerry Muller calls “metric fixation”: the systematic displacement of meaningful outcomes by measurable proxies for them.

Consider a hiring AI evaluated on DMAIC principles. Its performance registers as: time-to-shortlist, candidate pool size, CV screening accuracy against historical hiring decisions, and interviewer satisfaction scores. By these metrics, the model is in control, improving, and delivering value. What the metrics cannot capture: that those historical hiring decisions systematically favoured candidates from certain backgrounds, and that measuring “accuracy” against them reproduces, rather than corrects, that bias. The candidates who were never shortlisted generate no data point in the control chart. They are, in the system’s accounting, invisible.

The hiring process is efficient. It is not flourishing. Bentham would be satisfied. Mill would not be.

A governance framework adequate to AI requires Mill’s qualitative distinction: not just “are we producing more of the measurable good?” but “is this a good worth producing, and at what cost to whom?” This is not a soft question. It is the governance question. It is what CHROs and CPOs are ultimately accountable for - whether or not their frameworks are designed to ask it.

The shift from technical compliance to sociotechnical flourishing is not a rebrand of the same function. It is a different epistemological commitment: from treating the organisation as a machine to be optimised, to treating it as a community of people whose flourishing is the point of the enterprise. DMAIC can optimise a machine. Only phronetic judgement can govern a community.

The Architecture That Actually Works

None of this means process rigour is irrelevant to AI. It means process rigour occupies a different layer of the governance stack than organisations typically assign it.

Technical controls - model validation, bias testing, performance monitoring, audit logging - are necessary infrastructure. They are to AI governance what financial controls are to financial governance. But nobody argues that a well-run accounts payable function constitutes a strategy. The technical layer is the floor. It is not the governance.

What sits above it is the layer DMAIC cannot reach: the ongoing exercise of judgement about whether the system serves the organisation’s values, who bears the cost of its errors, what the right response is when it produces an unanticipated outcome, and how governance must adapt as the system - and the world - continues to evolve.

This layer needs different structures:

Ethics review panels with genuine authority, not advisory roles that produce recommendations no one is obliged to follow.
Escalation pathways that give frontline practitioners standing to raise concerns without career risk.
Red-team exercises designed to expose value misalignment rather than technical failure.
Leadership accountability for outcomes, not just process compliance.

In short: the institutional embodiment of phronesis. Practical wisdom exercised by people with genuine authority, genuine accountability, and genuine insight into what the technology is doing to the people it touches.

The Stakes

The organisations that will govern AI well are not the ones with the most sophisticated process frameworks. They are the ones that recognise AI governance as a genuinely new kind of problem - one requiring a different epistemology, a different institutional architecture, and a different conception of what it means to be responsible for a technology.

Ryle taught us that category errors look like rigour from the inside. The organisation that has wrapped DMAIC around its AI deployment feels like it has done the governance work. The bolt-torque specification is in place. The control charts are running. The variance is low.

And somewhere in the system, an AI is making consistent, low-variance decisions that are quietly compounding a harm that no control chart was designed to detect.

The ghost is in the machine. The machine just doesn’t know it.

Phronesis

Discussion about this post

Ready for more?