Trust is the bottleneck for public-sector AI

If you have sat through enough public-sector AI procurement conversations this year, you have heard the same loop. The agency wants to deploy. The model performs. The vendor presents. And then six months pass while a steering committee debates governance.

The committee is right to debate. Models do hallucinate. Decisions do affect citizens. But the framework most agencies are reaching for — copy-pasted from corporate AI risk registers — was designed for very different stakes. Adapting an enterprise framework to a public-sector decision-making problem is like wearing a life-jacket to a building fire. It is the wrong tool, even though it is a tool of safety.

What public-sector governance actually requires

Public-sector AI governance has four hard requirements that corporate frameworks rarely address head-on:

Decision auditability. Every model output must be traceable to its inputs, the model version, and the prompt template. Not after the fact via reconstruction — at the moment of the decision, by the agency itself, without depending on the vendor.
Citizen redress. There must be a clear, navigable path for an affected citizen to challenge an automated decision, escalate to human review, and obtain a written explanation. "Contact the call centre" is not a path.
Sovereign compute. Sensitive workloads — anything touching identity, benefits, law enforcement — must run on infrastructure under the agency's national jurisdiction, with keys held in jurisdiction. Cross-border AI APIs are convenient and inappropriate for the work.
Continuous evaluation. A one-time approval is not governance. Live evaluation of accuracy, fairness across protected attributes, and drift, reported on a public dashboard, is.

How to build the four in

Auditability is a data-model problem. If your inference pipeline emits a row per decision with the prompt, retrieved context, model version, output, confidence score, and operator-on-call to a log table that the agency owns, you have it. If you do not, you don't, and no amount of "explainable AI" will fix that retroactively.

Redress is a process problem. The first version is a form: receipt, ticket number, target SLA, escalation path, written explanation. It does not need to be sophisticated. It does need to exist before deployment, not after the first complaint.

Sovereign compute is increasingly a procurement problem you can solve. Anthropic Claude, OpenAI, and the major hyperscalers all have regional or air-gapped offerings now; the question is which the regulator considers sovereign enough. The right answer is whichever the regulator approves in writing, not whichever the engineering team prefers.

Continuous evaluation is a culture problem dressed up as a metrics problem. The technical part — emit metrics, build a dashboard, alert on drift — takes a quarter. The cultural part — the agency lead actually looks at the dashboard once a week, takes action when an indicator goes red, and is not punished for it — takes years.

What we do at 888

Every AI feature we ship into a public-sector environment is built with these four requirements as the inner loop. Decisions are logged in the customer's database with the customer's keys. Redress UI ships before model deployment, not after. Sovereign deployment is the default; cross-border inference is opt-in with explicit regulator sign-off. Live evaluation dashboards are the first thing we ask the customer to look at every Monday.

We do this because the alternative — a brittle promise of accuracy combined with an opaque escalation path — is the kind of system that causes a single bad decision to become a national news story, and then a moratorium, and then a five-year setback for everyone trying to do AI in this sector. We have all watched it happen elsewhere. We do not need to repeat it here.

What public-sector governance actually requires

How to build the four in

What we do at 888

Authors