AI Governance & Law

How Internal Audit Should Test AI Controls

Paweł Kubisiak·2026-06-01·7 min read

# How Internal Audit Should Test AI Controls

In many organizations, internal audit has received a new mandate: assess whether AI controls are truly effective, not only formally documented. This challenge is qualitatively different from classic IT audits. AI systems involve model variability, dependency on data and vendors, drift risk, and interpretation gaps between policy and operations. Standard compliance checklists are therefore not enough.

Both the IIA Global Internal Audit Standards (2024), COSO Internal Control - Integrated Framework (2013), and NIST AI RMF 1.0 (2023) point to evaluating controls in the context of risk, business objective, and operating effectiveness. In AI, this means shifting from document audits to audits of operational evidence: logs, decisions, exceptions, incidents, data quality, and escalation paths.

This playbook shows how internal audit can test AI controls in a way that helps management decide on scale, remediation, and risk acceptance. The goal is not to slow transformation, but to make it more controllable.

What Distinguishes AI Control Audit from Traditional IT Audit

In a traditional systems audit, a control is often binary: it exists or not. In AI, this logic is too simplified. A control can formally exist but operate ineffectively in specific data segments, user roles, or exception scenarios.

Key differences: - control effectiveness depends on input data quality and usage context, - some risks materialize post-deployment (drift, user behavior change), - model decisions can be probabilistic and require acceptable-threshold assessment, - accountability is distributed across business, IT, data, risk, and vendors.

This means audit must examine not only control design, but also control performance over time.

Bad -> Good Example

Bad: audit is limited to checking whether AI policy, model registry, and signed accountability roles exist. The report ends with "controls designed," even though operational exceptions and quality escalations are rising.

Good: audit tests not only control existence but effectiveness on samples of real decisions and incidents: whether exceptions were handled according to procedure, whether escalations met SLA, and whether post-incident recommendations were implemented. The report includes residual-risk assessment and concrete corrective actions with owners and deadlines.

Three AI Audit Objectives the Audit Committee Should Approve

Before testing starts, audit should align three objectives with the audit committee:

1. **Adequacy:** does the control fit the risk profile of the AI use case. 2. **Operating effectiveness:** does the control run consistently and detect deviations. 3. **Enforceability of accountability:** is there a clear chain of owners, decisions, and remediation.

Without this alignment, audit is easily reduced to document review.

AIC-T6 Testing Model

A practical approach to auditing AI controls is the AIC-T6 model (AI Control Testing, 6 steps), usable for cross-functional and thematic audits.

### T1 Scope by risk

Audit scope should be defined by use case criticality, not organizational structure. Priority goes to systems with high impact on customers, finances, compliance, or safety.

### T2 Control inventory and ownership

Audit builds a control map: preventive, detective, corrective, and governance controls. Every control must have an owner, operating frequency, and expected evidence.

### T3 Design adequacy testing

At this stage, audit evaluates whether control design is logically sufficient. Example: do error escalation thresholds exist, is human-in-the-loop (HITL) real, and are production gate criteria measurable.

### T4 Operating effectiveness testing

The most important phase: test control performance on samples of actual decisions and exceptions. This includes analysis of logs, incident registers, remediation tickets, and closure times.

### T5 Outcome and residual risk assessment

Audit assesses not only compliance but residual risk. A control can be formally operating yet still leave unacceptable business risk.

### T6 Remediation and governance follow-through

Audit output should end with a concrete remediation plan including deadline, owner, and closure criteria. Audit monitors remediation progress and reports delays to the audit committee.

The AIC-T6 model structures audit work and makes it easier to compare outcomes across units.

How to Build an Audit Sample in AI Environments

Sampling is critical because AI risks are often unevenly distributed. Pure random sampling may miss high-risk segments. Audit should therefore combine:

- random sampling (overall stability assessment), - targeted sampling (hard scenarios, exceptions, escalations), - time-based sampling (periods of model change or volume spikes), - entity-based sampling (different user roles and markets/jurisdictions).

This approach increases the chance of finding gaps that materially matter for the business.

What Evidence Audit Should Accept

In AI audit, verbal declarations and static policies are insufficient. A strong evidence package includes:

- model decision logs and human approval paths, - history of prompt, rule, configuration, and model-version changes, - exception, incident, and corrective-action registers, - quality and drift metrics with alert thresholds, - evidence of user training and confirmation of accountability roles, - records of pre-production tests and gate criteria.

In practice, use this rule: every critical control should be auditable end-to-end, from definition to operational trace.

Typical Gaps Found by Internal Audit

1. Controls exist on paper but lack assigned operational owners. 2. Human-in-the-loop is formal, while approvals are mass-accepted without real review. 3. Thresholds are missing that should automatically trigger escalation or process stop. 4. The incident register is disconnected from remediation and control updates. 5. Model or vendor changes do not trigger renewed risk assessment. 6. Quality metrics are collected but do not drive decisions.

Audit that identifies these gaps early reduces the cost of later incidents.

Scenario: Formal Control, Real Risk

An insurance company deployed an AI system supporting claims assessment. Control documentation was complete: policies, roles, checklists, and monthly reviews. Yet an AIC-T6 audit found that a key decision-quality control performed poorly in one high-complexity segment.

The issue did not come from missing policy, but from operational practice. The escalation threshold was set too high, and the review team lacked sufficient capacity. As a result, part of the cases passed without proper oversight despite a formal human-in-the-loop model.

Audit recommended threshold recalibration, broader review sampling, and automatic alerts for the elevated-risk segment. After two monitoring cycles, post-factum corrections dropped and exception closure time improved.

This scenario shows that AI audit’s main value is detecting gaps between control design and real-world operation.

How to Report Results to Management and the Audit Committee

AI audit reports should be decision-oriented, not purely descriptive. A strong structure includes:

- residual-risk level for each critical use case, - status of key controls (green/amber/red) with rationale, - impact of gaps on business objectives and compliance, - remediation plan: owner, deadline, closure criteria, - areas requiring management decisions (for example risk acceptance or extra investment).

This format makes it easier to connect audit with governance and investment planning.

12-Week Audit Action Plan

### Weeks 1-2: Risk and scope calibration

Select critical use cases, align audit objectives, and map stakeholders.

### Weeks 3-5: Control inventory and design testing

Build control map, owners, and expected evidence; identify design gaps.

### Weeks 6-9: Operating-effectiveness testing

Analyze random and targeted samples, test exceptions, review incidents and remediation timing.

### Weeks 10-11: Residual-risk assessment and recommendations

Categorize findings by impact, prepare remediation plan and management decisions.

### Week 12: Report and follow-up plan

Present outcomes to the audit committee, agree milestones, and schedule retesting.

What Distinguishes Strong AI Audit from Cosmetic Audit

Strong AI audit shows where controls fail in practice and what must change to make risk acceptable. Cosmetic audit confirms documents exist but cannot assess operating effectiveness.

Strong audit combines technical, process, and organizational evidence. Cosmetic audit separates these worlds and loses the full picture.

Strong audit leads to remediation with owner and timeline. Cosmetic audit ends with observations and no enforcement.

In AI environments, this difference determines whether governance protects the company or only looks convincing.

Executive Takeaway What changed? Internal audit in AI must evaluate operating effectiveness of controls over time, not only formal documentation compliance. Why does it matter? AI controls may exist on paper while still allowing critical risk due to wrong thresholds, weak review, and ineffective remediation. What should leaders do? Implement AIC-T6, base testing on end-to-end evidence, combine random and targeted sampling, and report residual risk with enforceable remediation plans.

Paweł Kubisiak

Partner at AI&Scale, Editor in Chief

Partner at AI&Scale and Editor in Chief, responsible for editorial quality and direction across AI transformation, governance and scaling coverage.