Skip to main content
AI & ML

An LLM governance framework for mid-market companies

Mohakdeep Singh|May 6, 2026|11 min read
An LLM governance framework for mid-market companies

LLM governance at mid-market scale (50-500 employees) is the small set of policies, processes, and reviews that prevent AI use from causing harm without blocking AI use from creating value. The enterprise governance frameworks (NIST AI RMF, EU AI Act compliance regimes, ISO 42001) are designed for organizations large enough to staff dedicated governance functions. Mid-market needs a focused subset that fits in 5-10% of one person's time.

Why mid-market needs a different posture on governance

Three properties define the mid-market governance position:

  1. You have legal exposure. Customer data flows through AI applications; the contracts you signed with customers say things about how that data is used. A governance failure has real cost.
  2. You don't have a Chief AI Ethics Officer. Or a dedicated AI governance team. The work happens as part of someone's other job.
  3. The AI surface area is changing fast. Every quarter brings new model capabilities (longer context, agents, multi-modal), new vendor relationships, new use cases. Governance built for last quarter's surface area becomes brittle.

These properties argue for a governance approach that is principle-based rather than checklist-based, ownership-clear rather than committee-decided, and integrated into existing engineering practices rather than parallel to them.

The competitive landscape: Big-4 consultancies and law firms publish enterprise governance frameworks. They are appropriate for genuine enterprise. NIST's AI Risk Management Framework is the most useful public reference for any organization; we route readers there for the foundational concepts and build the mid-market-specific layer on top.

This piece describes the framework we use across mid-market AI engagements. It is the governance companion to our AI enablement roadmap pillar and our AI Center of Excellence structure piece.

The four principles

Most governance failures we see at mid-market trace back to a violation of one of four principles:

  1. Data flowing into AI applications must respect the customer contract that brought it in.
  2. Decisions made by AI must be attributable, reviewable, and reversible.
  3. AI vendors must be evaluated for security, data handling, and contract terms before adoption.
  4. AI incidents must have an owner and a response process.

Everything in the framework derives from these four. The policies, processes, and reviews are the operationalization. When in doubt about whether a specific situation needs governance attention, ask which principle is at risk.

The four documents

The minimum governance documentation set for mid-market is four documents, kept short and reviewed quarterly.

Document 1: Acceptable Use Policy (AUP)

What it covers: what data can go into which AI tools, what use cases are permitted, what's prohibited.

Structure (one page):

  • Permitted use. AI tools may be used for [list of categories: code generation, content drafting, data analysis on permitted data, etc.]
  • Restricted data. The following data must not be input into AI tools without explicit approval: customer PII, financial data, regulated data (HIPAA, etc.), competitive intelligence, IP under embargo
  • Approved tools. The following AI tools are approved for company use: [list with version constraints if relevant]
  • Provisional tools. The following are under evaluation; production use requires sign-off
  • Prohibited tools. The following are not approved: [explicit list]
  • How to request changes. Process for requesting a new tool or use case to be evaluated

The AUP is for everyone in the company, not just engineering. The language reflects this — non-technical readers should be able to understand it. Length: under 1000 words.

The mistake we see: 30-page AUPs that try to anticipate every scenario. They become unread; people make decisions without reference to them; the practical effect is no AUP. Short and used beats long and ignored.

Document 2: Vendor Evaluation Criteria

What it covers: what questions get asked before a new AI vendor is approved, what evidence is required, what the approval workflow is.

Structure (one page):

  • Trigger. When does this evaluation run? (New vendor, vendor adding AI features to existing product, etc.)
  • Required information. What the vendor must provide:
  • Evaluation owner. Who reviews and recommends approval/rejection
  • Approval threshold. Who has approval authority by spend tier
  • Documentation. Where the evaluation outcome is recorded

The standard mid-market mistake: every team evaluates vendors independently with different rigor. The result: some vendors get rigorous evaluation, others slip through with minimal review. A consistent process — even a lightweight one — closes that gap.

Document 3: Incident Response Process

What it covers: what happens when an AI application produces a wrong answer that has consequences, when a vendor has an outage, when a security issue surfaces.

Structure (one page):

  • Severity classification. What constitutes a P1/P2/P3 AI incident. Examples for each tier specific to your business.
  • First responder. Who gets paged when an AI incident is reported, by severity
  • Triage process. What the first responder does in the first 30 minutes
  • Escalation paths. Who gets notified, when, by severity
  • Communication. Internal and external (customers, regulators if applicable) communication templates and timing
  • Post-incident review. Process for capturing lessons; cadence for reviewing patterns

The incident response document is most useful when it has been used. Run a tabletop exercise quarterly even if no real incident has occurred. The exercise surfaces gaps in the document and trains the responders.

Document 4: Risk Register

What it covers: a running list of identified AI-related risks, their assessment, and the current mitigation status.

Structure (a maintained spreadsheet or simple database, reviewed quarterly):

RiskLikelihood (L/M/H)Impact (L/M/H)MitigationOwnerLast Review
Customer PII inadvertently shared with LLMMHAUP enforcement; engineering review of data flowsCTOQ2 2026
Vendor X stops offering API used by Product YLMMulti-vendor abstraction in code; contract reviewEng VPQ2 2026
Model version regression breaks Application ZMMPinned model versions; regression evaluation suiteAI LeadQ1 2026

The risk register is not a one-time output; it's a living document. Risks get added as new applications launch; risks get removed as mitigations land. The quarterly review surfaces what's changed.

The mistake: building a comprehensive risk register at AI program kickoff and then never updating it. The static document is worse than no document because people assume it's current.

The processes

Three recurring processes that operationalize the documents:

Process 1: New use case review

When a team proposes a new AI application:

  1. Submission. Team writes a one-page brief: what's the use case, what data flows through it, which vendor(s) are involved, what's the user-facing impact.
  2. Review. AI lead (or governance owner) assesses against the AUP and risk register. Returns one of: approved, approved with conditions, more information needed, rejected with reasoning.
  3. Documentation. Approved use cases land in a registry of active AI applications. Conditions are tracked.
  4. Cadence. Reviews happen on demand, target turnaround 5 business days.

The process is intentionally lightweight. The rigor is in clarity of the AUP and the risk register, not in lengthy review.

Process 2: Vendor evaluation

When a new AI vendor enters consideration:

  1. Trigger. Anyone in the company can flag a vendor for evaluation by submitting a brief request.
  2. Vendor information collection. The vendor evaluation owner sends the standard information request to the vendor.
  3. Review. Information is reviewed against the vendor evaluation criteria. Outcome is one of: approve, approve with conditions (specific use cases only), reject.
  4. Documentation. Outcome lands in the approved/restricted/prohibited tool list in the AUP. Re-evaluation cadence is set (typically annual; sooner if material vendor change).

A typical mid-market company has 5-15 AI vendors under evaluation in a given year. The process scales fine at this volume.

Process 3: Quarterly governance review

Once per quarter, the AI lead or governance owner runs a 60-minute governance review:

  • Document refresh. Are the four documents still current? What needs updating based on the last quarter's events?
  • Risk register update. New risks identified, mitigations completed, risks closed.
  • Vendor list refresh. Renewal cycles approaching, vendor changes affecting status, new vendors added.
  • Incident pattern review. Were there incidents in the last quarter? What patterns are emerging?
  • Forward look. What governance work is needed next quarter to stay ahead of the AI roadmap?

The output: a one-page governance update shared with the executive sponsor and the engineering leadership. Surface what needs decisions; resolve what doesn't.

The reviews

Two specific reviews that catch the most common governance failures:

Review 1: Data flow review

For each AI application, document:

  • What data flows in
  • Where that data came from (which customer, which contract, which classification)
  • Where the data goes after processing
  • What the AI vendor's contract permits regarding the data
  • Whether anything in the data flow violates the customer contract that brought the data in

This is the single most important review. The most common AI governance failure at mid-market is customer data flowing into AI vendors in ways that violate the data processing terms in the customer contract. Catching this requires the engineering team to understand both the AI vendor's terms and the customer-facing contract; the data flow review is the artifact that makes that comprehensible.

For each new AI application, complete this review before production launch. For existing applications, run it once as a backfill; refresh quarterly.

Review 2: Behavior evaluation

For AI applications that produce outputs visible to users (customers or internal):

  • What's the expected behavior?
  • How is it measured?
  • What's the acceptable error rate?
  • What happens when the AI is wrong?

This is harder than the data flow review because "expected behavior" is fuzzy for many AI applications. The discipline is to write down what success and failure look like before launch, build the evaluation infrastructure to measure them, and make the metrics visible to the team owning the application.

We have written about evaluation methodology elsewhere; the governance angle is that the application owner is accountable for knowing the application's behavior characteristics, not just for shipping it.

What this framework intentionally leaves out

The mid-market framework does not include:

  • Formal AI ethics committee. Useful at enterprise scale; overhead exceeds benefit at mid-market unless your industry requires it.
  • Algorithmic impact assessments per Canadian/EU regulatory frameworks. Required by specific regulations for specific applications; we route readers to specialist legal counsel for those.
  • Detailed model cards for every application. Useful at platform-engineering scale where you maintain models; less useful when you're consuming vendor APIs.
  • Multi-stakeholder governance councils. Common at enterprise scale; the small team mid-market governance owner can operate more efficiently than a council can.
  • Prescriptive metrics for fairness, bias, etc. These metrics are real and important; one-size-fits-all metrics rarely apply meaningfully across use cases. The framework requires that applications producing user-facing decisions have evaluation infrastructure; the specific metrics are use-case-specific.

These omissions are deliberate. The framework targets what produces governance value at mid-market; we strip out what doesn't pay back.

Common governance failures at mid-market

Patterns we see frequently:

Customer data in vendor training pipelines. A team adopts an AI vendor without checking whether the vendor's terms say "we may use customer data to improve our service." Some customer contracts explicitly forbid this; the AI use creates contract breach. Catching this requires the data flow review; it doesn't get caught reactively.

Model behavior drift not detected. An application launches working well; the underlying model gets updated by the vendor; behavior changes; nobody notices because there's no evaluation infrastructure. The result is sometimes minor (slight quality regression) and sometimes significant (model now refuses queries it used to handle). Pinning model versions and running regression evaluations catches this.

Vendor lock-in not anticipated. An application is built around a specific vendor's specific features. The vendor changes pricing, deprecates the feature, or has a security incident. Switching is expensive because nothing was built with vendor abstraction in mind. The mitigation is in the architecture and the evaluation; governance flags the risk.

Acceptable Use violated by employees who never read the AUP. A long AUP that nobody read. The violation happens; the policy is "in place"; the practical control isn't. Short policies plus active communication (mentions in onboarding, periodic reminders) work better than long policies in shared drives.

Incident occurs; nobody knows who owns response. The incident response document doesn't exist or isn't current. Time-to-response stretches; the incident escalates; trust erodes. The fix is the document plus the tabletop exercise.

How to assess where you are today

Score 0-3 on each component:

  • 0: Not in place
  • 1: Drafted, not in active use
  • 2: Active and current
  • 3: Active, current, and reviewed on schedule
Component0123
Acceptable Use PolicyNoneDrafted but not communicatedCommunicated and accessible+ reviewed quarterly
Vendor Evaluation CriteriaPer-team, no standardStandard draftedApplied to all new vendors+ annual reviews of approved vendors
Incident Response ProcessNoneDraftedUsed for at least one incident+ tabletop exercise quarterly
Risk RegisterNoneInitial assessmentMaintained and reviewed+ quarterly executive review

Score interpretation:

  • 0-4: Pre-governance. Highest priority is the AUP and the vendor evaluation criteria. The other two follow naturally.
  • 5-8: Functional. Likely missing the active practice (regular reviews, tabletop exercises). Schedule them.
  • 9-12: Mature for mid-market. Sustaining work; consider whether scale or regulatory environment justifies the additional rigor of an enterprise framework.

FAQ

Q: Do we need a Chief AI Officer or AI Ethics Officer? At mid-market, almost never. The roles exist at enterprise scale because of the coordination cost; mid-market has lighter coordination needs. The functional ownership (named individual responsible for the four documents and the three processes) is more important than the title.

Q: How does this interact with regulatory requirements like the EU AI Act? The framework above is the foundation; specific regulations layer on top. For EU AI Act high-risk applications, additional documentation (impact assessments, human oversight design, technical documentation packages) is required. We work with specialist legal counsel for the regulatory layer; the framework above creates the operational foundation that makes the regulatory layer manageable.

Q: We're a startup that grew quickly. Do we need to do all of this? Start with the AUP and vendor evaluation criteria; those are the highest-leverage. Build the others as use cases emerge. The risk register can begin as a list of three risks; expand from there. Don't skip the foundation, but don't try to ship everything at once either.

Q: How do we get employees to actually read the AUP? Short policy. Communication beyond document publication: discuss in onboarding, mention in engineering all-hands when AI topics come up, refer to it in code review when relevant. The policy is a tool; if it's not in active circulation, it doesn't function.

Q: What's the right reporting structure for AI governance? At mid-market, governance ownership often sits with the same person who leads AI capability (CTO, VP of Engineering, AI Lead). At enterprise scale, governance gets a separate function. The mid-market combination works because the governance load is small enough that the same person can hold it.

Q: Should the AUP be public? Internal-facing usually. Customer-facing version (what customer data may be processed by AI) is often required by enterprise customer contracts and usually appears in your trust center or DPA. Two documents serving different audiences.

*For broader AI enablement context, see our pillar on the AI enablement roadmap for mid-market. For the organizational structures that own this governance, see our AI Center of Excellence structure piece.*

Mohakdeep Singh

Mohakdeep Singh

Principal Consultant

Specializes in AI/ML Engineering, Cloud-Native Architecture, and Intelligent Automation. Designs and builds production-grade AI systems including retrieval-augmented generation (RAG) pipelines, conversational agents, and document intelligence platforms that transform how enterprises access and act on information.

Stay Updated

Get the latest cloud optimization insights delivered to your inbox.

Ready to Transform Your Cloud Infrastructure?

Let our team show you where your cloud spend is going -- and how to fix it. AI-powered optimization across AWS, Azure, GCP, and OCI.

Schedule Your Free Consultation