Vendor selection & procurement

The UK SME Buyer’s Playbook for AI Contracts (2025)

Published 22 Oct 2025 • 14–17 min read

Buying AI in 2025 is not like buying ordinary software. Costs scale with usage, models change under the hood, and quality can vary with your data. This playbook gives UK SME and charity leaders a practical way to run an AI procurement in weeks, not months: what to ask vendors, how to compare offers, which guardrails to insist on, and a 90‑day rollout that avoids surprises.

Why this matters now:

AI assurance has matured. ISO/IEC 42001 sets out how organisations manage AI responsibly and is available for certification via BSI and others. Ask vendors how they align. iso.org
Cost governance has caught up. The FinOps Foundation’s 2025 framework explicitly covers non‑cloud spend like SaaS and AI, helping teams budget, allocate and optimise costs. finops.org
Public buyers must document algorithmic tools via the UK’s Algorithmic Transparency Recording Standard (ATRS). Even if you’re private sector, the template is a useful accountability checklist to require from suppliers. gov.uk
For SaaS, UK guidance continues to reference the NCSC’s cloud security principles when choosing tools—make sure your vendor can answer against them. gov.uk

A simple decision tree before you go to market

Is there a proven, configurable product? If yes, start with SaaS. If no, consider custom build or a hybrid.
Will personal or sensitive data be processed? If yes, tighten vendor due diligence and insist on robust transparency, security answers and a DPIA with your DPO.
Is quality measurable in your real use case? If not, run a limited data trial first; don’t buy on demo quality alone.
Is the unit economics viable? If you can’t estimate a cost per task (email answered, lead qualified, page summarised), you can’t control spend. Use FinOps-style allocation and quotas from day one. finops.org

The 30 vendor questions that prevent regrets

1) Outcomes, success metrics and accountability

Which outcomes will you commit to in writing (e.g. reduction in handling time, response coverage, time‑to‑first‑answer)? What’s excluded?
What offline and live metrics do you track (accuracy, relevance, harmful content rate, latency)? How often are they reported to us?
Can we run a controlled pilot with our own data and define exit criteria?

2) Data protection and security baseline

Map our data flows: what leaves our tenant, where is it stored, and for how long? In which jurisdictions?
Demonstrate how you meet the NCSC Cloud Security Principles (data in transit, tenant separation, audit, supply chain). Provide a filled checklist, not just a marketing claim. gov.uk
How do you secure admin access, logging, and incident response for your SaaS? Share your playbook summary and last test date. gov.uk

3) Model choices and quality

Which foundation models power the service? How often do you swap or upgrade models and how are changes communicated?
Do you offer retrieval‑augmented generation (RAG) or structured prompts with guardrails? How do you control hallucinations and unsafe outputs?
What is your evaluation approach on our content? Share sampling method, test set size and pass/fail thresholds.

4) Cost guardrails and scalability

What pricing model applies (per user, per message, per thousand tokens, per document)? Where are the breakpoints and overage rates?
What budget controls exist (quotas, per‑team limits, throttling)? Can we set hard caps and alerts? FinOps working groups recommend limits and throttling to prevent runaway costs—do you natively support them? finops.org
Do you provide monthly cost allocation by team or project, aligned to FinOps practices? What’s the percentage of spend you can attribute automatically? finops.org

5) Delivery, support and change

Who provides implementation and ongoing optimisation? What’s included in the fee vs chargeable?
What SLAs do you offer for availability, response times and quality incidents? What are the service credits?
How do you notify and control model or feature changes that could affect outcomes?

6) Transparency, ethics and assurance

Do you align with ISO/IEC 42001 (AI management systems)? If not certified, can you provide evidence of conformance activities and scope? iso.org
Can you produce an ATRS‑style record describing purpose, data, safeguards and human oversight for our deployment? gov.uk
Are any sub‑processors used for data processing or model operations? How are they vetted and monitored?

7) Exit, portability and lock‑in

What is our data export format and how quickly can we extract everything (including prompts, configurations and embeddings)?
What help do you provide to port to another supplier or in‑house solution if needed?
What happens to our data, logs and model derivatives at contract end?

How to score competing proposals

Dimension	Weight	What good looks like
Business outcome fit	25%	Clear, quantified outcomes linked to your process and KPIs.
Quality & safety	20%	Evidence‑based evaluation on your data; low harmful/irrelevant output rate; human review options.
Cost & controls	20%	Transparent pricing; quotas and alerts; monthly cost allocation by team/project. finops.org
Security & assurance	15%	NCSC‑aligned answers; ATRS‑style transparency; ISO/IEC 42001 alignment. gov.uk
Delivery & support	10%	Named team; clear SLA; optimisation cadence.
Portability & exit	10%	Full data export, config portability, decommission plan.

Keep scoring visible. If a late discount changes the decision, document why the business outcome still holds.

Costing AI the pragmatic way

AI costs are usage‑driven. Build a simple unit model so finance and delivery can “see” spend before it happens:

Define the unit. One answer generated, one email drafted, one page summarised, one lead qualified.
Estimate volume. Use last three months of operational data to derive a realistic daily/weekly figure.
Model price tiers. Convert vendor pricing into “cost per unit” at low, medium and high volumes.
Add guardrails. Apply per‑user and per‑team quotas, and throttle non‑critical use when spend exceeds forecast. This mirrors FinOps for AI recommendations. finops.org
Allocate and review monthly. Attribute spend to teams and products; aim for 80–90% allocable costs as maturity improves. finops.org

Tip: treat prompt, context and output sizes as levers. Minor prompt changes can halve token use without hurting quality—test during evaluation, not after go‑live.

Risk and cost table for directors

Risk	Typical cost impact	Mitigation
Runaway usage	High (budget breach)	Set quotas and alerts; restrict preview features; monthly allocation reports to owners. finops.org
Quality drift after vendor model change	Medium (rework, complaints)	Change notices in contract; re‑test gates; fall‑back responses; human review for sensitive workflows.
Weak SaaS security	High (breach, downtime)	NCSC principle mapping in due diligence; right to audit; incident response SLAs. gov.uk
Opaque AI governance	Medium (trust, adoption)	Request ATRS‑style record and ISO/IEC 42001 alignment statement. gov.uk

The 90‑day rollout plan

Days 0–30: Evaluate and de‑risk

Run a structured evaluation on your content with 30–50 real tasks; baseline quality, latency and cost per unit.
Complete a lightweight security and data review; get DPO sign‑off on the proposed data flows.
Decide a go/no‑go using scorecard thresholds, not gut feel. If you need a recipe, use our 5‑Day AI Evaluation Sprint.

Days 31–60: Pilot in production, quietly

Enable for a small user group; keep a human‑in‑the‑loop for sensitive steps.
Set hard usage caps; weekly cost and quality reviews with the vendor.
Follow a change‑managed approach similar to our Quiet Cutover plan to avoid big‑bang risk.

Days 61–90: Scale and integrate

Expand to the next team; automate reporting; refine prompts for cost/quality balance.
Agree quarterly optimisation and model‑change review with the vendor.
Align next increments with our 12‑week launch plan and your 2025 practical AI stack.

Contract clauses that save you later

Change control for models. Advance notice for swaps/upgrades; right to freeze version for critical workflows; re‑test before full rollout.
Quality SLA. Define measurable quality thresholds and a remedy mechanism (credits, re‑work) for persistent failure.
Security annex. Map to NCSC principles and your controls; include incident reporting timelines and third‑party sub‑processor disclosure. gov.uk
Cost caps. Monthly spend caps with automatic throttle; obligation to expose per‑team usage and cost allocation. finops.org
Exit and data portability. Export of data, logs, prompts and configs; secure deletion certificate; handover assistance.
Transparency record. Require an ATRS‑style artefact describing purpose, data, oversight and limitations for your deployment. gov.uk
Assurance. Statement of conformity or certification path for ISO/IEC 42001 over the contract term. iso.org

Procurement pitfalls to avoid

Buying the demo. Demos are curated. Always test on your content and edge cases.
Forgetting the unit cost. Flat licences hide usage costs. Convert to cost per task before approving budget.
Underestimating adoption. No training, no change plan, no benefit. Bake in enablement and process tweaks.
Undefined ownership. Give a named product owner P&L responsibility for the AI service.
Security by assertion. Ask for completed checklists and evidence, not just badges. gov.uk

What good looks like in practice

In a recent SME rollout, the buyer used a one‑page scorecard, capped spend per team, and held fortnightly quality reviews with the vendor. Within six weeks they halved their “first‑draft” turnaround time and kept costs under forecast by introducing team‑level quotas and a narrower prompt style for routine tasks—simple levers, applied consistently.

Book a 30‑min call Or email: team@youraiconsultant.london