If you only do three things this week: ask vendors to show proof, score what you see, and cap your spend while you test. This article gives UK SME and charity leaders a practical, non-technical pack to run AI vendor due diligence without drowning in jargon or paperwork.
We’ll use established guidance to keep it simple: the NIST AI Risk Management Framework (including the Generative AI profile), and ISO/IEC 42001 for AI management systems. You don’t need to certify; you just need vendors to show how they align. That way you anchor conversations in recognised frameworks, not marketing slides. nist.gov
What’s different about AI procurement?
- Quality varies with data and prompts. You must see task‑level results on your content and cases, not just a generic demo.
- Costs are usage‑driven and can spike. You need guardrails before pilots, not after. See our cost sections below.
- Supply chain risk is real. Many AI tools chain together third‑party models, vector databases and plugins. UK guidance recommends structured supplier assurance, with clear questions and ongoing checks. security.gov.uk
- Cloud pattern matters. SaaS versus private deployment drives data location, access, logging and exit options. The NCSC’s cloud security principles are a helpful lens when probing vendors’ answers. cloud.service.gov.uk
The 1‑hour triage: five quick filters
Use this to decide which vendors deserve a slot in your 10‑day bake‑off.
- Problem fit: Can the vendor restate your use case in plain English and suggest a measurable outcome (e.g. “cut triage time by 30% in Customer Care”)?
- Deployment pattern: SaaS, private SaaS, or your cloud? Ask for a one‑page data flow showing where data is stored, processed and logged. Cross‑check answers against the NCSC cloud security principles you care about most (data protection, separation between users, audit information, and secure service administration). cloud.service.gov.uk
- Security posture: Minimum bar: Cyber Essentials (ideally Plus) or ISO 27001; bonus points for an AI‑specific management approach aligned to ISO/IEC 42001. gov.uk
- Commercial clarity: Is pricing transparent for both platform fees and usage (tokens, requests, storage, seats)? Do they publish rate cards and throttling/overflow policies?
- Switching and exit: Can you export your data, prompts and evaluation sets in human‑readable formats? Do they document an exit plan with timelines and support?
Your 10‑day bake‑off plan
Keep it short, scored and real. Invite 2–4 vendors. Give them the same sample data, tasks and targets. Share your scoring sheet in advance.
Days 1–2: scope and pack
- Define two business tasks you’ll test (e.g. draft a response to a customer email; summarise a 20‑page case file).
- Prepare a small, legally shareable dataset (50–200 items) with ground truth answers.
- Set success thresholds (e.g. quality ≥ 85/100; time saved ≥ 30%).
Days 3–4: assurance kick‑off
- Send vendors your Due Diligence Pack: questions below, your data sample, test tasks, and the scoring matrix.
- Ask for five proofs (listed later) before their demo slot.
Days 5–7: show, don’t tell
- Run vendor demos on your tasks and data, not canned decks.
- Capture scores in your matrix while stakeholders watch. Record timings and failure modes.
Days 8–9: hands‑on evaluation
- Give shortlisted vendors 48 hours of sandbox access with usage caps and dummy data.
- Measure quality, speed, human effort saved and error types using your ground truth.
Day 10: decision and next steps
- Agree a ranked shortlist, redlines, and a pilot plan with KPIs and kill‑switches (see below).
- Set contract heads of terms and an exit test you’ll run before go‑live.
Government security guidance highlights the importance of ongoing contract management and KPIs for higher‑risk suppliers—apply the same rigour to AI, not just initial selection. security.gov.uk
20 questions to ask every AI vendor
Group your asks; score each 0–5. Anything below 3 requires a mitigation or a “no”.
Data, privacy and access
- Where is customer data stored and processed? Can we choose UK or EU regions?
- Do you use our data to train your models by default? How can we opt out?
- How long do you retain logs and prompts? Can we set retention to 30 days or less?
- Is role‑based access enforced with SSO and MFA? Who at your company can access our data for support?
Security and operations
- Which baseline do you meet: Cyber Essentials (Plus), ISO 27001? Do you map to the NCSC cloud principles? gov.uk
- Do you conduct annual penetration tests and share an executive summary? How are findings tracked to closure?
- What’s your incident response SLA and our notification window? Do you run joint exercises with customers?
- Which sub‑processors do you use and how do you assure them? Do you flow down our contractual requirements? UK guidance encourages supplier assurance and flow‑down. security.gov.uk
Model and quality
- Which models are used (foundation and fine‑tuned)? Can we bring our own model?
- Show evaluation results on our tasks: accuracy, factuality, redaction success, harmful output rate.
- How do you mitigate prompt injection and data leakage in retrieval‑augmented generation?
- How do you version prompts, evaluation sets and outputs over time?
Cost and limits
- What are the unit drivers (requests, tokens, storage, vector reads, seats)? Provide a worked example for our workload.
- What controls prevent runaway spend (rate limits, concurrency caps, per‑project budgets)?
- When do you throttle or degrade responses? What happens at quota limits?
Support, continuity and exit
- Do you offer uptime and response SLAs? What credits apply?
- Can we export all content, prompts, evaluations and metadata in open formats on request?
- What’s your business continuity plan and RPO/RTO targets? Have you tested them?
The five proofs to demand before you shortlist
- Security certificate or attestation: Cyber Essentials (Plus) and/or ISO 27001. For AI‑heavy vendors, ask how they align with ISO/IEC 42001; you’re checking that AI risks are managed systematically. gov.uk
- Pen test summary: An independent test from the last 12 months, with risk‑rated findings and closure dates.
- Data flow diagram: A one‑pager showing where data is stored, processed and logged, mapped to the NCSC cloud principles most relevant to you. cloud.service.gov.uk
- Model evaluation note: A short report explaining the models used and evaluation results on tasks similar to yours, tied back to the NIST AI RMF or its Generative AI profile. nist.gov
- Support & continuity evidence: Incident response procedure, status page link, RPO/RTO targets, and last exercise date. UK supply‑chain guidance stresses proactive reporting and regular testing through the contract lifecycle. security.gov.uk
Cost governance that scales
AI bills grow with usage. Put guardrails in before pilots, then scale with confidence.
Unit metrics to track from day one
- Cost per resolved case or cost per drafted document.
- Average tokens per task and per user session.
- Cache or retrieval hit‑rate (for RAG), which reduces model calls.
- Human time saved per task (minutes) and deflection rate (%).
Guardrails
- Per‑project monthly budget caps and hard kill‑switches.
- Rate limits per user and per workflow to prevent spikes.
- Feature flags to enable/disable expensive options for small cohorts first.
If you need a deeper dive into unit economics and cost controls, these playbooks will help:
Pilot KPIs and a go‑live gate
Before a pilot, agree KPIs and a single‑page “go‑no‑go” test. Track:
- Quality: accuracy or agreement with ground truth, harmful output rate, refusal rate.
- Speed: end‑to‑end time and human time saved.
- Adoption: % of eligible tasks using the AI feature; CSAT from users.
- Cost: cost per task, total spend vs cap.
- Operations: incidents, mean time to detect/respond, SLO breaches.
At the gate, require: KPI thresholds met, incident drill completed, exit test passed (full export works), and signed risk owner. For a template, see The Go‑Live Gate for AI.
If your pilots touch essential services or sensitive operations, consider aligning your checks with the NIST AI RMF and your existing cyber framework to keep language consistent across the organisation. nist.gov
Contract essentials in plain English
Keep contracts proportionate and focused on outcomes. Build in these clauses:
- Data handling: processing locations; training opt‑out; retention and deletion timelines; sub‑processor list and flow‑down; audit rights for serious incidents. UK supplier‑assurance guidance encourages clarity on subcontractors and audit rights. security.gov.uk
- Security warranties: maintain baseline controls (e.g. Cyber Essentials/ISO 27001); notify incidents within X hours; annual pen test with exec summary; timely patching of critical vulnerabilities. gov.uk
- Service levels: uptime/response SLAs, fair service credits, and an RPO/RTO that matches your risk appetite.
- Cost controls: hard caps, rate limits, and price‑change notice periods; your right to disable costly features.
- Exit plan: assisted export in open formats; secure deletion certification; handover support for 30–60 days.
For organisations new to supplier risk management, NPSA’s supply‑chain guidance is a helpful companion—especially on stress‑testing and incident coordination with suppliers. npsa.gov.uk
Your Due Diligence Pack (copy‑paste this list)
Send the same pack to each vendor to keep the process fair and fast:
- Two tasks and a small, shareable dataset with ground truth answers.
- Scoring matrix (quality, speed, cost, security, usability) with weightings.
- The 20 questions and five proofs from this article.
- Usage caps and sandbox dates for the hands‑on test.
- Heads of terms for key clauses (data, security, SLAs, exit, price caps).
- Timeline: demo window and Day‑10 decision.
Where this fits with your wider AI programme
Vendor selection is one piece of the puzzle. Combine it with a focused buyer’s week and a clear UX brief. If you’re building a shortlist now, pair this article with our AI Copilot Buyer’s Guide for a fast, confident decision.
References and useful lenses
- NIST AI Risk Management Framework and Generative AI Profile (for structuring evaluation and risk language). nist.gov
- ISO/IEC 42001 overview (for AI management systems and assurance expectations). bsigroup.com
- UK Government Security guidance on supply‑chain assurance, KPIs and subcontractors. security.gov.uk
- Cyber Essentials scheme (a practical baseline many SMEs already use). gov.uk
- NCSC cloud security principles (to probe SaaS security answers). cloud.service.gov.uk
- NPSA supply‑chain guidance (stress‑testing supplier arrangements). npsa.gov.uk