“AI copilot” has become the label on everything from smart search to ticket triage and sales enablement. The risk for UK SMEs and charities is buying a glossy demo that turns into shelfware or spiralling costs. This guide gives non‑technical leaders a structured, one‑week process to move from noise to a credible shortlist — with a clear demo script, pricing checks, and evidence requests that vendors can’t dodge.
We draw on practical assurance guidance from the UK’s Centre for Data Ethics and Innovation (CDEI) and NIST’s AI Risk Management Framework to focus on evidence rather than promises, and on operational fit rather than generic feature lists. See the CDEI’s roadmap for building an AI assurance ecosystem, and NIST’s AI RMF functions (govern, map, measure, manage) for a common-sense lens on AI risk and quality. Sources: CDEI roadmap; NIST AI RMF.
Your 1‑week plan (Mon–Sun)
Day 1 — Define value and red lines
- Pick one job to be done (e.g. reduce support backlog; qualify inbound leads; speed up casework drafting).
- Quantify today’s baseline and desired outcome (e.g. 20% email deflection; 2 hours saved per case).
- Set non‑negotiables: data residency, SSO, audit logs, human override, no training on your data by default. Example vendor commitments worth requesting in writing: OpenAI Enterprise privacy; Microsoft on enterprise AI data protection.
Day 2 — Build a long‑list, then a shortlist
- Start with 8–10 options (product vendors and platforms). Trim to 4 on fit, mandatory controls, and pricing transparency.
- Ask each vendor for two documents up front: security certifications (ISO/IEC 27001, SOC 2 Type II) and a 2‑page architecture and data flow overview.
Day 3 — Script the demo and send a data sample pack
- Provide 10–20 de‑identified examples that reflect reality (messy spellings, attachments, or complex intents).
- Define success criteria you can measure live (first‑pass accuracy, handoff quality, latency, explainability).
Day 4–5 — Demos you control, not vendor theatre
- Run the same script with all vendors. Include adversarial prompts to smoke‑test prompt injection and data exfiltration resilience — areas highlighted by UK government and NCSC‑linked guidance. See the AI Cyber Security Code of Practice.
- Score immediately with a 10‑point grid (below) and capture evidence links/screens.
Day 6 — Pricing sanity checks
- Translate pricing to “cost per successful outcome” (ticket deflected, call summarised, case drafted) rather than per seat or per 1,000 tokens.
- Pressure test peak scenarios and rate limits; model a worst case and a cap. If helpful, use the board‑ready templates in The AI unit economics board pack.
Day 7 — Shortlist with confidence
- Pick 2 finalists. Move into a bounded, 30‑day paid pilot with explicit KPIs, exit clauses, and a rollback plan. For contracting tips, see The UK SME buyer’s playbook for AI contracts.
The 10‑point demo scorecard
| Dimension | What good looks like | Score (1–10) |
|---|---|---|
| Task success | ≥80% first‑pass success on your sample pack; clear reasons when unsure. | |
| Safety & controls | Resists prompt injection and data leakage; safe fallbacks documented. Refer to UK Code of Practice. | |
| Auditability | Every action traceable with timestamps, prompts/parameters, model versions and sources. | |
| Data handling | No training on your business data by default, with contract wording; documented retention controls. See OpenAI Enterprise and Microsoft. | |
| Latency | < 3s for typical turns; graceful degradation under load. | |
| Human‑in‑the‑loop | Easy escalation to a person; approvals/logging for sensitive actions. | |
| Admin & SSO | SAML/SSO, role‑based access and per‑feature controls. | |
| Portability | Export of prompts, configs and knowledge bases; clean exit plan. | |
| Unit economics | Transparent pricing; caps and alerts; predictable cost per successful outcome. | |
| Delivery fit | Clear pilot plan, named team, and support hours aligned to UK time. |
40 buyer questions vendors should welcome
Value & fit
- Which three outcomes will you improve in our first 30 days, and by how much?
- Show recent ROI from similar UK organisations (sector, size, data constraints).
- What changes in our current process are required to realise value?
Security, privacy & data handling
- Confirm in the contract: no training on our data by default; provide the policy URL and exact clause wording. See examples from OpenAI and Azure OpenAI.
- What certifications do you hold today? Provide current certificates: ISO/IEC 27001:2022 and SOC 2 Type II. ISO 27001 is the widely‑used benchmark for information security management; UKAS‑accredited bodies like BSI certify against the 2022 revision.
- Where is data processed and stored (regions)? What is the default retention and how do we set it to 0–30 days?
- How do you defend against prompt injection and data exfiltration? Map your controls to the UK AI Cyber Security Code of Practice.
Quality & assurance
- Which metrics do you track (accuracy, refusal rate, escalation rate, hallucination rate)? How often are models and guardrails updated?
- Do you support evaluation on our data prior to contract? How do you log prompts, parameters and model versions for audit?
- Can you provide a short “model card” or equivalent risks summary? If not, share a written description of typical failure modes and mitigations.
Commercials & cost governance
- Break down pricing by fixed fee, variable usage, add‑ons, and services. What happens at peak season? Is there a usage cap with graceful throttling?
- What’s the price per successful outcome and payback at our volumes? See unit‑economics prompts in this board pack.
- What’s included in support, response times, and uptime SLAs? Remedies beyond credits?
Delivery & change
- What’s your 30‑day pilot plan? Who are the named UK‑hours contacts? What’s needed from us?
- How will end‑users be onboarded? What training content, nudge tips, and change comms do you provide?
- How do we configure safety thresholds and approvals for sensitive actions?
Exit & portability
- How do we export prompts, knowledge, and conversation logs in an open format?
- What’s the data deletion process and evidencing? Timeline from request to completion?
- What are your standard off‑boarding services and rates? Any termination fees?
Background reading: UK assurance ecosystem work by the CDEI; and the NIST AI RMF for practical risk functions.
What to verify in security paperwork
- ISO/IEC 27001:2022 certificate from a UKAS‑accredited body (e.g. BSI). The 2022 revision modernises controls; many organisations are still transitioning, so check dates. See BSI’s update on 27001:2022 certification and transition timelines.
- SOC 2 Type II report (period of at least 6 months). SOC 2 covers controls across security, availability, processing integrity, confidentiality and privacy — a useful complement to ISO 27001. See AICPA’s overview of SOC 2 and Trust Services Criteria.
- Data privacy stance: explicit statement that your data isn’t used to train foundation models by default for enterprise products (e.g. OpenAI Enterprise; Microsoft enterprise AI).
Useful references: BSI on ISO/IEC 27001:2022; AICPA & CIMA SOC 2 explainer.
Pricing that won’t bite later
- Translate to outcomes: price per ticket deflected, per qualified lead, or per case drafted. Tie bonuses/credits to outcome SLAs, not usage only.
- Cap and alert: monthly hard caps, anomaly alerts, and weekly cost reporting. Insist on admin‑side rate limits and per‑feature toggles.
- Seasonality: confirm peak throughput, queueing, and fair‑use policies. What happens if you exceed rate limits?
- Services creep: separate platform subscription from change requests and data wrangling. Fixed‑price bundles for pilot.
For a deeper budgeting approach, see Beating AI bill shock.
KPIs for a 30‑day pilot
- Adoption: weekly active users; % of target team using feature 3+ times/week.
- Quality: first‑pass accuracy; escalation rate; user‑rated usefulness; “unsafe output” flags per 1,000 interactions.
- Speed: median and 95th percentile response time.
- Cost: cost per successful outcome; % within budget cap.
- Operations: time‑to‑rollback; mean‑time‑to‑resolution for incidents.
If you need a framework to set thresholds and acceptance criteria, borrow from The AI Quality Scoreboard.
Red flags (politely walk away)
- Unwilling to run your demo script or to test on your sample data.
- No written commitment that enterprise data is excluded from model training by default.
- Vague or expired certifications; no SOC 2 Type II period or an unwillingness to share under NDA.
- No audit trail of prompts, parameters, or model versions.
- Opaque pricing, no caps, or “unlimited” promises with tiny fair‑use footnotes.
The simple, evidence‑first shortlist pack
Send this one‑pager to your execs/trustees:
- 1‑line problem and target outcome (e.g. “Reduce email backlog by 25% in 30 days”).
- Scorecard (top two vendors, average score out of 10 with notes).
- Security (certificates, data policy links, regions, retention).
- Commercials (pilot price, cap, success fee, exit terms).
- Risks and mitigations (top 3, owner, timeline).
Then proceed to a bounded pilot with clear exit ramps — avoiding the pitfalls in Nine AI procurement traps UK SMEs can avoid.
Appendix: references you can cite in procurement
- CDEI — Roadmap to an effective AI assurance ecosystem (assurance market and roles).
- NIST AI Risk Management Framework (govern, map, measure, manage functions).
- UK AI Cyber Security Code of Practice (AI‑specific security expectations such as prompt injection and asset inventories).
- AICPA & CIMA — SOC 2 (Trust Services Criteria overview).
- OpenAI Enterprise privacy and Microsoft enterprise AI data protection (example vendor commitments on training and privacy for enterprise products).