Directors comparing AI vendor options on a one-page scorecard in a London meeting room
Vendor selection

The 2026 AI Vendor Scorecard for UK SMEs: Compare Platforms in One Afternoon

If you’ve ever sat through three glossy demos and still couldn’t tell which AI platform is the best fit, this one-page scorecard is for you. It’s designed for non‑technical leaders to compare options quickly and defensibly across six things that actually matter: outcomes, cost, reliability, data custody, safety, and exit terms. Use it to shortlist vendors for a bake‑off this month, then negotiate from a position of clarity.

This article includes a ready-to-run scoring method, procurement questions, and red‑flag checks. Where helpful, we reference independent guidance such as the UK National Cyber Security Centre’s secure AI development principles, the NIST AI Risk Management Framework, the World Economic Forum’s AI procurement guidelines, and current cloud data transfer policies that affect exit costs.

Your one-page AI vendor scorecard (10 criteria, 10 points each)

Score each vendor from 0–10 per criterion. Weight as shown, or adjust to your priorities. A quick rule: only one vendor per row may score 9 or 10—force trade‑offs.

Criterion Weight What “good” looks like Quick red flags
Business outcomes & fit 20% Clear, quantified KPIs tied to your use cases (e.g., 20% email deflection, 30% faster case handling) with a plan to measure. Feature tour without a measurement plan; vague “productivity uplift”.
Total cost of ownership 15% Transparent pricing by seat and usage; clear storage, vector, fine‑tuning, and data transfer charges; cost guardrails in the product. “All‑inclusive” bundles that hide usage tiers; no way to cap spend.
Reliability & support 15% Published SLAs and customer-facing SLOs; real‑time status; rollback playbook; UK business‑hours support plus out‑of‑hours incident route. No uptime track record; “best efforts” support only.
Data custody & residency 10% UK/EU hosting option; clear data‑processing terms; no training on your business data by default; documented data deletion. Ambiguous data usage; unclear where data is stored or processed.
Security posture 10% Aligned to NCSC secure AI guidance; OWASP LLM Top‑10 aware; independent assurance (ISO 27001 or SOC 2; working towards or certified against ISO/IEC 42001 where proportionate). No security lead; no third‑party assurance; default‑open permissions.
Safety & evaluation 10% Built‑in safety filters; bias/abuse testing; traceability of sources; ability to run your own evaluations on representative content. “Trust us, it’s safe”; no way to test on your data before signing.
Integration & workflow fit 8% Connectors for M365/SharePoint, Google Drive, Slack, Salesforce, Zendesk, etc.; audit logs; single sign‑on. Manual CSV exports; no SSO or audit trail.
Change control 5% Version pinning; canaries; rollbacks; release notes that show impact. Breaking changes pushed silently; no rollback path.
Exit & portability 5% Export of prompts, fine‑tunes, embeddings, and knowledge bases; documented data egress process and costs; 30–60 day assisted handover. “We can export PDFs”; punitive egress or proprietary formats only.
References & momentum 2% Relevant UK references; an active roadmap aligned to your needs. All reference calls outside your sector; roadmap locked behind NDA.

Add up each vendor’s weighted total—then take the top two into a short, time‑boxed trial.

What to ask vendors (and why it matters)

1) Data usage and residency

  • “Do you train any models on our prompts, outputs, or files by default? If not, where is this written?” Enterprise positions vary. For example, OpenAI states it does not train on business data by default for enterprise and API products, with details in its Enterprise data‑usage commitments. Microsoft provides similar commitments for Azure OpenAI and Copilot services, outlined in its enterprise privacy statement.
  • “Can you guarantee UK or EU processing for both storage and runtime?” Ask for the specific Azure/AWS/GCP regions used and where logs and backups live.
  • “What’s the deletion path and timeline for user content, logs and embeddings?” You’ll want this in writing.

2) Security and assurance

3) Reliability and service levels

  • “What SLA do you offer and how do you measure it?” Ask for user‑centred SLOs as well, not just backend uptime. Google’s SRE materials explain how SLIs/SLOs help align reliability to user outcomes—useful when agreeing measures of success (overview).
  • “What happens if you change models under the hood?” You need version pinning, canaries and rollbacks so you can ship changes safely.

4) Costs you can actually control

  • “How are we billed?” You’ll typically see a hybrid of seats and usage (tokens/calls), plus storage for knowledge bases and vector databases.
  • “What levers keep costs predictable?” Look for quotas, per‑team budgets, and alerts; see our 90‑day cost guardrail playbook.
  • “What are exit costs?” Cloud providers now publicise data egress policies. Azure documents bandwidth and egress pricing and credits on its pricing page. AWS announced in 2024 it would remove network data transfer fees for customers moving data to other cloud providers, which reduces lock‑in at exit—useful leverage in negotiations (Reuters coverage).

How to run a fair, short vendor comparison (10 working days)

  1. Define the three outcomes you’re buying: for example, “reduce inbound email by 20%”, “cut time‑to‑first‑answer from 11 to 5 minutes”, “raise self‑serve satisfaction to 4.4/5”.
  2. Assemble a 30‑document pack representative of your real world (policies, FAQs, tricky edge cases), and a 100‑question test set covering common, rare and high‑risk queries.
  3. Run “blind” evaluations using the same content and questions across vendors. This can be as simple as measuring accuracy and helpfulness with a two‑person review panel. The NIST AI Risk Management Framework and its 2024 GenAI profile offer practical dimensions you can adapt.
  4. Score using the one‑page matrix above. Keep notes on any “manual magic” (e.g., a vendor quietly tweaks prompts mid‑trial).
  5. Invite the top two to a 2‑week bake‑off with a fixed scope, success criteria and a capped support allowance. For a ready‑made structure, see our two‑week vendor bake‑off.

Costs and traps to surface early

Cost line item Why it matters Ask vendors Watch‑outs
Seats and roles Human access drives recurring spend. “Viewer/Editor/Admin pricing? Community or frontline licences?” All‑or‑nothing seat pricing; no light‑use role.
Usage (tokens/calls) Spiky volumes can blow budgets. “Throttle per team? Caps and alerts?” No hard limits; no per‑dept budgeting.
Knowledge base storage Documents, embeddings, and history can grow fast. “Per‑GB and per‑embedding pricing?” Bundled tiers that jump sharply.
Data transfer/egress Matters at exit and for analytics pipelines. “What are egress fees today? Any exit credits?” Opaque or “contact sales”; no written policy.
Support Who answers when things break? “UK hours, severity SLAs, and escalation path?” Email‑only; no on‑call for P1 incidents.
Customisation Fine‑tunes and private connectors can be pricey. “Fixed‑price packages? Who owns the artefacts?” Work‑for‑hire without IP clarity.

Cross‑check any egress or inter‑region transfer assumptions against your cloud provider’s live pages (e.g., Azure bandwidth pricing) and supplier announcements (e.g., AWS’ 2024 commitment to waive network fees for switching).

Safety and misuse: minimum viable checks

  • Prompt‑injection resilience. Vendors should demonstrate practical mitigations for OWASP LLM01 prompt injection and insecure output handling, not just prompts that “say no”.
  • Content provenance and source citations. Can the tool show where an answer came from? This reduces hallucinations and speeds reviews.
  • Human‑in‑the‑loop for risky actions. Anything that triggers payments, sends emails at scale, or changes records should require explicit human approval.
  • Evaluation harness. Can you run your own weekly tests on fresh content and see trends? This is where the NIST AI RMF language helps: treat it like quality management for AI.

Model choice without the hype

You don’t need the “best” model—you need the best model for your tasks at your price point. Ask vendors:

  • “Can we pin the model version and switch between ‘good/fast/cheap’ tiers per workflow?”
  • “What is your fall‑back plan if an upstream model is degraded?”
  • “Can we bring our own model later?” Even if you won’t, the option disciplines pricing.

Agree user‑centred SLOs for key flows (e.g., “answer accuracy ≥80% on our curated set; 95th‑percentile response time ≤3s; weekly regression ≤5%”). The SRE SLO approach is a helpful shared language for both business and supplier.

Make the contract work for you

Keep legal lean but purposeful. At minimum: data usage and deletion; residency; security commitments; change control; evaluation rights; exit and assistance. If you need a deeper starting point, our AI contract addendum lays out focused clauses to prevent surprises.

For public‑sector adjacent charities or suppliers, the World Economic Forum’s AI Procurement in a Box offers a simple, principle‑based checklist you can adapt without getting regulatory.

KPIs you can evidence in month one

  • Service KPIs: first‑response time, resolution time, deflection rate, satisfaction score, answer accuracy on a curated set.
  • Reliability KPIs: availability, 95th‑percentile latency, weekly regression against last month’s test set.
  • Cost KPIs: cost per answer, cost per resolved case, spend vs budget by team, egress spend.
  • Safety KPIs: flagged content rate, prompt‑injection detection rate, false‑positive moderation rate.

Agree baselines and publish a simple monthly dashboard. If the vendor can’t instrument these, that’s an early signal.

Red flags (walk away when you see these)

  • “We can’t share the model version or change log.”
  • “We use your data to improve our models by default; opt‑out is manual per user.”
  • “Export is screenshots or PDFs only.”
  • No acknowledgement of LLM‑specific risks like prompt injection or output handling issues.
  • No UK/EU region option or unclear sub‑processor list.

If in doubt, rerun a small, week‑long reliability sprint on your content to verify claims. We show how to do that safely in this 7‑day reliability sprint.

Where this leaves you

Procurement doesn’t need to be an epic. A one‑page scorecard, a curated content pack, and two weeks of structured bake‑off will tell you more than hours of sales theatre. Use cloud egress policies and portability commitments to keep power balanced; use NCSC and NIST guidance as a neutral anchor; and insist on user‑centred SLOs so everyone knows what “good” means in production. When you’re ready to operationalise, our change‑safety playbook will help you keep improvements shipping without surprises.

References you can cite to your board