Many UK SMEs and charities still handle customer, donor or supplier emails in a shared inbox. It works—until it doesn’t. During peak weeks, response times slip, messages are missed, and your best people spend their day copying and pasting. In this case study, we show how a 90‑person UK services SME shipped AI email triage in 14 days, without adding headcount or ripping out existing tools.
We’ll cover the scope we chose, the steps we took, and the KPIs and guardrails that made the rollout safe and measurable. Where helpful, we point to credible external guidance—for example, the government’s recently published Code of Practice for the Cyber Security of AI, which reinforces practical controls like asset inventories and protecting test data. gov.uk
What “AI email triage” actually means (for non‑technical leaders)
- Classification: New emails are labelled into a small set of categories you agree—e.g. billing, complaints, quotes, appointment changes, job applications.
- Routing: Based on the label and simple rules, emails are routed to the right team, queue or ticketing tool. In Microsoft 365, that often starts with a shared mailbox; in Google Workspace, domain‑level Gmail routing keeps headers intact for audit. support.microsoft.com
- Assistive replies: For common scenarios, the system drafts a response for a human to approve or tweak.
- Lookup and attach: It can fetch relevant snippets (pricing policy, returns window, opening hours) from your knowledge base or website and include them in the draft reply.
- Escalate with context: When the AI isn’t confident, it escalates to a human with a short summary and suggested next steps.
Crucially, we didn’t aim for full automation. The goal was to reduce the cognitive load on staff and improve first reply time and time to resolution, two widely used service metrics. zendesk.co.uk
Before you start: three leadership decisions
- Define “safe data”. Decide what content is allowed into the AI workflow. UK government guidance is clear: don’t input sensitive or personal data into public tools; if you use AI, use approved, controlled routes. Apply that principle to your set‑up. gov.uk
- Pick a narrow label set (6–10 max). Fewer, clearer labels improve accuracy and usability. You can expand once you’re measuring quality.
- Set measurable success targets. For example: reduce first reply time by 35% on “Quotes” emails; keep misroutes under 3%; reach 75% human‑accepted draft replies by week four. Align targets to your SLAs, not vanity metrics.
The 14‑day rollout plan (that actually sticks)
Days 1–3: Scope, data, baseline
- Choose one inbox (e.g. enquiries@) and 2–3 email categories with the biggest impact—typically billing and quotes.
- Export the last 400–600 emails and hand‑label 150 of them. This seed set powers your first quality measurements.
- Capture your baseline: first reply time, time to resolution, percent of messages needing escalation, and weekend backlog. Use your helpdesk/CRM reports or mailbox analytics. zendesk.co.uk
Days 4–6: Light plumbing and guardrails
- In Microsoft 365, keep the shared mailbox as the anchor; in Google Workspace, configure domain‑level routing so copies are delivered to your triage tool while preserving headers and audit trails. support.microsoft.com
- Apply minimal security hygiene: asset inventory, change control for prompts and label sets, and clear rollback. These echo the UK Code of Practice for AI cyber security. gov.uk
- Decide what cannot be processed (e.g. CVs, medical details). Filter those to manual handling.
Days 7–10: Pilot with humans‑in‑the‑loop
- Switch on triage for 25–30% of inbound volume during business hours only; staff can accept, edit, or reject AI drafts.
- Track quality: misroute rate, draft accept rate, “needs more info” loops, and any off‑brand tone. Keep a daily scoreboard.
- Hold 15‑minute stand‑ups with the team: what labels are fuzzy, what templates need fixes, where cost spiked.
Days 11–14: Scale, train, sign‑off
- Expand to 70–80% of volume; keep a manual queue for edge cases.
- Run a simple user acceptance test (UAT) plus a go/no‑go review with SLAs and cost checks before full roll‑out.
- Update your playbook, add on‑call escalation and a backout plan. UK and international secure‑by‑design guidance emphasises having a known good state to restore. gov.uk
Architecture at a glance (plain English)
- Ingress: Email lands in a Microsoft 365 shared mailbox or a Google Workspace address. Domain‑level routing copies the message to the triage service while preserving headers for audit. support.microsoft.com
- Triage service: Extracts sender, subject and body; classifies against your label set; checks confidence.
- Knowledge look‑ups: Pulls snippets from approved sources (FAQs, price list, terms) to reduce back‑and‑forth.
- Draft reply: Generates a suggested response and suggested next actions. If confidence is low, only a summary is created.
- Routing: Sends to the right team queue, with the label and summary in the subject line for quick scanning.
- Human approval: Agent reviews, edits, and sends; their action becomes training feedback for improvements.
Quality and risk scoreboard you can run weekly
| Metric | Target | What good looks like | Action if off‑track |
|---|---|---|---|
| First reply time (business hours) | Down 25–40% | Median FRT falls in week 2; staff use drafts without over‑editing. zendesk.co.uk | Reduce label count; adjust prompts; expand templates for top 10 intents. |
| Time to resolution | Down 15–25% | Fewer back‑and‑forth loops for common questions. zendesk.co.uk | Add knowledge snippets; require order/ID capture in first reply. |
| Misroute rate | < 3% | Most emails land in the correct queue first time. | Clarify label definitions; add negative examples to the seed set. |
| Draft reply acceptance | ≥ 70% | Agents lightly edit and send; tone is on‑brand. | Refresh tone rules; add 5–7 high‑quality examples per label. |
| Security hygiene | 100% controls applied | Asset inventory, change log, rollback tested. gov.uk | Pause changes; restore previous config; fix gaps before resuming. |
For a deeper framework to set SLAs, KPIs and go/no‑go gates, see our AI Quality Scoreboard.
Cost guardrails that prevent bill shock
- Traffic caps: Limit the triage service to business hours for week one. Consider excluding large attachments from AI processing.
- Confidence thresholds: Only generate full drafts above an agreed confidence level; otherwise produce summaries only.
- Template‑first: Encourage re‑use of approved snippets to reduce token usage.
- Weekly variance check: Compare usage and cost to ticket volume; investigate spikes before scaling.
If you’ve not set up guardrails before, borrow the ideas in Beating AI Bill Shock.
Safety and privacy: simple, non‑negotiable controls
- Keep sensitive data out of public tools. Use approved, managed routes. This echoes UK government guidance to staff and is a good north star for SMEs. gov.uk
- Maintain an AI asset inventory. Track prompts, templates, label sets, model versions, and who changed what, when—explicitly recommended in the UK AI cyber security Code of Practice. gov.uk
- Test rollback. You should be able to restore a known good state quickly if quality drifts or a change misroutes mail. gov.uk
- Phishing awareness. Expect more convincing scam emails in 2025; ensure your triage never auto‑replies to suspicious mails, and build an “escalate to security” route. theguardian.com
- Secure‑by‑design mindset. International guidance co‑published by the UK NCSC and CISA underlines building in security from design to operations—relevant even to “simple” email triage projects. cisa.gov
Change management that frontline teams accept
- Involve agents early. Ask them to pick the first label set and review the first 150 labelled messages.
- Make the AI optional at first. Let staff accept, edit or discard drafts; measure acceptance rather than force adoption.
- Short training, high repetition. 25‑minute demos, twice a week for the first fortnight; record 5 “golden examples” per label.
- Celebrate time savings, not headcount cuts. Reinvest the time saved into proactive outreach and complex cases.
If your team is building new skills, this AI skills matrix can help you spread knowledge beyond one or two “AI champions”.
Procurement questions that separate promises from delivery
- Routing: Do you support Microsoft 365 shared mailboxes and Google Workspace domain routing without per‑user rules? How do you preserve original headers for audit? support.microsoft.com
- Security & data: Where is data processed and stored? Can we opt‑out of training? Provide your change log and rollback procedure aligned to the UK Code of Practice. gov.uk
- Quality: How do we measure misroutes and draft acceptance? What’s your recommended label set size at go‑live?
- Cost: What controls exist for rate limiting, business‑hours processing, and attachment handling?
- Support: What is your on‑call escalation? Do you provide a named success manager for the first 30 days?
When you’re ready to sign off, adapt the five‑day approach in our UAT for AI features post to this workflow.
Seven implementation pitfalls (and how we avoided them)
- Too many labels. Start with 6–10; more labels equals more confusion and weaker accuracy.
- DIY forwarding rules. Personal mail rules break audit trails; use shared mailboxes or admin‑level routing. support.microsoft.com
- No baseline. If you don’t measure before, you can’t prove the win after. Capture FRT, time to resolution, and backlog. zendesk.co.uk
- Unclear “do not process” list. Define sensitive content to exclude up‑front in line with UK guidance. gov.uk
- Weak rollback. Test restore to a previous config before go‑live—documented in secure‑by‑design guidance. gov.uk
- Ignoring phishing risk. Never auto‑reply to suspicious emails; route to security review. theguardian.com
- No incident drill. Run a 30‑minute “what if” exercise so everyone knows how to pause or revert. Use our AI Incident Drill as a template.
Results to expect in month one
Every organisation is different, but with a focused label set and disciplined routing you should see:
- Faster first replies because the AI prepares the first draft and captures key info up front. zendesk.co.uk
- Shorter resolution times thanks to fewer back‑and‑forth emails on FAQs. zendesk.co.uk
- Happier agents who spend less time classifying and more time solving actual problems.
- Predictable costs if you apply the guardrails outlined above and review weekly.
When you’re ready to scale beyond email, our posts on taking AI pilots live and evaluation tests can help you plan the next phase.
Quick checklist for directors and ops managers
- Agree the first 6–10 labels and “do not process” list.
- Choose one inbox and 2–3 categories with the highest time cost.
- Label 150 historical emails for a seed quality set.
- Enable shared mailbox or domain routing centrally; avoid personal rules. support.microsoft.com
- Implement change logging, rollback, and a weekly quality review. gov.uk
- Set targets for FRT, resolution time, misroutes, and draft acceptance. zendesk.co.uk