Product manager and UX lead mapping an AI user journey with trust, guardrails and handoffs
Product management & UX

Chat isn’t always the answer: a UK SME playbook for shipping AI features users actually adopt

Many UK SMEs and charities added a chat widget this year—and found adoption stalling after the first week. The reason usually isn’t “the model is bad”, it’s a product and UX gap: unclear capabilities, weak guardrails, no human escape hatch, and KPIs that reward demos rather than outcomes. This playbook shows non‑technical leaders how to decide when chat is right, which interaction patterns increase trust, and how to measure success without runaway costs.

Two well‑established sources set the tone. First, government service guidance: only use AI where there is a proven user need, tell people when AI is used, and provide a route to a human. Second, industry‑tested design rules that keep AI predictable and recoverable. We reference both throughout so you can reuse proven patterns rather than reinvent them. ([gov.uk](https://www.gov.uk/service-manual/technology/using-artificial-intelligence-ai-in-services))

When is chat the right UI? A 7‑question decision checklist

Use this to avoid “chat by default”. It complements guidance that not every problem needs conversational UI. ([uxlift.org](https://www.uxlift.org/articles/ai-chat-is-not-always-the-answer/))

  1. Is the task exploratory or ambiguous? Chat helps with brainstorming, clarifying documents, or navigating edge cases. If the task is a fixed transaction (for example, booking a slot), prefer forms.
  2. Will natural language make the task faster than your current UI? If not, reconsider.
  3. Can the assistant act—or only explain? If it cannot take constrained actions with approvals, chat risks becoming a slower help page.
  4. Do you have safe fallbacks? Users must be able to switch to a person, phone or email within two clicks. ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))
  5. Can you show sources or confidence? If you cannot ground answers with references or a “check answers” step, restrict the scope.
  6. Is the cost per successful task within budget? Set a target, for example “under £0.12 per resolved query”, and design to hit it.
  7. Will you measure outcomes (tasks completed, errors avoided) not just usage? Tie KPIs to business value.

If you answered “no” to any of 3–7, favour a structured assistant (buttons, forms, summaries, approvals) over an open chat field. Google’s People + AI Guidebook also suggests aligning the “reward function” with user goals early—decide what “good” looks like before you ship. ([pair.withgoogle.com](https://pair.withgoogle.com/old-gb/))

12 interface patterns that increase trust and reduce risk

These patterns map to well‑validated Human‑AI guidelines—set expectations, recover gracefully when wrong, and give users control. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

1) Capability banner and limits

On first launch, state clearly what the assistant can and cannot do, plus typical accuracy. Example: “This assistant summarises PDFs and drafts emails. It cannot give legal advice.” This aligns to “Make clear what the system can do” and “how well it can do it”. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

2) Confidence and sources by default

For factual answers, display source links and a short “why this answer” note. Where you cannot show sources, show a gentle confidence cue and prompt users to verify before acting. Google PAIR’s Explainability + Trust guidance is useful here. ([pair.withgoogle.com](https://pair.withgoogle.com/old-gb/))

3) Guardrails with quick actions

Use buttons like “Summarise this”, “Extract key dates”, or “Propose 3 options” to keep users on the happy path and reduce prompt errors. Always provide an escape to a human. GOV.UK’s guidance explicitly recommends alternatives to the tool to prevent loops. ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))

4) “Check answers” step before action

Before sending anything externally—emails, letters, data updates—insert a short review step so users can edit. This mirrors the government pattern that reduces errors and improves confidence. ([design-system.service.gov.uk](https://design-system.service.gov.uk/patterns/check-answers/))

5) Human handoff

Offer handoff when confidence is low, the user asks twice, or keywords indicate distress. Make the route obvious: live chat, a named inbox, or a call‑back form. ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))

6) Scope when uncertain

If the system is unsure, ask a clarifying question or narrow the task, rather than guessing. This reflects “Scope services when in doubt”. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

7) Memory that’s easy to reset

Remember recent context (“the same supplier as before”), but show what’s remembered and provide “Start fresh”. That balances convenience with predictability. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

8) Event log and costs panel

Show a simple activity log (“Draft created → You edited → Sent”) and a per‑conversation cost estimate so teams can govern spend.

9) Safe defaults for sensitive data

Mask personal data by default, ask before sending externally, and provide templated responses for regulated scenarios. Keep legal review offline; the assistant can prepare, not approve.

10) Draft‑then‑edit, not auto‑send

Default to “Create a draft” that the user edits. It preserves control and makes quality visible before impact.

11) Explain actions and consequences

When a user clicks “Learn my preferences”, say what changes and how to undo it. This follows “Convey the consequences of user actions”. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

12) Version notice

Notify users when behaviour changes (“Updated writing style on 18 Oct”). Provide a “What’s new” link for transparency. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

A 2‑week validation plan: prove UX fit before you scale

Week 1 — Understand tasks and set a baseline

  • Top‑tasks inventory: list the 5 most common user tasks and their value (time saved, cash impact).
  • Benchmark the current journey: measure time‑to‑complete and error rate on today’s process with 5–7 users.
  • Define “good”: write a one‑line outcome per task, for example “Produce a 150‑word supplier email in under 90 seconds with zero confidential data leaks.” Google’s Guidebook calls this aligning the reward function to user goals. ([pair.withgoogle.com](https://pair.withgoogle.com/old-gb/))
  • Choose UI mode per task: chat, structured assistant, or no AI (yet).

Week 2 — Prototype, guardrail, and test

  • Prototype the smallest version that can succeed: 3 quick actions, draft‑then‑edit, a review step, and human handoff.
  • Run usability tests with 5–7 target users. Measure completion, edits, and escalations.
  • Decide go/no‑go for a limited pilot. If you lack sourcing or evaluation capacity, try our 5‑Day AI Evaluation Sprint to de‑risk choices.

Throughout, apply two simple rules from Human‑AI guidelines: be clear up front about capability and quality, and make it easy to correct or dismiss when the system is wrong. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

Cost and risk: choose the right pattern for each task

Pattern Best for Typical risk Unit cost mindset Notes
Open chat Exploration, Q&A on documents, ideation Medium: off‑topic, over‑confident answers Cap per message or per session; display estimate to the user Requires strong sourcing, confidence cues, and easy handoff. ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))
Structured assistant (quick actions + drafts) Repeatable tasks (summarise, extract fields, draft reply) Lower: constrained actions and review step Bundle tasks into small “jobs”; alert if over budget Often outperforms chat for throughput and quality.
Inline suggestions in existing form Speeding up a known flow (e.g., writing a reason) Lower: user remains in control Included in page render; cache where safe Great first step if you’re new to AI UX.
No AI (yet) Deterministic, high‑risk or short tasks Lowest None Matches GOV.UK advice to use established tech when better. ([gov.uk](https://www.gov.uk/service-manual/technology/using-artificial-intelligence-ai-in-services))

Procurement questions for vendors (copy/paste)

  1. Which interaction modes do you support out of the box (chat, quick actions, draft‑then‑edit, review step)? Show screenshots.
  2. How do you implement capability and quality “up‑front disclosures” for users?
  3. What controls exist for human handoff and escalation rules? Can we route by topic, risk, or confidence? ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))
  4. Can users see and edit the sources used to answer a question?
  5. How do you prevent “silent failures” (for example, timeouts, tool errors)? What does the user see?
  6. What’s the per‑task cost at our scale? How can we cap or alert on spend in real time?
  7. Which guidelines do you design against (for example, Microsoft’s Human‑AI Interaction)? Give concrete examples. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))
  8. What A/B or usability testing is built in? Can we export raw UX metrics?
  9. How are “What’s new” notices and version changes shown to end users? ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))
  10. What’s the fallback if AI is unavailable? Show the non‑AI path.

KPIs that predict adoption (and those that don’t)

Leading indicators (ship and iterate against these)

  • Task completion rate with AI vs. without AI (target: +20% within 4 weeks).
  • Median time‑to‑complete per task (target: −30%).
  • Edit ratio on AI drafts (healthy range: 20–60% edits; <20% may mean over‑automation risk).
  • Escalation rate to human (target: stable or down; spikes indicate unclear scope or low confidence thresholds).
  • Per‑task cost (keep within your agreed envelope; show users a running estimate to nudge efficient use).

Lagging indicators (track, but don’t optimise week‑to‑week)

  • Errors caught at “check answers” stage vs. post‑send. Aim to shift errors upstream. ([design-system.service.gov.uk](https://design-system.service.gov.uk/patterns/check-answers/))
  • Customer satisfaction on AI‑assisted tasks.
  • Time saved by team per month (hours × blended hourly rate).

Avoid vanity metrics like “messages sent”. They rarely correlate with outcomes. Instead, tie each KPI to a specific task and user goal. Google’s People + AI Guidebook recommends making success criteria explicit at design time. ([pair.withgoogle.com](https://pair.withgoogle.com/old-gb/))

Rollout: from UX fit to production

Once the UX works in small pilots, move to a time‑boxed rollout. Keep the guardrails and KPIs in place as you scale usage. For a structured programme, see our 12‑week launch plan (From Pilot to Production in 12 Weeks) and, if you’re grounding answers in your own knowledge, the 6‑Week RAG Blueprint.

Still choosing a model or testing quality? Use our 5‑Day AI Evaluation Sprint. For stack choices and hosting options, see Practical AI Stack 2025.

Common pitfalls (and quick fixes)

  • Problem: Users trust confident wrong answers. Fix: show sources and add a “Verify before you send” nudge on high‑impact tasks; prefer drafts over auto‑send. Industry research warns against over‑reliance on confident UIs. ([uxlift.org](https://www.uxlift.org/articles/ai-chat-is-not-always-the-answer/))
  • Problem: “Blinking cursor” paralysis in chat. Fix: start with 3–5 context‑aware quick actions and an example input; provide a visible route to a human. ([gov.uk](https://www.gov.uk/guidance/using-chatbots-and-webchat-tools))
  • Problem: Unclear scope—assistant attempts everything. Fix: constrain to high‑value tasks and use clarifying questions when uncertain. ([microsoft.com](https://www.microsoft.com/en-us/research/blog/guidelines-for-human-ai-interaction-design/))

What to do this week

  1. Run the 30‑minute task trawl and pick one task where AI can genuinely help.
  2. Prototype a structured assistant: 3 quick actions, draft‑then‑edit, “check answers”, human handoff.
  3. Test with 5 users; ship to 10–20 staff; review the KPIs above after one week.
  4. Decide whether to expand to chat for exploratory tasks—or double down on structured flows.