
If you bid for public sector work, you already know the pain: multi‑hundred‑page PQQ/ITT packs, multiple clarifications, and a watchful eye on deadlines. In September, a 120‑person construction SME in the Midlands asked us to cut the time senior staff spent skimming tender documents. In three weeks we piloted and shipped an AI summariser that turns 300–500 page packs into concise, role‑specific briefs and a checklist of must‑dos, directly in Microsoft 365 and Slack.
This article walks through the exact playbook we used—what we built, how we evaluated quality, the governance we put in place, and where we drew the line and said “no AI here”. It’s deliberately practical and designed for non‑technical leaders who want measurable outcomes, not a science project.
The problem in numbers
- Average tender pack: 420 pages across 11 PDFs and 2 spreadsheets.
- People involved: 1 bid manager, 2 SMEs (technical and commercial), 1 director for sign‑off.
- Baseline effort: ~18 staff hours per opportunity to produce a first‑pass brief and checklist.
- Quality risks: missed mandatory criteria, outdated templates, and drift between requirements and the internal “house view”.
Our goal was simple: deliver a first‑pass brief to the bid team within 30 minutes of uploading documents, cut re‑reading time by half, and reduce missed mandatory requirements to near‑zero.
What we built (in plain English)
The summariser is a lightweight retrieval‑augmented system: it indexes the tender pack, pulls back the most relevant sections to answer a prompt like “What are the mandatory pass/fail requirements?”, and drafts a role‑specific summary. Think of it as a clever search‑plus‑summarise layer that stays grounded in the documents the team provided.
Inputs
- PDFs, DOCX, XLSX tender files uploaded to a secure folder.
- Optional context: our client’s standard policies and boilerplate answers in a read‑only library.
Outputs
- Three summaries: director brief (1 page), bid manager drill‑down (3–4 pages), SME checklist (action items with page citations).
- Risk flags: any “must include” items; deadlines; format and submission quirks.
- Traceability: every statement links back to the original document page number.
The 3‑week rollout plan (and the gates to go‑live)
Week 1 — Discovery and a tight demo
- Pick two recent, fully‑scored tenders as “golden packs” with known answers.
- Define success: time‑to‑first‑brief under 30 minutes; coverage of mandatory criteria ≥ 95%; zero uncited claims.
- Agree guardrails: no personal data; no uploading client‑confidential information beyond the tender; read‑only access to policy library.
We anchored these decisions to the UK’s practical guidance on agile delivery and measuring progress, which favours small, testable increments and clear gates between discovery, alpha, and beta. ([gov.uk](https://www.gov.uk/service-manual/agile-delivery))
Week 2 — Pilot with evaluation baked in
- Create a small test set: 40 questions the bid team always asks (deadlines, mandatory forms, insurance levels, selection vs award, etc.).
- Automate checks: for each answer, require a citation and highlight where evidence is missing.
- Run side‑by‑side: human baseline vs. AI output; track minutes saved and corrections needed.
Week 3 — Production hardening and handover
- Set up least‑privilege access, monitoring, and a simple approval workflow before sending outputs beyond the bid team.
- Agree a runbook: who owns the tool, how to add new document types, and when to switch it off.
- Hold a go/no‑go review with sample outputs and KPI trends.
Security hygiene followed mainstream recommendations now referenced by UK bodies and partners: restrict inputs, validate outputs, and log usage. If you remember nothing else, remember to guard against prompt injection, over‑reliance, and excessive agency—the top risks called out by OWASP for LLM applications. ([owasp.org](https://owasp.org/www-project-top-10-for-large-language-model-applications/))
Quality you can measure: the evaluation checklist
- Coverage: Did we capture all pass/fail criteria? Target ≥ 95% on the two golden packs.
- Faithfulness: Are all claims supported by a citation to the tender pack? Target 100% cited.
- Clarity: Can a director read the 1‑page brief in under five minutes and understand the bid/no‑bid decision?
- Time‑to‑first‑brief: Under 30 minutes from upload to draft; stretch goal 15 minutes.
- Red flags: Automatic detection of deadlines, submission portals, file naming rules, and mandatory forms.
- Safety: No personal data in prompts; no uncited assumptions; outputs carry a “human review required” label by default.
KPI | Baseline | Target | Result after 4 weeks |
---|---|---|---|
Time‑to‑first‑brief | ~180 mins | ≤ 30 mins | 22 mins (median) |
Coverage of mandatory criteria | Not measured | ≥ 95% | 96–98% on test packs |
Uncited claims | Not tracked | 0 | 0 |
Staff hours saved per tender | — | ≥ 8 hours | 9.3 hours (est.) |
For leaders, this keeps the conversation grounded in outcomes, not model names. If you want a primer on retrieval‑based approaches, our 6‑week RAG blueprint explains the moving parts and how to budget for them.
Security and assurance without the jargon
We mapped risks to well‑known checklists so the board and the DPO had confidence without drowning in acronyms:
- Secure by design: Follow the joint guidelines on secure AI system development supported by the UK’s NCSC and international partners. In practice this means documenting threat models, limiting data exposure, and building in logging from day one. ([cisa.gov](https://www.cisa.gov/news-events/alerts/2023/11/26/cisa-and-uk-ncsc-unveil-joint-guidelines-secure-ai-system-development))
- Operational risks: Tackle the OWASP LLM Top 10 first—especially prompt injection, insecure output handling, and over‑reliance. We used allow‑lists for file types, escaped outputs, and required human review for any externally‑shared summary. ([owasp.org](https://owasp.org/www-project-top-10-for-large-language-model-applications/))
- Baseline controls: The UK’s AI Cyber Security Code of Practice translates nicely into SME actions: least‑privilege access, dependency checks, and incident response plans that include your AI components. ([gov.uk](https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice))
- Assurance language: When the board asked, “How do we know it’s good enough?”, we referenced the AI Standards Hub and the government’s portfolio of AI assurance techniques for non‑technical framing. ([aistandardshub.org](https://aistandardshub.org/guidance/introduction-to-ai-assurance/))
Cost governance that holds up in finance committee
Instead of arguing about per‑token prices, we agreed a simple, auditable budget model:
- Unit of work: One “tender brief” covers up to 600 pages; anything larger splits into batches.
- Hard limits: Cap pages and re‑runs per tender; block uploads over a threshold; alert on unusual usage.
- Three tiers of service:
- Bronze Internal use only; no external sharing; manual review required.
- Silver Can share with partners; dual review required; stricter citations.
- Gold Executive‑ready; runs extra checks and watermarking; monthly evaluation report.
- Fixed price per brief for Bronze/Silver; usage‑based for Gold with a monthly cap.
This mirrors good practice from government digital delivery—spend controls, small increments, and measurable benefits—without forcing the company into a big‑bang programme. ([gov.uk](https://www.gov.uk/service-manual/agile-delivery))
Procurement questions that saved us weeks
When shortlisting vendors or integrators, ask these and insist on evidence, not promises:
- Show three real outputs on redacted tenders similar to ours, with citations.
- What’s your approach to the OWASP LLM Top 10? Which risks are in scope day one? ([owasp.org](https://owasp.org/www-project-top-10-for-large-language-model-applications/))
- How do you implement secure‑by‑design guidance from UK and allied cyber agencies? Provide your runbook. ([cisa.gov](https://www.cisa.gov/news-events/alerts/2023/11/26/cisa-and-uk-ncsc-unveil-joint-guidelines-secure-ai-system-development))
- Where is data stored and for how long? Can we set retention to zero for raw prompts?
- How do we export our content and logs if we stop using you?
- What is the monthly cap and what happens at the cap (degrade gracefully or block)?
- Do you provide an assurance summary aligned to UK AI assurance guidance? ([aistandardshub.org](https://aistandardshub.org/guidance/introduction-to-ai-assurance/))
For a broader operating model, see our Practical AI stack for UK SMEs (2025), which covers vendor selection, security, and change management.
Where we did not use AI (and why)
- Final scoring and compliance sign‑off: Always human‑owned. The tool proposes, people dispose.
- Creating new claims: We summarise and extract only; no invention of experience or capabilities.
- Uploading partner or client personal data: Out of scope for this tool; different governance needed.
This separation keeps the system well within comfort for most UK SMEs and charities while still delivering time savings.
Results after four weeks in the wild
- Median time‑to‑first‑brief: 22 minutes.
- Coverage of mandatory criteria on golden packs: 96–98%.
- Missed “must‑include” items: 0 in the pilot cohort.
- Staff time freed per tender: ~9 hours, mostly senior SME reading time.
Operationally, the team now reviews tender packs first thing Monday with a single link and spends the next hour assigning actions rather than hunting for requirements.
Risks and mitigations you should actually track
- Prompt injection and context poisoning: Treat all input text as untrusted; restrict file types; escape outputs; never let the system execute actions or browse the web. OWASP lists these among the highest priority risks. ([owasp.org](https://owasp.org/www-project-top-10-for-large-language-model-applications/))
- Over‑reliance: Force human review before external sharing and watermark drafts as such. Also an OWASP risk area. ([owasp.org](https://owasp.org/www-project-top-10-for-large-language-model-applications/))
- Supply chain drift: Pin model versions, log when providers change defaults, and re‑run your evaluation set monthly. The UK AI Cyber Security Code of Practice recommends robust dependency management and monitoring. ([gov.uk](https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice))
- Threat environment: The NCSC and partners continue to warn that AI can amplify known threats (like phishing and ransomware). Do the basics—MFA, patching, backups—and don’t let an AI pilot distract from hygiene. ([cisa.gov](https://www.cisa.gov/news-events/alerts/2024/01/23/cisa-joins-acsc-led-guidance-how-use-ai-systems-securely))
Playbook you can reuse next week
- Pick the use case that is document‑heavy and low‑stakes for external audiences (internal briefs, not public claims).
- Assemble a 2‑hour decision group: bid lead, SME, director, data owner.
- Write acceptance tests as 30–50 questions you can repeatedly score.
- Pilot with two golden packs, measure coverage and time‑to‑first‑brief.
- Decide your caps (pages per job, runs per pack) and your monthly spend limit.
- Choose a service tier (Bronze/Silver/Gold) and stick to it for a month.
- Report weekly on the four KPIs in the table above and kill anything that doesn’t move the needle.
If you’re earlier in your journey, our AI readiness audit and board‑level AI governance guide explain how to set scope and guardrails before you pilot.
FAQs leaders asked (and the short answers)
“Can we share drafts externally?”
Only after human review and only from the Silver tier upwards. Drafts remain clearly labelled. This reduces over‑reliance risk and aligns with secure‑by‑design guidance. ([cisa.gov](https://www.cisa.gov/news-events/alerts/2023/11/26/cisa-and-uk-ncsc-unveil-joint-guidelines-secure-ai-system-development))
“Do we need an audit?”
Not for an internal summariser, but do maintain an assurance note describing purpose, scope, risks, controls, and results. Use the AI Standards Hub material to structure it. ([aistandardshub.org](https://aistandardshub.org/guidance/introduction-to-ai-assurance/))
“What about cyber risk?”
Treat it like any SaaS: least privilege, no secrets in prompts, log access, and rehearse incident response. The UK’s AI Cyber Security Code of Practice gives a pragmatic set of controls to adopt. ([gov.uk](https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice))