Documents and dashboards showing hybrid search and freshness metrics
Data foundations & retrieval patterns

From SharePoint to solid answers: the hybrid retrieval, freshness and evaluation playbook for UK SMEs

Most UK SMEs already have the right information somewhere — in SharePoint/OneDrive, Google Drive, Confluence, or email. What’s missing is a way for AI to reliably find the right passage, up-to-date, and show its working. That comes down to three things: the retrieval pattern, how quickly new changes are reflected (“freshness”), and simple evaluation that non‑technical leaders can trust.

This playbook explains the minimum you need to ship in December to make your internal AI assistant a source of truth rather than a source of arguments.

1) Choose the right retrieval pattern (and why “hybrid” wins)

Two common retrieval approaches are:

  • Keyword (lexical) search — matches exact terms and filters. Great for product codes, policy IDs, names and dates.
  • Vector (semantic) search — finds conceptually similar passages even when the words differ.

Hybrid search combines both. Modern platforms merge the two result sets using a method called Reciprocal Rank Fusion (RRF), which consistently improves relevance compared with either method alone. Microsoft describes hybrid search and RRF in Azure AI Search, including how it merges BM25 keyword results with vector results, and how to weight each side. See Microsoft Learn’s overviews and ranking notes for details. Hybrid search overview (Azure AI Search), RRF ranking, vector weighting.

Elastic and OpenSearch also document hybrid retrieval patterns if you prefer that stack. Elastic: Hybrid search, OpenSearch: concepts.

Why this matters to leaders: hybrid retrieval is a risk reducer. It catches exact references (policies, numbers) while still handling vague staff questions. If your vendor cannot explain how they fuse lexical and vector results — and how you can tune it — keep looking. The RRF method is well‑established in the information retrieval community. Original RRF paper (SIGIR 2009).

2) Freshness first: make your AI reflect file changes within hours, not weeks

Staff stop trusting AI the moment it quotes an old version of a policy. You need change‑tracking so updated files are re‑indexed quickly.

For Microsoft 365 (SharePoint/OneDrive)

  • Use Microsoft Graph delta queries to track created/updated/deleted items without re‑scanning entire libraries. This reduces costs and throttling risk. Microsoft Graph: Delta queries overview.
  • Complement with Graph change notifications (webhooks) where available, so your indexer is nudged when changes happen; then use delta to fetch the details.

For Google Drive

Set a “staleness budget” per content type

  • Finance and HR policies: target re‑index within 15–60 minutes of change.
  • Project docs and meeting notes: within 4–8 hours.
  • Archive or low‑value content: 24+ hours is fine.

Track and report these as service levels (see KPIs below).

3) A one‑week metadata tune‑up that pays for itself

Without basic metadata, search turns into guesswork. Ask content owners to add or confirm the following fields and make sure your search index ingests them:

  • Owner (person or team) and contact channel.
  • Status (draft, approved, superseded) and Effective from date.
  • Review by date (default 12 months unless your domain needs shorter).
  • Audience (e.g., Customer Support, Fundraising, Trustees) for filtering.
  • System of record (where the truth lives, e.g., HRIS vs. PDF copy).
  • Sensitivity (internal, confidential) so your AI respects access controls.

For retention and disposal, UK public bodies publish clear schedules; adopt the spirit of this guidance to keep your index lean. See examples from the Home Office and ONS. Home Office: retention schedules, ONS: data archiving policy, ONS: retention & disposal policy.

4) Preparing documents so retrieval actually works

  • Prefer “sectioned” documents with meaningful headings and short paragraphs. Your AI retrieves chunks; headings become valuable signposts.
  • Use canonical pages for policies and price lists rather than emailing attachments. Email threads create duplicates that confuse retrieval.
  • Make tables machine‑readable (avoid screenshots) and keep critical numbers near their headings. This improves both keyword and semantic recall.
  • PDFs: ensure text is selectable (OCR if needed) to be indexable.

5) Evaluation your board will understand (no code needed)

Don’t launch on gut feel. Build a 50‑question test set from real queries (service desk tickets, finance ops, HR) and decide up‑front what counts as “good”. Modern platforms include simple metrics you can adopt:

  • Groundedness: is the answer supported by the retrieved context? Available in Microsoft’s evaluators and Google’s Vertex AI templates. Microsoft: RAG evaluators, Google: metric prompt templates.
  • Relevance: did the answer address the question?
  • Retrieval recall@k: did the correct source appear in the top k passages?

These ideas are rooted in decades of information‑retrieval practice (for example, NIST’s TREC programme develops evaluation methods and datasets). NIST: Overview of TREC 2023, TREC Deep Learning Track.

How to run it monthly

  1. Keep your 50 questions in a spreadsheet with expected sources.
  2. Run them against the assistant, record groundedness (1–5), relevance (1–5), and whether the correct source appeared in the top 5 passages.
  3. Set pass criteria: e.g., groundedness ≥4 on 80% of queries, relevance ≥4 on 80%, recall@5 ≥85%.
  4. Share a one‑page summary with trend lines and actions taken.

If you use Microsoft Fabric or Azure tooling, you can lean on built‑in RAG evaluation examples. Fabric: evaluate RAG performance, Azure Architecture Center: evaluation phase.

6) Security and governance basics for directors

Your AI assistant inherits your cloud posture. When assessing a vendor or building in‑house, align to recognised UK guidance such as the NCSC’s Cloud Security Principles, focusing on supply‑chain security, separation between customers, and secure admin. NCSC: Cloud Security Principles.

Board‑level questions to ask:

  • Where is data processed and stored? Can we keep it in the UK?
  • How are access controls enforced so the assistant only retrieves what a user can already see?
  • Is there an audit trail of queries, retrieved sources and citations?

7) Risks and costs you can control in week one

Risk / Cost Why it happens Mitigation What to track
Ungrounded answers Weak retrieval; old versions; no citations Hybrid retrieval with RRF; show top 3 sources; fail closed if no good context Groundedness ≥4/5; % answers with citations
Stale content Indexer rescans too slowly Use delta/changes APIs; staleness budgets per content type Time‑to‑freshness (P95) vs. SLA
Search drift New jargon; synonyms missing Monthly query review; add synonyms and acronyms Relevance ≥4/5 trend
Index bloat → cost Duplicates; attachments in email Retention schedules; archive superseded docs; prioritise canonical sources Index size growth; cost per 1,000 queries
Spend predictability Embedding and re‑index churn Batch updates; only re‑embed changed chunks; schedule off‑peak Monthly £/1,000 queries within budget

8) Procurement questions (copy/paste into your RFP)

  • Retrieval: Do you support hybrid retrieval (lexical + vector)? How are results fused (e.g., RRF)? Can we tune weighting and filters?
  • Freshness: What is your typical re‑index lag for SharePoint/OneDrive and Google Drive? Do you use Microsoft Graph delta queries and Google Drive Changes to avoid full rescans?
  • Security: How do you enforce document‑level permissions at query time? Where is data processed and stored?
  • Observability: Can we export per‑query logs showing the prompt, retrieved sources, latency, and costs?
  • Portability: Can we export our embeddings and metadata if we change vendor? What are the exit costs?
  • Cost guardrails: Can we set caps per user or per day? What happens at the cap?

For a structured buying process, see our two‑week AI vendor bake‑off and the 90‑day AI cost guardrail.

9) KPIs to report monthly

  • Groundedness (median, P90) across your test set.
  • Relevance and retrieval recall@5 on the same test set.
  • Time‑to‑freshness P95 for key libraries.
  • Deflection rate: % of queries resolved without human hand‑off.
  • Latency: P50/P95 time to first answer.
  • Cost: £ per 1,000 queries, and re‑indexing cost per 1,000 changed documents.

10) A practical 14‑day plan (start Monday)

Week 1 — Foundations

  1. Day 1–2: Inventory the top 5 libraries/folders staff actually use. Identify owners and agree freshness SLAs.
  2. Day 3–4: Connect SharePoint/OneDrive or Google Drive with change‑tracking (Graph delta or Drive Changes).
  3. Day 5: Turn on hybrid retrieval with filters for audience/status. Validate access controls.
  4. Day 6–7: Build your 50‑question test set and baseline metrics.

Week 2 — Tune and evidence

  1. Day 8–9: Improve metadata (owners, status, review by). Remove obvious duplicates/superseded docs.
  2. Day 10–11: Tune hybrid weighting and synonyms for low‑scoring queries.
  3. Day 12: Re‑run evaluation. Capture groundedness and recall improvements with examples.
  4. Day 13–14: Draft a one‑page report for execs with KPIs, cost per 1,000 queries, and next actions.

If you need deeper knowledge management prep, see RAG‑ready in 30 days and our AI content quality tests. To avoid being boxed in later, review Avoid AI lock‑in.

What “good” looks like by end of January

  • Hybrid search live for at least two core libraries with change‑tracking enabled.
  • Median groundedness ≥4, recall@5 ≥85% on your 50‑question test set.
  • Time‑to‑freshness P95 within SLA for each content type.
  • Monthly report shows cost per 1,000 queries and at least two improvements shipped.