From 22 December to 2 January, many UK teams run on skeleton staff while customer demand spikes in retail, hospitality, advice services and fundraising. That combination makes a strong case for a short, deliberate change freeze for AI systems: your chatbots, search, triage assistants and internal copilots. This playbook gives SME and charity leaders a pragmatic 14‑day freeze plan: what to pause, what’s safe to ship, how to cap spend, and how to stay resilient without asking engineers to work through Christmas.
We’ve kept it non‑technical: simple rules, checklists, and KPIs you can review in your weekly ops meeting. Where useful, we link to trusted UK guidance and cloud vendor docs.
Why a change freeze for AI is different
- AI behaviour can change without you changing code (model updates, embeddings refreshes, prompt edits). Freezing brings predictability when staffing is thin.
- Vendors enforce quotas and rate limits. If your seasonal demand surges, you need headroom and throttles ready in advance. Microsoft’s Azure OpenAI quotas and OpenAI’s own rate limits are tiered and can constrain bursts unless pre‑approved. learn.microsoft.com
- Good monitoring and incident routines prevent small issues from becoming outages—this is long‑standing UK government and NCSC advice. security.gov.uk
- Progressive rollouts and canaries reduce blast radius compared with “big bang” changes, a core lesson from site reliability engineering. cloud.google.com
The freeze window: 14 days that keep the lights on
Recommended dates for 2025: 20 December–2 January (adjust to your sector). Set the freeze dates, tell vendors and partners, and add a banner to your internal change board. The goal is not “no change ever”, but “only low‑risk, pre‑approved changes with instant rollback”.
What is frozen
- New prompts, tools or function‑calling behaviour in production assistants.
- Retrieval changes: re‑indexing, chunking logic, new sources, embeddings model upgrades.
- Switching model versions or providers; latency or temperature tweaks that affect outputs.
- Workflow automation that touches payments, case updates, or customer emails.
Safe exceptions (pre‑approved and reversible)
- Critical security patches and expired certificate renewals.
- Configuration toggles that disable features or reduce load (feature flags) with a one‑click rollback.
- Quota increases and capacity reservations with vendor support tickets already open.
Five guardrails to set before the freeze
1) Capacity headroom
Confirm your AI vendor quotas exceed expected peak by at least 30%. For Azure OpenAI, check tokens‑per‑minute and requests‑per‑minute limits per model and tier; request upgrades if needed. Capture proofs (ticket numbers, approved tiers) in your ops wiki. learn.microsoft.com
If you also run through the OpenAI API, check your account page and response headers for current limits to avoid surprise throttling. platform.openai.com
2) Cost caps
Set daily spend alerts and automatic caps at 120% of normal. Align with cloud cost practices: expenditure awareness, demand management and “optimise over time” from AWS’s Well‑Architected cost pillar translate well to AI usage. docs.aws.amazon.com
3) Throttling and graceful degradation
Agree how you’ll protect the service if demand spikes: queue requests, shed non‑critical traffic, or temporarily serve faster, cheaper summaries. SRE guidance is clear: use progressive controls to avoid cascading failures. sre.google
4) Monitoring and alerts
Minimum signals: success rate, average cost per task, median and p95 latency, vendor 4xx/5xx, and deflection to fallback flows. Ensure logs and alerts reach a monitored inbox or on‑call rota; this mirrors UK government monitoring principles. security.gov.uk
5) Incident “battle rhythm”
Define your holiday cadence now: who leads, when updates happen, what the thresholds are to escalate and de‑escalate. Government incident plans call this a battle rhythm—regular, pre‑agreed meetings and situation reports. Keep it lightweight for the freeze. gov.uk
The freeze rulebook (one page you can copy)
Decisions leaders must approve
- Any production change that alters model, prompt, retrieval, or automation behaviour.
- Spend cap increases above the pre‑authorised limit.
- Suspending the freeze due to a critical incident.
Operational checklists
- Freeze banner visible on the change board; vendor tickets for quota increases filed and acknowledged. learn.microsoft.com
- Alerts route to a named person each day; incident WhatsApp/Teams room prepared with a short contact list. UK guidance for cloud tools reminds teams to use corporate SSO and 2FA. gov.uk
- Fallbacks tested: cached answers for frequent FAQs; “sorry, we’re busy” message and email handover for non‑urgent queries.
- Rollbacks rehearsed for the last two changes shipped pre‑freeze.
KPIs to watch daily (15 minutes)
- Reliability: success rate ≥ 98%; vendor error rate steady or falling.
- Cost: cost per resolved task within 15% of last week; spend under the daily cap. docs.aws.amazon.com
- Speed: p95 response time within 20% of baseline.
- Human handover: escalations to people not exceeding agreed threshold.
Who does what over the holidays
| Role | Responsibilities (freeze period) | When to escalate |
|---|---|---|
| Service owner | Owns the freeze; approves exceptions; daily KPI review; liaises with vendors for quotas and incidents. | Any KPI breach for 2 consecutive days; quota tickets rejected. learn.microsoft.com |
| On‑call lead | Responds to alerts; triggers runbook; posts situational updates in the agreed battle rhythm. gov.uk | Multiple alerts per shift or unresolved incident after 60 minutes. SRE guidance suggests keeping incident load low to maintain quality of response. sre.google |
| Support/ops | Handles customer updates; switches to fallbacks; coordinates temporary throttling or queueing. sre.google | Backlog over threshold or customer‑visible degradation. |
| Legal/DPO or trustee (charities) | On standby for serious incident decisions and external notifications; align with Charity Commission guidance for reporting. gov.uk | Confirmed data exfiltration or suspected fraud. |
What you can still improve during a freeze
You don’t have to stop progress; just favour low‑risk improvements that reduce toil or cost and are easy to undo.
- Improve answer quality by curating safe, static snippets for the top 20 FAQs rather than changing retrieval or prompts.
- Reduce cost by routing non‑urgent jobs to batch at off‑peak hours and using cheaper models for low‑stakes tasks, within your existing quotas. docs.aws.amazon.com
- Practice incident table‑top exercises for one hour: a throttle drill, a vendor‑quota rejection scenario, and a “roll back now” drill—standard SRE advice. sre.google
If you’ve not yet formalised runbooks and SLOs, bookmark these for January. Our earlier posts walk through the details and templates:
Incident playbook: keep it boring
If something does go wrong, your aim is to keep incidents small and boring.
Step‑by‑step
- Page on‑call lead; acknowledge within 5 minutes.
- Switch to safe mode: throttle, queue or fall back to cached answers; inform customer support of expected impact and timeframe. sre.google
- Post a short situation report in the incident room: what we see, who’s on it, next update time. Government guidance recommends a consistent cadence for updates during incidents. gov.uk
- If a serious incident involves a charity, follow the Charity Commission’s guidance on reporting and triage; keep an audit trail. gov.uk
- Stabilise first, root cause later. SRE practice encourages automating mitigations to reduce user impact before deep diagnosis. sre.google
- Close with a brief review and a dated action list for January.
Vendor realities over Christmas
- Quota increases can take time. File requests now, include expected tokens‑per‑minute and requests‑per‑minute by model, and ask for temporary holiday uplift. Azure OpenAI documents the tiers and token limits clearly—know which you’re on. learn.microsoft.com
- If you use multiple providers, avoid global switches during the freeze. Prefer canary routing and ensure cost caps apply on both sides. OpenAI’s own guidance notes limits and reset timing in headers—monitor them. platform.openai.com
- Keep at least one region or deployment untouched so you have a known‑good fallback, mirroring cloud change‑isolation practices. docs.cloud.google.com
Security basics you should not pause
Security doesn’t take holidays. Even with a freeze, continue:
- Applying critical patches, enforcing SSO and 2FA, and using corporate accounts for SaaS administration—UK government advice for safe cloud tool use. gov.uk
- Monitoring and log review at least once per day—this underpins detection and response. security.gov.uk
- For charities, refreshing staff and volunteer awareness of phishing and fraud reporting flows in case of seasonal scams. gov.uk
January reset: how to restart change safely
- Unfreeze with intent: hold a one‑hour review of the period; confirm KPIs and any debt you took on.
- Post‑mortems first: write up any incidents and the decisions taken. Assign owners and dates.
- Batch small changes: release in small slices with canaries; avoid global changes until the team is fully back. cloud.google.com
- Re‑baseline costs: compare holiday spend vs. forecast; address any cap breaches using the Well‑Architected cost lens. docs.aws.amazon.com
- Capacity housekeeping: remove temporary uplifts you no longer need; document what stayed helpful. learn.microsoft.com
When you’re ready to accelerate again, consider our playbooks on cost‑of‑serve dashboards and AI load testing and capacity planning to keep momentum without surprises.
Procurement and governance questions to ask this week
- What model versions are we on, and are any due to auto‑update during the freeze?
- What are our current quotas by model and region? Are uplift tickets filed and confirmed? learn.microsoft.com
- Where are our spend caps and alerts set? Who gets the notifications? docs.aws.amazon.com
- Do we have a daily incident battle rhythm and a one‑page runbook accessible to support staff? gov.uk
- For charities: who will report serious incidents if needed, and how quickly? gov.uk