Cloudflare had two major outages back-to-back in late 2025. November, then December. Their response was something they'd never done before: a company-wide "Code Orange." All non-critical changes stopped. Every engineering team shifted to resilience work until the root causes were gone.

They're not the only ones. If you look at how the most reliable companies handle incidents, you'll find the same instinct everywhere: stop shipping until you understand what's broken.

GitHub froze production deployments for three days during their February 2021 service disruptions. Three days. Google's SRE team gates what can ship and when during launches and critical periods. Roblox froze everything during their 73-hour outage — a cascading failure that started with a routine change. Datadog gives incident commanders explicit authority to halt deploys as part of their incident process.

None of this is controversial. When production is on fire, you don't want someone merging a "quick fix" that introduces a second problem.

The Slack message that nobody reads

And yet, most teams still handle this the same way: someone posts in Slack. "Hey, we're having an incident — please don't merge anything." Then everyone hopes the right people see it.

If you've been on-call, you know how this goes. The engineer in a different timezone merges something ten minutes after the message. The new hire doesn't know the protocol. CI pipelines keep running because nobody told the robots. And when the postmortem happens, nobody can actually tell you what got merged during the incident window because there's no record.

The worst part is the first 15 minutes. The on-call is still figuring out what's happening. Leadership might not even know yet. And during that window, other engineers — who have no idea anything is wrong — are merging PRs and deploying changes. By the time someone remembers to post in Slack, the blast radius may already be bigger than it needed to be.

What if PagerDuty just handled it?

The fix is pretty simple in concept: connect your incident management to your merge workflow. PagerDuty fires a P1, your repos get locked automatically. No Slack message, no human in the loop, no gap.

That's what we built Frost to do. The flow looks like this:

PagerDuty fires a P1 — a webhook hits Frost
Frost sets a failing status check on every open PR in your configured repos
GitHub's merge rules block the merge while the check is failing
Incident resolves in PagerDuty — Frost clears the check, merges flow again

The whole thing takes seconds. No one needs to remember to post in Slack. No one needs to be awake.

It's not just for incidents

Incident protection is the most time-sensitive use case, but once you have the merge-blocking mechanism in place, other things become easy too. We use the same status check approach for daily protection windows (block merges overnight when nobody's around to deal with breakages), scheduled protection (holidays, launches, on-call transitions), and override labels for when something genuinely needs to go out during a protected period. The override gets logged, so your postmortems actually have a paper trail.

Try it

Cloudflare, Google, and GitHub all do some version of this. The difference is they use Slack messages, manual deploy locks, and company-wide memos. Frost does it with a GitHub status check and a PagerDuty webhook.

It takes a few minutes to set up and works with your existing GitHub merge rules. Free for public repos.

Why Cloudflare Declared Code Orange — and What It Means for Your Team

The Slack message that nobody reads

What if PagerDuty just handled it?

It's not just for incidents

Try it

Sources

Automate merge protection for your team