Ominvo beta launches July 23, 2026 — only 10 spots remain Join the waitlist →
All posts
Building OminvoDay 61

Day 61: Why your auto-rollback should ignore Stripe outages

June 25, 20264 min read

Yesterday we wired up Sentry and a health check endpoint. Today we made the site auto-heal when our code breaks. Took three versions to get right — and the bug we almost shipped was the kind that gets you a 3 AM PagerDuty page.

Here's what we built and the lesson it took us three tries to learn.

What we shipped

A GitHub Actions cron job that hits a health endpoint every 5 minutes. If three consecutive checks fail, it rolls back to the previous deployment automatically. Self-healing infrastructure without a $25/month monitoring SaaS, because we're pre-launch and every dollar matters.

If we push a bad deploy at 2 AM, by 2:06 AM the site is back on yesterday's working code. We sleep. Nobody on a customer team gets a Monday morning surprise.

Version 1 — the obvious version that would have broken us

First pass was the version every tutorial shows you. Health check fails, roll back. Done.

Then I sat with it for an hour and thought through what actually happens during a third-party outage.

Stripe has an outage. They do — they had one in March 2026. Our /api/health endpoint checks Stripe as part of its routine. During the outage, our health check returns 503. Auto-rollback fires. We roll back to yesterday's deployment.

But yesterday's deployment also calls Stripe. It's also broken. We've just deployed an old, untested version of our app on top of a new one — for a problem the rollback can't fix. When Stripe eventually recovers, we're now running stale code we never intended to ship. And we have no automated way to roll forward.

This is the third-party dependency chain problem every SaaS has had to learn the hard way since the 2025 AWS and Cloudflare outages. Auto-rollback that doesn't know the difference between our bug and the internet's bug makes outages worse, not better.

Version 2 — three-layer confirmation

Second version added confirmation. Don't roll back on the first 503. Wait 5 minutes, check again. Wait one more minute, check a third time. Only roll back if all three checks fail.

This cut the false alarms from transient blips. But the actual problem stayed. A sustained Stripe outage of 7+ minutes still triggers all three checks and still rolls back to yesterday's equally-broken code.

The misfire was rarer, not gone.

Version 3 — shallow versus deep

The fix wasn't more retries. It was splitting health checks into two separate things.

/api/ping does one thing — confirms our Next.js app is alive. No database call. No Stripe. No Anthropic. Just returns 200 if our code is responding.

/api/health keeps doing the full check across every service.

Auto-rollback only watches /api/ping. If our app is alive, no rollback fires, even if every third party in the world is down. /api/health keeps its job too — it feeds Sentry and UptimeRobot, so we still get an email if anything degrades. We just don't let a Stripe outage trigger destructive action on our own deploy.

This is the shallow versus deep health check pattern. It's what most SaaS companies converge on after their first bad outage. We just built it before having one, because being woken up at 3 AM by a broken auto-healing system would be a worse story than the original outage.

Why this matters at zero customers

Building reliability infrastructure when you have no customers feels like overkill. It's not.

When a real customer hits a broken deploy, the trust you lose is permanent. The cost of building this now is my own evening. The cost of building it after the first outage is somebody's first impression of Ominvo.

That trade is easy.

Day 61 of building Ominvo. Auto-rollback live. Sleep slightly better tonight.

Full changelog entry on the changelog. Yesterday's Sentry work here. The dashboard everyone's reliability is protecting here.

Tagged

#reliability#building in public#monitoring

Written by

The founder of Ominvo

Building review management for single-location small businesses. Join the waitlist →