Coinbase reviews the May outage incident: AWS cascading failure exposes architectural risks

By: rootdata|2026/06/01 21:42:00
0
Share
copy

Coinbase released a retrospective report on the large-scale service interruption event on May 7, 2026.

The outage lasted approximately 8 hours, with full recovery taking about 12 hours. During this time, trading, deposits, withdrawals, and most core services were unavailable or severely degraded. Coinbase stated that the outage was caused by multiple cooling units failing simultaneously in the cooling system of a data center in one availability zone (use1-az4) in the AWS us-east-1 region, triggering cabinet thermal protection shutdowns, which led to EC2 instances and EBS volumes going offline, affecting multiple internet services.

During the recovery process, the Coinbase trading matching engine lost quorum due to the cluster architecture deployed in a single AWS data center losing most nodes. It required urgent code adjustments and the reconstruction of a new node group to restore operation, gradually restarting market trading during the recovery.

Additionally, the AWS-managed Kafka (MSK) service experienced control plane failures, preventing the automatic re-election of partition leaders, further blocking quotes, fees, and some settlement and data flow systems, which expanded the overall impact.

After manual partition migration in collaboration with the AWS engineering team, the system gradually returned to normal. Coinbase stated that this incident exposed its shortcomings in cross-availability zone automatic switching capabilities and disaster recovery for managed middleware. The company will upgrade its cross-region hot backup architecture, strengthen regular failure drills, and migrate the Kafka system from dual availability zones to a three availability zone deployment, while also working with AWS to advance root cause fixes and improvements.

-- Price

--

You may also like

The midlife crisis of Crypto GP: Without PMF, there is no next check from LP

After losing the vastness of the stars and the sea, most Crypto GPs that failed to earn excess returns in this cycle must pragmatically launch a product with PMF, either by proving their ability to help LPs earn excess returns through some niche market, or by solving specific problems for LPs/partne...

Why is Peter Thiel, behind Palantir, preparing an exit in Argentina?

Palantir, political risk, and the self-preservation of technological oligarchs.

The broken defense of Solana's guardians: In order to tear apart Hyperliquid, they actually picked up the script that Ethereum once criticized itself?

HYPE surge sparks a battle of giants. Solana's leader angrily criticizes Hyperliquid for being too centralized, while Arthur Hayes counters with a strong rebuttal, betting $100,000.

Interview with macro master Raoul Pal: The AI competition is giving rise to an "economic singularity," don't easily give up your chips in the next four years

Compared to Nasdaq, Bitcoin is currently in a severely oversold position within its long-term trend.

Wang Chuan: How can one not feel anxious after the neighbor Old Wang made thirty times his investment in storage stocks? (Six) - The Trap of Homogeneous Products

In-depth analysis of the cyclical curse of storage stocks: The short-term windfall brought by AI is unsustainable, and rigid capacity will ultimately backfire on prices. Beware of the "low price-to-earnings ratio" wealth trap at the cyclical peak.

"Trapped in the cryptocurrency world: Don't let the anxiety of missing out force you onto the most dangerous last train."

When global assets reach new highs, cryptocurrency becomes the only uninvited guest.

Contents

Popular coins

Latest Crypto News

Read more
iconiconiconiconiconiconicon
Customer Support:@weikecs
Business Cooperation:@weikecs
Quant Trading & MM:bd@weex.com
VIP Program:support@weex.com