intermediate9 min readUpdated 2026-06-08

Circuit Breaker Pattern

Without circuit breakers, a failing downstream service can cascade failures throughout your system.

The Core Idea

The circuit breaker pattern prevents a service from repeatedly calling a failing dependency. Like an electrical circuit breaker, it 'trips' open after detecting failures, rejecting requests immediately instead of wasting time and resources on calls that will fail.

Step-by-Step Walkthrough

System architecture diagram for Circuit Breaker Pattern showing how services, databases, and caches connect — System architecture for Circuit Breaker Pattern

Service A calls Service B through a circuit breaker. Initially, the circuit is Closed — all requests pass through. If 5 out of 10 requests to Service B fail, the circuit Opens. All subsequent requests to B fail immediately (no network call). After 30 seconds, the circuit moves to Half-Open — the next request passes through. If it succeeds, the circuit Closes. If it fails, it Opens again.

Why This Approach Wins

Three states: Closed (normal — requests pass through), Open (tripped — requests fail immediately), Half-Open (testing — a few requests pass to check if the dependency recovered).
Failure threshold: The circuit opens after N consecutive failures or a failure rate exceeding X% within a time window.
Timeout: Open circuits automatically transition to Half-Open after a timeout (e.g., 30 seconds), allowing test requests.
Fallback: When the circuit is open, return a fallback response (cached data, default value, error message) instead of an error.
Per-dependency: Each downstream service should have its own circuit breaker. A failing payment service should not affect the search service.

In Production

Step-by-step diagram showing how Circuit Breaker Pattern processes a request from start to finish — How Circuit Breaker Pattern works step by step

Netflix Hystrix (now in maintenance) pioneered circuit breakers in microservices, protecting against cascading failures across hundreds of services.

Resilience4j is the modern Java circuit breaker library, used in Spring Boot applications.

Envoy proxy implements circuit breaking at the service mesh level, transparently protecting all services.

Tradeoffs and Limitations

Comparison table for Circuit Breaker Pattern contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Circuit Breaker Pattern

Protection vs Availability: Opening the circuit protects the system but makes the dependency completely unavailable (even for requests that might succeed).
Threshold sensitivity: Too sensitive = false trips on transient errors. Too lenient = slow to protect against real failures.
Fallback quality: A good fallback (cached data) maintains user experience. A bad fallback (empty response) confuses users.

Production Gotchas

Not implementing circuit breakers at all — cascading failures bring down the entire system
Using a single circuit breaker for all dependencies — one failing service trips the breaker for healthy ones
Not providing a useful fallback — the circuit opens and users see cryptic errors

The Interview Angle

Data flow diagram for Circuit Breaker Pattern showing how requests and responses move through the system — Data flow through Circuit Breaker Pattern

What is the circuit breaker pattern?
What are the three states of a circuit breaker?
How does a circuit breaker prevent cascading failures?
What fallback strategies can you use when the circuit is open?

Next Up

Component diagram for Circuit Breaker Pattern showing each building block and its responsibility — Key components of Circuit Breaker Pattern

The Real-World Incident That Made This Famous

Netflix's creation of Hystrix in 2011 was born from a production nightmare. During a holiday traffic spike, Netflix's recommendation service experienced elevated latency (from 50ms to 5 seconds). Every microservice that called the recommendation service had threads waiting for responses. Those threads were tied up for seconds instead of milliseconds, and the thread pools quickly exhausted. Since the same servers handled other requests too, the latency cascaded: the browsing service slowed down, then the search service, then the homepage service. Within minutes, the entire Netflix platform was degraded — not because the recommendation service was down, but because it was slow.

This is the insidious nature of cascading failures: a slow dependency is often worse than a dead one. A dead service fails fast (connection refused, immediate timeout). A slow service ties up resources (threads, connections, memory) while callers wait. It is like a traffic accident that does not block the road completely but reduces it to one lane — traffic backs up for miles.

Netflix's solution was Hystrix, a circuit breaker library. When the recommendation service error rate exceeded 50% over a 10-second window, Hystrix "opened" the circuit. All subsequent calls to the recommendation service were immediately rejected (fail fast) without even trying. Instead, a fallback was served: generic recommendations based on overall popularity instead of personalized ones. After 30 seconds, Hystrix would allow one test request through ("half-open" state). If it succeeded, the circuit closed and normal traffic resumed. If it failed, the circuit stayed open for another 30 seconds.

Interview preparation checklist for Circuit Breaker Pattern with key points to mention and mistakes to avoid — Interview tips for Circuit Breaker Pattern

Hystrix was so successful that Netflix open-sourced it, and it became the standard circuit breaker implementation. Although Hystrix itself is now in maintenance mode (replaced by resilience4j), the pattern it popularized is built into every service mesh and modern microservices framework.

How Senior Engineers Think About This

Think of a circuit breaker like an electrical circuit breaker in your house. When there is a power surge, the breaker trips to protect your appliances. Without it, the surge would fry everything. In software, the "surge" is a failing downstream dependency, and the "appliances" are the threads, connections, and resources in your service.

The three states are simple: Closed (normal operation, requests pass through), Open (dependency is failing, requests are immediately rejected with a fallback), and Half-Open (testing if the dependency has recovered by allowing a small number of requests through).

Decision guide for when to choose Circuit Breaker Pattern and when alternative approaches are better — When to use Circuit Breaker Pattern

Senior engineers configure three key parameters. Failure threshold: what percentage of failures triggers the circuit to open (typically 50%). Window size: over what time period you measure failures (typically 10-30 seconds). Recovery timeout: how long the circuit stays open before testing with a half-open request (typically 30-60 seconds). Getting these right requires tuning in production — too sensitive and the circuit opens on normal latency spikes, too insensitive and cascading failures spread before the circuit trips.

The most important design decision is the fallback strategy. When the circuit is open, what do you return? Options include: cached data (serve the last known good response), default values (show generic recommendations), a degraded experience (show the page without the failing component), or an error message. The best fallback is invisible to the user — they get a slightly less personalized experience but do not see an error page.

Common Interview Mistakes

Mistake 1: Not explaining the three states. Always describe Closed, Open, and Half-Open. Many candidates just say "it stops calling the failing service" without explaining the recovery mechanism.

Tradeoff analysis for Circuit Breaker Pattern listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Circuit Breaker Pattern

Mistake 2: Confusing circuit breaker with retry. Retries try again after a failure. Circuit breakers stop trying when failures are systemic. They complement each other: retry for transient failures, circuit breaker for sustained failures.

Mistake 3: Not discussing fallback strategies. Opening the circuit is only half the solution. The other half is what you serve instead. Always have a plan for degraded operation.

Mistake 4: Forgetting about cascading circuit breakers. If Service A calls Service B calls Service C, and C fails, both A and B need circuit breakers. Discuss how to propagate failure information up the call chain.

Mistake 5: Not mentioning bulkheads. Circuit breakers are often paired with the bulkhead pattern: isolating dependencies into separate thread pools so that a slow dependency cannot exhaust all threads. Mentioning bulkheads shows depth.

Production deployment examples of Circuit Breaker Pattern at companies like Netflix, Google, and Amazon — Real-world examples of Circuit Breaker Pattern

Production Checklist

Implement circuit breakers on every outbound call to external services or databases
Configure failure thresholds based on observed error rates — start with 50% failures over a 10-second window
Define meaningful fallback responses for every circuit: cached data, default values, or graceful degradation
Monitor circuit breaker state changes and alert on open circuits — an open circuit means a dependency is unhealthy
Pair circuit breakers with bulkheads (isolated thread pools) to prevent one slow dependency from consuming all resources
Implement circuit breakers at the client library level so all callers benefit, not just one endpoint
Use exponential backoff for the recovery timeout so you do not hammer a recovering service
Log all circuit state transitions with the failure reason for post-incident analysis
Test circuit breaker behavior with chaos engineering: inject latency or errors into a dependency and verify the circuit opens
Set the half-open request count to 1-3 so recovery testing does not overwhelm a healing service

Read the original source | Content from System-Design-Overview

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

External Resources

Original Sourcearticle