How to Design Reliable APIs
How to Design Reliable APIs
An API is a contract between systems. When that contract fails — due to network issues, overloaded servers, or bugs — the consequences range from degraded user experience to financial losses. Reliable API design is not about preventing all failures; it is about ensuring the system behaves predictably when failures occur.
Idempotency: Safe Retries
Network failures are inevitable. When a client sends a request and the connection drops, the client does not know if the server processed it. The client must retry, but retrying a non-idempotent operation (like charging a credit card) could execute it twice.
Solution: Make mutating endpoints idempotent by requiring an idempotency key (a client-generated UUID) with every request. The server stores the key and result. If the same key is received again, the server returns the stored result without re-executing.
GET, PUT, and DELETE are naturally idempotent. POST is not — POST with idempotency keys makes it effectively idempotent.
Implementation: Store idempotency keys in a database or Redis with a TTL (e.g., 24 hours). Use the key as a lock to prevent concurrent execution of the same request.
Versioning: Evolving Without Breaking
APIs must evolve, but breaking existing clients is unacceptable for public APIs and expensive for internal ones.
URL-based versioning (/v1/users, /v2/users) is simple but forces clients to migrate to a new URL, and maintaining multiple code paths is expensive.
Header-based versioning (Accept: application/vnd.api+json; version=2) keeps URLs clean but is less discoverable.
Stripe's approach: Pin each client to the version that existed when they integrated. Maintain transformation functions between versions so the server only runs the latest code internally. This is the gold standard for public APIs but requires significant investment.
Rule of thumb: For internal APIs, URL versioning is fine. For public APIs, invest in a compatibility layer.
Circuit Breakers: Failing Fast
When a downstream service is failing, continuing to send requests wastes resources and increases latency. A circuit breaker monitors failure rates and, when they exceed a threshold, short-circuits requests — returning an error immediately without calling the failing service.
States: CLOSED (normal operation, requests pass through), OPEN (service is failing, requests fail immediately), HALF-OPEN (after a timeout, allow a few test requests to check if the service has recovered).
Implementation: Track failure count and rate over a sliding window. Open the circuit when the failure rate exceeds the threshold (e.g., 50% of requests in the last 30 seconds). After a cooldown (e.g., 60 seconds), enter half-open state.
Rate Limiting: Protecting Resources
Without rate limiting, a single misbehaving client can overwhelm your service. Rate limiting protects both your servers and other clients.
Common algorithms: Token bucket (simple, allows bursts), sliding window (smoother, more accurate), fixed window (simplest, but vulnerable to burst at window boundaries).
Implementation: Return 429 (Too Many Requests) with a Retry-After header. Include rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-regulate.
Scope: Rate limit per API key, per IP, or per user depending on your threat model. Apply different limits to different endpoints (reads are cheaper than writes).
Error Contracts: Predictable Failures
Clients need to programmatically handle errors. A bare 500 status code with no body is useless.
Structured error responses: Return a consistent error format with a machine-readable error code, a human-readable message, and optionally a documentation link.
Example structure:
{
"error": {
"type": "invalid_request_error",
"code": "parameter_missing",
"message": "The 'email' parameter is required.",
"param": "email",
"doc_url": "https://api.example.com/docs/errors#parameter_missing"
}
}
Use specific HTTP status codes: 400 for client errors (bad input), 401 for authentication failures, 403 for authorization failures, 404 for missing resources, 409 for conflicts, 422 for validation errors, 429 for rate limiting, 500 for server errors.
Timeouts and Retries
Every outgoing HTTP call should have a timeout. Without one, a slow downstream service can exhaust your connection pool and cascade failures.
Set aggressive timeouts: If your downstream typically responds in 50ms, set a timeout at 200-500ms, not 30 seconds. Slow responses are often a sign of an overloaded service, and waiting longer makes things worse.
Retry with exponential backoff and jitter: When retrying, wait progressively longer (100ms, 200ms, 400ms) and add random jitter to prevent a thundering herd of synchronized retries.
Summary
Build idempotent endpoints so retries are safe. Version your API so evolution does not break clients. Use circuit breakers to fail fast when dependencies are down. Rate limit to protect against abuse. Return structured errors so clients can handle failures programmatically. Set timeouts on every outgoing call. These patterns are table stakes for any production API.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
What Most Articles Get Wrong
Many articles about How To Design Reliable Apis present an oversimplified view that misses the operational reality. In production, the theoretical best practices often collide with constraints like legacy systems, team expertise, budget limitations, and compliance requirements. The engineers who successfully implement these patterns at scale are the ones who understand not just the "what" but the "when" and "when not to."
The nuance that matters: context determines everything. A pattern that works at Netflix's scale (200M users, 1000+ engineers) is overkill for a startup with 10,000 users and 3 engineers. Always match the solution complexity to the problem complexity.
The Numbers That Matter
- Latency percentiles matter more than averages: p99 latency often reveals problems that p50 hides
- Error budgets quantify acceptable risk: if your SLA is 99.95%, you have 21.9 minutes of downtime per month to spend on deployments and experiments
- Cost per request at scale determines architecture: a $0.001 cost difference per request becomes $1M per year at 1 billion requests/year
- Team cognitive load is the hidden constraint: a system your team cannot understand is a system your team cannot operate safely