Consistency vs Availability
The CAP theorem forces distributed systems to choose between consistency and availability during network partitions.
The CAP theorem forces distributed systems to choose between consistency and availability during network partitions. This tradeoff shapes the design of every database and distributed service in production.
Which Should You Pick?
Favor Consistency (CP) if:
- Stale data causes financial loss or safety risk (banking, inventory, medical records)
- Users expect their writes to be immediately visible to all readers
- You can tolerate brief periods of unavailability during partitions
- Your system handles transactions that must be atomic
Favor Availability (AP) if:
- Users prefer a potentially stale response over no response
- The system serves read-heavy workloads where freshness is less critical
- You need 99.99% uptime across geographic regions
- Business logic can handle and reconcile conflicting versions
Understanding Consistency
In a consistent system, every read returns the most recent write. If you update your email address, the very next read — from any node, in any data center — returns the new email.
Achieving strong consistency in a distributed system requires coordination. Before a write is acknowledged, it must be replicated to a quorum of nodes. Before a read is served, it must verify it has the latest data.
The cost: Coordination takes time. A write to a strongly consistent system must wait for acknowledgment from multiple nodes, which may be in different data centers with 50-100ms network latency between them. During a network partition, nodes on the minority side of the partition cannot serve reads or accept writes — the system is unavailable for those clients.
Google Spanner is the canonical CP system. It uses synchronized clocks (TrueTime) and two-phase commits to provide globally consistent reads. The tradeoff is that writes are slower (cross-region consensus) and during partitions, affected regions become unavailable. Google accepts this tradeoff because financial and advertising data requires correctness.
PostgreSQL with synchronous replication is another CP choice. Writes are not acknowledged until the replica confirms. If the replica is unreachable, writes block.
Understanding Availability
In a highly available system, every request receives a response (success or failure), even during network partitions. The system never refuses to serve a client, but the data returned might be stale.
The cost: During a partition, different nodes may have different versions of the data. If node A accepted a write that has not yet propagated to node B, a read from node B returns stale data. After the partition heals, the system must reconcile divergent versions — which may require conflict resolution.
Amazon DynamoDB (in its default eventually consistent mode) is the canonical AP system. During a partition, all nodes continue accepting writes. Conflicting writes are resolved using last-writer-wins or application-level resolution. Amazon's shopping cart system was built on this principle: it is better to show a slightly outdated cart than to show an error page during a network issue.
Cassandra is another AP system by default. With a replication factor of 3 and consistency level of ONE, reads and writes succeed as long as a single replica is reachable. You can tune Cassandra toward consistency by using QUORUM reads/writes, but this reduces availability.
The Spectrum Between C and A
The CAP theorem presents a binary choice, but real systems operate on a spectrum. Most production systems use tunable consistency:
DynamoDB lets you choose per-request: eventually consistent reads (AP behavior, lower latency) or strongly consistent reads (CP behavior, higher latency). Writes can require one replica (faster, AP) or a quorum (slower, CP).
Cassandra lets you set consistency levels per query: ONE, QUORUM, ALL. With ONE, you get availability. With ALL, you get consistency. With QUORUM (majority), you get a practical middle ground.
CockroachDB is strongly consistent by default but allows stale reads via follower reads when you want lower latency at the cost of potential staleness.
Real-World Decision Patterns
Banking and payments: Strong consistency. A double-charge or lost transaction is unacceptable. Stripe, Square, and traditional banks use CP databases (PostgreSQL, Spanner) for financial records. They accept slightly higher latency and brief unavailability during partitions because correctness is non-negotiable.
Social media feeds: Eventual consistency. If a user posts a photo and their friend sees it 2 seconds later instead of instantly, nobody notices. Facebook uses an AP approach for the news feed with async replication across data centers. The engineering cost of global strong consistency for billions of daily posts would be enormous with minimal user-visible benefit.
Inventory systems: Depends on the cost of overselling. Amazon uses a combination: optimistic checks with eventual consistency for browsing ("In Stock" labels) but strongly consistent checks at checkout to prevent overselling high-value items. Low-value items might accept occasional overselling and resolve it after the fact.
DNS: Extremely available, eventually consistent. DNS caches stale records for hours. When you update a DNS record, it propagates globally over minutes to hours. This is acceptable because DNS changes are infrequent and the system must never go down.
Conflict Resolution Strategies
When you choose availability, conflicting writes will happen. You need a strategy to resolve them:
Last-writer-wins (LWW): Use timestamps to pick the most recent write. Simple but lossy — concurrent writes are silently discarded. DynamoDB uses this by default.
Application-level resolution: Present all conflicting versions to the application and let it merge them. Amazon's shopping cart uses union merge: conflicting cart versions are combined by taking the union of items. This ensures no item is lost, though a deleted item might reappear.
CRDTs (Conflict-free Replicated Data Types): Data structures designed to merge automatically without conflicts. Counters, sets, and maps have CRDT variants. Riak and Redis (in CRDB mode) support CRDTs.
Side-by-Side Comparison
| Dimension | Consistency (CP) | Availability (AP) |
|---|---|---|
| During Partition | Refuses some requests | Serves all requests |
| Data Freshness | Always current | May be stale |
| Write Latency | Higher (quorum required) | Lower (single node ack) |
| Conflict Handling | Prevention (locking) | Resolution (merge) |
| Use Cases | Finance, inventory, auth | Social, analytics, caching |
| Examples | Spanner, PostgreSQL, ZooKeeper | DynamoDB, Cassandra, DNS |
In practice, most systems do not make a single global choice. They use strong consistency for the data that requires it (account balances, authentication tokens) and eventual consistency for everything else (recommendations, activity feeds, analytics). The skill is knowing which data falls into which category.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.