What are the key takeaways from How to Scale Reads and Writes?

How to Scale Reads and Writes provides practical insights for building production-grade distributed systems. The main lessons apply to system design interviews and real-world architecture decisions.

How to Scale Reads and Writes

2025-01-158 min read

How to Scale Reads and Writes

Most applications start with a single database handling both reads and writes. This works until it does not. Understanding which bottleneck you are hitting — reads or writes — determines the right scaling strategy.

Scaling Reads

Read bottlenecks are the most common scaling problem. A typical web application has a 10:1 or even 100:1 read-to-write ratio. Three primary strategies address this.

How to Scale Reads and Writes system architecture with service components and data flow — System architecture for How to Scale Reads and Writes

Read Replicas

Create copies of your primary database that serve read queries. The primary handles writes and replicates changes to replicas asynchronously. This scales read throughput linearly: add more replicas, serve more reads.

Tradeoff: Replication lag means replicas may serve slightly stale data. For most applications (displaying profiles, listing products), this is acceptable. For operations that must read what was just written (e.g., showing a user their own post immediately after creating it), route those reads to the primary ("read-your-writes" consistency).

Caching

Step-by-step diagram showing how How to Scale Reads and Writes works in practice — How How to Scale Reads and Writes works step by step

Place a cache (Redis, Memcached) between the application and the database. Cache frequently accessed data with a TTL (time-to-live). Cache hit ratios of 90%+ are common, meaning 90% of reads never reach the database.

Cache-aside pattern: Application checks cache first. On a miss, it queries the database, stores the result in cache, and returns it. On write, invalidate the cache entry.

Tradeoff: Stale data (cache may serve outdated values until TTL expires or explicit invalidation). Cache invalidation is one of the two hard problems in computer science. Thundering herd: when a popular cache entry expires, hundreds of concurrent requests hit the database simultaneously.

Denormalization

Comparison table for How to Scale Reads and Writes showing key metrics and tradeoffs — Comparing key aspects of How to Scale Reads and Writes

Store precomputed results or redundant copies of data to avoid expensive joins or aggregations at read time. For example, store a user's follower count directly on the user record rather than counting the followers table on every request.

Tradeoff: Write complexity increases (you must update the denormalized data whenever the source data changes) and storage increases, but reads become trivially fast.

Scaling Writes

Write bottlenecks are harder to solve because writes must (usually) go to a single authoritative location to maintain consistency.

Data flow diagram for How to Scale Reads and Writes showing request and response paths — Data flow through How to Scale Reads and Writes

Sharding (Horizontal Partitioning)

Split your data across multiple database instances based on a shard key (e.g., user_id modulo N). Each shard handles a fraction of the total write load. This scales write throughput linearly.

Tradeoff: Cross-shard queries become expensive or impossible. Choosing the wrong shard key leads to hotspots (one shard receiving disproportionate load). Re-sharding (changing the number of shards) is operationally painful. Consistent hashing can mitigate re-sharding cost.

Asynchronous Processing

Key components of How to Scale Reads and Writes with roles and responsibilities — Key components of How to Scale Reads and Writes

Not all writes need to be processed synchronously. Place writes into a message queue (Kafka, SQS, RabbitMQ) and process them asynchronously. The API returns immediately after enqueuing, and workers process the writes at their own pace.

Tradeoff: The user does not see the result immediately (eventual consistency). Suitable for non-critical writes like analytics events, email sends, or feed updates. Not suitable for writes where the user needs immediate confirmation (e.g., payment processing).

Write-Behind Caching

Buffer writes in a fast store (Redis) and periodically flush them to the database in batches. This absorbs write spikes and reduces the number of database writes.

Interview tips for How to Scale Reads and Writes system design questions — Interview tips for How to Scale Reads and Writes

Tradeoff: Risk of data loss if the buffer fails before flushing. Use a WAL or persistent queue for durability.

CQRS: Command Query Responsibility Segregation

When read and write patterns diverge significantly, separate them into different models. The write model (commands) uses a normalized, write-optimized store. The read model (queries) uses a denormalized, read-optimized store (or cache, or search index). An event bus synchronizes the two.

Example: An e-commerce system writes orders to a normalized PostgreSQL database. A separate service consumes order events and materializes a denormalized view in Elasticsearch for the product search page.

Tradeoff: Increased complexity (two data models, eventual consistency between them, event infrastructure). Only justified when the read/write asymmetry is extreme or when read and write models have fundamentally different shapes.

Decision Guide

Symptom	Strategy
Database CPU saturated by SELECT queries	Read replicas
Same queries executed repeatedly	Caching
Complex joins on every read	Denormalization
Write throughput limited by single node	Sharding
Non-critical writes causing backpressure	Async processing
Reads and writes have very different shapes	CQRS

Summary

Scale reads with replicas, caching, and denormalization. Scale writes with sharding and async processing. When the asymmetry is extreme, consider CQRS. Always start with the simplest strategy that solves your bottleneck, and measure before adding complexity.

Pros and cons analysis of How to Scale Reads and Writes for system design decisions — Advantages and disadvantages of How to Scale Reads and Writes

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Real-world companies using How to Scale Reads and Writes in production systems — Real-world examples of How to Scale Reads and Writes

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

What Most Articles Get Wrong

Many articles about How To Scale Reads And Writes present an oversimplified view that misses the operational reality. In production, the theoretical best practices often collide with constraints like legacy systems, team expertise, budget limitations, and compliance requirements. The engineers who successfully implement these patterns at scale are the ones who understand not just the "what" but the "when" and "when not to."

The nuance that matters: context determines everything. A pattern that works at Netflix's scale (200M users, 1000+ engineers) is overkill for a startup with 10,000 users and 3 engineers. Always match the solution complexity to the problem complexity.

The Numbers That Matter

Latency percentiles matter more than averages: p99 latency often reveals problems that p50 hides
Error budgets quantify acceptable risk: if your SLA is 99.95%, you have 21.9 minutes of downtime per month to spend on deployments and experiments
Cost per request at scale determines architecture: a $0.001 cost difference per request becomes $1M per year at 1 billion requests/year
Team cognitive load is the hidden constraint: a system your team cannot understand is a system your team cannot operate safely

Sources

Designing Data-Intensive Applications — Martin Kleppmann