Skip to main content
SDMastery

Latency vs Throughput

Optimizing for low latency (fast individual requests) often conflicts with optimizing for high throughput (maximum requests per second).

Latency vs Throughput system design overview showing key components and metrics
High-level overview of Latency vs Throughput

The Tradeoff

Optimizing for low latency (fast individual requests) often conflicts with optimizing for high throughput (maximum requests per second). Batching improves throughput but increases latency. Pipelining improves both but adds complexity.

Optimize for Latency: A Closer Look

Process each request as fast as possible. No batching, no queuing.

Latency vs Throughput system architecture with service components and data flow
System architecture for Latency vs Throughput

The Good

  • Instant response times
  • Better user experience
  • Required for real-time systems

The Bad

  • Lower throughput
  • More expensive per request
  • Cannot leverage batching efficiencies

Optimize for Throughput: A Closer Look

Maximize the number of requests processed per second. Batch operations, queue processing.

Step-by-step diagram showing how Latency vs Throughput works in practice
How Latency vs Throughput works step by step

The Good

  • Higher resource utilization
  • Lower cost per operation
  • Better for bulk processing

The Bad

  • Higher individual request latency
  • Buffering adds delay
  • Not suitable for real-time

Quick Comparison

Optimize for LatencyOptimize for Throughput
Best forUser-facing API responsesData pipelines and ETL

Real-World Examples

Comparison table for Latency vs Throughput showing key metrics and tradeoffs
Comparing key aspects of Latency vs Throughput

Kafka optimizes for throughput by batching writes — millions of messages per second, but individual message latency is higher than direct delivery.

Redis optimizes for latency — single-threaded event loop processes commands in microseconds.

Google BigQuery optimizes for throughput — queries scan terabytes of data but have seconds of startup latency.

Interview Advice

Data flow diagram for Latency vs Throughput showing request and response paths
Data flow through Latency vs Throughput

Always clarify latency requirements in the interview. 'Is this a user-facing request that needs <200ms response, or a background job where 5-second latency is acceptable?' This drives the entire architecture.


Source | System-Design-Overview

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Key components of Latency vs Throughput with roles and responsibilities
Key components of Latency vs Throughput

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Interview tips for Latency vs Throughput system design questions
Interview tips for Latency vs Throughput

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Real-World Decision Framework

Latency is the time it takes to complete a single operation. Throughput is the number of operations completed per unit of time. Optimizing for one often comes at the expense of the other.

Pros and cons analysis of Latency vs Throughput for system design decisions
Advantages and disadvantages of Latency vs Throughput

When to Optimize for Latency

Low latency matters when individual request speed directly affects user experience or business outcomes:

  • Search results: Google targets under 200ms for search. Every 100ms of added latency reduces revenue by 1%.
  • Trading systems: High-frequency trading firms pay millions for microseconds of advantage. A 1ms delay can cost a trade.
  • Gaming: Online multiplayer games need under 50ms latency for acceptable responsiveness.
  • Interactive APIs: A REST endpoint serving a mobile app should respond in under 100ms for a snappy feel.

Techniques: In-memory caching, read replicas near users, CDNs, connection pooling, pre-computed results.

When to Optimize for Throughput

Real-world companies using Latency vs Throughput in production systems
Real-world examples of Latency vs Throughput

High throughput matters when you need to process large volumes efficiently:

  • Data pipelines: Ingesting millions of events per second into a data warehouse.
  • Batch processing: Processing a day's worth of transactions for billing.
  • File uploads: Handling thousands of concurrent image uploads for a social media platform.
  • Log aggregation: Collecting and indexing logs from thousands of servers.

Techniques: Batching requests, async processing, parallel workers, message queues, write-ahead logs.

The Tradeoff in Practice

Batching is the most common tradeoff. Instead of writing each database row individually (low latency per write, low throughput), you batch 1,000 writes into a single transaction (higher latency per batch, much higher throughput). Kafka uses this — producers batch messages for higher throughput at the cost of slightly higher per-message latency.

Decision guide showing when to use Latency vs Throughput and when to avoid
When to use Latency vs Throughput

Queuing trades latency for throughput stability. A message queue absorbs traffic spikes, maintaining consistent throughput but adding queue wait time to individual message latency.

Interview Tip

When discussing latency vs throughput, mention specific numbers. "We need P99 latency under 100ms" or "The system must handle 50,000 requests per second." Quantifying requirements shows maturity. Also mention that you would measure both in production using percentile metrics (P50, P95, P99) rather than averages.

.NET Implementation

Latency: Use IMemoryCache and output caching in ASP.NET Core. Throughput: Use Channel<T> for batching, System.Threading.Tasks.Dataflow for parallel pipelines, and Azure Service Bus for queue-based load leveling.