API Gateway vs Load Balancer vs Reverse Proxy

Clear comparison of API gateways, load balancers, and reverse proxies — when to use each, how they overlap, and how to combine them.

What Each One Actually Does

These three components sit between clients and backend services, but they solve fundamentally different problems. The confusion comes from the fact that modern implementations blur the lines — but understanding the core responsibility of each is what separates a strong system design answer from a weak one.

Load Balancer

A load balancer distributes incoming traffic across multiple instances of the same service. Its primary job is availability and horizontal scaling. It monitors the health of backend servers, removes unhealthy ones from the pool, and uses algorithms like round-robin, least connections, or consistent hashing to spread requests. Load balancers operate at either Layer 4 (TCP/UDP — just forwarding packets) or Layer 7 (HTTP — inspecting headers and URLs). They don't care about what your API looks like. They care about making sure no single server gets overwhelmed.

Reverse Proxy

A reverse proxy sits in front of your backend servers and intercepts all incoming client requests. Its primary job is shielding your origin servers from the outside world. It handles TLS termination, caching static content, compressing responses, and hiding your internal network topology. Clients never talk directly to your backend — they talk to the proxy, which forwards the request. A reverse proxy can serve a single backend server; you don't need multiple instances for it to be useful. That's the key difference from a load balancer.

API Gateway

An API gateway is an application-layer entry point that manages, secures, and routes API traffic. Its primary job is API lifecycle management. It handles authentication, rate limiting, request/response transformation, API versioning, protocol translation (REST to gRPC, for example), and aggregating responses from multiple downstream services into a single response. An API gateway understands your API contract. It knows about routes, consumers, quotas, and policies. A load balancer and reverse proxy don't.

Feature Comparison Table

Capability	Load Balancer	Reverse Proxy	API Gateway
Traffic distribution	Primary function	Basic (usually single upstream)	Routes to different services by path/header
Health checks	Yes — active and passive	Sometimes	Depends on implementation
TLS termination	Layer 7 LBs only	Yes — core feature	Yes
SSL offloading	Yes (L7)	Yes	Yes
Caching	No	Yes — core feature	Sometimes (edge caching)
Compression	No	Yes (gzip, brotli)	Sometimes
Authentication	No	No	Yes — core feature (JWT, OAuth, API keys)
Rate limiting	Basic (connection limits)	Basic (IP-based)	Yes — per-consumer, per-route, per-plan
Request transformation	No	Minimal (header rewriting)	Yes — body transformation, enrichment
Protocol translation	No	No	Yes (REST to gRPC, HTTP to WebSocket)
API versioning	No	No	Yes
Response aggregation	No	No	Yes (BFF pattern)
Circuit breaking	Passive (remove unhealthy)	No (typically)	Yes
Observability/analytics	Connection metrics	Access logs	Per-API, per-consumer analytics
Operates at	Layer 4 or Layer 7	Layer 7	Layer 7

The pattern is clear: load balancers focus on distribution, reverse proxies focus on transport-level concerns, and API gateways focus on application-level API logic.

When They Overlap

Here's the truth that makes interviews tricky: in practice, most tools do more than one of these jobs.

Nginx is the canonical example. It started as a reverse proxy and web server, but it can load balance across upstreams, terminate TLS, cache responses, and with NGINX Plus or OpenResty, it can do rate limiting and basic authentication. Is it a reverse proxy? A load balancer? Both? The answer is both — and that's fine.

AWS Application Load Balancer (ALB) is a Layer 7 load balancer that also does path-based routing, host-based routing, TLS termination, and authentication via Cognito. It has significant overlap with what a reverse proxy does.

Envoy was built as a service proxy for cloud-native architectures. It's a reverse proxy, a load balancer, and when deployed as a sidecar in a service mesh (Istio), it handles circuit breaking, retries, rate limiting, and observability — overlapping heavily with API gateway functionality.

HAProxy is primarily a load balancer, but it does TLS termination, HTTP header manipulation, rate limiting, and health checking — covering reverse proxy territory.

The confusion between these components is justified. When someone asks you to distinguish them in an interview, they're testing whether you understand the core abstraction each one represents, not whether you can draw clean boxes around specific products.

When to Use Each

Use a Load Balancer When...

You have multiple instances of the same service and need to distribute traffic across them.
You need high availability — automatic failover when a server goes down.
You're scaling horizontally and need to add/remove instances without client changes.
You need Layer 4 load balancing for non-HTTP protocols (TCP, UDP, database connections).
Performance is critical and you need minimal latency overhead — L4 load balancers add almost nothing.

The classic use case: you have 10 instances of your order service behind an internal load balancer. Every service that calls the order service goes through the LB. No authentication, no transformation — just distribution.

Use a Reverse Proxy When...

You need TLS termination so your backend services don't handle certificates.
You want to cache static assets (images, CSS, JS) close to the client.
You need to compress responses before sending them over the wire.
You want to hide your internal infrastructure — clients should never see internal IPs or ports.
You need SSL offloading to reduce CPU load on application servers.
You're serving a single backend and don't need load balancing, but want the security and performance benefits.

The classic use case: Nginx in front of a single application server, terminating TLS, caching static files, and compressing responses. Your app server only handles dynamic requests over plain HTTP.

Use an API Gateway When...

You have multiple backend services that external clients need to access through a unified API.
You need authentication and authorization enforced consistently across all APIs.
You need rate limiting per API consumer — different limits for free vs. paid tiers.
You need request/response transformation — reshaping payloads, adding headers, translating protocols.
You're implementing the Backend for Frontend (BFF) pattern — aggregating multiple service calls into one response.
You need API versioning — routing /v1/users and /v2/users to different services.
You need developer portal features — API key management, usage analytics, documentation.

The classic use case: Kong or AWS API Gateway in front of your microservices. External mobile and web clients hit the gateway, which authenticates them, applies rate limits, routes to the correct service, and potentially transforms the response.

Real-World Architecture Patterns

Pattern 1: Simple — Nginx as Everything

text

Client → Nginx (TLS + caching + load balancing) → App Server 1
                                                 → App Server 2
                                                 → App Server 3

Nginx terminates TLS, caches static content, and round-robins dynamic requests across three app servers. This is the right choice for small-to-medium applications. One component, one configuration file, minimal operational overhead. You don't need an API gateway if you have one API, one consumer, and no complex routing or authentication at the edge.

Pattern 2: Medium — ALB + API Gateway

text

Client → CloudFront (CDN/caching)
       → API Gateway (auth, rate limiting, routing)
       → ALB (distribution)
       → ECS/EKS Service Instances

AWS API Gateway handles authentication (Cognito/Lambda authorizers), rate limiting, and API key management. The ALB behind it distributes traffic across container instances. CloudFront at the edge caches responses and handles TLS. This is the standard pattern for production AWS microservices. The API Gateway adds 10-30ms of latency but gives you per-consumer throttling, request validation, and usage plans.

Pattern 3: Large — Service Mesh + API Gateway + Load Balancers

text

Client → Edge Proxy (Envoy/Cloudflare)
       → API Gateway (Kong/Ambassador)
       → Internal LB → Service A (with Envoy sidecar)
       → Internal LB → Service B (with Envoy sidecar)
       → Internal LB → Service C (with Envoy sidecar)

At this scale, each layer has a distinct job. The edge proxy handles global TLS termination, DDoS protection, and geographic routing. The API gateway handles authentication, rate limiting, and external API routing. Internal load balancers distribute traffic within each service cluster. Envoy sidecars (service mesh) handle inter-service communication — retries, circuit breaking, mutual TLS, and distributed tracing. This is what you describe in interviews for "design a system at Netflix scale."

How Companies Combine Them

Netflix uses Zuul as their API gateway at the edge, handling authentication, routing, and canary deployments. Behind Zuul, Eureka provides service discovery and Ribbon handles client-side load balancing. Each service finds other services via Eureka and load-balances across instances using Ribbon. The gateway handles external concerns; the service mesh handles internal ones.

Cloudflare operates as a massive reverse proxy at the edge. When you put your site behind Cloudflare, their edge network terminates TLS, caches content, applies WAF rules, and does rate limiting. Then it load-balances traffic across your origin servers using their Load Balancing product. Cloudflare effectively plays all three roles but at different points in the request path — reverse proxy at the edge, API gateway (via Workers and Rules), and load balancer to origins.

AWS stacks them explicitly. CloudFront (reverse proxy/CDN) sits at the edge, caching and compressing. API Gateway sits behind it, handling auth and rate limiting. ALB sits behind that, distributing across ECS tasks or EC2 instances. Each component adds latency but also adds a specific capability. For internal service-to-service calls, services skip the API Gateway and go directly through internal ALBs or use AWS App Mesh (Envoy-based service mesh).

Common Interview Questions

"Where would you put the API gateway in this architecture?"

The API gateway sits at the boundary between external clients and your internal services. It's the single entry point for all external traffic. Place it after your CDN/edge proxy (if you have one) and before your internal load balancers and services. For internal service-to-service communication, you typically bypass the API gateway entirely — services talk to each other through internal load balancers or a service mesh. Routing internal calls through the API gateway adds unnecessary latency and creates a single point of failure for internal traffic.

"Why not just use a load balancer for everything?"

Because a load balancer doesn't understand your API semantics. It can distribute traffic, but it can't enforce authentication, manage API keys, apply per-consumer rate limits, transform request bodies, aggregate responses from multiple services, or handle API versioning. You could add all of this logic to your application code, but then every service duplicates the same cross-cutting concerns. The API gateway centralizes these, which is the whole point. That said, if you have a monolith with no external API consumers and just need high availability — a load balancer is exactly the right answer. Don't introduce an API gateway for internal-only traffic with no authentication requirements.

"How is an API gateway different from a reverse proxy?"

A reverse proxy operates at the transport level — it handles TLS, caching, compression, and forwarding. It doesn't understand your application's API contract. An API gateway operates at the application level — it understands routes, consumers, quotas, authentication schemes, and request/response schemas. A reverse proxy asks "where should this HTTP request go?" An API gateway asks "is this consumer allowed to call this API, at this rate, with this payload, and should I transform the response before returning it?" In practice, tools like Nginx Plus and Envoy blur this line by adding application-level features to what started as reverse proxies.

Decision Framework for System Design Interviews

Use this mental model when deciding what to include in your architecture:

Step 1: Start with a load balancer. If you have multiple instances of any service, you need traffic distribution. This is almost always true in a system design interview. Use an internal load balancer for each service that runs multiple replicas.

Step 2: Add a reverse proxy for transport concerns. If clients connect over the internet, you need TLS termination. If you serve static content, you need caching. If you want to hide your internal topology, you need a reverse proxy. In most interview scenarios, a CDN or edge proxy fills this role (CloudFront, Cloudflare).

Step 3: Add an API gateway for application concerns. If external clients access multiple services through a unified API, add an API gateway. If you need authentication, rate limiting, or request routing based on API paths — that's the gateway's job. If you're designing an internal-only system with no external consumers, you may not need one.

Step 4: Consider a service mesh for internal communication. At large scale with many microservices, a service mesh (Envoy sidecars) handles inter-service retries, circuit breaking, mutual TLS, and observability without burdening application code.

The key interviewer insight: Don't over-engineer. A startup with one API and three servers doesn't need Kong, Envoy, and a CDN. Nginx as a reverse proxy and load balancer is the right answer. A company with 50 microservices, external API consumers, and multiple client platforms needs distinct layers. Match the complexity of your infrastructure to the complexity of your problem.

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.