Microservices Communication Patterns: REST vs gRPC vs Events
Microservices Communication Patterns: REST vs gRPC vs Events
The moment you split a monolith into services, communication becomes the critical design decision. The protocol you choose between services determines latency, coupling, error handling complexity, and how painful debugging will be at 3 AM.
There is no single correct answer. Most production systems use all three patterns, choosing the right protocol for each interaction. The goal is to understand when each shines and when each causes pain.
REST: The Universal Default
REST over HTTP is the most widely used communication pattern. JSON payloads, standard HTTP methods (GET, POST, PUT, DELETE), and ubiquitous tooling make it the default choice for most service-to-service communication.
When REST works well:
Public APIs. Stripe, Twilio, GitHub, and virtually every SaaS company exposes a REST API. The reasons are practical: every programming language has an HTTP client, JSON is human-readable, and developers already know how REST works. The learning curve is zero.
CRUD operations with moderate throughput. A user service that handles profile reads and updates at 1,000 requests per second is perfectly served by REST. The overhead of JSON serialization and HTTP headers is negligible at this scale.
Cross-team and cross-company boundaries. When the calling team does not control the called service (or vice versa), REST's simplicity and self-documenting nature reduce integration friction. A REST endpoint with clear URL paths and standard status codes is understandable without reading implementation code.
When REST hurts:
High-throughput internal communication. JSON serialization is CPU-intensive. HTTP/1.1 creates a new TCP connection for each request (or uses keep-alive connections with head-of-line blocking). At 100,000+ RPS between two internal services, these overheads matter.
Strongly typed contracts. REST relies on documentation and conventions for the request/response schema. There is no compiler-enforced contract. A field rename in the response breaks the caller silently (no compile error, just a runtime null). OpenAPI/Swagger specifications help, but they are documentation, not enforcement.
Streaming data. REST is request-response by nature. Streaming real-time updates (live scores, stock tickers, chat messages) requires workarounds: long polling, Server-Sent Events, or switching to WebSockets — all of which break the REST model.
gRPC: When Performance and Contracts Matter
gRPC is a high-performance RPC framework developed by Google. It uses Protocol Buffers (protobuf) for serialization and HTTP/2 for transport. It was designed specifically for service-to-service communication in microservices architectures.
When gRPC works well:
High-throughput internal services. Protobuf binary serialization is 5-10x faster than JSON serialization and produces payloads 3-5x smaller. HTTP/2 multiplexing allows thousands of concurrent RPCs on a single TCP connection without head-of-line blocking. Google processes billions of gRPC calls per second across their internal services.
Strongly typed service contracts. The .proto file defines the service interface, request, and response types. Code generation produces client and server stubs in dozens of languages. If the server changes a field type, the client's generated code fails to compile. This catches breaking changes before deployment, not in production.
Bi-directional streaming. gRPC natively supports four communication patterns: unary (request-response), server streaming (server sends a stream of responses), client streaming (client sends a stream of requests), and bi-directional streaming (both sides stream simultaneously). This makes gRPC ideal for real-time data feeds, file uploads, and chat-like interactions.
When gRPC hurts:
Browser clients. Browsers do not support HTTP/2 trailers, which gRPC requires. gRPC-Web is a workaround that adds a proxy layer, but it limits functionality (no client streaming, no bi-directional streaming). For browser-facing APIs, REST or GraphQL remains the practical choice.
Debugging and inspection. Protobuf messages are binary and not human-readable. You cannot curl a gRPC endpoint and read the response. Debugging requires tools like grpcurl, Postman with gRPC support, or custom logging that deserializes protobuf messages. This is a real friction point during development.
Ecosystem maturity. REST has decades of tooling: API gateways, documentation generators, testing tools, monitoring dashboards. gRPC tooling is improving but still less mature. Rate limiting, authentication, and request logging often require additional middleware.
Event-Driven: When Decoupling Is the Priority
Event-driven communication uses a message broker (Kafka, RabbitMQ, SQS, Google Pub/Sub) to decouple producers from consumers. The producer emits an event, and any number of consumers process it independently.
When events work well:
Fan-out operations. When one action triggers work in multiple services, events avoid the producer from knowing about or calling every consumer. An "OrderPlaced" event might be consumed by the inventory service, the email service, the analytics pipeline, the fraud detection system, and the loyalty points service. Adding a sixth consumer requires zero changes to the order service.
Temporal decoupling. If a downstream service is temporarily unavailable, events queue up in the broker. When the service recovers, it processes the backlog. No data is lost, and the producer was never blocked. This is impossible with synchronous REST or gRPC calls.
Load leveling. During traffic spikes, the broker absorbs the burst. Consumers process events at a sustainable rate. LinkedIn uses Kafka to absorb activity event bursts during peak hours — producers write millions of events per second, and consumers process them at a steady rate.
When events hurt:
Request-response interactions. When the caller needs an immediate answer ("is this credit card valid?"), event-driven communication adds unnecessary complexity. You would need to emit a "ValidateCard" event, wait for a "CardValidated" response event, and handle timeouts. This is the wrong pattern for synchronous operations.
Ordering and exactly-once processing. Events can arrive out of order (across partitions in Kafka) or be delivered more than once (at-least-once delivery). Consumers must be idempotent and may need to handle reordering. This complexity is invisible in the happy path but causes subtle bugs in production.
Debugging and observability. A single user action might generate a chain of events across six services. When something goes wrong, reconstructing the flow requires correlation IDs, distributed tracing, and centralized log aggregation. The debugging experience is significantly worse than a synchronous call chain.
How Companies Combine Them
Google uses gRPC for all internal service-to-service communication (billions of RPCs per day), REST for public APIs (Google Maps, Gmail, Cloud APIs), and Pub/Sub for event-driven pipelines (data processing, notifications).
Netflix uses gRPC between internal Java services, REST for the public API that mobile and TV clients consume, and Kafka for event-driven data pipelines (viewing history, recommendation model training, content delivery decisions).
Uber uses gRPC for real-time operations (ride matching, ETA calculation — where latency matters), REST for external partner APIs (restaurants, drivers, merchants), and Kafka for post-trip processing (payments, analytics, fraud detection, receipts).
Slack uses REST for the public API (third-party integrations), gRPC for internal service communication (message routing, presence updates), and Kafka for activity events (message storage, search indexing, notification delivery).
A Decision Framework
| Need | Best Pattern |
|---|---|
| Public API for external developers | REST |
| High-throughput internal service calls | gRPC |
| Real-time streaming between services | gRPC streaming |
| One event triggers multiple independent actions | Event-driven (Kafka/Pub-Sub) |
| Resilience against downstream failures | Event-driven |
| Synchronous request with immediate response | REST or gRPC |
| Cross-language internal communication | gRPC (code generation) |
| Human-readable debugging during development | REST |
| Financial or audit data with strict ordering | Event-driven (Kafka with ordered partitions) |
Migration Path
If you are starting with a monolith and extracting services, begin with REST for the first extracted service. REST has the lowest integration overhead and the most familiar debugging experience. As internal traffic between services grows and latency budgets tighten, migrate the highest-traffic internal paths to gRPC. Introduce events for the specific workflows that benefit from decoupling — background processing, multi-consumer fan-out, and temporal buffering.
The worst mistake is choosing a single protocol for all communication. Each pattern exists because it solves a specific problem well. The skill is matching the pattern to the interaction.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
What Most Articles Get Wrong
Many articles about Microservices Communication Patterns present an oversimplified view that misses the operational reality. In production, the theoretical best practices often collide with constraints like legacy systems, team expertise, budget limitations, and compliance requirements. The engineers who successfully implement these patterns at scale are the ones who understand not just the "what" but the "when" and "when not to."
The nuance that matters: context determines everything. A pattern that works at Netflix's scale (200M users, 1000+ engineers) is overkill for a startup with 10,000 users and 3 engineers. Always match the solution complexity to the problem complexity.
The Numbers That Matter
- Latency percentiles matter more than averages: p99 latency often reveals problems that p50 hides
- Error budgets quantify acceptable risk: if your SLA is 99.95%, you have 21.9 minutes of downtime per month to spend on deployments and experiments
- Cost per request at scale determines architecture: a $0.001 cost difference per request becomes $1M per year at 1 billion requests/year
- Team cognitive load is the hidden constraint: a system your team cannot understand is a system your team cannot operate safely