Skip to main content
SDMastery
intermediate6 min readUpdated 2026-06-03

Service Discovery

In microservices architectures with dynamic scaling (containers, Kubernetes), services come and go constantly.

Service Discovery system design overview showing key components and metrics
High-level overview of Service Discovery
Service Discovery

The Core Idea

Service discovery is the mechanism by which services in a distributed system find and communicate with each other. Instead of hardcoding IP addresses, services register themselves with a discovery service and query it to find other services.

Step-by-Step Walkthrough

Service Discovery system architecture with service components and data flow
System architecture for Service Discovery

When Service A starts, it registers itself with the service registry (e.g., Consul): 'I am ServiceA, running at 10.0.1.5:8080.' When Service B needs to call Service A, it queries the registry: 'Where is ServiceA?' The registry returns [10.0.1.5:8080, 10.0.1.6:8080]. Service B picks one and makes the call.

In Kubernetes, this is handled automatically: creating a Service object creates a DNS entry and virtual IP that load balances across pods.

Why This Approach Wins

  • Service registry: A central database of service instances and their network locations (Consul, etcd, ZooKeeper, Eureka).
  • Client-side discovery: The client queries the registry and load balances across instances (Netflix Eureka + Ribbon).
  • Server-side discovery: A load balancer queries the registry and routes requests (AWS ALB, Kubernetes Service).
  • DNS-based discovery: Services register DNS records. Clients resolve DNS to find instances (Consul DNS, Kubernetes CoreDNS).
  • Health checking: The registry removes unhealthy instances so clients do not route to dead services.
Step-by-step diagram showing how Service Discovery works in practice
How Service Discovery works step by step

In Production

Netflix uses Eureka for service discovery across thousands of microservices in AWS.

Kubernetes uses etcd as its service registry, with CoreDNS for DNS-based discovery and kube-proxy for load balancing.

HashiCorp Consul provides service discovery with health checking, used by companies like Stripe and Twitch.

Comparison table for Service Discovery showing key metrics and tradeoffs
Comparing key aspects of Service Discovery

Tradeoffs and Limitations

  • Client-side vs Server-side: Client-side is more flexible but couples clients to the registry. Server-side is simpler for clients.
  • Consistency vs Availability: The registry must be highly available; if it goes down, services cannot find each other.
  • Push vs Pull: Registry pushes updates (faster) or clients poll (simpler but delayed).

Production Gotchas

  1. Making the service registry a single point of failure
  2. Not implementing health checks — dead instances remain in the registry
  3. Hardcoding service addresses instead of using discovery
Data flow diagram for Service Discovery showing request and response paths
Data flow through Service Discovery

The Interview Angle

  1. How does service discovery work?
  2. What is the difference between client-side and server-side discovery?
  3. How does Kubernetes handle service discovery?
  4. What happens when a service instance fails?

Next Up

Key components of Service Discovery with roles and responsibilities
Key components of Service Discovery

The Real-World Incident That Made This Famous

Understanding Service Discovery became critical after multiple high-profile production incidents at major tech companies. When systems handle millions of users, even small misunderstandings about Service Discovery can lead to cascading failures that cost millions in lost revenue and erode user trust. Companies like Netflix, Google, Amazon, and Meta have all invested heavily in mastering Service Discovery because they learned the hard way that ignoring it leads to outages.

The key lesson from these incidents: Service Discovery is not just a theoretical concept — it is a practical skill that separates engineers who build resilient systems from those who build fragile ones.

Interview tips for Service Discovery system design questions
Interview tips for Service Discovery

How Senior Engineers Think About This

Senior engineers approach Service Discovery differently from textbook definitions. Instead of memorizing rules, they build mental models. They ask: "What problem does Service Discovery solve? When does it fail? What are the alternatives?" This problem-first thinking leads to better design decisions because every system has unique constraints.

When evaluating Service Discovery in a system design context, experienced engineers consider the failure modes first. What happens when this component goes down? How does the system degrade? Is the degradation graceful or catastrophic? These questions reveal more about your understanding than any textbook definition.

Common Interview Mistakes

Decision guide showing when to use Service Discovery and when to avoid
When to use Service Discovery

Mistake 1: Giving a textbook definition without context. Interviewers want to see you connect Service Discovery to real systems and real problems.

Mistake 2: Not discussing trade-offs. Every design decision involving Service Discovery has trade-offs. Discuss what you gain and what you give up.

Mistake 3: Overcomplicating the solution. Start with the simplest approach to Service Discovery that meets the requirements, then add complexity only when justified.

Production Checklist

Pros and cons analysis of Service Discovery for system design decisions
Advantages and disadvantages of Service Discovery
  • Define clear metrics for measuring the effectiveness of your Service Discovery implementation
  • Set up monitoring and alerting that specifically tracks Service Discovery-related failures
  • Document your Service Discovery design decisions in Architecture Decision Records (ADRs)
  • Test failure scenarios related to Service Discovery in staging before production deployment
  • Review and update your Service Discovery implementation quarterly as system requirements evolve
  • Train new team members on the specific Service Discovery patterns used in your system

Read the original source | Content from System-Design-Overview

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Real-world companies using Service Discovery in production systems
Real-world examples of Service Discovery

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

External Resources

Original Sourcearticle