Skip to main content
SDMastery
advanced7 min readUpdated 2026-06-03

Change Data Capture (CDC)

CDC enables real-time data synchronization between databases, caches, search indexes, and analytics systems without tight coupling.

Change Data Capture (CDC) system design overview showing key components and metrics
High-level overview of Change Data Capture (CDC)
Change Data Capture (CDC)

The Problem Change Data Capture (CDC) Solves

CDC enables real-time data synchronization between databases, caches, search indexes, and analytics systems without tight coupling. It is how modern systems keep Elasticsearch in sync with PostgreSQL, or update a cache when the database changes.

How It Works Under the Hood

Change Data Capture (CDC) system architecture with service components and data flow
System architecture for Change Data Capture (CDC)

Change Data Capture (CDC) is a pattern that tracks changes (inserts, updates, deletes) in a database and streams them as events to other systems. Instead of polling the database for changes, CDC reads the database's transaction log (WAL/binlog) and publishes events in real-time.

Debezium connects to PostgreSQL's WAL replication slot. When a row is inserted, updated, or deleted, Debezium captures the change and publishes it to a Kafka topic (e.g., dbserver1.public.orders). Downstream consumers (Elasticsearch, cache, analytics) subscribe to the topic and update their own stores.

This creates a real-time data pipeline: Database change → WAL → CDC connector → Kafka → Consumers.

The Mental Model

Step-by-step diagram showing how Change Data Capture (CDC) works in practice
How Change Data Capture (CDC) works step by step
  • Log-based CDC: Reads the database's write-ahead log (PostgreSQL WAL, MySQL binlog). Most reliable, no impact on database performance.
  • Trigger-based CDC: Database triggers fire on changes and write to a change table. Simpler but adds load to the database.
  • Polling-based CDC: Periodically query the database for recent changes (WHERE updated_at > last_poll). Simple but not real-time.
  • Debezium: The most popular open-source CDC tool. Reads MySQL/PostgreSQL/MongoDB logs and streams to Kafka.
  • Event sourcing vs CDC: Event sourcing is designing your system around events from the start. CDC retrofits event streaming onto an existing database.

Real Systems That Depend on This

LinkedIn pioneered CDC at scale, using Databus to stream database changes to search indexes and caches.

Uber uses Debezium + Kafka for real-time data synchronization across hundreds of microservices.

Comparison table for Change Data Capture (CDC) showing key metrics and tradeoffs
Comparing key aspects of Change Data Capture (CDC)

Airbnb uses CDC to keep their search index (Elasticsearch) in sync with their primary database (MySQL).

Where This Shows Up in Interviews

  1. What is CDC and when would you use it?
  2. How does log-based CDC work?
  3. What are the advantages of CDC over polling?
  4. How do you handle schema changes with CDC?

Tradeoffs

Data flow diagram for Change Data Capture (CDC) showing request and response paths
Data flow through Change Data Capture (CDC)
  • Complexity: CDC adds a data pipeline to manage (Debezium, Kafka, consumers).
  • Latency: Log-based CDC has sub-second latency; polling can have minutes of delay.
  • Schema evolution: Database schema changes can break CDC consumers if not handled carefully.

Watch Out For

  1. Using polling when log-based CDC is available — misses changes between polls
  2. Not handling schema evolution — a column rename breaks all consumers
  3. Not monitoring CDC lag — downstream systems can fall behind

Go Deeper

Key components of Change Data Capture (CDC) with roles and responsibilities
Key components of Change Data Capture (CDC)

The Real-World Incident That Made This Famous

Understanding Change Data Capture became critical after multiple high-profile production incidents at major tech companies. When systems handle millions of users, even small misunderstandings about Change Data Capture can lead to cascading failures that cost millions in lost revenue and erode user trust. Companies like Netflix, Google, Amazon, and Meta have all invested heavily in mastering Change Data Capture because they learned the hard way that ignoring it leads to outages.

Interview tips for Change Data Capture (CDC) system design questions
Interview tips for Change Data Capture (CDC)

The key lesson from these incidents: Change Data Capture is not just a theoretical concept — it is a practical skill that separates engineers who build resilient systems from those who build fragile ones.

How Senior Engineers Think About This

Senior engineers approach Change Data Capture differently from textbook definitions. Instead of memorizing rules, they build mental models. They ask: "What problem does Change Data Capture solve? When does it fail? What are the alternatives?" This problem-first thinking leads to better design decisions because every system has unique constraints.

When evaluating Change Data Capture in a system design context, experienced engineers consider the failure modes first. What happens when this component goes down? How does the system degrade? Is the degradation graceful or catastrophic? These questions reveal more about your understanding than any textbook definition.

Decision guide showing when to use Change Data Capture (CDC) and when to avoid
When to use Change Data Capture (CDC)

Common Interview Mistakes

Mistake 1: Giving a textbook definition without context. Interviewers want to see you connect Change Data Capture to real systems and real problems.

Mistake 2: Not discussing trade-offs. Every design decision involving Change Data Capture has trade-offs. Discuss what you gain and what you give up.

Mistake 3: Overcomplicating the solution. Start with the simplest approach to Change Data Capture that meets the requirements, then add complexity only when justified.

Pros and cons analysis of Change Data Capture (CDC) for system design decisions
Advantages and disadvantages of Change Data Capture (CDC)

Production Checklist

  • Define clear metrics for measuring the effectiveness of your Change Data Capture implementation
  • Set up monitoring and alerting that specifically tracks Change Data Capture-related failures
  • Document your Change Data Capture design decisions in Architecture Decision Records (ADRs)
  • Test failure scenarios related to Change Data Capture in staging before production deployment
  • Review and update your Change Data Capture implementation quarterly as system requirements evolve
  • Train new team members on the specific Change Data Capture patterns used in your system

Read the original source | Content from System-Design-Overview

Practical Implementation for .NET Developers

Real-world companies using Change Data Capture (CDC) in production systems
Real-world examples of Change Data Capture (CDC)

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

External Resources

Original Sourcearticle