Discord Message Storage: From MongoDB to ScyllaDB

Discord's journey from MongoDB to Cassandra to ScyllaDB — how they scaled message storage for trillions of messages across millions of channels.

Company Context

Discord serves over 150 million monthly active users exchanging billions of messages daily. Messages must be stored durably, served with low latency, and queried efficiently — primarily by channel, with the most recent messages displayed first. The message storage system is one of Discord's most critical infrastructure components.

The Problem at Scale

System architecture diagram for Discord Message Storage: From MongoDB to ScyllaDB showing how services, databases, and caches connect — System architecture for Discord Message Storage: From MongoDB to ScyllaDB

Discord's first message store was MongoDB, which worked well initially but could not handle the growing data volume. They migrated to Cassandra, choosing it for its write throughput and linear scalability. Cassandra worked for years, but as Discord grew to trillions of stored messages, operational pain increased. Cassandra's JVM-based architecture caused unpredictable GC pauses that spiked tail latencies. Compaction on large datasets consumed enormous I/O. Maintenance operations like repair and anti-entropy became prohibitively expensive. Read latencies on cold data were unacceptable for Discord's real-time experience.

Architecture Solution

Discord migrated from Cassandra to ScyllaDB, a C++ reimplementation of Cassandra's protocol that eliminates JVM garbage collection and uses a shard-per-core architecture for predictable performance.

Step-by-step diagram showing how Discord Message Storage: From MongoDB to ScyllaDB processes a request from start to finish — How Discord Message Storage: From MongoDB to ScyllaDB works step by step

The data model partitions messages by (channel_id, bucket), where a bucket is a time window (roughly 10 days). This ensures each partition stays bounded in size — a critical design for LSM-tree databases where large partitions cause compaction problems. The primary key within each partition is the message snowflake ID (a time-based unique identifier), which naturally sorts messages chronologically.

A data services layer sits between the application servers and ScyllaDB. This layer handles request coalescing (if 100 users open the same channel simultaneously, only one database query is issued), consistent routing, and write-behind caching. This service eliminated hot-partition thundering herd problems that previously overwhelmed the database during popular events.

The migration itself was performed with zero downtime using a dual-write strategy: writes went to both Cassandra and ScyllaDB, then reads were gradually shifted over after data validation confirmed consistency.

Comparison table for Discord Message Storage: From MongoDB to ScyllaDB contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Discord Message Storage: From MongoDB to ScyllaDB

Key Techniques Used

Partition key design: channel_id + time bucket keeps partitions bounded
Snowflake IDs: Time-sortable unique identifiers eliminate the need for a separate sort key
Request coalescing: Data services layer deduplicates identical concurrent queries
Shard-per-core architecture: ScyllaDB assigns each CPU core its own data shard, eliminating lock contention
Dual-write migration: Zero-downtime migration by writing to both databases simultaneously
Write-behind caching: Reduces read load on the database for recently accessed channels

Lessons for System Design Interviews

Data flow diagram for Discord Message Storage: From MongoDB to ScyllaDB showing how requests and responses move through the system — Data flow through Discord Message Storage: From MongoDB to ScyllaDB

This case study is a strong reference for "design a chat system" questions. Demonstrate awareness that partition key design is the most impactful decision in wide-column databases. Show that you understand why time-bucketing prevents unbounded partition growth. Mention the data services layer as an example of application-level caching and request deduplication, which is often more effective than database-level optimization.

Lessons for Production

Database technology choices are not permanent — Discord migrated twice as their scale changed. The data services layer was arguably more impactful than the database migration itself, showing that a well-designed application layer can compensate for database limitations. Time-bucketed partition keys are a proven pattern for time-series-like data in wide-column stores.

Component diagram for Discord Message Storage: From MongoDB to ScyllaDB showing each building block and its responsibility — Key components of Discord Message Storage: From MongoDB to ScyllaDB

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Interview preparation checklist for Discord Message Storage: From MongoDB to ScyllaDB with key points to mention and mistakes to avoid — Interview tips for Discord Message Storage: From MongoDB to ScyllaDB

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Decision guide for when to choose Discord Message Storage: From MongoDB to ScyllaDB and when alternative approaches are better — When to use Discord Message Storage: From MongoDB to ScyllaDB

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Tradeoff analysis for Discord Message Storage: From MongoDB to ScyllaDB listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Discord Message Storage: From MongoDB to ScyllaDB

Key Takeaways for Interviews

Understand the core problem this resource addresses and be able to explain it in 2-3 sentences without jargon
Know the key trade-offs: what does this approach optimize for, and what does it sacrifice?
Be ready to compare this with alternative approaches and explain when each is appropriate
Connect the concepts to real-world systems you have worked with or studied
Demonstrate depth by discussing failure modes and how they are handled

How This Applies to Modern .NET Systems

Production deployment examples of Discord Message Storage: From MongoDB to ScyllaDB at companies like Netflix, Google, and Amazon — Real-world examples of Discord Message Storage: From MongoDB to ScyllaDB

The concepts from this resource translate to .NET through several established libraries and patterns:

Azure managed services often abstract away the underlying distributed systems complexity, but understanding the fundamentals helps you configure them correctly, debug issues, and make informed architectural decisions.

NuGet packages in the .NET ecosystem provide production-ready implementations of many patterns described in this resource. Before building custom solutions, check if a well-maintained package already exists.

ASP.NET Core middleware pipeline is where many of these patterns are implemented in practice: caching, rate limiting, health checks, and circuit breaking all fit naturally into the middleware model.

Sources

How Discord Stores Trillions of Messages — Discord Engineering, 2023

Sources

Original Sourcearticle