How Discord Stores Trillions of Messages
How Discord Stores Trillions of Messages
Discord handles over 4 billion messages per day across millions of servers (guilds). Their messaging infrastructure has evolved through three database generations, each solving problems the previous one could not handle.
The Journey
Phase 1: MongoDB (2015-2017)
Discord started with MongoDB. It was fast to develop with and handled early scale. But as Discord grew, they hit problems:
- Data exceeded RAM: MongoDB performance degrades when the working set does not fit in memory
- Unpredictable latency: Garbage collection pauses caused latency spikes
- Difficult to scale: Sharding MongoDB required significant application changes
Phase 2: Cassandra (2017-2022)
Discord migrated to Apache Cassandra — a distributed wide-column store designed for high write throughput.
Data model:
- Partition key: channel_id + message_bucket (bucket = time window, e.g., 10 days)
- Clustering key: message_id (Snowflake ID — sortable by time)
- Each partition holds 10 days of messages for one channel
Why buckets? Without bucketing, a popular channel's partition would grow unbounded (millions of messages). Buckets cap partition size, keeping reads efficient.
What worked:
- Linear write scaling (add nodes = more throughput)
- Predictable latency for recent messages (single partition read)
- Automatic replication across data centers
What went wrong at extreme scale:
- Hot partitions: Popular channels (100M+ member servers) created uneven load
- GC pressure: Cassandra's JVM-based architecture caused latency spikes during compaction and garbage collection
- Read amplification: Reading old messages required scanning multiple SSTables
Phase 3: ScyllaDB (2022-present)
Discord migrated to ScyllaDB — a Cassandra-compatible database written in C++ with a shard-per-core architecture. Zero garbage collection. Predictable microsecond latencies.
Key improvements:
- p99 latency dropped from 40-125ms to 15ms
- Same data model (Cassandra CQL compatible — minimal application changes)
- Fewer nodes: ScyllaDB's efficiency means fewer servers for the same workload
- No GC pauses: C++ runtime eliminates the JVM garbage collection problem entirely
Architecture Insights
Message ID: Snowflake
Discord uses Snowflake IDs — 64-bit integers encoding:
- Timestamp (42 bits) — messages are inherently time-sorted
- Worker ID (5 bits) — which server generated the ID
- Process ID (5 bits)
- Sequence (12 bits) — 4096 IDs per millisecond per worker
This means: no coordination needed for ID generation, IDs are sortable by time, and they are compact (8 bytes vs 36 bytes for UUID).
Data Services Layer
Discord does not query ScyllaDB directly from application code. A data services layer sits between the application and database:
- Coalesces duplicate requests (100 users requesting the same channel's messages = 1 DB query)
- Provides request routing and caching
- Abstracts the database, enabling migration without touching application code
Key Takeaways
- Choose databases for your access pattern: Cassandra/ScyllaDB excels at time-series data partitioned by entity
- Bucket your partitions: Unbounded partition growth kills performance
- JVM GC matters at scale: For latency-sensitive workloads, consider non-JVM databases
- Add a data services layer: Abstracts the database and enables optimizations like request coalescing
Sources
Putting This Into Practice
Understanding the theory is only half the battle. Here is how to apply these concepts in your daily work:
Start small. Pick one project or one component of your current system and apply the ideas from this article. Do not try to redesign everything at once.
Document your decisions. When you make an architectural choice, write a short ADR (Architecture Decision Record) explaining what you chose, why, and what alternatives you considered. Future you will thank present you.
Talk to your team. System design is a team sport. Share what you learn, discuss tradeoffs openly, and build shared understanding. The best architectures come from teams that communicate well, not from lone geniuses.
Key Takeaways
- Every design decision involves tradeoffs — there is no perfect solution
- Start simple and evolve as requirements grow
- Measure before optimizing — premature optimization wastes engineering time
- Learn from production incidents — they teach you more than any textbook
- Practice explaining your reasoning — this is what interviews test
Why Cassandra Was Not Enough
Discord ran on Cassandra for five years before the problems became critical. The JVM-based runtime caused unpredictable garbage collection pauses — sometimes 200ms or more — that directly affected message delivery latency. During these pauses, message delivery would stall for all users on the affected node.
Compaction was another issue. Cassandra periodically merges SSTables (sorted string tables) to reclaim space and improve read performance. But compaction is I/O intensive, and during compaction runs, read latency spiked dramatically. At Discord's scale (trillions of messages), compaction ran almost continuously.
The final straw was operational complexity. Repairing inconsistencies between replicas required running nodetool repair, which at Discord's data volume took days to complete and consumed significant cluster resources during the process.
The ScyllaDB Migration
ScyllaDB provided the same Cassandra CQL interface (so application code barely changed) but with a C++ runtime that eliminated garbage collection entirely. The shard-per-core architecture assigns each CPU core its own portion of data, eliminating lock contention.
The migration took several months. Discord ran both databases in parallel, writing to both and reading from Cassandra while validating ScyllaDB responses. Once they confirmed consistency, they cut reads over to ScyllaDB. The result: p99 latency dropped from 40-125ms to a consistent 15ms.
Lessons for Your Systems
The Discord story teaches three things: First, your initial database choice does not need to be perfect — you can migrate later if you plan for it. Second, JVM-based databases have inherent latency variability from garbage collection that C/C++ alternatives avoid. Third, the data services abstraction layer that sits between your application and database is not just good architecture — it is what makes migration possible without rewriting your entire application.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Explore More
What Most Articles Get Wrong
Many articles about How Discord Stores Trillions Of Messages present an oversimplified view that misses the operational reality. In production, the theoretical best practices often collide with constraints like legacy systems, team expertise, budget limitations, and compliance requirements. The engineers who successfully implement these patterns at scale are the ones who understand not just the "what" but the "when" and "when not to."
The nuance that matters: context determines everything. A pattern that works at Netflix's scale (200M users, 1000+ engineers) is overkill for a startup with 10,000 users and 3 engineers. Always match the solution complexity to the problem complexity.
The Numbers That Matter
- Latency percentiles matter more than averages: p99 latency often reveals problems that p50 hides
- Error budgets quantify acceptable risk: if your SLA is 99.95%, you have 21.9 minutes of downtime per month to spend on deployments and experiments
- Cost per request at scale determines architecture: a $0.001 cost difference per request becomes $1M per year at 1 billion requests/year
- Team cognitive load is the hidden constraint: a system your team cannot understand is a system your team cannot operate safely