How Discord Stores Trillions of Messages

2025-02-209 min read

How Discord Stores Trillions of Messages

Discord handles over 4 billion messages per day across millions of servers (guilds). Their messaging infrastructure has evolved through three database generations, each solving problems the previous one could not handle.

The Journey

System architecture diagram for How Discord Stores Trillions of Messages showing how services, databases, and caches connect — System architecture for How Discord Stores Trillions of Messages

Phase 1: MongoDB (2015-2017)

Discord started with MongoDB. It was fast to develop with and handled early scale. But as Discord grew, they hit problems:

Data exceeded RAM: MongoDB performance degrades when the working set does not fit in memory
Unpredictable latency: Garbage collection pauses caused latency spikes
Difficult to scale: Sharding MongoDB required significant application changes

Phase 2: Cassandra (2017-2022)

Discord migrated to Apache Cassandra — a distributed wide-column store designed for high write throughput.

Data model:

Partition key: channel_id + message_bucket (bucket = time window, e.g., 10 days)
Clustering key: message_id (Snowflake ID — sortable by time)
Each partition holds 10 days of messages for one channel

Why buckets? Without bucketing, a popular channel's partition would grow unbounded (millions of messages). Buckets cap partition size, keeping reads efficient.

What worked:

Linear write scaling (add nodes = more throughput)
Predictable latency for recent messages (single partition read)
Automatic replication across data centers

What went wrong at extreme scale:

Hot partitions: Popular channels (100M+ member servers) created uneven load
GC pressure: Cassandra's JVM-based architecture caused latency spikes during compaction and garbage collection
Read amplification: Reading old messages required scanning multiple SSTables

Phase 3: ScyllaDB (2022-present)

Discord migrated to ScyllaDB — a Cassandra-compatible database written in C++ with a shard-per-core architecture. Zero garbage collection. Predictable microsecond latencies.

Key improvements:

p99 latency dropped from 40-125ms to 15ms
Same data model (Cassandra CQL compatible — minimal application changes)
Fewer nodes: ScyllaDB's efficiency means fewer servers for the same workload
No GC pauses: C++ runtime eliminates the JVM garbage collection problem entirely

Architecture Insights

Step-by-step diagram showing how How Discord Stores Trillions of Messages processes a request from start to finish — How How Discord Stores Trillions of Messages works step by step

Message ID: Snowflake

Discord uses Snowflake IDs — 64-bit integers encoding:

Timestamp (42 bits) — messages are inherently time-sorted
Worker ID (5 bits) — which server generated the ID
Process ID (5 bits)
Sequence (12 bits) — 4096 IDs per millisecond per worker

This means: no coordination needed for ID generation, IDs are sortable by time, and they are compact (8 bytes vs 36 bytes for UUID).

Data Services Layer

Discord does not query ScyllaDB directly from application code. A data services layer sits between the application and database:

Coalesces duplicate requests (100 users requesting the same channel's messages = 1 DB query)
Provides request routing and caching
Abstracts the database, enabling migration without touching application code

Key Takeaways

Choose databases for your access pattern: Cassandra/ScyllaDB excels at time-series data partitioned by entity
Bucket your partitions: Unbounded partition growth kills performance
JVM GC matters at scale: For latency-sensitive workloads, consider non-JVM databases
Add a data services layer: Abstracts the database and enables optimizations like request coalescing

How Discord Stores Trillions of Messages article overview — How Discord Stores Trillions of Messages — Hero

Comparison table for How Discord Stores Trillions of Messages contrasting approaches, tradeoffs, and when to use each — Comparing key metrics for How Discord Stores Trillions of Messages

How Discord Stores Trillions of Messages key takeaways and lessons learned — How Discord Stores Trillions of Messages — Takeaways

Sources

Data flow diagram for How Discord Stores Trillions of Messages showing how requests and responses move through the system — Data flow through How Discord Stores Trillions of Messages

Putting This Into Practice

Component diagram for How Discord Stores Trillions of Messages showing each building block and its responsibility — Key components of How Discord Stores Trillions of Messages

Understanding the theory is only half the battle. Here is how to apply these concepts in your daily work:

Start small. Pick one project or one component of your current system and apply the ideas from this article. Do not try to redesign everything at once.

Document your decisions. When you make an architectural choice, write a short ADR (Architecture Decision Record) explaining what you chose, why, and what alternatives you considered. Future you will thank present you.

Talk to your team. System design is a team sport. Share what you learn, discuss tradeoffs openly, and build shared understanding. The best architectures come from teams that communicate well, not from lone geniuses.

Key Takeaways

Interview preparation checklist for How Discord Stores Trillions of Messages with key points to mention and mistakes to avoid — Interview tips for How Discord Stores Trillions of Messages

Every design decision involves tradeoffs — there is no perfect solution
Start simple and evolve as requirements grow
Measure before optimizing — premature optimization wastes engineering time
Learn from production incidents — they teach you more than any textbook
Practice explaining your reasoning — this is what interviews test

Why Cassandra Was Not Enough

Discord ran on Cassandra for five years before the problems became critical. The JVM-based runtime caused unpredictable garbage collection pauses — sometimes 200ms or more — that directly affected message delivery latency. During these pauses, message delivery would stall for all users on the affected node.

Compaction was another issue. Cassandra periodically merges SSTables (sorted string tables) to reclaim space and improve read performance. But compaction is I/O intensive, and during compaction runs, read latency spiked dramatically. At Discord's scale (trillions of messages), compaction ran almost continuously.

The final straw was operational complexity. Repairing inconsistencies between replicas required running nodetool repair, which at Discord's data volume took days to complete and consumed significant cluster resources during the process.

The ScyllaDB Migration

ScyllaDB provided the same Cassandra CQL interface (so application code barely changed) but with a C++ runtime that eliminated garbage collection entirely. The shard-per-core architecture assigns each CPU core its own portion of data, eliminating lock contention.

The migration took several months. Discord ran both databases in parallel, writing to both and reading from Cassandra while validating ScyllaDB responses. Once they confirmed consistency, they cut reads over to ScyllaDB. The result: p99 latency dropped from 40-125ms to a consistent 15ms.

Lessons for Your Systems

The Discord story teaches three things: First, your initial database choice does not need to be perfect — you can migrate later if you plan for it. Second, JVM-based databases have inherent latency variability from garbage collection that C/C++ alternatives avoid. Third, the data services abstraction layer that sits between your application and database is not just good architecture — it is what makes migration possible without rewriting your entire application.

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Explore More

What Most Articles Get Wrong

Many articles about How Discord Stores Trillions Of Messages present an oversimplified view that misses the operational reality. In production, the theoretical best practices often collide with constraints like legacy systems, team expertise, budget limitations, and compliance requirements. The engineers who successfully implement these patterns at scale are the ones who understand not just the "what" but the "when" and "when not to."

The nuance that matters: context determines everything. A pattern that works at Netflix's scale (200M users, 1000+ engineers) is overkill for a startup with 10,000 users and 3 engineers. Always match the solution complexity to the problem complexity.

The Numbers That Matter

Latency percentiles matter more than averages: p99 latency often reveals problems that p50 hides
Error budgets quantify acceptable risk: if your SLA is 99.95%, you have 21.9 minutes of downtime per month to spend on deployments and experiments
Cost per request at scale determines architecture: a $0.001 cost difference per request becomes $1M per year at 1 billion requests/year
Team cognitive load is the hidden constraint: a system your team cannot understand is a system your team cannot operate safely