Real-Time Messaging Architecture at Slack
How Slack delivers real-time messages to millions of concurrent users using WebSocket connections, message fanout, and channel-based routing.
Company Context
Slack is a workplace messaging platform serving millions of organizations with real-time communication. Users expect messages to appear instantly (within 100-200ms), conversations to maintain perfect ordering, and the system to handle users who are members of hundreds or thousands of channels simultaneously. Slack manages millions of concurrent WebSocket connections across its infrastructure.
The Problem at Scale
Real-time messaging has deceptively simple requirements — send a message, everyone in the channel sees it — but the implementation challenges are enormous. Each online user maintains a persistent WebSocket connection to Slack's servers. When a message is posted to a channel with 10,000 members, the system must determine which of those members are currently online, find the servers holding their WebSocket connections, and deliver the message — all within milliseconds. This is the fanout problem: one write (a message) must be delivered to many readers (channel members) with low latency.
Additionally, Slack must handle presence (knowing who is online), typing indicators (ephemeral, high-frequency events), message ordering (messages must appear in the same order for all users), and reliable delivery (if a user briefly disconnects, they must receive missed messages on reconnection).
Architecture Solution
Slack's architecture separates the concerns of connection management, message routing, and message storage.
Connection gateways are the edge layer that terminates WebSocket connections. Each gateway server handles tens of thousands of concurrent connections. Gateways are stateless in terms of business logic — they simply hold connections and forward messages.
When a message is posted, it flows through a message server that persists it to the database, assigns it a monotonically increasing sequence number within the channel, and publishes it to an internal message bus (similar to a pub/sub system). The message bus routes the message to every gateway server that has online members of that channel.
Each gateway server maintains an in-memory map of which channels each connected user belongs to. When a message arrives from the message bus for a particular channel, the gateway identifies all local connections subscribed to that channel and pushes the message down their WebSocket connections.
Presence is tracked separately: when a user connects or disconnects, a presence event is published. Slack uses a heartbeat mechanism where clients periodically confirm they are active, and the system aggregates presence state.
For reconnection, each client tracks the last sequence number it received per channel. On reconnect, the client sends these sequence numbers, and the server sends any messages with higher sequence numbers — a simple gap-fill protocol.
Key Techniques Used
- WebSocket connections: Persistent bidirectional connections for real-time delivery
- Connection gateways: Stateless edge servers that hold connections and forward messages
- Channel-based message bus: Pub/sub routing from message servers to gateway servers
- Sequence numbers: Monotonically increasing per-channel IDs for ordering and gap detection
- Fanout at the gateway: Each gateway delivers messages to locally connected channel members
- Gap-fill on reconnection: Client reports last seen sequence; server sends missed messages
- Separate presence tracking: Heartbeat-based online/offline detection, decoupled from messaging
Lessons for System Design Interviews
This is the canonical reference for "design a real-time chat system." Key points: separate connection management from message routing; use a pub/sub message bus for fanout; assign sequence numbers for ordering; handle reconnection via gap-fill. Discuss the tradeoff between push (WebSocket) and pull (polling) models. Know that the fanout problem (one message to many recipients) is the core scalability challenge.
Lessons for Production
WebSocket connection management is its own infrastructure problem — gateway servers must handle graceful connection migration during deploys. Sequence numbers per channel are critical for correctness; without them, clients cannot detect missed messages. Presence is surprisingly expensive at scale and should be approximated (e.g., "active in the last 5 minutes") rather than tracked exactly. Typing indicators and other ephemeral events should be treated differently from messages — they do not need persistence or ordering guarantees.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Key Takeaways for Interviews
- Understand the core problem this resource addresses and be able to explain it in 2-3 sentences without jargon
- Know the key trade-offs: what does this approach optimize for, and what does it sacrifice?
- Be ready to compare this with alternative approaches and explain when each is appropriate
- Connect the concepts to real-world systems you have worked with or studied
- Demonstrate depth by discussing failure modes and how they are handled
How This Applies to Modern .NET Systems
The concepts from this resource translate to .NET through several established libraries and patterns:
Azure managed services often abstract away the underlying distributed systems complexity, but understanding the fundamentals helps you configure them correctly, debug issues, and make informed architectural decisions.
NuGet packages in the .NET ecosystem provide production-ready implementations of many patterns described in this resource. Before building custom solutions, check if a well-maintained package already exists.
ASP.NET Core middleware pipeline is where many of these patterns are implemented in practice: caching, rate limiting, health checks, and circuit breaking all fit naturally into the middleware model.