Real-Time Messaging Architecture at Slack

How Slack delivers real-time messages to millions of concurrent users using WebSocket connections, message fanout, and channel-based routing.

Company Context

Slack is a workplace messaging platform serving millions of organizations with real-time communication. Users expect messages to appear instantly (within 100-200ms), conversations to maintain perfect ordering, and the system to handle users who are members of hundreds or thousands of channels simultaneously. Slack manages millions of concurrent WebSocket connections across its infrastructure.

The Problem at Scale

System architecture diagram for Real-Time Messaging Architecture at Slack showing how services, databases, and caches connect — System architecture for Real-Time Messaging Architecture at Slack

Real-time messaging has deceptively simple requirements — send a message, everyone in the channel sees it — but the implementation challenges are enormous. Each online user maintains a persistent WebSocket connection to Slack's servers. When a message is posted to a channel with 10,000 members, the system must determine which of those members are currently online, find the servers holding their WebSocket connections, and deliver the message — all within milliseconds. This is the fanout problem: one write (a message) must be delivered to many readers (channel members) with low latency.

Additionally, Slack must handle presence (knowing who is online), typing indicators (ephemeral, high-frequency events), message ordering (messages must appear in the same order for all users), and reliable delivery (if a user briefly disconnects, they must receive missed messages on reconnection).

Architecture Solution

Step-by-step diagram showing how Real-Time Messaging Architecture at Slack processes a request from start to finish — How Real-Time Messaging Architecture at Slack works step by step

Slack's architecture separates the concerns of connection management, message routing, and message storage.

Connection gateways are the edge layer that terminates WebSocket connections. Each gateway server handles tens of thousands of concurrent connections. Gateways are stateless in terms of business logic — they simply hold connections and forward messages.

When a message is posted, it flows through a message server that persists it to the database, assigns it a monotonically increasing sequence number within the channel, and publishes it to an internal message bus (similar to a pub/sub system). The message bus routes the message to every gateway server that has online members of that channel.

Comparison table for Real-Time Messaging Architecture at Slack contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Real-Time Messaging Architecture at Slack

Each gateway server maintains an in-memory map of which channels each connected user belongs to. When a message arrives from the message bus for a particular channel, the gateway identifies all local connections subscribed to that channel and pushes the message down their WebSocket connections.

Presence is tracked separately: when a user connects or disconnects, a presence event is published. Slack uses a heartbeat mechanism where clients periodically confirm they are active, and the system aggregates presence state.

For reconnection, each client tracks the last sequence number it received per channel. On reconnect, the client sends these sequence numbers, and the server sends any messages with higher sequence numbers — a simple gap-fill protocol.

Data flow diagram for Real-Time Messaging Architecture at Slack showing how requests and responses move through the system — Data flow through Real-Time Messaging Architecture at Slack

Key Techniques Used

WebSocket connections: Persistent bidirectional connections for real-time delivery
Connection gateways: Stateless edge servers that hold connections and forward messages
Channel-based message bus: Pub/sub routing from message servers to gateway servers
Sequence numbers: Monotonically increasing per-channel IDs for ordering and gap detection
Fanout at the gateway: Each gateway delivers messages to locally connected channel members
Gap-fill on reconnection: Client reports last seen sequence; server sends missed messages
Separate presence tracking: Heartbeat-based online/offline detection, decoupled from messaging

Lessons for System Design Interviews

Component diagram for Real-Time Messaging Architecture at Slack showing each building block and its responsibility — Key components of Real-Time Messaging Architecture at Slack

This is the canonical reference for "design a real-time chat system." Key points: separate connection management from message routing; use a pub/sub message bus for fanout; assign sequence numbers for ordering; handle reconnection via gap-fill. Discuss the tradeoff between push (WebSocket) and pull (polling) models. Know that the fanout problem (one message to many recipients) is the core scalability challenge.

Lessons for Production

WebSocket connection management is its own infrastructure problem — gateway servers must handle graceful connection migration during deploys. Sequence numbers per channel are critical for correctness; without them, clients cannot detect missed messages. Presence is surprisingly expensive at scale and should be approximated (e.g., "active in the last 5 minutes") rather than tracked exactly. Typing indicators and other ephemeral events should be treated differently from messages — they do not need persistence or ordering guarantees.

Interview preparation checklist for Real-Time Messaging Architecture at Slack with key points to mention and mistakes to avoid — Interview tips for Real-Time Messaging Architecture at Slack

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Decision guide for when to choose Real-Time Messaging Architecture at Slack and when alternative approaches are better — When to use Real-Time Messaging Architecture at Slack

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Tradeoff analysis for Real-Time Messaging Architecture at Slack listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Real-Time Messaging Architecture at Slack

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Production deployment examples of Real-Time Messaging Architecture at Slack at companies like Netflix, Google, and Amazon — Real-world examples of Real-Time Messaging Architecture at Slack

Key Takeaways for Interviews

Understand the core problem this resource addresses and be able to explain it in 2-3 sentences without jargon
Know the key trade-offs: what does this approach optimize for, and what does it sacrifice?
Be ready to compare this with alternative approaches and explain when each is appropriate
Connect the concepts to real-world systems you have worked with or studied
Demonstrate depth by discussing failure modes and how they are handled

How This Applies to Modern .NET Systems

The concepts from this resource translate to .NET through several established libraries and patterns:

Azure managed services often abstract away the underlying distributed systems complexity, but understanding the fundamentals helps you configure them correctly, debug issues, and make informed architectural decisions.

NuGet packages in the .NET ecosystem provide production-ready implementations of many patterns described in this resource. Before building custom solutions, check if a well-maintained package already exists.

ASP.NET Core middleware pipeline is where many of these patterns are implemented in practice: caching, rate limiting, health checks, and circuit breaking all fit naturally into the middleware model.

Sources

Real-Time Messaging at Slack — Slack Engineering

Sources

Original Sourcearticle