hard8 min readUpdated 2026-06-08

Design Zoom

Design Zoom with WebRTC, SFU architecture, screen sharing, recording, and NAT traversal.

Problem Statement

Design a video conferencing platform like Zoom supporting 1-on-1 calls, group meetings (up to 1000 participants), screen sharing, meeting recording, and virtual backgrounds. The system must handle real-time audio/video with <200ms end-to-end latency, adapt to varying network conditions, and traverse NATs/firewalls.

Requirements

System architecture diagram for Design Zoom showing how services, databases, and caches connect — System architecture for Design Zoom

Functional

Create/join meetings with a meeting ID; host controls (mute, remove, breakout rooms)
Real-time audio and video streaming between all participants
Screen sharing: one participant shares their screen, visible to all
Meeting recording: save audio/video/screen share to cloud storage for later playback

Non-Functional

Latency: <150ms glass-to-glass for audio, <200ms for video (same continent)
Scale: 300M daily meeting participants, 100K concurrent meetings, up to 1000 participants per meeting
Reliability: Graceful degradation -- reduce video quality before dropping participants
NAT traversal: Works behind corporate firewalls and symmetric NATs

Core Architecture

Step-by-step diagram showing how Design Zoom processes a request from start to finish — How Design Zoom works step by step

Signaling Server -- WebSocket-based server that handles meeting creation, participant join/leave, and SDP (Session Description Protocol) exchange for WebRTC handshake. Does not carry media -- only control messages. Coordinates which SFU node each participant should connect to.
SFU (Selective Forwarding Unit) -- The media server. Each participant sends one audio and one video stream to the SFU. The SFU selectively forwards streams to other participants without transcoding (unlike an MCU). For large meetings (>50 participants), the SFU only forwards the active speaker's video and thumbnail-quality streams for the rest. Deployed in multiple regions; participants connect to the nearest SFU.
TURN/STUN Servers for NAT Traversal -- STUN servers help clients discover their public IP and port (works for ~80% of NATs). For symmetric NATs and firewalls that block UDP, TURN servers relay media traffic through a publicly reachable server. TURN is expensive (all media flows through it), so it is used only as a fallback.

Data flow diagram for Design Zoom showing how requests and responses move through the system — Data flow through Design Zoom

Recording Service -- A headless participant joins the meeting on a server, receives all streams via the SFU, and composites them into a single video file (speaker view or gallery view). Audio streams are mixed server-side. The recording is encoded to H.264 and uploaded to S3 in chunks. Available for download/playback within minutes of meeting end.
Adaptive Bitrate Controller -- Each client monitors network conditions (packet loss, RTT, available bandwidth) using RTCP feedback. When bandwidth drops, the client: (1) reduces video resolution (1080p -> 720p -> 360p -> audio-only), (2) drops non-speaker video streams, (3) increases audio FEC (Forward Error Correction) to protect speech quality. This happens within 2 seconds of detecting congestion.

Database Choice

Interview preparation checklist for Design Zoom with key points to mention and mistakes to avoid — Interview tips for Design Zoom

PostgreSQL for user accounts, meeting metadata (scheduled meetings, participants, settings), and recording metadata. Redis for real-time meeting state: active participants, mute status, speaker detection, and room assignments. S3 for recorded meeting files. The media pipeline itself uses no database -- media streams are forwarded in real-time through the SFU without storage (except for recording).

Key API Endpoints

text

POST /api/v1/meetings
  -> Body: \{ host_id: "U1", scheduled_time: "...", settings: \{ max_participants: 100, recording: true \} \}
  -> Returns: \{ meeting_id: "M-123456", join_url: "https://meet.example.com/M-123456" \}

WebSocket /ws/signaling/\{meeting_id\}
  -> Client sends: \{ type: "join", sdp_offer: "v=0..." \}
  -> Server responds: \{ type: "answer", sdp_answer: "v=0...", ice_candidates: [...] \}
  -> Server pushes: \{ type: "participant_joined", user_id: "U2", stream_id: "S2" \}

POST /api/v1/meetings/\{meeting_id\}/recordings
  -> Returns: \{ recording_url: "https://s3.../M-123456.mp4", duration_min: 45, size_mb: 680 \}

Scaling Insight

Decision guide for when to choose Design Zoom and when alternative approaches are better — When to use Design Zoom

The SFU architecture is what makes large meetings possible. In a peer-to-peer mesh, each participant sends their stream to every other participant (N^2 connections). In a meeting of 100 people, that is 9,900 streams -- impossible. With an SFU, each participant sends 1 stream and receives N-1 streams from the SFU. For large meetings (>49 participants), the SFU sends only the active speaker at full quality and others at thumbnail quality, reducing downstream bandwidth from 49 HD streams to 1 HD + 48 thumbnail streams (~90% bandwidth reduction).

Key Tradeoffs

Decision	Option A	Option B	Chosen
Media server	MCU (mix all streams)	SFU (forward selectively)	SFU -- no transcoding cost, lower latency, scales to 1000 participants
Transport	TCP (reliable)	UDP (low latency)	UDP (via WebRTC) -- lost packets are acceptable for real-time media; retransmission causes stutter
NAT traversal	TURN always (reliable)	STUN first, TURN fallback	STUN first -- 80% of clients connect directly, TURN only for the 20% behind strict NATs

Practical Implementation for .NET Developers

Tradeoff analysis for Design Zoom listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Zoom

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Production deployment examples of Design Zoom at companies like Netflix, Google, and Amazon — Real-world examples of Design Zoom

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

Comparison table for Design Zoom contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Zoom

This gives you searchable, structured logs in Azure Monitor or Seq.

System-Specific Clarifying Questions

Component diagram for Design Zoom showing each building block and its responsibility — Key components of Design Zoom

Before designing Zoom, ask questions specific to THIS system:

Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Zoom should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Zoom, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources

Design Zoom -- Reference
Source: System-Design-Overview

Reference

Reference Solutionvideo