medium9 min readUpdated 2026-06-08

Design YouTube

Design YouTube with video upload/transcoding pipeline, adaptive streaming, view counting, recommendations, and a global video CDN.

Problem Statement

Design a video sharing platform like YouTube supporting video upload, transcoding to multiple resolutions and formats, adaptive bitrate streaming, accurate view counting at massive scale, comments, likes, and a recommendation system. Must handle 500 hours of video uploaded per minute and 1B hours watched daily.

Requirements

Functional

Upload videos (up to 12 hours); transcode to multiple resolutions (144p to 4K) and formats (H.264, VP9, AV1)
Adaptive bitrate streaming: client switches resolution seamlessly based on bandwidth
Accurate view counting with deduplication (no bots, no double-counts)
Personalized video recommendations on the home page and "Up Next" sidebar

System architecture diagram for Design YouTube showing how services, databases, and caches connect — System architecture for Design YouTube

Non-Functional

Latency: Video playback starts within 2 seconds; uploads processed within 30 minutes for most videos
Scale: 2B MAU, 500 hours of video uploaded/minute, 1B hours watched/day
Storage: ~1 exabyte of video content
Availability: 99.99% for video playback

Core Architecture

Upload and Transcoding Pipeline -- User uploads original video to a staging bucket. A Kafka event triggers a DAG of transcoding jobs: split video into segments, transcode each segment in parallel across multiple resolutions/codecs (H.264 for compatibility, VP9/AV1 for efficiency), generate thumbnails, extract audio tracks, create DASH/HLS manifests. Completed segments are written to the video CDN origin (S3/GCS).
Video CDN and Streaming -- Videos are served as DASH/HLS adaptive streams. The CDN caches popular videos at edge nodes. The video player client monitors buffer health and switches between resolution renditions mid-stream. For less popular videos, CDN pull-through fetches segments from origin on demand.

Step-by-step diagram showing how Design YouTube processes a request from start to finish — How Design YouTube works step by step

View Count Service -- Views are recorded in Kafka with deduplication: same user + same video within 30 seconds = 1 view. A Flink streaming job aggregates view counts in real time. Approximate counts (HyperLogLog for unique viewers) are shown immediately; exact counts are reconciled hourly via batch. This is critical for monetization accuracy.
Recommendation Engine -- Two-stage system: (1) Candidate Generation retrieves ~1000 videos from the user's watch history, subscriptions, and collaborative filtering embeddings. (2) Ranking model scores candidates using features like watch time prediction, click-through rate, and diversity. Served via a low-latency inference service (<50ms per request).

Database Choice

Vitess (sharded MySQL) for video metadata, channels, and user data -- Google uses Vitess internally for YouTube's relational data. Bigtable/Cassandra for view counts and watch history -- write-heavy, wide-column, partitioned by video_id or user_id. S3/GCS for video segment storage (exabyte scale). Redis for real-time view count approximations and session data. Elasticsearch for video search (title, description, captions).

Data flow diagram for Design YouTube showing how requests and responses move through the system — Data flow through Design YouTube

Key API Endpoints

text

POST /api/v1/videos/upload (resumable upload)
  -> Headers: Content-Range, Upload-ID
  -> Returns: \{ video_id: "V-123", status: "PROCESSING", estimated_time_min: 15 \}

GET /api/v1/videos/\{video_id\}/manifest.mpd
  -> Returns: DASH manifest with available resolution renditions

GET /api/v1/recommendations?context=home&limit=20
  -> Returns: \{ videos: [\{ video_id, title, thumbnail_url, channel, view_count, duration \}] \}

Scaling Insight

Segment-level parallel transcoding is what makes YouTube's upload pipeline feasible. A 1-hour video is split into 10-second segments (360 segments). Each segment is transcoded independently across 6 resolutions and 3 codecs = 6,480 parallel tasks. This is embarrassingly parallel and can be distributed across thousands of worker nodes. A 1-hour video that would take 6+ hours to transcode sequentially finishes in ~10 minutes with 600 parallel workers.

Key Tradeoffs

Decision	Option A	Option B	Chosen
Video codec	H.264 only (universal)	Multi-codec (H.264 + VP9 + AV1)	Multi-codec -- AV1 saves 30% bandwidth for modern browsers, H.264 as fallback
View counting	Exact real-time	Approximate real-time + hourly exact reconciliation	Hybrid -- fast approximate for display, exact for monetization reconciliation
Transcoding	Full video sequential	Segment-parallel	Segment-parallel -- reduces processing time from hours to minutes

Interview preparation checklist for Design YouTube with key points to mention and mistakes to avoid — Interview tips for Design YouTube

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Decision guide for when to choose Design YouTube and when alternative approaches are better — When to use Design YouTube

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Tradeoff analysis for Design YouTube listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design YouTube

Deep-Dive: Clarifying Questions for YouTube

What is the upload volume? 500 hours of video are uploaded to YouTube every minute. That is 720,000 hours per day.
What is the watch volume? Users watch over 1 billion hours of video per day across 2 billion logged-in users per month.
How does the transcoding pipeline work? Each uploaded video must be transcoded into multiple resolutions (144p to 8K), multiple codecs (H.264, VP9, AV1), and multiple container formats. A single upload can generate 100+ output files.
How does the recommendation engine work? YouTube's recommendation engine drives 70% of all watch time. It uses deep learning models trained on billions of watch sessions.
Do we need live streaming? Live streams have fundamentally different requirements: low latency (under 5 seconds glass-to-glass), real-time transcoding, and live chat.
Content moderation? YouTube reviews millions of videos per day for policy violations using a combination of ML models and human reviewers.

Specific Functional Requirements

Production deployment examples of Design YouTube at companies like Netflix, Google, and Amazon — Real-world examples of Design YouTube

Video Upload: Upload videos up to 12 hours long and 256 GB in size, with automatic transcoding to multiple formats
Video Playback: Stream video with adaptive bitrate, supporting quality switching from 144p to 8K
Search: Search across billions of videos by title, description, tags, and spoken content (auto-generated captions)
Recommendations: Personalized video suggestions on the homepage and "Up Next" sidebar
Comments and Engagement: Like, dislike, comment, share, save to playlist
Channels and Subscriptions: Subscribe to channels, notification bell for new uploads
Live Streaming: Real-time video broadcasting with live chat

Specific API Endpoints

text

POST /api/v1/videos/upload
  Body: multipart (video file, title, description, tags, thumbnail)
  Response: &#123; "video_id": "abc123", "status": "processing", "processing_eta_minutes": 30 &#125;

GET /api/v1/videos/:video_id/watch
  Response: &#123; "manifest_url": "https://cdn.youtube.com/abc123/manifest.mpd", "metadata": &#123; "title": "...", "views": 1234567, "likes": 45678 &#125;, "recommendations": [...] &#125;

GET /api/v1/feed/home?page_token=xyz
  Response: &#123; "videos": [...], "next_page_token": "..." &#125;

GET /api/v1/search?q=system+design&order=relevance&page_token=xyz
  Response: &#123; "results": [...], "total_results": 500000, "next_page_token": "..." &#125;

POST /api/v1/videos/:video_id/comment
  Body: &#123; "text": "Great video!" &#125;
  Response: &#123; "comment_id": "c123", "text": "...", "created_at": "..." &#125;

Specific Data Model

Videos (Bigtable/Spanner)

Field	Type	Notes
video_id	VARCHAR(11)	Base64-like ID (YouTube's actual format)
channel_id	VARCHAR	Owner channel
title	TEXT	Searchable, multi-language
description	TEXT	Up to 5,000 characters
upload_timestamp	TIMESTAMP
duration_seconds	INT
view_count	BIGINT	Denormalized, updated asynchronously
encoding_status	ENUM	processing, ready, failed
manifest_url	VARCHAR	DASH/HLS manifest location
thumbnail_urls	JSON	Multiple resolution thumbnails

Comparison table for Design YouTube contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design YouTube

Video Chunks (Google Cloud Storage/GFS): Each video split into 2-5 second segments, each segment encoded in multiple qualities. Total storage: estimated at 1 exabyte+.

View Counts (Memcache + async flush): Real-time view counting uses in-memory counters that flush to the database periodically. "301+ views" used to appear because YouTube had an anti-fraud threshold before committing view counts.

Recommendation Features (ML pipeline): User watch history, watch duration ratios (watched 90% = strong signal), click-through rates, co-watch patterns. All fed into deep learning models that run in batch (hourly) and real-time (per-request) pipelines.

Specific Back-of-the-Envelope Numbers

Component diagram for Design YouTube showing each building block and its responsibility — Key components of Design YouTube

Upload pipeline:

500 hours of video uploaded per minute = 30,000 hours/hour
Average video: 10 minutes at 1080p = ~1.5 GB raw
Each video transcoded into ~100 variants (resolutions * codecs)
Transcoding compute: 500 hours/min * 100 variants * 5x real-time processing = 250,000 hours of compute per minute

Storage:

New content: 500 hours/min * 60 * 24 = 720,000 hours/day * ~10 GB/hour (across all variants) = 7.2 PB/day of new encoded video
Total library: estimated at 800M+ videos, 1+ exabyte of storage

Streaming traffic:

1 billion hours watched/day, average bitrate 5 Mbps = 5 billion Gbps-hours/day
Peak concurrent viewers: estimated 100M+ streams simultaneously

CDN requirements:

Google's global CDN (Google Edge) serves YouTube content from 90+ countries
Cache hit ratio for popular videos: 95%+
Long-tail videos (under 100 views) served from origin storage

Sources

Design YouTube -- Reference
Source: System-Design-Overview

Reference

Reference Solutionarticle

Problem Statement

Requirements

Functional

Non-Functional

Core Architecture

Database Choice

Key API Endpoints

Scaling Insight

Key Tradeoffs

Practical Implementation for .NET Developers

Deep-Dive: Clarifying Questions for YouTube

Specific Functional Requirements

Specific API Endpoints

Specific Data Model

Specific Back-of-the-Envelope Numbers

Sources

Reference

Related Topics