Design Netflix
System design interview solution for Design Netflix. Includes requirements, API design, data model, architecture, scaling strategy, and tradeoffs.
Problem Statement
Design a system similar to Netflix. The system should handle millions of users and provide a reliable, scalable experience.
Step 1: Clarifying Questions
Before diving into the design, ask these clarifying questions:
- What is the expected scale (users, requests per second)?
- What are the most critical features to support?
- What are the latency requirements?
- Do we need to support real-time features?
- What consistency guarantees are needed?
Step 2: Functional Requirements
- Core feature set for Netflix
- User-facing APIs and interactions
- Data storage and retrieval
- Search and discovery (if applicable)
- Notifications (if applicable)
Step 3: Non-Functional Requirements
- Scalability: Handle millions of concurrent users
- Availability: 99.99% uptime (four nines)
- Latency: Sub-200ms for read operations
- Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
- Durability: No data loss
Step 4: Back-of-the-Envelope Estimation
| Metric | Estimate |
|---|---|
| Daily Active Users | 10M |
| Read:Write Ratio | 10:1 |
| Average Request Size | 1 KB |
| Storage per year | ~10 TB |
| Peak QPS | 100K |
Step 5: API Design
POST /api/v1/resource
GET /api/v1/resource/{id}
PUT /api/v1/resource/{id}
DELETE /api/v1/resource/{id}
Step 6: Data Model
Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.
Step 7: High-Level Architecture
The system consists of these major components:
- Client Layer — Web/mobile clients
- API Gateway — Rate limiting, authentication, routing
- Application Servers — Business logic
- Database Layer — Primary storage
- Cache Layer — Redis/Memcached for hot data
- Message Queue — Async processing
Step 8: Detailed Component Design
Write Path
How data flows from client to persistent storage.
Read Path
How data is retrieved, including cache interactions.
Step 9: Scaling Strategy
- Horizontal scaling of application servers behind a load balancer
- Database sharding by user ID or geographic region
- Read replicas for read-heavy workloads
- CDN for static content delivery
- Auto-scaling based on traffic patterns
Step 10: Reliability and Fault Tolerance
- Data replication across availability zones
- Circuit breakers for dependent services
- Graceful degradation under high load
- Health checks and automated failover
Step 11: Monitoring and Observability
- Request latency (p50, p95, p99)
- Error rates by endpoint
- Database query performance
- Cache hit/miss ratios
- Queue depth and processing lag
Key Tradeoffs
| Decision | Option A | Option B | Chosen |
|---|---|---|---|
| Database | SQL | NoSQL | Depends on access patterns |
| Consistency | Strong | Eventual | Eventual for most reads |
| Communication | Sync | Async | Async for non-critical paths |
How to Present This in an Interview
- Start with clarifying questions (2 min)
- Define requirements (3 min)
- Do estimation (2 min)
- Design API and data model (5 min)
- Draw high-level architecture (10 min)
- Deep dive into critical components (10 min)
- Discuss tradeoffs and bottlenecks (5 min)
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Deep-Dive: Clarifying Questions for Netflix
- What is the streaming volume? Netflix serves 200M+ subscribers watching 1 billion hours of video per week. During peak hours (8-10 PM local time), Netflix accounts for 15% of all downstream internet traffic in North America.
- How does content delivery work? Netflix uses its own CDN (Open Connect) with servers placed inside ISP data centers. Do we design our own CDN or use a third-party?
- What video quality levels? Netflix encodes each title in 10+ quality levels (240p to 4K HDR) across multiple codecs (H.264, VP9, AV1). Adaptive bitrate streaming switches quality based on bandwidth.
- How important is the recommendation engine? 80% of content watched on Netflix comes from recommendations, not search. The recommendation system is arguably the most important technical component.
- Do we need to handle global distribution? Netflix operates in 190+ countries with content licensing that varies by region. Some content is available in the US but not in Europe.
- What about the video transcoding pipeline? When Netflix acquires a new title, it must be transcoded into 1,000+ versions (quality * codec * resolution combinations). How do we handle this at scale?
Specific Functional Requirements
- Video Streaming: Stream video content with adaptive bitrate switching based on network conditions — quality changes should be invisible to the viewer
- Content Catalog: Browse and search a catalog of 15,000+ titles with metadata in 30+ languages
- Personalized Recommendations: Show each user a unique homepage ranked by predicted viewing interest using collaborative and content-based filtering
- User Profiles: Support up to 5 profiles per account with independent viewing histories and recommendations
- Continue Watching: Track playback position to the second so users can resume from any device
- Content Delivery: Serve video from edge servers (CDN) close to the viewer for minimal buffering
- Video Transcoding: Process new content into multiple quality levels and codecs within hours of acquisition
Specific API Endpoints
GET /api/v1/catalog/browse?profile_id=abc&genre=action&page=1
Response: { "rows": [{ "title": "Trending Now", "items": [...] }, { "title": "Because You Watched...", "items": [...] }] }
GET /api/v1/playback/start?title_id=12345&profile_id=abc
Response: { "manifest_url": "https://cdn.example.com/12345/manifest.mpd", "resume_position": 1847, "subtitles": [...], "audio_tracks": [...] }
POST /api/v1/playback/heartbeat
Body: { "title_id": 12345, "position": 1920, "quality": "1080p", "buffer_health": 30 }
(Sent every 10 seconds during playback)
GET /api/v1/search?q=stranger+things&profile_id=abc
Response: { "results": [...], "suggestions": ["Stranger Things", "Strange Planet"] }
POST /api/v1/ratings
Body: { "title_id": 12345, "profile_id": "abc", "rating": "thumbs_up" }
Specific Data Model
Content Metadata (PostgreSQL/Cassandra)
| Field | Type | Notes |
|---|---|---|
| title_id | BIGINT | Primary key |
| title | VARCHAR | Localized per region |
| type | ENUM | movie, series, episode |
| genres | ARRAY | ["action", "sci-fi"] |
| maturity_rating | VARCHAR | PG, PG-13, R, etc. |
| duration_seconds | INT | Total runtime |
| available_regions | ARRAY | ["US", "UK", "DE"] |
| encoding_profiles | JSON | Available quality levels and codecs |
Viewing History (Cassandra): Partitioned by user_id for fast "resume watching" queries.
- (user_id, title_id) -> { position_seconds, last_watched, completed, device }
Recommendation Model Outputs (Redis/Cassandra): Pre-computed ranked lists per user.
- user_id -> [title_id_1, title_id_2, ...] with scores, refreshed every few hours by ML batch pipeline
CDN Manifest (Edge Cache): DASH/HLS manifests that tell the player where to fetch each 4-second video chunk at each quality level. Cached at edge servers with short TTL (5 min) so quality adaptation works in real-time.
Specific Back-of-the-Envelope Numbers
Traffic:
- 200M subscribers, ~100M concurrent streams during global peak
- Each stream: one manifest request + one chunk request every 4 seconds = 25M chunk requests/second at peak
- Recommendation page loads: 200M sessions/day * 10 page loads = 2B recommendation requests/day
Storage:
- Content library: 15,000 titles * 1,000 encoding profiles * average 1 GB per profile = 15 PB of encoded video
- Viewing history: 200M users * average 500 titles watched * 50 bytes = 5 TB
- Recommendation data: 200M users * 2 KB pre-computed rankings = 400 GB (fits in Redis cluster)
Bandwidth:
- Average stream: 5 Mbps (1080p) = 625 KB/second
- 100M concurrent streams * 625 KB/s = 62.5 TB/second = 500 Tbps (this is why Netflix built their own CDN)
- Netflix's Open Connect serves 95%+ of this from ISP-embedded servers
Video transcoding:
- New title: 2 hours of 4K source = ~100 GB raw
- Encode into 1,000+ profiles: ~100 hours of compute time
- Use a massive parallel encoding farm (AWS EC2 spot instances or dedicated encoding hardware)