Design Twitter
System design interview solution for Design Twitter. Includes requirements, API design, data model, architecture, scaling strategy, and tradeoffs.
Problem Statement
Design a system similar to Twitter. The system should handle millions of users and provide a reliable, scalable experience.
Step 1: Clarifying Questions
Before diving into the design, ask these clarifying questions:
- What is the expected scale (users, requests per second)?
- What are the most critical features to support?
- What are the latency requirements?
- Do we need to support real-time features?
- What consistency guarantees are needed?
Step 2: Functional Requirements
- Core feature set for Twitter
- User-facing APIs and interactions
- Data storage and retrieval
- Search and discovery (if applicable)
- Notifications (if applicable)
Step 3: Non-Functional Requirements
- Scalability: Handle millions of concurrent users
- Availability: 99.99% uptime (four nines)
- Latency: Sub-200ms for read operations
- Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
- Durability: No data loss
Step 4: Back-of-the-Envelope Estimation
| Metric | Estimate |
|---|---|
| Daily Active Users | 10M |
| Read:Write Ratio | 10:1 |
| Average Request Size | 1 KB |
| Storage per year | ~10 TB |
| Peak QPS | 100K |
Step 5: API Design
POST /api/v1/resource
GET /api/v1/resource/{id}
PUT /api/v1/resource/{id}
DELETE /api/v1/resource/{id}
Step 6: Data Model
Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.
Step 7: High-Level Architecture
The system consists of these major components:
- Client Layer — Web/mobile clients
- API Gateway — Rate limiting, authentication, routing
- Application Servers — Business logic
- Database Layer — Primary storage
- Cache Layer — Redis/Memcached for hot data
- Message Queue — Async processing
Step 8: Detailed Component Design
Write Path
How data flows from client to persistent storage.
Read Path
How data is retrieved, including cache interactions.
Step 9: Scaling Strategy
- Horizontal scaling of application servers behind a load balancer
- Database sharding by user ID or geographic region
- Read replicas for read-heavy workloads
- CDN for static content delivery
- Auto-scaling based on traffic patterns
Step 10: Reliability and Fault Tolerance
- Data replication across availability zones
- Circuit breakers for dependent services
- Graceful degradation under high load
- Health checks and automated failover
Step 11: Monitoring and Observability
- Request latency (p50, p95, p99)
- Error rates by endpoint
- Database query performance
- Cache hit/miss ratios
- Queue depth and processing lag
Key Tradeoffs
| Decision | Option A | Option B | Chosen |
|---|---|---|---|
| Database | SQL | NoSQL | Depends on access patterns |
| Consistency | Strong | Eventual | Eventual for most reads |
| Communication | Sync | Async | Async for non-critical paths |
How to Present This in an Interview
- Start with clarifying questions (2 min)
- Define requirements (3 min)
- Do estimation (2 min)
- Design API and data model (5 min)
- Draw high-level architecture (10 min)
- Deep dive into critical components (10 min)
- Discuss tradeoffs and bottlenecks (5 min)
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Deep-Dive: Clarifying Questions for Twitter
- What is the tweet volume? Twitter processes approximately 500 million tweets per day (about 6,000 tweets/second average, spiking to 20,000+ during major events).
- How should the home timeline be generated? This is the core design question. Fan-out on write (pre-compute timelines) vs. fan-out on read (compute on demand)? The celebrity problem (users with 50M+ followers) makes this critical.
- Do we need real-time delivery? Should tweets appear in followers' timelines within seconds or is a delay of 30-60 seconds acceptable?
- What media types do we support? Text only (140-280 chars), images (up to 4 per tweet), videos (up to 2:20), GIFs, polls? Media significantly changes storage and bandwidth requirements.
- Do we need search? Real-time search of all tweets is a massive infrastructure challenge requiring an inverted index updated in real-time.
- What about the follow graph? The average user follows 400 accounts, but some follow 5,000+. The follow graph is the foundation of timeline generation.
Specific Functional Requirements
- Post Tweet: Users can post text tweets (280 chars), optionally with images (up to 4), videos, or polls
- Home Timeline: Display a feed of tweets from accounts the user follows, ordered by relevance/recency
- Follow/Unfollow: Users can follow other users, with follow counts displayed on profiles
- Like, Retweet, Reply: Standard engagement actions with real-time count updates
- Search: Real-time search across all public tweets with relevance ranking
- Notifications: Real-time notifications for mentions, likes, retweets, and new followers
- Trending Topics: Identify and display currently trending hashtags and topics
Specific API Endpoints
POST /api/v1/tweets
Body: { "text": "Hello world!", "media_ids": ["abc123"], "reply_to": null }
Response: { "id": "1234567890", "text": "...", "created_at": "...", "user": {...} }
GET /api/v1/timeline/home?cursor=abc&limit=20
Response: { "tweets": [...], "next_cursor": "def" }
POST /api/v1/users/:user_id/follow
Response: { "following": true, "follower_count": 1234 }
GET /api/v1/search?q=system+design&type=recent&cursor=abc
Response: { "tweets": [...], "next_cursor": "def" }
GET /api/v1/trends?location=US
Response: { "trends": [{ "name": "#SystemDesign", "tweet_count": 50000 }] }
Specific Data Model
Users: id (BIGINT), username (VARCHAR), display_name, bio, follower_count, following_count, created_at
Tweets: id (BIGINT, Snowflake ID), user_id, text (VARCHAR 280), media_urls (JSON), reply_to_id, retweet_of_id, like_count, retweet_count, created_at
Follows: follower_id (BIGINT), followee_id (BIGINT), created_at — Sharded by followee_id for efficient "get all followers" queries
Timeline Cache (Redis): Key = user_id, Value = sorted set of tweet_ids with timestamp scores. Each user's timeline cache holds the most recent 800 tweet IDs. This is the pre-computed timeline for fan-out-on-write.
The Celebrity Problem: For users with more than 10,000 followers, do NOT fan out on write. Instead, merge their tweets into the timeline at read time. This hybrid approach (fan-out-on-write for normal users, fan-out-on-read for celebrities) is what Twitter actually uses.
Specific Back-of-the-Envelope Numbers
Traffic:
- 300M daily active users (DAU), 500M tweets/day
- Average user checks timeline 10 times/day = 3B timeline reads/day = ~35,000 reads/second
- Tweet writes: 500M/day = ~6,000 writes/second
- Average user has 400 followers: each tweet fans out to 400 timeline caches
Fan-out calculation:
- 500M tweets * 400 avg followers = 200 billion timeline cache updates/day
- Celebrity optimization: exclude users with 10K+ followers from fan-out, reducing this by ~30%
Storage:
- Tweet text: 280 bytes avg * 500M/day * 365 = ~50 TB/year for text alone
- Media: 1MB avg image * 100M images/day * 365 = ~36 PB/year for media
- Timeline cache (Redis): 300M users * 800 tweet IDs * 8 bytes = ~1.9 TB of Redis
Latency targets:
- Timeline load: under 200ms (pre-computed timeline from Redis)
- Tweet posting: under 500ms (write tweet, start async fan-out)
- Search: under 300ms (inverted index lookup)