Design Spotify
Design Spotify with audio streaming, playlist management, recommendation engine, offline downloads, and a global audio CDN.
Problem Statement
Design a music streaming platform like Spotify that allows users to search and stream songs, create and share playlists, receive personalized recommendations, and download music for offline listening. The system must stream audio with adaptive bitrate to handle varying network conditions and serve 500M+ users globally.
Requirements
Functional
- Search songs by title, artist, album, or genre with sub-200ms results
- Stream audio with adaptive bitrate (96/160/320 kbps) based on network quality
- Create, edit, and share playlists; support collaborative playlists with multiple editors
- Download songs for offline playback with DRM encryption
Non-Functional
- Latency: Song playback starts within 200ms of pressing play (buffer first 5 seconds)
- Availability: 99.99% for streaming -- degraded recommendations acceptable
- Scale: 500M users, 100M DAU, 80M songs, 4B streams/day
- Storage: ~80M songs at average 8 MB = 640 TB of audio files
Core Architecture
-
Audio CDN -- Songs are stored in S3 as the origin, pre-encoded in multiple bitrates (96/160/320 kbps OGG Vorbis). CDN edge nodes (200+ PoPs) cache popular songs. The client prefetches the next song in the queue, and dynamically switches bitrate mid-stream based on buffer health.
-
Playlist Service -- Manages CRUD operations on playlists backed by PostgreSQL. Collaborative playlists use Operational Transformation (OT) for conflict resolution when multiple users edit simultaneously. Playlists are cached in Redis with invalidation on write.
-
Recommendation Engine -- Combines collaborative filtering (users who liked X also liked Y) with content-based features (audio fingerprint analysis, tempo, key, energy). Runs on a Spark cluster nightly to generate personalized daily mixes. Real-time session-based recommendations use a lightweight model at the edge.
-
Offline Download Manager -- Client-side component that downloads encrypted song files (AES-256) with a license key tied to the user's subscription. License keys expire every 30 days, requiring a check-in with the server. Downloaded songs are stored in an encrypted local database.
-
Search Service -- Elasticsearch cluster indexing song metadata (title, artist, album, genre, lyrics). Supports fuzzy matching, autocomplete, and typo tolerance. Index is updated in near real-time as new songs are ingested via a Kafka pipeline.
Database Choice
PostgreSQL for user profiles, playlists, and subscription data -- relational integrity between users, playlists, and songs. Cassandra for the play history and activity feed -- write-heavy (4B events/day), time-series access pattern, partitioned by user_id. Elasticsearch for search. S3 for audio file storage (origin). Redis for session data, now-playing state, and playlist cache.
Key API Endpoints
GET /api/v1/stream/\{song_id\}?bitrate=320
-> Returns: Chunked audio stream (HTTP 206 Partial Content with Range headers)
POST /api/v1/playlists
-> Body: \{ name: "Road Trip", song_ids: ["s1", "s2"], collaborative: true \}
-> Returns: \{ playlist_id: "PL-123", share_url: "..." \}
GET /api/v1/recommendations/daily-mix
-> Returns: \{ mixes: [\{ name: "Daily Mix 1", songs: [...] \}] \}
Scaling Insight
The audio CDN with adaptive bitrate is the critical scaling lever. 80M songs sounds like a lot, but follow the Pareto principle: 1% of songs (800K) account for 80% of streams. These hot songs are cached at every CDN edge. The remaining 79.2M songs are served from regional caches or origin, with the CDN pull-through model filling caches on first access. Prefetching the next song in the queue hides any cache-miss latency from the user.
Key Tradeoffs
| Decision | Option A | Option B | Chosen |
|---|---|---|---|
| Audio codec | MP3 (universal) | OGG Vorbis (better quality/size) | OGG Vorbis -- 20% smaller at same quality, Spotify controls both client and server |
| Recommendations | Real-time only | Batch + real-time hybrid | Hybrid -- batch for deep personalization, real-time for session context |
| Offline DRM | No DRM (trust users) | AES encryption + license | AES + license -- required by music labels, 30-day check-in balances UX and protection |
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Deep-Dive: Clarifying Questions for Spotify
- What is the streaming volume? Spotify has 600M+ users (200M+ paid subscribers) streaming billions of songs per day. Average session is about 30 minutes.
- How does Discover Weekly work? Spotify's most famous feature generates a personalized 30-song playlist every Monday for each user, using collaborative filtering and audio analysis.
- What audio quality levels? Free tier: 160 kbps (Ogg Vorbis), Premium: up to 320 kbps, HiFi: lossless (FLAC, 1,411 kbps). Bandwidth and storage differ by 10x between tiers.
- How does offline playback work? Premium users can download songs for offline listening. How do we handle DRM, storage management, and license expiration?
- What about podcasts? Podcasts have different characteristics: much longer content (30 min to 3 hours), less re-listening, RSS-based ingestion from publishers.
- How does the royalty system work? Every stream must be tracked and attributed to the correct artist/label for royalty payments. This is a financial audit trail requirement.
Specific Functional Requirements
- Music Streaming: Stream songs with adaptive quality based on network conditions and user subscription tier
- Search and Discovery: Search across 100M+ tracks by title, artist, album, genre, and lyrics
- Personalized Playlists: Generate Discover Weekly, Daily Mix, and Release Radar playlists using collaborative and content-based filtering
- User Library: Save songs, albums, and playlists to personal library with offline download for premium users
- Social Features: Follow friends, share songs/playlists, collaborative playlists, see what friends are listening to
- Podcast Playback: Stream and download podcasts with resume position tracking across devices
- Royalty Tracking: Record every stream for accurate royalty distribution to artists and labels
Specific API Endpoints
GET /api/v1/tracks/:track_id/stream
Headers: { "Range": "bytes=0-" }
Response: Audio stream (Ogg Vorbis 160/320 kbps based on subscription)
Note: Uses CDN with token-authenticated URLs
GET /api/v1/browse/discover-weekly
Response: { "playlist": { "id": "dw_123", "tracks": [...30 tracks...], "generated_at": "2025-01-20" } }
GET /api/v1/search?q=bohemian+rhapsody&type=track,artist,album&limit=20
Response: { "tracks": [...], "artists": [...], "albums": [...] }
PUT /api/v1/player/play
Body: { "track_id": "t_abc", "context": "playlist:p_456", "position_ms": 0, "device_id": "d_789" }
Response: { "status": "playing" }
GET /api/v1/me/top-tracks?time_range=medium_term&limit=50
Response: { "items": [...50 tracks ranked by play count...] }
Specific Data Model
Tracks (PostgreSQL/Cassandra)
| Column | Type | Notes |
|---|---|---|
| track_id | VARCHAR | Spotify URI format |
| title | VARCHAR | |
| artist_ids | ARRAY | Can have multiple artists |
| album_id | VARCHAR | |
| duration_ms | INT | |
| audio_features | JSONB | Danceability, energy, tempo, key, valence (0-1 scores) |
| file_references | JSONB | CDN paths for each quality level |
| popularity | INT | 0-100, updated daily |
Play History (Cassandra): Append-only log of every stream, partitioned by user_id.
- (user_id, played_at) -> { track_id, duration_ms, context_type, context_id, shuffle, device }
- Used for: recommendations, Wrapped, royalty calculation
Discover Weekly Pipeline:
- Build user taste profile from play history (weighted by recency and completion rate)
- Find similar users via collaborative filtering (users who listen to similar artists)
- Surface tracks that similar users love but the target user has not heard
- Filter by audio features to match the user's listening patterns (tempo, energy, mood)
- Generate 30-track playlist, cached until next Monday
Audio Feature Extraction: Every uploaded track is analyzed by ML models that extract: tempo (BPM), key, time signature, danceability, energy, speechiness, acousticness, instrumentalness, liveness, and valence (musical positivity).
Specific Back-of-the-Envelope Numbers
Streaming traffic:
- 600M users, 200M daily active, average 30 minutes/day
- Average song: 3.5 minutes = ~8.5 songs per session
- 200M * 8.5 = 1.7 billion song plays/day = ~20,000 streams/second
- Average bitrate: 160 kbps (free) to 320 kbps (premium) = ~200 kbps average
- Bandwidth: 20K streams/sec * 200 kbps = 4 Gbps continuous
Storage:
- Music library: 100M+ tracks * average 7 MB (320 kbps, 3.5 min) = 700 TB (single quality)
- All quality levels: 700 TB * 3 tiers = 2.1 PB for the music catalog
- Play history: 1.7B plays/day * 100 bytes = 170 GB/day = 62 TB/year
Recommendation compute:
- Discover Weekly: generate for 200M users every Monday
- Time window: ~48 hours to generate all playlists (Saturday-Sunday batch processing)
- ~2,500 playlists/second generation rate
- Each playlist requires: user profile computation + collaborative filtering + audio matching