medium9 min readUpdated 2026-06-08

Design Instagram

Design Instagram with photo upload pipeline, fan-out news feed, stories, and explore page. Covers image processing, CDN delivery, and feed ranking.

Problem Statement

Design a photo-sharing social network like Instagram. Users upload photos, follow other users, see a personalized feed of posts from people they follow, share temporary stories that expire in 24 hours, and discover content via an Explore page. The system handles 2B+ monthly users with 100M+ photos uploaded daily.

Requirements

Functional

Upload photos with captions, filters, and location tags
Generate a personalized feed ranked by relevance from followed accounts
Stories: ephemeral posts that auto-delete after 24 hours
Explore page: surface trending and personalized content from non-followed accounts

System architecture diagram for Design Instagram showing how services, databases, and caches connect — System architecture for Design Instagram

Non-Functional

Latency: Feed loads in <500ms, photo upload confirmation in <3 seconds
Availability: 99.99% for feed reads, eventual consistency acceptable (1-2 second delay for new posts)
Scale: 2B monthly users, 500M DAU, 100M photos/day, 1M likes/second

Core Architecture

Photo Upload Pipeline -- Client uploads image to a pre-signed S3 URL. An async worker (triggered via SQS) generates multiple resolutions (thumbnail 150px, medium 640px, full 1080px), applies filters server-side if requested, strips EXIF data, and stores all variants in S3 with CDN distribution.
Feed Generation Service -- Hybrid push/pull model. For users with <500 followers (99% of users), new posts are fan-out-on-write: pushed to each follower's precomputed feed in Redis. For celebrity accounts (>500K followers), feed is fan-out-on-read: the follower's feed is assembled at read time by merging their precomputed feed with recent celebrity posts.

Step-by-step diagram showing how Design Instagram processes a request from start to finish — How Design Instagram works step by step

Stories Service -- Stories are stored with a TTL of 24 hours in a dedicated Cassandra table partitioned by user_id. A separate ring buffer per user holds story IDs. The stories tray (top of the feed) is precomputed for each user, ordered by recency and engagement probability.
Explore/Discovery Engine -- Kafka streams feed engagement events (likes, saves, shares) into a real-time trending detection pipeline. An embedding-based model matches user interests to content, combining collaborative filtering with image understanding (ResNet features). Candidate generation produces 10K posts, then a ranking model selects the top 50.

Database Choice

PostgreSQL for users, relationships (follows), and post metadata -- strong relational queries for follower graphs. Cassandra for the feed timeline and stories -- write-heavy, time-sorted, partitioned by user_id. S3 + CloudFront CDN for photo storage and delivery. Redis for precomputed feeds (sorted sets by timestamp) and user session cache. Elasticsearch for hashtag and user search.

Comparison table for Design Instagram contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Instagram

Key API Endpoints

text

POST /api/v1/posts
  -> Body (multipart): \{ image: <file>, caption: "Sunset", location: "Bali", tags: ["travel"] \}
  -> Returns: \{ post_id: "P-456", image_urls: \{ thumb: "...", full: "..." \} \}

GET /api/v1/feed?cursor=\{timestamp\}&limit=20
  -> Returns: \{ posts: [...], next_cursor: "..." \}

GET /api/v1/stories/tray
  -> Returns: \{ trays: [\{ user_id: "U1", stories: [\{ id: "S1", media_url: "...", expires_at: "..." \}] \}] \}

Scaling Insight

The hybrid fan-out model is essential. Pure fan-out-on-write breaks for celebrities -- a single post by someone with 100M followers would generate 100M Redis writes. Pure fan-out-on-read is too slow for normal users. The hybrid approach: fan-out-on-write for 99% of users (fast reads, manageable writes), fan-out-on-read only for the top 0.1% celebrity accounts (merged at read time from a small set of celebrity timelines).

Key Tradeoffs

Decision	Option A	Option B	Chosen
Feed model	Fan-out-on-write (push)	Fan-out-on-read (pull)	Hybrid -- push for normal users, pull for celebrities
Image storage	Store original only, resize on-the-fly	Pre-generate all sizes	Pre-generate -- amortizes CPU cost, CDN-friendly, predictable latency
Feed ranking	Chronological	ML-ranked by engagement	ML-ranked -- higher engagement, but offer chronological toggle for transparency

Interview preparation checklist for Design Instagram with key points to mention and mistakes to avoid — Interview tips for Design Instagram

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Decision guide for when to choose Design Instagram and when alternative approaches are better — When to use Design Instagram

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Tradeoff analysis for Design Instagram listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Instagram

Deep-Dive: Clarifying Questions for Instagram

What is the posting volume? Instagram users share approximately 95 million photos and videos per day. With 2B+ monthly active users, the system handles massive read-heavy traffic.
How does the feed algorithm work? Instagram switched from chronological to algorithmic feed in 2016. The ranking considers: relationship (how often you interact with the poster), interest (predicted engagement based on content), timeliness, and session frequency.
What media processing is needed? Every uploaded photo needs: multiple resolution variants (thumbnail, standard, full), format conversion (HEIC to JPEG), face detection for tagging, content moderation (NSFW detection), and EXIF data stripping for privacy.
How does Explore work? The Explore page recommends content from accounts you do not follow, using collaborative filtering on engagement signals. It serves billions of impressions per day.
Do we need Stories and Reels? Stories (24-hour ephemeral content) and Reels (short-form video) have different storage, delivery, and ranking requirements than feed posts.
What about the social graph? The follow graph (who follows whom) is the foundation of feed generation and notifications.

Specific Functional Requirements

Production deployment examples of Design Instagram at companies like Netflix, Google, and Amazon — Real-world examples of Design Instagram

Photo/Video Upload: Upload images (up to 10 per post) and videos with filters, captions, hashtags, and location tags
Feed Generation: Algorithmic feed showing posts from followed accounts, ranked by predicted engagement
Explore Page: Discover content from accounts you do not follow, personalized by viewing and engagement history
Stories: Ephemeral photo/video content that disappears after 24 hours, with viewer lists
Direct Messages: Private messaging with text, images, and shared posts
Like, Comment, Share: Engagement features with real-time count updates
Search: Search by username, hashtag, and location

Specific API Endpoints

text

POST /api/v1/media/upload
  Body: multipart form (image files, caption, hashtags, location)
  Response: &#123; "media_id": "abc123", "status": "processing", "url": "..." &#125;

GET /api/v1/feed?cursor=xyz&limit=20
  Response: &#123; "items": [&#123; "media_id": "...", "user": &#123;...&#125;, "images": &#123; "thumbnail": "...", "standard": "...", "full": "..." &#125;, "likes_count": 1234, "caption": "..." &#125;], "next_cursor": "..." &#125;

GET /api/v1/explore?cursor=xyz&limit=30
  Response: &#123; "items": [...], "next_cursor": "..." &#125;

POST /api/v1/media/:media_id/like
  Response: &#123; "liked": true, "likes_count": 1235 &#125;

GET /api/v1/users/:user_id/stories
  Response: &#123; "stories": [&#123; "id": "...", "media_url": "...", "created_at": "...", "expires_at": "...", "viewers_count": 456 &#125;] &#125;

Specific Data Model

Media (PostgreSQL, sharded by user_id)

Column	Type	Notes
media_id	BIGINT	Instagram Snowflake-style ID (timestamp + shard + sequence)
user_id	BIGINT	Shard key
media_type	ENUM	photo, video, carousel, reel, story
caption	TEXT	Up to 2,200 characters
location_id	BIGINT	Nullable
s3_key	VARCHAR	Path to original in S3
image_versions	JSON	URLs for thumbnail, standard, full resolution
created_at	TIMESTAMP

Data flow diagram for Design Instagram showing how requests and responses move through the system — Data flow through Design Instagram

Feed Cache (Redis): user_id -> sorted set of media_ids with ranking scores. Pre-computed by feed ranking pipeline. Each user's feed cache holds ~500 ranked post IDs.

Social Graph (Cassandra/TAO): Optimized for "get followers of X" and "does A follow B" queries.

Following: (user_id, followed_id, created_at)
Followers: (user_id, follower_id, created_at)

Image Storage (S3): Each upload generates 4-5 variants stored in S3. Total photo storage: 95M uploads/day * 5 variants * 500KB average = ~240 TB/day of new image data.

Specific Back-of-the-Envelope Numbers

Component diagram for Design Instagram showing each building block and its responsibility — Key components of Design Instagram

Traffic:

2B+ MAU, 500M+ DAU
95M photos/videos uploaded per day = ~1,100 uploads/second
Feed reads: 500M users * 15 feed loads/day = 7.5B feed reads/day = ~87,000 reads/second
Like actions: ~4 billion likes/day = ~46,000 likes/second

Storage:

Photo storage: 95M * 5 variants * 500KB = 240 TB/day = 87 PB/year
Database (metadata): each post ~1KB * 95M/day = 95 GB/day = 34 TB/year
Feed cache (Redis): 500M users * 500 post IDs * 8 bytes = 2 TB

Image processing pipeline:

1,100 uploads/second, each generating 5 resized variants + content moderation
Processing time per image: ~2 seconds
Need ~5,500 concurrent image processing workers

CDN:

Average user views ~50 images per session * 200KB per image = 10MB per session
500M sessions/day * 10MB = 5 PB/day of CDN bandwidth

Sources

Design Instagram -- Reference
Source: System-Design-Overview

Reference

Reference Solutionarticle

Problem Statement

Requirements

Functional

Non-Functional

Core Architecture

Database Choice

Key API Endpoints

Scaling Insight

Key Tradeoffs

Practical Implementation for .NET Developers

Deep-Dive: Clarifying Questions for Instagram

Specific Functional Requirements

Specific API Endpoints

Specific Data Model

Specific Back-of-the-Envelope Numbers

Sources

Reference

Related Topics