medium7 min readUpdated 2026-06-08

Design Reddit

Design Reddit with subreddits, voting (hot/top/controversial algorithms), threaded comment trees, and karma.

Problem Statement

Design a community-driven content platform like Reddit with subreddits, post/comment voting, ranking algorithms (hot, top, new, controversial), nested comment threads, and a karma system. The system must handle viral posts with millions of votes while keeping ranking scores fresh.

Requirements

System architecture diagram for Design Reddit showing how services, databases, and caches connect — System architecture for Design Reddit

Functional

Create subreddits with rules and moderators; subscribe/unsubscribe
Submit posts (text, link, image) to subreddits; upvote/downvote posts and comments
Rank posts by Hot (time-decayed score), Top (by time range), New, and Controversial
Display comment threads as nested trees with collapsible replies

Non-Functional

Latency: Front page and subreddit feed loads in <500ms
Scale: 50M DAU, 100K new posts/day, 10M comments/day, 500M votes/day
Consistency: Vote counts eventually consistent (1-2s delay acceptable); comment tree structure strongly consistent
Availability: 99.95% -- read-heavy system (100:1 read-write ratio)

Core Architecture

Step-by-step diagram showing how Design Reddit processes a request from start to finish — How Design Reddit works step by step

Ranking Engine -- Computes Hot score using Reddit's formula: score = log10(max(|ups - downs|, 1)) + sign(ups - downs) * (post_timestamp - epoch) / 45000. This gives newer posts a time-based boost while heavily-upvoted posts stay visible longer. Scores are recomputed asynchronously on vote events and cached in Redis sorted sets per subreddit.
Vote Processing Pipeline -- Votes are written to Kafka, deduplicated (one vote per user per item), and aggregated. A vote flipping from up to down is a delta of -2. Aggregated counts are flushed to the database every 5 seconds. Redis holds the real-time approximate count for display.
Comment Tree Service -- Comments are stored with a materialized path (e.g., "c1/c2/c5") enabling efficient subtree queries. Each comment has parent_id and a depth field. Tree rendering sorts by score within each depth level. Deep threads (>5 levels) show a "Continue this thread" link and load lazily.

Data flow diagram for Design Reddit showing how requests and responses move through the system — Data flow through Design Reddit

Content Moderation Pipeline -- Combines automated filters (spam detection, banned words, URL blacklist) with human moderator actions. AutoModerator rules are evaluated on post/comment creation. Reported content enters a moderation queue. ML models flag potential policy violations for human review.

Database Choice

PostgreSQL for posts, comments, subreddits, and user profiles -- relational queries for comment trees (materialized path with LIKE prefix queries) and subreddit membership. Redis sorted sets for ranked feeds per subreddit (ZADD with Hot score, ZREVRANGE for feed). Cassandra for vote records -- write-heavy (500M/day), partitioned by item_id, simple lookup pattern. Elasticsearch for post and subreddit search.

Interview preparation checklist for Design Reddit with key points to mention and mistakes to avoid — Interview tips for Design Reddit

Key API Endpoints

text

GET /api/v1/r/\{subreddit\}/\{sort\}?cursor=\{score\}&limit=25
  -> Returns: \{ posts: [\{ id, title, author, score, num_comments, created_at \}], next_cursor \}

POST /api/v1/vote
  -> Body: \{ item_id: "P-123", item_type: "post", direction: "up" \}

GET /api/v1/posts/\{post_id\}/comments?sort=top&depth=5
  -> Returns: \{ comments: [\{ id, body, score, replies: [...recursive...] \}] \}

Scaling Insight

Approximate vote counts are the key to handling Reddit's vote volume. Instead of incrementing a counter in the database for every vote, votes are buffered in Kafka and flushed in batches every 5 seconds. The displayed count is the last flushed value + an in-memory delta from Redis. Users see a near-instant response to their vote (optimistic UI update), while the backend processes votes asynchronously. This reduces database write pressure from 500M individual updates to ~10M batched updates per day.

Decision guide for when to choose Design Reddit and when alternative approaches are better — When to use Design Reddit

Key Tradeoffs

Decision	Option A	Option B	Chosen
Comment storage	Adjacency list (parent_id)	Materialized path ("c1/c2/c5")	Materialized path -- faster subtree queries, single query loads a full thread
Vote counting	Exact real-time count	Approximate batched count	Approximate -- 5s delay invisible to users, 50x reduction in DB writes
Ranking	Precomputed scores in cache	Computed on read	Precomputed -- amortizes calculation cost, sub-ms feed reads from Redis sorted sets

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Tradeoff analysis for Design Reddit listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Reddit

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Production deployment examples of Design Reddit at companies like Netflix, Google, and Amazon — Real-world examples of Design Reddit

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Comparison table for Design Reddit contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Reddit

System-Specific Clarifying Questions

Before designing Reddit, ask questions specific to THIS system:

Component diagram for Design Reddit showing each building block and its responsibility — Key components of Design Reddit

Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Reddit should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Reddit, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources

Design Reddit -- Reference
Source: System-Design-Overview

Reference

Reference Solutionvideo