Skip to main content
SDMastery
medium7 min readUpdated 2026-06-03

Design Reddit

Design Reddit with subreddits, voting (hot/top/controversial algorithms), threaded comment trees, and karma.

Design Reddit system design overview showing key components and metrics
High-level overview of Design Reddit

Problem Statement

Design a community-driven content platform like Reddit with subreddits, post/comment voting, ranking algorithms (hot, top, new, controversial), nested comment threads, and a karma system. The system must handle viral posts with millions of votes while keeping ranking scores fresh.

Requirements

Design Reddit system architecture with service components and data flow
System architecture for Design Reddit

Functional

  • Create subreddits with rules and moderators; subscribe/unsubscribe
  • Submit posts (text, link, image) to subreddits; upvote/downvote posts and comments
  • Rank posts by Hot (time-decayed score), Top (by time range), New, and Controversial
  • Display comment threads as nested trees with collapsible replies

Non-Functional

  • Latency: Front page and subreddit feed loads in <500ms
  • Scale: 50M DAU, 100K new posts/day, 10M comments/day, 500M votes/day
  • Consistency: Vote counts eventually consistent (1-2s delay acceptable); comment tree structure strongly consistent
  • Availability: 99.95% -- read-heavy system (100:1 read-write ratio)

Core Architecture

Step-by-step diagram showing how Design Reddit works in practice
How Design Reddit works step by step
  1. Ranking Engine -- Computes Hot score using Reddit's formula: score = log10(max(|ups - downs|, 1)) + sign(ups - downs) * (post_timestamp - epoch) / 45000. This gives newer posts a time-based boost while heavily-upvoted posts stay visible longer. Scores are recomputed asynchronously on vote events and cached in Redis sorted sets per subreddit.

  2. Vote Processing Pipeline -- Votes are written to Kafka, deduplicated (one vote per user per item), and aggregated. A vote flipping from up to down is a delta of -2. Aggregated counts are flushed to the database every 5 seconds. Redis holds the real-time approximate count for display.

  3. Comment Tree Service -- Comments are stored with a materialized path (e.g., "c1/c2/c5") enabling efficient subtree queries. Each comment has parent_id and a depth field. Tree rendering sorts by score within each depth level. Deep threads (>5 levels) show a "Continue this thread" link and load lazily.

Data flow diagram for Design Reddit showing request and response paths
Data flow through Design Reddit
  1. Content Moderation Pipeline -- Combines automated filters (spam detection, banned words, URL blacklist) with human moderator actions. AutoModerator rules are evaluated on post/comment creation. Reported content enters a moderation queue. ML models flag potential policy violations for human review.

Database Choice

PostgreSQL for posts, comments, subreddits, and user profiles -- relational queries for comment trees (materialized path with LIKE prefix queries) and subreddit membership. Redis sorted sets for ranked feeds per subreddit (ZADD with Hot score, ZREVRANGE for feed). Cassandra for vote records -- write-heavy (500M/day), partitioned by item_id, simple lookup pattern. Elasticsearch for post and subreddit search.

Interview tips for Design Reddit system design questions
Interview tips for Design Reddit

Key API Endpoints

text
GET /api/v1/r/\{subreddit\}/\{sort\}?cursor=\{score\}&limit=25
  -> Returns: \{ posts: [\{ id, title, author, score, num_comments, created_at \}], next_cursor \}

POST /api/v1/vote
  -> Body: \{ item_id: "P-123", item_type: "post", direction: "up" \}

GET /api/v1/posts/\{post_id\}/comments?sort=top&depth=5
  -> Returns: \{ comments: [\{ id, body, score, replies: [...recursive...] \}] \}

Scaling Insight

Approximate vote counts are the key to handling Reddit's vote volume. Instead of incrementing a counter in the database for every vote, votes are buffered in Kafka and flushed in batches every 5 seconds. The displayed count is the last flushed value + an in-memory delta from Redis. Users see a near-instant response to their vote (optimistic UI update), while the backend processes votes asynchronously. This reduces database write pressure from 500M individual updates to ~10M batched updates per day.

Decision guide showing when to use Design Reddit and when to avoid
When to use Design Reddit

Key Tradeoffs

DecisionOption AOption BChosen
Comment storageAdjacency list (parent_id)Materialized path ("c1/c2/c5")Materialized path -- faster subtree queries, single query loads a full thread
Vote countingExact real-time countApproximate batched countApproximate -- 5s delay invisible to users, 50x reduction in DB writes
RankingPrecomputed scores in cacheComputed on readPrecomputed -- amortizes calculation cost, sub-ms feed reads from Redis sorted sets

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Pros and cons analysis of Design Reddit for system design decisions
Advantages and disadvantages of Design Reddit

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Real-world companies using Design Reddit in production systems
Real-world examples of Design Reddit

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Comparison table for Design Reddit showing key metrics and tradeoffs
Comparing key aspects of Design Reddit

System-Specific Clarifying Questions

Before designing Reddit, ask questions specific to THIS system:

Key components of Design Reddit with roles and responsibilities
Key components of Design Reddit
  1. Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
  2. What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
  3. What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
  4. What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
  5. What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Reddit should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Reddit, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources