Skip to main content
SDMastery
medium7 min readUpdated 2026-06-03

Design an Analytics Platform

Design an analytics platform with event ingestion, real-time aggregation, dashboards, and ad-hoc queries. Covers Lambda architecture and columnar storage.

Design an Analytics Platform system design overview showing key components and metrics
High-level overview of Design an Analytics Platform

Problem Statement

Design an analytics platform (like Google Analytics or Amplitude) that ingests billions of events per day from web/mobile apps, processes them in real time for live dashboards, stores them for historical ad-hoc queries, and supports custom metrics, funnels, and cohort analysis. Must handle bursty ingestion and provide query results in seconds.

Requirements

Design an Analytics Platform system architecture with service components and data flow
System architecture for Design an Analytics Platform

Functional

  • Ingest events (page_view, click, purchase, custom) with arbitrary properties via a lightweight SDK
  • Real-time dashboards showing active users, event counts, and conversion rates updated every 5 seconds
  • Ad-hoc queries: filter/group events by any property, time range, and user segments
  • Funnel analysis: define multi-step funnels and compute conversion rates

Non-Functional

  • Ingestion: 10B events/day, peak 500K events/second, <1 second end-to-end to dashboard
  • Query latency: Real-time dashboards <2 seconds, ad-hoc queries <10 seconds for 90-day range
  • Retention: Raw events stored for 2 years, aggregates forever
  • Scale: 100K customer apps, each sending millions of events/day

Core Architecture

Step-by-step diagram showing how Design an Analytics Platform works in practice
How Design an Analytics Platform works step by step
  1. Event Ingestion Gateway -- Lightweight HTTP endpoint that accepts event batches from client SDKs. Validates schema, enriches with server timestamp and geo-IP, and publishes to Kafka. The gateway is stateless and horizontally scaled behind a load balancer. Client SDKs batch events locally (every 10 seconds or 20 events) to reduce HTTP overhead.

  2. Stream Processing Layer (Flink) -- Consumes events from Kafka in real time. Computes pre-aggregated metrics: event counts, unique users (HyperLogLog), and active session counts per 5-second window per customer app. Outputs aggregates to Redis for live dashboards and to the raw event store for historical queries.

  3. Columnar Event Store (ClickHouse) -- Stores raw events in a columnar format partitioned by date and customer_id. Columnar storage enables fast aggregation queries (scanning only relevant columns). Compression ratios of 10-20x reduce storage costs. Supports SQL-like queries with sub-second response for most analytical patterns.

Data flow diagram for Design an Analytics Platform showing request and response paths
Data flow through Design an Analytics Platform
  1. Query Engine -- Translates user-defined dashboards and ad-hoc queries into optimized SQL against ClickHouse. For funnel analysis, uses window functions to detect sequential event patterns per user. For real-time metrics, reads pre-aggregated data from Redis. Caches frequently-run dashboard queries with a 5-second TTL.

Database Choice

ClickHouse for raw event storage and ad-hoc queries -- columnar format ideal for analytical workloads, handles 10B rows/day insertion with excellent compression. Redis for real-time pre-aggregated metrics (counters, HyperLogLog sketches for unique users, sorted sets for top events). Kafka as the event bus between ingestion and processing. PostgreSQL for customer accounts, dashboard definitions, and funnel configurations.

Interview tips for Design an Analytics Platform system design questions
Interview tips for Design an Analytics Platform

Key API Endpoints

text
POST /api/v1/events/batch
  -> Body: \{ app_id: "APP-1", events: [\{ name: "page_view", properties: \{ url: "/home" \}, user_id: "U1", timestamp: ... \}] \}

GET /api/v1/analytics/\{app_id\}/query
  -> Body: \{ metric: "unique_users", filters: [\{ property: "country", op: "eq", value: "US" \}], group_by: "device_type", range: "7d" \}
  -> Returns: \{ results: [\{ device_type: "mobile", unique_users: 142000 \}, ...], query_time_ms: 850 \}

POST /api/v1/analytics/\{app_id\}/funnel
  -> Body: \{ steps: ["page_view", "add_to_cart", "purchase"], range: "30d" \}
  -> Returns: \{ funnel: [\{ step: "page_view", users: 100000 \}, \{ step: "add_to_cart", users: 25000, conversion: 0.25 \}, \{ step: "purchase", users: 5000, conversion: 0.20 \}] \}

Scaling Insight

Pre-aggregation at the stream processing layer is the secret to real-time dashboards at scale. Instead of querying 10B raw events for a simple "active users in the last hour" metric, Flink maintains a rolling HyperLogLog sketch per customer app per 5-second window. The dashboard reads these pre-computed sketches from Redis, merging windows as needed. This turns a billion-row scan into a sub-millisecond Redis read, making real-time dashboards feasible regardless of event volume.

Decision guide showing when to use Design an Analytics Platform and when to avoid
When to use Design an Analytics Platform

Key Tradeoffs

DecisionOption AOption BChosen
Event storeRow-oriented (PostgreSQL)Columnar (ClickHouse)Columnar -- 10-100x faster for analytical aggregation queries, 10x better compression
Real-time metricsQuery raw events on every dashboard loadPre-aggregate in stream processorPre-aggregate -- O(1) dashboard reads instead of scanning billions of rows
IngestionSynchronous write to databaseKafka buffer + async writeKafka buffer -- absorbs traffic spikes, decouples ingestion from storage speed

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Pros and cons analysis of Design an Analytics Platform for system design decisions
Advantages and disadvantages of Design an Analytics Platform

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Real-world companies using Design an Analytics Platform in production systems
Real-world examples of Design an Analytics Platform

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Comparison table for Design an Analytics Platform showing key metrics and tradeoffs
Comparing key aspects of Design an Analytics Platform

System-Specific Clarifying Questions

Before designing Analytics Platform, ask questions specific to THIS system:

Key components of Design an Analytics Platform with roles and responsibilities
Key components of Design an Analytics Platform
  1. Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
  2. What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
  3. What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
  4. What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
  5. What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Analytics Platform should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Analytics Platform, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources