medium9 min readUpdated 2026-06-08

Design Notification Service

System design interview solution for Design Notification Service. Includes requirements, API design, data model, architecture, scaling strategy, and.

Problem Statement

Design a system similar to Notification Service. The system should handle millions of users and provide a reliable, scalable experience.

Step 1: Clarifying Questions

Before diving into the design, ask these clarifying questions:

What is the expected scale (users, requests per second)?
What are the most critical features to support?
What are the latency requirements?
Do we need to support real-time features?
What consistency guarantees are needed?

Step 2: Functional Requirements

System architecture diagram for Design Notification Service showing how services, databases, and caches connect — System architecture for Design Notification Service

Core feature set for Notification Service
User-facing APIs and interactions
Data storage and retrieval
Search and discovery (if applicable)
Notifications (if applicable)

Step 3: Non-Functional Requirements

Scalability: Handle millions of concurrent users
Availability: 99.99% uptime (four nines)
Latency: Sub-200ms for read operations
Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
Durability: No data loss

Step 4: Back-of-the-Envelope Estimation

Metric	Estimate
Daily Active Users	10M
Read:Write Ratio	10:1
Average Request Size	1 KB
Storage per year	~10 TB
Peak QPS	100K

Step 5: API Design

text

POST /api/v1/resource
GET  /api/v1/resource/{id}
PUT  /api/v1/resource/{id}
DELETE /api/v1/resource/{id}

Step-by-step diagram showing how Design Notification Service processes a request from start to finish — How Design Notification Service works step by step

Step 6: Data Model

Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.

Step 7: High-Level Architecture

The system consists of these major components:

Client Layer — Web/mobile clients
API Gateway — Rate limiting, authentication, routing
Application Servers — Business logic
Database Layer — Primary storage
Cache Layer — Redis/Memcached for hot data
Message Queue — Async processing

Step 8: Detailed Component Design

Data flow diagram for Design Notification Service showing how requests and responses move through the system — Data flow through Design Notification Service

Write Path

How data flows from client to persistent storage.

Read Path

How data is retrieved, including cache interactions.

Step 9: Scaling Strategy

Horizontal scaling of application servers behind a load balancer
Database sharding by user ID or geographic region
Read replicas for read-heavy workloads
CDN for static content delivery
Auto-scaling based on traffic patterns

Step 10: Reliability and Fault Tolerance

Interview preparation checklist for Design Notification Service with key points to mention and mistakes to avoid — Interview tips for Design Notification Service

Data replication across availability zones
Circuit breakers for dependent services
Graceful degradation under high load
Health checks and automated failover

Step 11: Monitoring and Observability

Request latency (p50, p95, p99)
Error rates by endpoint
Database query performance
Cache hit/miss ratios
Queue depth and processing lag

Key Tradeoffs

Decision	Option A	Option B	Chosen
Database	SQL	NoSQL	Depends on access patterns
Consistency	Strong	Eventual	Eventual for most reads
Communication	Sync	Async	Async for non-critical paths

How to Present This in an Interview

Decision guide for when to choose Design Notification Service and when alternative approaches are better — When to use Design Notification Service

Start with clarifying questions (2 min)
Define requirements (3 min)
Do estimation (2 min)
Design API and data model (5 min)
Draw high-level architecture (10 min)
Deep dive into critical components (10 min)
Discuss tradeoffs and bottlenecks (5 min)

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Tradeoff analysis for Design Notification Service listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Notification Service

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Production deployment examples of Design Notification Service at companies like Netflix, Google, and Amazon — Real-world examples of Design Notification Service

Deep-Dive: Clarifying Questions for Notification Service

What notification channels? Push notifications (iOS APNs, Android FCM), email (SMTP), SMS (Twilio/SNS), in-app notifications, and webhooks. Each channel has different delivery guarantees and costs.
What is the volume? A large platform sends 1-10 billion notifications per day. Most are push notifications; email and SMS are an order of magnitude less.
Do we need priority levels? A security alert (password changed) is high priority and must be delivered immediately. A "someone liked your photo" notification can be batched and delayed.
Do we need rate limiting per user? Users should not receive more than N notifications per hour to avoid notification fatigue and uninstalls.
How do we handle user preferences? Users should be able to opt out of specific notification types per channel (e.g., receive likes via push but not email).
Do we need notification templates? Localized templates with variable substitution for consistent messaging across channels.

Specific Functional Requirements

Multi-Channel Delivery: Send notifications via push (iOS/Android), email, SMS, in-app, and webhooks
Priority Queuing: High-priority notifications (security alerts, OTPs) are delivered immediately; low-priority ones can be batched
User Preferences: Per-user settings for which notification types they receive on which channels
Rate Limiting: Limit notifications per user per time window to prevent notification fatigue
Template Engine: Localized notification templates with variable substitution
Delivery Tracking: Track sent, delivered, opened, and clicked status per notification
Retry and Failover: Retry failed deliveries with exponential backoff; fall back to alternate channels

Specific API Endpoints

text

POST /api/v1/notifications/send
  Body: &#123;
    "user_id": "user_123",
    "type": "like_photo",
    "priority": "low",
    "data": &#123; "liker_name": "Alice", "photo_id": "p456" &#125;,
    "channels": ["push", "in_app"]
  &#125;
  Response: &#123; "notification_id": "n789", "status": "queued" &#125;

POST /api/v1/notifications/send-bulk
  Body: &#123; "user_ids": ["user_1", "user_2", ...], "type": "new_feature", "data": &#123;...&#125; &#125;
  Response: &#123; "batch_id": "batch_abc", "queued_count": 50000 &#125;

GET /api/v1/users/:user_id/notifications?unread=true&limit=20
  Response: &#123; "notifications": [...], "unread_count": 5 &#125;

PUT /api/v1/users/:user_id/notification-preferences
  Body: &#123; "like_photo": &#123; "push": true, "email": false, "sms": false &#125;, "security_alert": &#123; "push": true, "email": true, "sms": true &#125; &#125;

Comparison table for Design Notification Service contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Notification Service

Specific Data Model

Notification Queue (Kafka): Topics partitioned by priority (high, medium, low). High-priority topic has more partitions and dedicated consumers for faster processing.

Notification Log (Cassandra)

Column	Type	Notes
user_id	BIGINT	Partition key
notification_id	TIMEUUID	Clustering key
type	VARCHAR	like_photo, comment, follow, security_alert
channel	VARCHAR	push, email, sms, in_app
status	VARCHAR	queued, sent, delivered, opened, failed
data	JSON	Template variables
created_at	TIMESTAMP
delivered_at	TIMESTAMP	Nullable

User Preferences (PostgreSQL)

Column	Type	Notes
user_id	BIGINT	Primary key
preferences	JSONB	Map of notification_type -> channel -> enabled
quiet_hours_start	TIME	Do not disturb start
quiet_hours_end	TIME	Do not disturb end
timezone	VARCHAR	For quiet hours calculation

Device Registry (Redis/PostgreSQL): Maps user_id to device tokens for push notifications. Users may have multiple devices.

Component diagram for Design Notification Service showing each building block and its responsibility — Key components of Design Notification Service

Specific Back-of-the-Envelope Numbers

Traffic:

500M DAU generating ~10 notification-triggering events each = 5 billion notification requests/day
After preference filtering and deduplication: ~2 billion actual deliveries/day
Push notifications: ~1.5B/day (70%), in-app: 400M (20%), email: 150M (7%), SMS: 50M (3%)
Average: ~23,000 notifications/second, peak: ~70,000/second

Processing:

Each notification: check preferences (Redis lookup), apply rate limit (Redis counter), render template, route to channel
Processing time per notification: ~5ms
Need ~350 worker instances at peak to maintain under 1-second queue latency

Storage:

Notification log: 2B/day * 200 bytes = 400 GB/day, retained for 90 days = 36 TB
User preferences: 500M users * 1 KB = 500 GB (fits in a single PostgreSQL instance with read replicas)

External provider limits:

APNs (Apple): effectively unlimited but throttles per device token
FCM (Google): 240 messages/minute per device, up to 500 topics per app
Email (SES): 50,000 emails/day on standard, need dedicated IPs for higher volumes
SMS: $0.0075 per message (US) — expensive at scale, use sparingly

Sources

Reference

Reference Solutionarticle

Problem Statement

Step 1: Clarifying Questions

Step 2: Functional Requirements

Step 3: Non-Functional Requirements

Step 4: Back-of-the-Envelope Estimation

Step 5: API Design

Step 6: Data Model

Step 7: High-Level Architecture

Step 8: Detailed Component Design

Write Path

Read Path

Step 9: Scaling Strategy

Step 10: Reliability and Fault Tolerance

Step 11: Monitoring and Observability

Key Tradeoffs

How to Present This in an Interview

Practical Implementation for .NET Developers

Deep-Dive: Clarifying Questions for Notification Service

Specific Functional Requirements

Specific API Endpoints

Specific Data Model

Specific Back-of-the-Envelope Numbers

Sources

Reference

Related Topics