Skip to main content
SDMastery
medium9 min readUpdated 2026-06-03

Design Notification Service

System design interview solution for Design Notification Service. Includes requirements, API design, data model, architecture, scaling strategy, and.

Design Notification Service system design overview showing key components and metrics
High-level overview of Design Notification Service

Problem Statement

Design a system similar to Notification Service. The system should handle millions of users and provide a reliable, scalable experience.

Step 1: Clarifying Questions

Before diving into the design, ask these clarifying questions:

  • What is the expected scale (users, requests per second)?
  • What are the most critical features to support?
  • What are the latency requirements?
  • Do we need to support real-time features?
  • What consistency guarantees are needed?

Step 2: Functional Requirements

Design Notification Service system architecture with service components and data flow
System architecture for Design Notification Service
  1. Core feature set for Notification Service
  2. User-facing APIs and interactions
  3. Data storage and retrieval
  4. Search and discovery (if applicable)
  5. Notifications (if applicable)

Step 3: Non-Functional Requirements

  • Scalability: Handle millions of concurrent users
  • Availability: 99.99% uptime (four nines)
  • Latency: Sub-200ms for read operations
  • Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
  • Durability: No data loss

Step 4: Back-of-the-Envelope Estimation

MetricEstimate
Daily Active Users10M
Read:Write Ratio10:1
Average Request Size1 KB
Storage per year~10 TB
Peak QPS100K

Step 5: API Design

text
POST /api/v1/resource
GET  /api/v1/resource/{id}
PUT  /api/v1/resource/{id}
DELETE /api/v1/resource/{id}
Step-by-step diagram showing how Design Notification Service works in practice
How Design Notification Service works step by step

Step 6: Data Model

Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.

Step 7: High-Level Architecture

The system consists of these major components:

  1. Client Layer — Web/mobile clients
  2. API Gateway — Rate limiting, authentication, routing
  3. Application Servers — Business logic
  4. Database Layer — Primary storage
  5. Cache Layer — Redis/Memcached for hot data
  6. Message Queue — Async processing

Step 8: Detailed Component Design

Data flow diagram for Design Notification Service showing request and response paths
Data flow through Design Notification Service

Write Path

How data flows from client to persistent storage.

Read Path

How data is retrieved, including cache interactions.

Step 9: Scaling Strategy

  • Horizontal scaling of application servers behind a load balancer
  • Database sharding by user ID or geographic region
  • Read replicas for read-heavy workloads
  • CDN for static content delivery
  • Auto-scaling based on traffic patterns

Step 10: Reliability and Fault Tolerance

Interview tips for Design Notification Service system design questions
Interview tips for Design Notification Service
  • Data replication across availability zones
  • Circuit breakers for dependent services
  • Graceful degradation under high load
  • Health checks and automated failover

Step 11: Monitoring and Observability

  • Request latency (p50, p95, p99)
  • Error rates by endpoint
  • Database query performance
  • Cache hit/miss ratios
  • Queue depth and processing lag

Key Tradeoffs

DecisionOption AOption BChosen
DatabaseSQLNoSQLDepends on access patterns
ConsistencyStrongEventualEventual for most reads
CommunicationSyncAsyncAsync for non-critical paths

How to Present This in an Interview

Decision guide showing when to use Design Notification Service and when to avoid
When to use Design Notification Service
  1. Start with clarifying questions (2 min)
  2. Define requirements (3 min)
  3. Do estimation (2 min)
  4. Design API and data model (5 min)
  5. Draw high-level architecture (10 min)
  6. Deep dive into critical components (10 min)
  7. Discuss tradeoffs and bottlenecks (5 min)

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Pros and cons analysis of Design Notification Service for system design decisions
Advantages and disadvantages of Design Notification Service

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Real-world companies using Design Notification Service in production systems
Real-world examples of Design Notification Service

Deep-Dive: Clarifying Questions for Notification Service

  1. What notification channels? Push notifications (iOS APNs, Android FCM), email (SMTP), SMS (Twilio/SNS), in-app notifications, and webhooks. Each channel has different delivery guarantees and costs.
  2. What is the volume? A large platform sends 1-10 billion notifications per day. Most are push notifications; email and SMS are an order of magnitude less.
  3. Do we need priority levels? A security alert (password changed) is high priority and must be delivered immediately. A "someone liked your photo" notification can be batched and delayed.
  4. Do we need rate limiting per user? Users should not receive more than N notifications per hour to avoid notification fatigue and uninstalls.
  5. How do we handle user preferences? Users should be able to opt out of specific notification types per channel (e.g., receive likes via push but not email).
  6. Do we need notification templates? Localized templates with variable substitution for consistent messaging across channels.

Specific Functional Requirements

  1. Multi-Channel Delivery: Send notifications via push (iOS/Android), email, SMS, in-app, and webhooks
  2. Priority Queuing: High-priority notifications (security alerts, OTPs) are delivered immediately; low-priority ones can be batched
  3. User Preferences: Per-user settings for which notification types they receive on which channels
  4. Rate Limiting: Limit notifications per user per time window to prevent notification fatigue
  5. Template Engine: Localized notification templates with variable substitution
  6. Delivery Tracking: Track sent, delivered, opened, and clicked status per notification
  7. Retry and Failover: Retry failed deliveries with exponential backoff; fall back to alternate channels

Specific API Endpoints

text
POST /api/v1/notifications/send
  Body: {
    "user_id": "user_123",
    "type": "like_photo",
    "priority": "low",
    "data": { "liker_name": "Alice", "photo_id": "p456" },
    "channels": ["push", "in_app"]
  }
  Response: { "notification_id": "n789", "status": "queued" }

POST /api/v1/notifications/send-bulk
  Body: { "user_ids": ["user_1", "user_2", ...], "type": "new_feature", "data": {...} }
  Response: { "batch_id": "batch_abc", "queued_count": 50000 }

GET /api/v1/users/:user_id/notifications?unread=true&limit=20
  Response: { "notifications": [...], "unread_count": 5 }

PUT /api/v1/users/:user_id/notification-preferences
  Body: { "like_photo": { "push": true, "email": false, "sms": false }, "security_alert": { "push": true, "email": true, "sms": true } }
Comparison table for Design Notification Service showing key metrics and tradeoffs
Comparing key aspects of Design Notification Service

Specific Data Model

Notification Queue (Kafka): Topics partitioned by priority (high, medium, low). High-priority topic has more partitions and dedicated consumers for faster processing.

Notification Log (Cassandra)

ColumnTypeNotes
user_idBIGINTPartition key
notification_idTIMEUUIDClustering key
typeVARCHARlike_photo, comment, follow, security_alert
channelVARCHARpush, email, sms, in_app
statusVARCHARqueued, sent, delivered, opened, failed
dataJSONTemplate variables
created_atTIMESTAMP
delivered_atTIMESTAMPNullable

User Preferences (PostgreSQL)

ColumnTypeNotes
user_idBIGINTPrimary key
preferencesJSONBMap of notification_type -> channel -> enabled
quiet_hours_startTIMEDo not disturb start
quiet_hours_endTIMEDo not disturb end
timezoneVARCHARFor quiet hours calculation

Device Registry (Redis/PostgreSQL): Maps user_id to device tokens for push notifications. Users may have multiple devices.

Key components of Design Notification Service with roles and responsibilities
Key components of Design Notification Service

Specific Back-of-the-Envelope Numbers

Traffic:

  • 500M DAU generating ~10 notification-triggering events each = 5 billion notification requests/day
  • After preference filtering and deduplication: ~2 billion actual deliveries/day
  • Push notifications: ~1.5B/day (70%), in-app: 400M (20%), email: 150M (7%), SMS: 50M (3%)
  • Average: ~23,000 notifications/second, peak: ~70,000/second

Processing:

  • Each notification: check preferences (Redis lookup), apply rate limit (Redis counter), render template, route to channel
  • Processing time per notification: ~5ms
  • Need ~350 worker instances at peak to maintain under 1-second queue latency

Storage:

  • Notification log: 2B/day * 200 bytes = 400 GB/day, retained for 90 days = 36 TB
  • User preferences: 500M users * 1 KB = 500 GB (fits in a single PostgreSQL instance with read replicas)

External provider limits:

  • APNs (Apple): effectively unlimited but throttles per device token
  • FCM (Google): 240 messages/minute per device, up to 500 topics per app
  • Email (SES): 50,000 emails/day on standard, need dedicated IPs for higher volumes
  • SMS: $0.0075 per message (US) — expensive at scale, use sparingly

Sources