Design Payment System
System design interview solution for Design Payment System. Includes requirements, API design, data model, architecture, scaling strategy, and tradeoffs.
Problem Statement
Design a system similar to Payment System. The system should handle millions of users and provide a reliable, scalable experience.
Step 1: Clarifying Questions
Before diving into the design, ask these clarifying questions:
- What is the expected scale (users, requests per second)?
- What are the most critical features to support?
- What are the latency requirements?
- Do we need to support real-time features?
- What consistency guarantees are needed?
Step 2: Functional Requirements
- Core feature set for Payment System
- User-facing APIs and interactions
- Data storage and retrieval
- Search and discovery (if applicable)
- Notifications (if applicable)
Step 3: Non-Functional Requirements
- Scalability: Handle millions of concurrent users
- Availability: 99.99% uptime (four nines)
- Latency: Sub-200ms for read operations
- Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
- Durability: No data loss
Step 4: Back-of-the-Envelope Estimation
| Metric | Estimate |
|---|---|
| Daily Active Users | 10M |
| Read:Write Ratio | 10:1 |
| Average Request Size | 1 KB |
| Storage per year | ~10 TB |
| Peak QPS | 100K |
Step 5: API Design
POST /api/v1/resource
GET /api/v1/resource/{id}
PUT /api/v1/resource/{id}
DELETE /api/v1/resource/{id}
Step 6: Data Model
Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.
Step 7: High-Level Architecture
The system consists of these major components:
- Client Layer — Web/mobile clients
- API Gateway — Rate limiting, authentication, routing
- Application Servers — Business logic
- Database Layer — Primary storage
- Cache Layer — Redis/Memcached for hot data
- Message Queue — Async processing
Step 8: Detailed Component Design
Write Path
How data flows from client to persistent storage.
Read Path
How data is retrieved, including cache interactions.
Step 9: Scaling Strategy
- Horizontal scaling of application servers behind a load balancer
- Database sharding by user ID or geographic region
- Read replicas for read-heavy workloads
- CDN for static content delivery
- Auto-scaling based on traffic patterns
Step 10: Reliability and Fault Tolerance
- Data replication across availability zones
- Circuit breakers for dependent services
- Graceful degradation under high load
- Health checks and automated failover
Step 11: Monitoring and Observability
- Request latency (p50, p95, p99)
- Error rates by endpoint
- Database query performance
- Cache hit/miss ratios
- Queue depth and processing lag
Key Tradeoffs
| Decision | Option A | Option B | Chosen |
|---|---|---|---|
| Database | SQL | NoSQL | Depends on access patterns |
| Consistency | Strong | Eventual | Eventual for most reads |
| Communication | Sync | Async | Async for non-critical paths |
How to Present This in an Interview
- Start with clarifying questions (2 min)
- Define requirements (3 min)
- Do estimation (2 min)
- Design API and data model (5 min)
- Draw high-level architecture (10 min)
- Deep dive into critical components (10 min)
- Discuss tradeoffs and bottlenecks (5 min)
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Deep-Dive: Clarifying Questions for Payment System
- What payment methods? Credit/debit cards (Visa, Mastercard), bank transfers (ACH), digital wallets (Apple Pay, Google Pay), cryptocurrency? Each has different processing pipelines and settlement times.
- What is the transaction volume? Stripe processes billions of dollars per year. PayPal handles ~40 million transactions per day. Visa processes 65,000 transactions per second at peak.
- How do we prevent double-charging? Idempotency is non-negotiable. If a client retries a payment request (due to timeout), the system must guarantee the payment is processed exactly once.
- How do we handle refunds? Partial refunds, full refunds, and time-limited refund windows (e.g., 180 days for chargebacks).
- What compliance requirements? PCI DSS for card data, SOX for financial reporting, PSD2/SCA for European payments. These are not optional — they are legal requirements.
- How do we handle currency? Multi-currency support with real-time exchange rates. Never use floating-point for money — use integers (cents) or decimal types.
Specific Functional Requirements
- Payment Processing: Accept payments via multiple methods (card, bank transfer, wallet) with real-time authorization
- Idempotent Transactions: Every payment request includes an idempotency key to prevent duplicate charges
- Refunds: Support full and partial refunds with audit trail
- Ledger: Double-entry bookkeeping ledger recording every financial movement (debits and credits must always balance)
- Fraud Detection: Real-time scoring of transactions using ML models (velocity checks, geographic anomalies, device fingerprinting)
- Settlement: Batch settlement with payment processors and bank accounts on a daily/weekly schedule
- Webhooks: Notify merchants of payment events (succeeded, failed, refunded, disputed)
Specific API Endpoints
POST /api/v1/payments
Headers: { "Idempotency-Key": "unique_request_id_abc123" }
Body: { "amount": 2999, "currency": "usd", "payment_method_id": "pm_card_visa", "merchant_id": "m_456", "metadata": { "order_id": "ord_789" } }
Response: { "payment_id": "pay_abc", "status": "succeeded", "amount": 2999, "currency": "usd", "created_at": "..." }
POST /api/v1/refunds
Headers: { "Idempotency-Key": "refund_req_xyz" }
Body: { "payment_id": "pay_abc", "amount": 1500, "reason": "customer_request" }
Response: { "refund_id": "ref_def", "status": "pending", "amount": 1500 }
GET /api/v1/payments/:payment_id
Response: { "payment_id": "pay_abc", "status": "succeeded", "amount": 2999, "refunds": [...], "timeline": [...] }
GET /api/v1/ledger/balance?merchant_id=m_456¤cy=usd
Response: { "available": 150000, "pending": 25000, "currency": "usd" }
Specific Data Model
Payments (PostgreSQL — strong consistency required)
| Column | Type | Notes |
|---|---|---|
| payment_id | UUID | Primary key |
| idempotency_key | VARCHAR(255) | UNIQUE — prevents duplicate processing |
| merchant_id | VARCHAR | |
| amount | BIGINT | In smallest currency unit (cents) — NEVER use float |
| currency | VARCHAR(3) | ISO 4217 code |
| status | ENUM | pending, authorized, captured, failed, refunded |
| payment_method | JSONB | Tokenized card/bank details (never store raw card numbers) |
| created_at | TIMESTAMP | |
| updated_at | TIMESTAMP |
Ledger Entries (PostgreSQL — append-only, double-entry)
| Column | Type | Notes |
|---|---|---|
| entry_id | UUID | Primary key |
| payment_id | UUID | Reference |
| account_id | VARCHAR | Which account (merchant, platform, reserve) |
| type | ENUM | debit, credit |
| amount | BIGINT | Always positive; direction determined by type |
| balance_after | BIGINT | Running balance for reconciliation |
| created_at | TIMESTAMP | Immutable — ledger entries are never updated or deleted |
Idempotency Store (Redis + PostgreSQL): Redis for fast lookup (TTL: 24 hours), PostgreSQL for persistence. Key = idempotency_key, Value = payment_id + response. If a duplicate request arrives, return the stored response without reprocessing.
Specific Back-of-the-Envelope Numbers
Traffic (Stripe-scale):
- ~100M transactions/day = ~1,200 transactions/second average, 5,000/second peak
- Each transaction: 1 write (payment record) + 2 ledger entries (debit + credit) + 1 webhook = 4 DB operations
- Database write rate: ~5,000 writes/second at peak
Storage:
- Payment records: 100M/day * 500 bytes = 50 GB/day = 18 TB/year
- Ledger entries: 200M/day * 200 bytes = 40 GB/day = 14.5 TB/year
- Must retain financial records for 7+ years (regulatory) = 100+ TB of historical data
Latency targets:
- Payment authorization: under 2 seconds end-to-end (includes network call to card network)
- Idempotency check: under 5ms (Redis lookup)
- Webhook delivery: within 30 seconds of event
Availability:
- Payment systems target 99.999% availability (5.26 min downtime/year)
- Every second of downtime = lost revenue for merchants
- Requires active-active deployment across multiple regions with automatic failover