System Design Concepts
60 concepts across 8 categories. Master the fundamentals before tackling interview problems.
Core Concepts
View allScalability
Every production system eventually faces growth. If your architecture cannot scale, you will hit a wall — either the system crashes under load, or you.
Availability
Users and businesses depend on systems being available. A payment system that goes down for 1 hour can cost millions of dollars.
Reliability
A system can be available (running) but unreliable (returning wrong results). A payment system that double-charges customers is available but unreliable.
Single Point of Failure (SPOF)
Identifying and eliminating SPOFs is one of the first things an interviewer expects in a system design discussion.
Latency vs Throughput vs Bandwidth
Confusing latency and throughput is a common interview mistake. A system can have high throughput but high latency (batch processing), or low latency but.
Consistent Hashing
Consistent hashing is the backbone of distributed caching (Memcached), distributed databases (DynamoDB, Cassandra), load balancing, and CDNs.
CAP Theorem
CAP theorem is the most asked theoretical concept in system design interviews. It defines the fundamental constraint of distributed systems.
Failover
Without failover, any single component failure can bring down your entire system. Failover is how you achieve high availability in practice — it is the.
Fault Tolerance
In large-scale systems, component failures are not exceptions — they are the norm.
System Design Fundamentals
A comprehensive overview of what system design is, why it matters for every software engineer, and the foundational building blocks that every production.
Networking
View allOSI Model
The OSI model helps you understand where different technologies operate (TCP at Layer 4, HTTP at Layer 7, load balancers at Layer 4 or 7).
IP Addresses
Understanding IP addressing is essential for designing networked systems — configuring load balancers, VPCs, subnets, and security groups all require IP.
Domain Name System (DNS)
DNS is the first step of every web request. It affects latency, reliability, and can be used for load balancing (DNS-based routing).
Proxy vs Reverse Proxy
Reverse proxies are used in virtually every production system. They handle TLS termination, load balancing, caching, rate limiting, and DDoS protection.
HTTP and HTTPS
Every web API uses HTTP. Understanding HTTP methods, status codes, headers, and connection management is essential for API design.
TCP vs UDP
Choosing between TCP and UDP affects your system's performance and reliability. Real-time systems (video calls, gaming) cannot afford TCP's overhead.
Load Balancing
Load balancing is used in virtually every production system. It is one of the first things you add when scaling beyond a single server.
Checksums
Checksums protect data integrity in distributed systems. When transferring files across networks, replicating databases, or storing data on disk, you need.
APIs
View allWhat is an API
API design is a core skill for backend engineers and a key topic in system design interviews.
API Gateway
In a microservices architecture, clients should not need to know about individual service addresses.
REST vs GraphQL
Choosing between REST and GraphQL is a common API design decision and interview question.
WebSockets
WebSockets power real-time features: chat applications, live notifications, stock tickers, collaborative editing, and online gaming.
Webhooks
Webhooks enable event-driven integrations without continuous polling. They are used by virtually every SaaS platform (Stripe, GitHub, Slack, Twilio) to.
Idempotency
Network failures are inevitable. Clients will retry requests. Without idempotency, retries can cause catastrophic bugs — a payment system that charges.
Rate Limiting
Without rate limiting, a single client can overwhelm your service (intentionally via DDoS or unintentionally via a bug).
API Design Best Practices
APIs are contracts — once published, they are hard to change without breaking clients. Good design from the start saves years of technical debt.
Databases
View allACID Transactions
Understanding ACID is essential for choosing between SQL and NoSQL databases. Financial systems require ACID. Social media feeds may not.
SQL vs NoSQL
Choosing the right database is one of the most impactful decisions in system design. The wrong choice leads to painful migrations.
Database Indexes
Indexes are the single most impactful performance optimization for databases. A query that takes 30 seconds without an index can take 1 millisecond with.
Database Sharding
When a single database server cannot handle the data volume or query load, sharding is the solution.
Data Replication
Every production database uses replication. Without it, a single server failure means data loss and downtime.
Database Scaling
The database is almost always the first bottleneck in a growing system. Knowing the scaling playbook — and the order in which to apply techniques — is.
Database Types
Choosing the right database for each component of your system is a core design skill.
Bloom Filters
Bloom filters save expensive disk/network lookups. Before querying a database or cache, check the Bloom filter.
Database Architectures
Understanding database architectures helps you design systems that meet availability, consistency, and performance requirements.
NoSQL Data Modeling
How to model data in NoSQL databases using denormalization, access-pattern-driven design, and practical patterns for document, wide-column, and key-value.
Caching
View allCaching 101
Caching reduces latency (memory access: ~100ns vs disk: ~10ms), reduces database load (cache absorbs 80-95% of reads), and reduces costs (fewer database.
Caching Strategies
Choosing the wrong caching strategy leads to stale data, cache misses, or unnecessary database load.
Cache Eviction Policies
The eviction policy directly affects cache hit rate. A poor policy evicts useful data, causing more cache misses and higher latency.
Distributed Caching
A single Redis server can only hold as much data as its RAM allows. Distributed caching (Redis Cluster, Memcached with consistent hashing) scales.
Content Delivery Network (CDN)
CDNs are essential for any user-facing application. They reduce latency, reduce origin server load, protect against DDoS attacks, and handle traffic.
Cache Warming
Pre-populating caches before traffic hits to avoid cold-start latency spikes, covering warming strategies, real-world implementations, and when warming is.
Async Communication
View allPublish-Subscribe Pattern
Pub/Sub is the foundation of event-driven architectures. It enables microservices to communicate asynchronously, decouples producers from consumers, and.
Message Queues
Message queues decouple producers from consumers, handle traffic spikes by buffering, enable retry logic, and improve system reliability.
Change Data Capture (CDC)
CDC enables real-time data synchronization between databases, caches, search indexes, and analytics systems without tight coupling.
Distributed Systems
View allHeartbeats in Distributed Systems
Failure detection is the foundation of fault tolerance. Without heartbeats, you cannot know when a server has crashed, and failover cannot begin.
Service Discovery
In microservices architectures with dynamic scaling (containers, Kubernetes), services come and go constantly.
Consensus Algorithms
Without consensus, distributed systems cannot reliably replicate data, elect leaders, or coordinate actions.
Distributed Locking
Without distributed locks, concurrent processes can cause data corruption, double-spending, overselling inventory, or duplicate processing.
Gossip Protocol
Gossip protocols enable decentralized failure detection, membership management, and data dissemination without a central coordinator.
Circuit Breaker Pattern
Without circuit breakers, a failing downstream service can cascade failures throughout your system.
Disaster Recovery
Disasters happen: AWS us-east-1 has had multi-hour outages, entire data centers have lost power, and ransomware attacks have encrypted production.
Distributed Tracing
In a microservices system, a single user request may pass through 10+ services. When something is slow or fails, you need to see the entire chain to find.
Leader Election
How distributed systems elect a single leader to coordinate work, covering Raft, Bully, and Ring algorithms, along with real-world implementations in.
Architecture Patterns
View allClient-Server Architecture
Client-server is the most fundamental architectural pattern. Understanding it is the starting point for all system design discussions.
Microservices Architecture
Microservices enable large engineering teams to work independently, deploy frequently, scale individual components, and use the best technology for each.
Serverless Architecture
Serverless eliminates operational overhead for many use cases. There are no servers to patch, no capacity to plan, and no idle resources to pay for.
Event-Driven Architecture
EDA enables loose coupling, scalability, and real-time processing. It is the architecture behind real-time analytics, IoT systems, and modern.
Peer-to-Peer Architecture
P2P eliminates the need for central servers, making systems more resilient and cost-effective for certain use cases.
Monolith vs Microservices
When to use a monolithic architecture versus microservices, the real tradeoffs involved, and practical migration strategies used by companies like.