System Design Concepts

110 concepts across 8 categories. Master the fundamentals before tackling interview problems.

Core Concepts

beginnerCore Concepts

Scalability

Every production system eventually faces growth. If your architecture cannot scale, you will hit a wall — either the system crashes under load, or you.

beginnerCore Concepts

Availability

Users and businesses depend on systems being available. A payment system that goes down for 1 hour can cost millions of dollars.

beginnerCore Concepts

Reliability

A system can be available (running) but unreliable (returning wrong results). A payment system that double-charges customers is available but unreliable.

beginnerCore Concepts

Single Point of Failure (SPOF)

Identifying and eliminating SPOFs is one of the first things an interviewer expects in a system design discussion.

beginnerCore Concepts

Latency vs Throughput vs Bandwidth

Confusing latency and throughput is a common interview mistake. A system can have high throughput but high latency (batch processing), or low latency but.

intermediateCore Concepts

Consistent Hashing

Consistent hashing is the backbone of distributed caching (Memcached), distributed databases (DynamoDB, Cassandra), load balancing, and CDNs.

intermediateCore Concepts

CAP Theorem

CAP theorem is the most asked theoretical concept in system design interviews. It defines the fundamental constraint of distributed systems.

beginnerCore Concepts

Failover

Without failover, any single component failure can bring down your entire system. Failover is how you achieve high availability in practice — it is the.

intermediateCore Concepts

Fault Tolerance

In large-scale systems, component failures are not exceptions — they are the norm.

beginnerCore Concepts

System Design Fundamentals

A comprehensive overview of what system design is, why it matters for every software engineer, and the foundational building blocks that every production.

intermediateCore Concepts

Capacity Planning

Capacity planning estimates future resource needs (CPU, memory, storage, bandwidth) based on traffic projections, ensuring the system can handle growth.

intermediateCore Concepts

Autoscaling

Autoscaling automatically adjusts compute resources based on real-time demand — adding servers during traffic spikes and removing them during lulls to.

Networking

beginnerNetworking

OSI Model

The OSI model helps you understand where different technologies operate (TCP at Layer 4, HTTP at Layer 7, load balancers at Layer 4 or 7).

beginnerNetworking

IP Addresses

Understanding IP addressing is essential for designing networked systems — configuring load balancers, VPCs, subnets, and security groups all require IP.

beginnerNetworking

Domain Name System (DNS)

DNS is the first step of every web request. It affects latency, reliability, and can be used for load balancing (DNS-based routing).

beginnerNetworking

Proxy vs Reverse Proxy

Reverse proxies are used in virtually every production system. They handle TLS termination, load balancing, caching, rate limiting, and DDoS protection.

beginnerNetworking

HTTP and HTTPS

Every web API uses HTTP. Understanding HTTP methods, status codes, headers, and connection management is essential for API design.

beginnerNetworking

TCP vs UDP

Choosing between TCP and UDP affects your system's performance and reliability. Real-time systems (video calls, gaming) cannot afford TCP's overhead.

intermediateNetworking

Load Balancing

Load balancing is used in virtually every production system. It is one of the first things you add when scaling beyond a single server.

beginnerNetworking

Checksums

Checksums protect data integrity in distributed systems. When transferring files across networks, replicating databases, or storing data on disk, you need.

intermediateNetworking

Data Compression

Data compression reduces payload sizes for faster network transfer and lower storage costs.

intermediateNetworking

Serialization

Serialization converts in-memory objects to a transferable format (JSON, Protobuf, Avro, MessagePack).

intermediateNetworking

Encryption

Encryption protects data confidentiality at rest (AES-256), in transit (TLS), and end-to-end (E2E).

APIs

What is an API

API design is a core skill for backend engineers and a key topic in system design interviews.

intermediateAPIs

API Gateway

In a microservices architecture, clients should not need to know about individual service addresses.

intermediateAPIs

REST vs GraphQL

Choosing between REST and GraphQL is a common API design decision and interview question.

intermediateAPIs

WebSockets

WebSockets power real-time features: chat applications, live notifications, stock tickers, collaborative editing, and online gaming.

Webhooks

Webhooks enable event-driven integrations without continuous polling. They are used by virtually every SaaS platform (Stripe, GitHub, Slack, Twilio) to.

intermediateAPIs

Idempotency

Network failures are inevitable. Clients will retry requests. Without idempotency, retries can cause catastrophic bugs — a payment system that charges.

intermediateAPIs

Rate Limiting

Without rate limiting, a single client can overwhelm your service (intentionally via DDoS or unintentionally via a bug).

intermediateAPIs

API Design Best Practices

APIs are contracts — once published, they are hard to change without breaking clients. Good design from the start saves years of technical debt.

intermediateAPIs

gRPC

gRPC is a high-performance RPC framework by Google using Protocol Buffers and HTTP/2 for efficient, typed service-to-service communication in microservice.

Authentication

Authentication verifies who a user is. In distributed systems, stateless token-based auth (JWT, OAuth 2.0) replaces server-side sessions for horizontal.

intermediateAPIs

Authorization

Authorization controls what authenticated users can do. RBAC, ABAC, and policy engines enforce permissions across distributed systems without hardcoding.

WebRTC

WebRTC enables peer-to-peer real-time audio, video, and data communication directly between browsers without plugins or intermediate servers for media.

intermediateAPIs

RBAC

Role-Based Access Control assigns permissions to roles, not individual users. Users inherit permissions through role membership, simplifying access.

intermediateAPIs

SSO

Single Sign-On lets users authenticate once and access multiple applications without re-entering credentials, using protocols like SAML 2.0 and OpenID.

Databases

intermediateDatabases

ACID Transactions

Understanding ACID is essential for choosing between SQL and NoSQL databases. Financial systems require ACID. Social media feeds may not.

intermediateDatabases

SQL vs NoSQL

Choosing the right database is one of the most impactful decisions in system design. The wrong choice leads to painful migrations.

intermediateDatabases

Database Indexes

Indexes are the single most impactful performance optimization for databases. A query that takes 30 seconds without an index can take 1 millisecond with.

advancedDatabases

Database Sharding

When a single database server cannot handle the data volume or query load, sharding is the solution.

intermediateDatabases

Data Replication

Every production database uses replication. Without it, a single server failure means data loss and downtime.

intermediateDatabases

Database Scaling

The database is almost always the first bottleneck in a growing system. Knowing the scaling playbook — and the order in which to apply techniques — is.

intermediateDatabases

Database Types

Choosing the right database for each component of your system is a core design skill.

advancedDatabases

Bloom Filters

Bloom filters save expensive disk/network lookups. Before querying a database or cache, check the Bloom filter.

advancedDatabases

Database Architectures

Understanding database architectures helps you design systems that meet availability, consistency, and performance requirements.

intermediateDatabases

NoSQL Data Modeling

How to model data in NoSQL databases using denormalization, access-pattern-driven design, and practical patterns for document, wide-column, and key-value.

intermediateDatabases

BASE Properties

BASE (Basically Available, Soft state, Eventually consistent) is an alternative to ACID that relaxes consistency guarantees in favor of availability and.

intermediateDatabases

Full-Text Search

Full-text search enables fast, relevance-ranked querying of unstructured text data using inverted indexes, tokenization, and scoring algorithms like.

intermediateDatabases

Materialized Views

Materialized views are precomputed query results stored as physical tables, trading storage space and write overhead for dramatically faster read.

intermediateDatabases

Query Optimization

Query optimization is the process of analyzing and restructuring database queries, indexes, and execution plans to minimize response time and resource.

intermediateDatabases

Connection Pooling

Connection pooling reuses a pool of pre-established database connections instead of creating new ones per request, dramatically reducing latency and.

advancedDatabases

LSM Trees

LSM Trees (Log-Structured Merge Trees) are write-optimized data structures that buffer writes in memory and flush sorted runs to disk, powering databases.

advancedDatabases

B-Trees

B-Trees are self-balancing tree data structures that maintain sorted data in pages optimized for disk I/O, forming the backbone of indexes in PostgreSQL,.

advancedDatabases

HyperLogLog

HyperLogLog is a probabilistic data structure that estimates the cardinality (count of distinct elements) of massive datasets using only a few kilobytes.

intermediateDatabases

Time Series Databases

Time series databases are purpose-built for storing, querying, and analyzing time-stamped data, with optimizations for high-ingestion rates, time-range.

intermediateDatabases

Vector Databases

Vector databases store and query high-dimensional vector embeddings using approximate nearest neighbor (ANN) search, enabling semantic similarity search.

intermediateDatabases

ETL Pipelines

ETL (Extract, Transform, Load) pipelines extract data from source systems, transform it into a consistent format, and load it into a target data store.

intermediateDatabases

Data Pipelines

Data pipelines are automated systems that move and transform data from sources to destinations, encompassing both batch (ETL/ELT) and streaming (real-time.

intermediateDatabases

Data Lakes

Data lakes are centralized storage repositories that hold vast amounts of raw data in its native format — structured, semi-structured, and unstructured —.

intermediateDatabases

Data Warehouses

Data warehouses are centralized, schema-on-write analytical databases optimized for complex queries across large volumes of structured, historical data,.

Caching

beginnerCaching

Caching 101

Caching reduces latency (memory access: ~100ns vs disk: ~10ms), reduces database load (cache absorbs 80-95% of reads), and reduces costs (fewer database.

intermediateCaching

Caching Strategies

Choosing the wrong caching strategy leads to stale data, cache misses, or unnecessary database load.

intermediateCaching

Cache Eviction Policies

The eviction policy directly affects cache hit rate. A poor policy evicts useful data, causing more cache misses and higher latency.

intermediateCaching

Distributed Caching

A single Redis server can only hold as much data as its RAM allows. Distributed caching (Redis Cluster, Memcached with consistent hashing) scales.

beginnerCaching

Content Delivery Network (CDN)

CDNs are essential for any user-facing application. They reduce latency, reduce origin server load, protect against DDoS attacks, and handle traffic.

intermediateCaching

Cache Warming

Pre-populating caches before traffic hits to avoid cold-start latency spikes, covering warming strategies, real-world implementations, and when warming is.

intermediateCaching

Cache Stampede

A cache stampede (thundering herd) occurs when many requests simultaneously miss the cache and hit the database, causing a load spike that can bring down.

Async Communication

intermediateAsync Communication

Publish-Subscribe Pattern

Pub/Sub is the foundation of event-driven architectures. It enables microservices to communicate asynchronously, decouples producers from consumers, and.

intermediateAsync Communication

Message Queues

Message queues decouple producers from consumers, handle traffic spikes by buffering, enable retry logic, and improve system reliability.

advancedAsync Communication

Change Data Capture (CDC)

CDC enables real-time data synchronization between databases, caches, search indexes, and analytics systems without tight coupling.

intermediateAsync Communication

Backpressure

Backpressure is a flow control mechanism where a slow consumer signals upstream producers to slow down, preventing memory exhaustion and cascading.

Distributed Systems

intermediateDistributed Systems

Heartbeats in Distributed Systems

Failure detection is the foundation of fault tolerance. Without heartbeats, you cannot know when a server has crashed, and failover cannot begin.

intermediateDistributed Systems

Service Discovery

In microservices architectures with dynamic scaling (containers, Kubernetes), services come and go constantly.

advancedDistributed Systems

Consensus Algorithms

Without consensus, distributed systems cannot reliably replicate data, elect leaders, or coordinate actions.

advancedDistributed Systems

Distributed Locking

Without distributed locks, concurrent processes can cause data corruption, double-spending, overselling inventory, or duplicate processing.

advancedDistributed Systems

Gossip Protocol

Gossip protocols enable decentralized failure detection, membership management, and data dissemination without a central coordinator.

intermediateDistributed Systems

Circuit Breaker Pattern

Without circuit breakers, a failing downstream service can cascade failures throughout your system.

intermediateDistributed Systems

Disaster Recovery

Disasters happen: AWS us-east-1 has had multi-hour outages, entire data centers have lost power, and ransomware attacks have encrypted production.

intermediateDistributed Systems

Bulkhead Pattern

Learn the Bulkhead Pattern for isolating failures in distributed systems — prevent cascading outages by partitioning resources into independent.

intermediateDistributed Systems

Distributed Tracing

In a microservices system, a single user request may pass through 10+ services. When something is slow or fails, you need to see the entire chain to find.

advancedDistributed Systems

Leader Election

How distributed systems elect a single leader to coordinate work, covering Raft, Bully, and Ring algorithms, along with real-world implementations in.

intermediateDistributed Systems

Retry Patterns

Learn Retry Patterns including exponential backoff with jitter — handle transient failures gracefully in distributed systems without overwhelming.

intermediateDistributed Systems

Timeout Patterns

Learn Timeout Patterns for distributed systems — configure connect, read, and write timeouts to prevent hung requests from consuming resources and.

advancedDistributed Systems

Load Shedding

Learn Load Shedding in distributed systems — intentionally dropping excess requests to protect system stability and maintain quality of service for.

intermediateDistributed Systems

Observability

Learn Observability in distributed systems — understand how logs, metrics, and traces work together to provide deep insight into system behavior and.

beginnerDistributed Systems

Logging

Learn structured logging and log levels for distributed systems — capture meaningful context, correlate events across services, and build queryable.

intermediateDistributed Systems

Metrics

Learn about metrics in distributed systems — counters, gauges, and histograms that enable real-time dashboards, alerting, and capacity planning for.

intermediateDistributed Systems

Correlation IDs

Learn Correlation IDs for request tracing across distributed services — attach unique identifiers to requests so logs, metrics, and traces can be linked.

beginnerDistributed Systems

Monitoring

Learn Monitoring for distributed systems — build dashboards, set SLOs, configure alerts, and establish processes to detect, diagnose, and respond to.

intermediateDistributed Systems

Alerting

Learn Alerting for distributed systems — design on-call rotations, configure escalation policies, set meaningful thresholds, and reduce alert fatigue with.

advancedDistributed Systems

Service Mesh

Learn Service Mesh architecture with Istio, Linkerd, and sidecar proxies — handle service-to-service communication, security, observability, and traffic.

intermediateDistributed Systems

Sidecar Pattern

Learn the Sidecar Pattern for distributed systems — deploy companion containers alongside application services to handle cross-cutting concerns like.

advancedDistributed Systems

Merkle Trees

Learn Merkle Trees — hash-based tree structures that enable efficient data verification, tamper detection, and synchronization in distributed systems and.

advancedDistributed Systems

MapReduce

Learn MapReduce — the parallel data processing paradigm that splits computation across distributed nodes using map and reduce phases, pioneered by Google.

intermediateDistributed Systems

Secrets Management

Learn Secrets Management for distributed systems — securely store, distribute, and rotate credentials, API keys, and certificates using tools like.

advancedDistributed Systems

Erasure Coding

Learn Erasure Coding for distributed storage — achieve fault tolerance with less storage overhead than replication by encoding data into fragments that.

Architecture Patterns

advancedArchitecture Patterns

CQRS

CQRS separates read and write models so each can be optimized independently — write to a normalized database, read from a denormalized projection.

beginnerArchitecture Patterns

Client-Server Architecture

Client-server is the most fundamental architectural pattern. Understanding it is the starting point for all system design discussions.

advancedArchitecture Patterns

Event Sourcing

Event Sourcing stores every state change as an immutable event. The current state is derived by replaying events, providing a complete audit trail and.

intermediateArchitecture Patterns

BFF Pattern

Backend for Frontend (BFF) creates dedicated backend services for each frontend type (web, mobile, TV), tailoring API responses to each client's specific.

intermediateArchitecture Patterns

Microservices Architecture

Microservices enable large engineering teams to work independently, deploy frequently, scale individual components, and use the best technology for each.

intermediateArchitecture Patterns

Serverless Architecture

Serverless eliminates operational overhead for many use cases. There are no servers to patch, no capacity to plan, and no idle resources to pay for.

intermediateArchitecture Patterns

Strangler Fig Pattern

The Strangler Fig Pattern incrementally migrates a legacy monolith to microservices by routing traffic to new services one feature at a time, avoiding.

intermediateArchitecture Patterns

Blue-Green Deployment

Blue-green deployment maintains two identical production environments. Traffic switches from blue (current) to green (new) instantly, enabling.

intermediateArchitecture Patterns

Event-Driven Architecture

EDA enables loose coupling, scalability, and real-time processing. It is the architecture behind real-time analytics, IoT systems, and modern.

intermediateArchitecture Patterns

Canary Release

Canary release gradually rolls out a new version to a small percentage of users first, monitoring for issues before expanding to 100%, reducing the blast.

intermediateArchitecture Patterns

Peer-to-Peer Architecture

P2P eliminates the need for central servers, making systems more resilient and cost-effective for certain use cases.

beginnerArchitecture Patterns

Feature Flags

Feature flags toggle functionality on or off at runtime without code deployment. They enable trunk-based development, A/B testing, gradual rollouts, and.

intermediateArchitecture Patterns

Monolith vs Microservices

When to use a monolithic architecture versus microservices, the real tradeoffs involved, and practical migration strategies used by companies like.