hard10 min readUpdated 2026-06-08

Design Dropbox

Design Dropbox with file sync, chunking, deduplication, conflict resolution, and delta sync. Covers the sync protocol and metadata management.

Problem Statement

Design a cloud file storage and synchronization service like Dropbox. Users store files in the cloud, sync them across multiple devices in near real-time, share files/folders with others, and access file version history. The system must handle large files efficiently using chunking, minimize bandwidth with delta sync, and resolve conflicts when the same file is edited on two devices offline.

Requirements

Functional

Upload/download files; sync file changes across all connected devices in near real-time
Chunk large files (4 MB chunks) for resumable uploads and efficient delta sync
Content-based deduplication: identical file chunks stored only once across all users
Conflict resolution: when the same file is edited on two offline devices, create a conflict copy

System architecture diagram for Design Dropbox showing how services, databases, and caches connect — System architecture for Design Dropbox

Non-Functional

Sync latency: Changes propagated to online devices within 5 seconds
Storage efficiency: Deduplication reduces storage by 50%+ across all users
Scale: 700M users, 500B files, 1.2 exabytes of data
Reliability: No data loss -- files replicated across 3+ data centers

Core Architecture

Chunking Engine (Client-side) -- Splits files into 4 MB chunks using content-defined chunking (Rabin fingerprinting). This means inserting 1 byte at the start of a 1 GB file only changes 1-2 chunks, not all of them. Each chunk is hashed (SHA-256). Before uploading, the client sends chunk hashes to the server -- only missing chunks are uploaded (delta sync).
Metadata Service -- Stores the file tree: files, folders, versions, and the mapping from files to ordered lists of chunk hashes. Uses PostgreSQL sharded by user_id. Each file edit creates a new version entry pointing to the new set of chunk hashes. Provides the "diff" API: given a client's known version, return all changes since.

Step-by-step diagram showing how Design Dropbox processes a request from start to finish — How Design Dropbox works step by step

Block Storage Service -- Stores raw chunk data in S3/GCS, keyed by chunk hash (SHA-256). Content-addressable storage means identical chunks are naturally deduplicated -- if 1M users have the same PDF, it is stored once. Chunks are encrypted at rest (AES-256) with per-user keys managed by a KMS.
Sync Service -- Maintains a long-polling or WebSocket connection per online client. When one device uploads changes, the metadata service publishes an event. The sync service notifies all other devices of the same user to pull the updated file metadata and download any new chunks.
Conflict Resolver -- If two devices edit the same file while offline, both upload their changes with the same parent version. The server detects the conflict (two writes with the same parent version), keeps one as the primary (first to sync), and saves the other as "filename (conflicted copy - Device - Date)". The user manually resolves.

Database Choice

Data flow diagram for Design Dropbox showing how requests and responses move through the system — Data flow through Design Dropbox

PostgreSQL (sharded by user_id) for file metadata, folder structure, sharing permissions, and version history. Sharding by user_id ensures all of a user's files are on the same shard for fast tree queries. S3/GCS for chunk blob storage (content-addressed by SHA-256 hash). Redis for online presence (which devices are connected) and change notification pub/sub. Kafka for change events between metadata service and sync service.

Key API Endpoints

text

POST /api/v1/files/upload_session
  -> Body: \{ path: "/docs/report.pdf", chunk_hashes: ["abc123...", "def456..."] \}
  -> Returns: \{ upload_id: "UP-789", chunks_needed: ["def456..."] \} (only missing chunks)

PUT /api/v1/files/upload_session/\{upload_id\}/chunk
  -> Body: <binary chunk data>
  -> Headers: X-Chunk-Hash: def456...

GET /api/v1/files/changes?cursor=\{version_id\}
  -> Returns: \{ changes: [\{ path: "/docs/report.pdf", action: "modified", version: 42, chunk_hashes: [...] \}], new_cursor: "..." \}

Scaling Insight

Content-defined chunking with deduplication is the most impactful optimization. Rabin fingerprinting sets chunk boundaries based on content (not fixed offsets), so inserting data at the start of a file only affects the first 1-2 chunks. Combined with content-addressed storage (SHA-256 hash as key), identical chunks across all users are stored once. At Dropbox's scale, this reduces storage from 1.2 EB to ~500 PB -- saving hundreds of millions of dollars in storage costs annually.

Interview preparation checklist for Design Dropbox with key points to mention and mistakes to avoid — Interview tips for Design Dropbox

Key Tradeoffs

Decision	Option A	Option B	Chosen
Chunking	Fixed-size (simple)	Content-defined (Rabin fingerprint)	Content-defined -- minimizes re-upload on edits, better deduplication across files
Sync protocol	Poll for changes (simple)	Long-poll / WebSocket (real-time)	Long-poll -- near-instant sync, lower server load than polling
Conflict resolution	Last-write-wins (data loss risk)	Conflict copy (user decides)	Conflict copy -- preserves both versions, no data loss

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Decision guide for when to choose Design Dropbox and when alternative approaches are better — When to use Design Dropbox

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

Tradeoff analysis for Design Dropbox listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Dropbox

This gives you searchable, structured logs in Azure Monitor or Seq.

Deep-Dive: Clarifying Questions for Dropbox

How does file sync work? When a user modifies a file on one device, how quickly does it appear on other devices? Dropbox targets under 5 seconds for small files.
Do we need file chunking? Large files (1 GB+) should be split into chunks (typically 4 MB) so that modifying one part of a large file only uploads the changed chunks, not the entire file.
How do we handle sync conflicts? Two users editing the same file on different devices simultaneously. Dropbox creates a "conflicted copy" rather than silently overwriting.
Do we need deduplication? If 1,000 users upload the same 1 GB file, we should store it once. Content-addressable storage (hash the file content, use the hash as the key) enables this.
What about bandwidth optimization? Users on slow connections should still be able to sync. Delta sync (only upload the bytes that changed) is critical.
How do we handle versioning? Keep previous versions of files so users can restore accidentally deleted or overwritten content.

Production deployment examples of Design Dropbox at companies like Netflix, Google, and Amazon — Real-world examples of Design Dropbox

Specific Functional Requirements

File Upload and Download: Upload files up to 50 GB with resumable uploads for large files
Real-Time Sync: Changes on one device appear on all linked devices within seconds
File Chunking: Split files into 4 MB chunks for efficient delta sync — only upload changed chunks
Deduplication: Content-addressable storage using SHA-256 hashes to eliminate duplicate file storage
Conflict Resolution: Detect concurrent edits and create "conflicted copies" with clear naming
Version History: Keep 30-180 days of file versions depending on plan, with ability to restore any version
Sharing: Share files and folders via links or direct sharing with permission levels (view, edit)

Specific API Endpoints

text

POST /api/v2/files/upload_session/start
  Response: &#123; "session_id": "sess_abc123" &#125;

PUT /api/v2/files/upload_session/append
  Headers: &#123; "Dropbox-API-Arg": &#123; "session_id": "sess_abc123", "offset": 0 &#125; &#125;
  Body: [4 MB chunk binary data]
  Response: HTTP 200

POST /api/v2/files/upload_session/finish
  Body: &#123; "session_id": "sess_abc123", "commit": &#123; "path": "/documents/report.pdf", "mode": "update", "content_hash": "sha256:abc..." &#125; &#125;
  Response: &#123; "id": "id:abc123", "path": "/documents/report.pdf", "size": 15728640, "content_hash": "abc..." &#125;

POST /api/v2/files/list_folder/longpoll
  Body: &#123; "cursor": "cursor_abc" &#125;
  Response: &#123; "changes": true &#125;  (long-poll returns when changes are available)

POST /api/v2/files/list_folder/continue
  Body: &#123; "cursor": "cursor_abc" &#125;
  Response: &#123; "entries": [&#123; "tag": "file", "name": "report.pdf", "path": "...", "content_hash": "..." &#125;], "cursor": "new_cursor", "has_more": false &#125;

Specific Data Model

Comparison table for Design Dropbox contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Dropbox

File Metadata (PostgreSQL, sharded by user_id)

Column	Type	Notes
file_id	UUID	Primary key
user_id	BIGINT	Shard key
path	VARCHAR	Full file path within user's Dropbox
content_hash	VARCHAR(64)	SHA-256 of file content, used for dedup
size_bytes	BIGINT
version	INT	Incremented on each edit
is_deleted	BOOLEAN	Soft delete for version history
modified_at	TIMESTAMP	Client-side modification time
server_modified_at	TIMESTAMP	When the server received the change

Chunk Store (Object Storage — S3/GCS)

Key: SHA-256 hash of chunk content (content-addressable)
Value: 4 MB chunk data (encrypted at rest)
Deduplication is automatic: identical chunks across all users share one physical copy

File-to-Chunks Mapping (PostgreSQL)

Column	Type	Notes
file_id	UUID
version	INT
chunk_index	INT	Position in file
chunk_hash	VARCHAR(64)	Reference to chunk in object storage

Sync Journal (Cassandra): Ordered log of all changes per namespace (user or shared folder). Clients maintain a cursor and poll for changes since their last sync point.

Component diagram for Design Dropbox showing each building block and its responsibility — Key components of Design Dropbox

Specific Back-of-the-Envelope Numbers

Traffic:

700M+ registered users, ~15M paying users
Average user syncs ~2 GB/month of changed data
Assume 50M active daily users, each syncing ~100 MB/day average
File operations: 50M users * 20 file operations/day = 1 billion file ops/day = ~12,000 ops/second

Storage:

Total stored data: estimated at 1+ exabyte across all users
Deduplication saves ~30-60% of raw storage (many users store common files: OS installers, popular downloads)
Chunk store: average 4 MB chunks, 250 billion+ chunks stored

Sync performance:

Small file change (under 4 MB): single chunk upload, under 5 seconds sync time
Large file change (100 MB file, 1 MB changed): only 1 chunk re-uploaded out of 25, 96% bandwidth saved
Delta sync for a 1 GB file with 1% change: upload 10 MB instead of 1 GB

Bandwidth:

50M users * 100 MB/day = 5 PB/day of upload bandwidth
Download traffic is typically 2-3x upload (multiple devices syncing) = 10-15 PB/day
Peak: 3-5x average during business hours in each timezone

Sources

Design Dropbox -- Reference
Source: System-Design-Overview

Reference

Reference Solutionvideo

Problem Statement

Requirements

Functional

Non-Functional

Core Architecture

Database Choice

Key API Endpoints

Scaling Insight

Key Tradeoffs

Practical Implementation for .NET Developers

Deep-Dive: Clarifying Questions for Dropbox

Specific Functional Requirements

Specific API Endpoints

Specific Data Model

Specific Back-of-the-Envelope Numbers

Sources

Reference

Related Topics