System Design: Dropbox

Requirements

Functional Requirements:

Sync files across multiple devices automatically when changes are detected
Share files and folders with other users
Version history — restore any version of a file from the last 30 days
Support files up to 100 GB
Offline access — files modified offline sync when reconnected
Collaborative editing support with conflict detection and resolution

Non-Functional Requirements:

Sync latency under 30 seconds for small file changes
Handle 100 million active users
100 million GB (100 PB) of total stored data
Efficient bandwidth usage — only transfer changed blocks
99.99% availability
End-to-end encryption option

Scale Estimation

With 100 million users averaging 1 GB of active data each, total stored data is 100 PB. With block deduplication (typical dedup ratio 3:1), physical storage is ~33 PB. Daily active users: 50 million, with each modifying an average of 5 files/day, generating 250 million file events/day (~3,000/sec). Block size of 4 MB means a 100 GB file is split into 25,000 blocks. Deduplication: many common files (standard OS libraries, shared templates) appear across millions of accounts — dedup alone saves enormous storage. Chunking and Rabin fingerprinting ensure the same content block is stored only once.

High-Level Architecture

The system has four major components: the desktop/mobile client, the metadata service, the block storage service, and the notification service. The client watches the local filesystem for changes. Changed files are chunked into variable-length blocks using content-defined chunking (Rabin fingerprinting). Each block is hashed (SHA-256). The client checks the metadata service which blocks are new (not yet stored) and uploads only those blocks to the block storage service. A commit operation atomically records the new file version in the metadata service.

The desktop client uses OS filesystem watches (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows) to detect file modifications. When a change is detected, the client reads the file, splits it into variable-size chunks using Rabin fingerprinting (target chunk size 4 MB), computes SHA-256 for each chunk, and queries the metadata service for the file's previous block list. New block hashes (not in the previous version) are uploaded to the block store. This delta-transfer approach is critical: modifying the middle of a large file only uploads the changed blocks (~1–2 chunks), not the full file.

Synchronization across devices uses a long-polling notification service. When device A commits a file change, the metadata service publishes a change event to a notification bus. All other devices belonging to the same account receive the change event, download the new block list, identify blocks not yet in their local cache, fetch those blocks from the block store, reconstruct the file on disk, and confirm sync completion. The notification service maintains persistent connections (WebSockets or HTTP/2 server-sent events) from all active clients.

Core Components

Content-Defined Chunking (Rabin Fingerprinting)

Rabin fingerprinting uses a rolling hash over a sliding window of bytes. When the hash value hits a specific bit pattern (e.g., lowest 12 bits = 0), a chunk boundary is declared. This produces variable-length chunks with a target size of 4 MB. The key advantage over fixed-size chunking: inserting bytes at the start of a file shifts all fixed-size chunks, creating spurious changes for all downstream blocks. With Rabin chunking, inserting bytes only changes the affected chunks plus one boundary adjustment; all other chunks remain identical. This maximizes deduplication across file versions and similar files.

Block Deduplication Store

The block store is keyed by SHA-256 hash (content-addressable storage). Before uploading a block, the client checks whether the server already has a block with this hash (via a /check_blocks API that accepts a list of hashes). If the block exists, the commit operation simply references the existing block — zero bytes transferred. This global deduplication means that if 10 million users all have the same macOS system file in their Dropbox, it is stored only once. Block reference counts are maintained by the metadata service. When a reference count drops to zero (all file versions referencing the block are deleted), the block is scheduled for garbage collection.

Conflict Resolution

When two devices modify the same file offline and both sync, a conflict is detected when the metadata service receives two commits for the same file with divergent parent version IDs. Dropbox's strategy for most file types is "last-write-wins with conflict copy": the later commit is accepted as the current version, and the earlier conflicting version is saved as a new file named "filename (conflicted copy from Device, date).ext". For structured file types (Dropbox Paper, Google Docs-style), CRDT-based merging allows both edits to be preserved. The client is notified of the conflict and displays it in the Dropbox UI for user resolution.

Database Design

The metadata service stores: file tree (user_id, path, file_id, parent_id, is_folder, created_at, modified_at); file versions (file_id, version_id, parent_version_id, block_list: [sha256], total_size, modified_by, committed_at); block registry (sha256, storage_path, size, reference_count, created_at). The file tree is stored in a relational DB (MySQL or PostgreSQL) sharded by user_id. File versions are stored in Cassandra (immutable, append-only, partitioned by file_id). The block registry is stored in a distributed key-value store (DynamoDB) keyed by SHA-256 hash. A separate access control table manages sharing permissions (sharer_id, sharee_id, folder_id, permission_level).

API Design

Scaling & Bottlenecks

The metadata service is the consistency-critical bottleneck. It must serialize concurrent commits to the same file (two devices saving simultaneously). A distributed lock per file_id (Redis SETNX or Zookeeper ephemeral nodes) prevents concurrent commit races. At 3,000 commits/sec with a 1-second lock timeout, 3,000 concurrent locks must be manageable by the lock service. Redis can handle 100,000 SET ops/sec, so this is well within budget. Lock contention is low in practice: concurrent edits to the same file are rare unless users deliberately share high-frequency-update files.

Block storage uses S3 (or a similar object store) as the backend, providing automatic scaling to exabytes. Block uploads are distributed across regions: blocks are stored in the region closest to the uploading client. Cross-region replication (3-way, across different continents) provides geo-redundancy. Download acceleration uses CDN edge caching for popular blocks (company-shared files, template files accessed by many users). Bandwidth costs are a significant operational expense; the block deduplication and delta-transfer design directly reduce bandwidth consumption by 70–90% compared to full-file uploads.

Key Trade-offs

Variable vs. fixed chunk size: Variable (Rabin) chunking maximizes deduplication but requires storing chunk boundary metadata; fixed chunking is simpler but wastes bandwidth on shifted content
Client-side vs. server-side deduplication: Client-side (checking hashes before upload) minimizes bandwidth but requires trust in client hash computation; server-side verification is more secure
Sync latency vs. battery life: Immediate sync on file change provides the lowest latency but drains laptop battery and generates wake-lock contention; batched sync (every 60 seconds) is more efficient but increases latency
Eventual consistency vs. strong consistency for metadata: Strong consistency prevents lost updates but limits write throughput; eventual consistency with last-write-wins is simpler but creates subtle conflict scenarios