Database Design at Scale
Sharding, replication, SQL vs NoSQL, and data partitioning
Database Design at Scale
At scale, a single database becomes a bottleneck. Understanding replication (copies of data), sharding (splitting data), and choosing between SQL and NoSQL is essential for system design interviews.
Replication
Master-Slave Replication
Sharding Strategies
text
1. Range-Based Sharding
├── Shard 1: Users A-H
├── Shard 2: Users I-P
└── Shard 3: Users Q-Z
✅ Simple, range queries possible
❌ Uneven distribution (hotspots)
2. Hash-Based Sharding
├── shard = hash(user_id) % num_shards
├── Evenly distributes data
└── Uses consistent hashing for rebalancing
✅ Even distribution
❌ No range queries, resharding is expensive
3. Geographic Sharding
├── Shard US: Users in Americas
├── Shard EU: Users in Europe
└── Shard APAC: Users in Asia-Pacific
✅ Low latency for regional users
❌ Cross-region queries are expensive
4. Directory-Based Sharding
├── Lookup service maps key → shard
├── Most flexible approach
└── Lookup service is single point of failure
✅ Flexible mapping
❌ Lookup overhead, single point of failureSQL vs NoSQL Decision Framework
- Choose SQL when: you need ACID transactions, complex joins, structured data with relationships, data integrity is critical (banking, e-commerce)
- Choose NoSQL when: you need flexible schema, horizontal scalability, high write throughput, hierarchical data (MongoDB), key-value access patterns (Redis), wide-column (Cassandra), graph relationships (Neo4j)
- Common pattern: Use SQL for transactional data + NoSQL for caches, sessions, analytics, and search