Sharding & Partitioning

Distributing data across multiple databases

Database Sharding

Sharding splits data across multiple database instances. Each shard holds a subset of data. This enables horizontal scaling of storage and throughput, but adds complexity around cross-shard queries, rebalancing, and transactions.

Sharding Strategies

Sharding Strategies

  • Range-based: Shard by ranges (A-H, I-P, Q-Z) — simple but can create hotspots
  • Hash-based: Hash the shard key → distribute evenly — good distribution but range queries are hard
  • Directory-based: Lookup table maps data to shards — flexible but lookup is a bottleneck
  • Geographic: Shard by region — reduces latency for geo-specific data
⚠️

Avoid sharding as long as possible. First try: read replicas, caching, query optimization, and vertical scaling. Sharding adds significant operational complexity.