~/posts/databases/b-tree-vs-lsm-tree.md

B-Tree vs LSM-Tree: The Index Trade-off

PostgreSQL uses B-Trees. Cassandra and RocksDB use LSM-Trees. Why? The answer comes down to whether your workload is read-heavy or write-heavy.

7 min read by admin b-tree databases lsm-tree performance storage-engines
~/posts/databases/b-tree-vs-lsm-tree.md $ cat b-tree-vs-lsm-tree.md

B-Trees

B-Trees keep data sorted on disk. Reads are O(log n) with excellent cache behaviour. But every write is a random I/O to update an in-place page — painful for write-heavy workloads on spinning disks.

LSM-Trees

Log-Structured Merge trees write sequentially — always appending to a memtable that periodically flushes to immutable SSTables. Writes are fast. Reads require checking multiple SSTables (partially mitigated by bloom filters).

The Rule of Thumb

  • Read-heavy, point lookups → B-Tree (PostgreSQL, MySQL InnoDB)
  • Write-heavy, time-series, wide rows → LSM-Tree (Cassandra, RocksDB, LevelDB)