This post is a part of a newsletter that I run: “Scamming The Coding Interview“, which is geared towards helping people ACE their coding interviews. We send a coding question on weekdays along with a system design article like this one on weekends. Do subscribe If you find this article valuable.
What will be the simplest implementation of a database that we can think of?
A simple append only file where each new entry is appended to the file seems to be a viable design.
In fact, this simple idea of maintaining immutable append-only data structures is a really powerful idea in the database world.
This idea of storing data into append only logs is the basic idea behind data structures like SSTables and LSM Trees which power many databases like Casandra, RockDB etc.
What is an LSM Tree?
If we try to define an LSM Tree, we can go something like:
An LSM Tree or “Log Structured Merge Tree” is a data structure with performance characteristics that make it very attractive to store data with high insert and update rates.
An LSM tree comprises of two or more levels of tree-like data structures. The simplest versions consist of just two levels C0 and C1.
C0 is called the memtable and resides completely in-memory. The memtable is generally implemented using a balanced tree data structure like AVL Tree and stores key-value pairs sorted by key. All the inserts and updates are written in C0.
C1 is meant to be very large in size and is stored in disk. C1 comprises of many immutable log segments accompanied by indexes like hash indexes for quick key look ups.
C1 is generally implemented using SSTables.
What are SSTables and how does it fit in the LSM tree?
In log based data structures, inserts and updates are appended into log files. The main idea is that appending to files is really efficient and quick and thus can make the database support large amount of write workload.
As these log files start to grow, it makes sense to segment them. This is where SSTables or “String Sorted Tables” come into play.
SSTables are used to store key-value pairs in immutable log segments in the order sorted by string.
This provides many key advantages:
- Lookups can be very efficient even if full hash index is not present.
- Range queries are possible.
When the size of the in-memory C0 exceeds a certain threshold, (typically a few megabytes), it’s flushed to disk as a new segment as SSTable sorted by key.
Overview of LSM Tree’s working:
Now, since we have gone through all the concepts involved in the LSM Tree we can cover it’s working with a bird’s eye view to fit all the pieces into the bigger picture.
Inserting data into LSM Tree:
- When a write comes, it is inserted in the memory-resident memtable.
- When the size of the memtable exceeds a certain threshold, it’s flushed to the disk.
- As memtable is already sorted, creating a new SSTable segment from it is efficient enough.
- Old segments are periodically merged together to save disk space and reduce fragmentation of data.
Reading data from LSM Tree:
- A given key is first looked up in the memtable.
- Then using a hash index it’s searched in one or more segments depending upon the status of the compaction.
Sample use case in a system
As the design of LSMTree indicates, these are really good candidates to process and store high write workloads like transaction logs, user events or any stream of data.
For example, LSM trees can be leveraged to store events generated by an application.
Advantages of LSM tree
- LSM Trees can handle very high write throughput
- LSM Trees can be compressed better and thus result into smaller log segment files.
Disadvantages of LSM Tree
- Compaction process sometime interfere with the performance of ongoing reads and writes.
- Read/Write throughput can be consumed by compaction, again depleting performancce.
- Each key can exist at multiple places and thus checking if a key doesn’t exist needs
all segments to be scanned.
LSM trees are based on a simple and yet powerful idea of append only logs. They work well with very high write throughput where traditional databases can become a bottleneck.
We also covered an overview of how LSM Trees work and where they can be used.