dragonfly/README.md

412 lines
14 KiB
Markdown
Raw Normal View History

2022-05-18 13:25:20 +08:00
<p align="center">
<a href="https://dragonflydb.io">
<img src="/.github/images/logo-full.svg"
width="284" border="0" alt="Dragonfly">
</a>
</p>
[![ci-tests](https://github.com/dragonflydb/dragonfly/actions/workflows/ci.yml/badge.svg)](https://github.com/dragonflydb/dragonfly/actions/workflows/ci.yml) [![Twitter URL](https://img.shields.io/twitter/follow/romanger?style=social)](https://twitter.com/romanger)
2021-12-21 17:07:29 +08:00
2022-05-27 21:15:12 +08:00
### Probably, the fastest in-memory store in the universe!
2022-05-31 15:08:21 +08:00
Dragonfly is a modern in-memory datastore, fully compatible with Redis and Memcached APIs. Dragonfly implements novel algorithms and data structures on top of a multi-threaded, shared-nothing architecture. As a result, Dragonfly reaches x25 performance
compared to Redis and supports millions of QPS on a single instance.
Dragonfly's core properties make it a cost-effective, high-performing, and easy-to-use Redis replacement.
2022-05-18 13:25:20 +08:00
## Benchmarks
2022-05-27 21:15:12 +08:00
<img src="doc/throughput.svg" width="80%" border="0"/>
Dragonfly is crossing 3.8M QPS on c6gn.16xlarge reaching x25 increase in throughput compared to Redis.
99th latency percentile of Dragonfly at its peak throughput:
| op |r6g | c6gn | c7g |
|-----|-----|------|----|
| set |0.8ms | 1ms | 1ms |
| get | 0.9ms | 0.9ms |0.8ms |
|setex| 0.9ms | 1.1ms | 1.3ms
*All benchmarks were performed using `memtier_benchmark` (see below) with number of threads tuned per server type and the instance type. `memtier` was running on a separate c6gn.16xlarge machine. For setex benchmark we used expiry-range of 500, so it would survive the end of the test.*
```bash
memtier_benchmark --ratio ... -t <threads> -c 30 -n 200000 --distinct-client-seed -d 256 \
--expiry-range=...
```
2022-05-29 22:47:34 +08:00
When running in pipeline mode `--pipeline=30`, Dragonfly reaches **10M qps** for SET and **15M qps** for GET operations.
2022-05-31 15:08:21 +08:00
### Memory efficiency
2022-05-29 22:47:34 +08:00
In the following test, we filled Dragonfly and Redis with ~5GB of data
using `debug populate 5000000 key 1024` command. Then we started sending the update traffic with `memtier` and kicked off the snapshotting with the
"bgsave" command. The following figure demonstrates clearly how both servers behave in terms of memory efficiency.
<img src="doc/bgsave_memusage.svg" width="70%" border="0"/>
Dragonfly was 30% more memory efficient than Redis at the idle state.
It also did not show any visible memory increase during the snapshot phase.
Meanwhile, Redis reached almost x3 memory increase at peak compared to Dragonfly.
Dragonfly also finished the snapshot much faster, just a few seconds after it started.
For more info about memory efficiency in Dragonfly see [dashtable doc](./doc/dashtable.md)
2022-05-18 13:25:20 +08:00
## Running the server
Dragonfly runs on linux. It uses relatively new linux specific [io-uring API](https://github.com/axboe/liburing)
for I/O, hence it requires Linux version 5.10 or later.
Debian/Bullseye, Ubuntu 20.04.4 or later fit these requirements.
2022-05-12 01:37:25 +08:00
### With docker:
2022-05-12 01:37:25 +08:00
```bash
docker pull docker.dragonflydb.io/dragonflydb/dragonfly && \
docker tag docker.dragonflydb.io/dragonflydb/dragonfly dragonfly
2021-11-30 16:11:59 +08:00
docker run --network=host --ulimit memlock=-1 --rm dragonfly
2022-05-27 21:15:12 +08:00
redis-cli PING # redis-cli can be installed with "apt install -y redis-tools"
2021-11-30 16:11:59 +08:00
```
*You need `--ulimit memlock=-1` because some Linux distros configure the default memlock limit for containers as 64m and Dragonfly requires more.*
2022-05-27 21:15:12 +08:00
### Releases
We maintain [binary releases](https://github.com/dragonflydb/dragonfly/releases) for x86 and arm64 architectures. You will need to install `libunwind8` lib to run the binaries.
### Building from source
2022-05-27 21:15:12 +08:00
You need to install dependencies in order to build on Ubuntu 20.04 or later:
```bash
git clone --recursive https://github.com/dragonflydb/dragonfly && cd dragonfly
# to install dependencies
sudo apt install ninja-build libunwind-dev libboost-fiber-dev libssl-dev \
autoconf-archive libtool cmake g++
# Configure the build
./helio/blaze.sh -release
# Build
cd build-opt && ninja dragonfly
# Run
./dragonfly --alsologtostderr
```
## Configuration
2022-05-27 21:15:12 +08:00
Dragonfly supports common redis arguments where applicable.
For example, you can run: `dragonfly --requirepass=foo --bind localhost`.
2022-05-27 21:15:12 +08:00
Dragonfly currently supports the following Redis-specific arguments:
2022-05-12 01:37:25 +08:00
* `port`
* `bind`
* `requirepass`
* `maxmemory`
2022-05-18 13:25:20 +08:00
* `dir` - by default, dragonfly docker uses `/data` folder for snapshotting.
You can use `-v` docker option to map it to your host folder.
2022-05-12 01:37:25 +08:00
* `dbfilename`
2022-05-18 13:25:20 +08:00
In addition, it has Dragonfly specific arguments options:
* `memcache_port` - to enable memcached compatible API on this port. Disabled by default.
2022-05-12 01:37:25 +08:00
* `keys_output_limit` - maximum number of returned keys in `keys` command. Default is 8192.
2022-05-18 13:25:20 +08:00
`keys` is a dangerous command. we truncate its result to avoid blowup in memory when fetching too many keys.
* `dbnum` - maximum number of supported databases for `select`.
* `cache_mode` - see [Cache](#novel-cache-design) section below.
for more options like logs management or tls support, run `dragonfly --help`.
2022-05-18 13:25:20 +08:00
## Roadmap and status
2022-05-18 13:25:20 +08:00
Currently Dragonfly supports ~130 Redis commands and all memcache commands besides `cas`.
2022-05-31 02:18:45 +08:00
We are almost on par with Redis 2.8 API. Our first milestone will be to stabilize basic
2022-05-18 13:25:20 +08:00
functionality and reach API parity with Redis 2.8 and Memcached APIs.
If you see that a command you need, is not implemented yet, please open an issue.
2022-05-18 13:25:20 +08:00
The next milestone will be implementing H/A with `redis -> dragonfly` and
`dragonfly<->dragonfly` replication.
2022-05-18 13:25:20 +08:00
For dragonfly-native replication we are planning to design a distributed log format that will
support order of magnitude higher speeds when replicating.
2022-05-27 21:15:12 +08:00
After replication and failover feature we will continue with other Redis commands from
APIs 3,4 and 5.
2022-05-18 13:25:20 +08:00
### Initial release
API 1.0
- [X] String family
- [X] SET
- [ ] SETNX
- [X] GET
- [X] DECR
- [X] INCR
- [X] DECRBY
2022-01-10 02:43:49 +08:00
- [X] GETSET
- [X] INCRBY
- [X] MGET
- [X] MSET
2022-04-02 01:43:56 +08:00
- [X] MSETNX
- [X] SUBSTR
2022-05-27 21:15:12 +08:00
- [x] Generic family
- [X] DEL
- [X] ECHO
- [X] EXISTS
- [X] EXPIRE
2022-01-10 02:43:49 +08:00
- [X] EXPIREAT
2022-04-19 16:38:23 +08:00
- [X] KEYS
- [X] PING
- [X] RENAME
- [X] RENAMENX
- [X] SELECT
2022-01-10 02:43:49 +08:00
- [X] TTL
- [X] TYPE
- [ ] SORT
- [X] Server Family
2022-04-19 16:38:23 +08:00
- [X] AUTH
- [X] QUIT
- [X] DBSIZE
- [ ] BGSAVE
- [X] SAVE
2022-01-10 02:43:49 +08:00
- [X] DEBUG
- [X] EXEC
- [X] FLUSHALL
2022-01-10 02:43:49 +08:00
- [X] FLUSHDB
- [X] INFO
2022-01-10 02:43:49 +08:00
- [X] MULTI
- [X] SHUTDOWN
- [X] LASTSAVE
- [X] SLAVEOF/REPLICAOF
- [ ] SYNC
2022-05-27 21:15:12 +08:00
- [X] Set Family
- [x] SADD
- [x] SCARD
- [X] SDIFF
- [X] SDIFFSTORE
- [X] SINTER
- [X] SINTERSTORE
- [X] SISMEMBER
- [X] SMOVE
- [X] SPOP
- [ ] SRANDMEMBER
- [X] SREM
- [X] SMEMBERS
- [X] SUNION
- [X] SUNIONSTORE
- [X] List Family
- [X] LINDEX
- [X] LLEN
- [X] LPOP
- [X] LPUSH
2022-03-16 01:52:11 +08:00
- [X] LRANGE
- [X] LREM
- [X] LSET
- [X] LTRIM
- [X] RPOP
- [X] RPOPLPUSH
- [X] RPUSH
2022-03-06 14:46:48 +08:00
- [X] SortedSet Family
- [X] ZADD
- [X] ZCARD
2022-03-30 19:25:42 +08:00
- [X] ZINCRBY
- [X] ZRANGE
- [X] ZRANGEBYSCORE
2022-03-06 14:46:48 +08:00
- [X] ZREM
2022-03-23 05:50:47 +08:00
- [X] ZREMRANGEBYSCORE
- [X] ZREVRANGE
2022-03-06 14:46:48 +08:00
- [X] ZSCORE
2022-05-27 21:15:12 +08:00
- [ ] Other
- [ ] BGREWRITEAOF
- [ ] MONITOR
- [ ] RANDOMKEY
- [ ] MOVE
API 2.0
2022-04-04 17:07:27 +08:00
- [X] List Family
- [X] BLPOP
2022-03-31 19:26:33 +08:00
- [X] BRPOP
- [ ] BRPOPLPUSH
2022-04-19 00:45:48 +08:00
- [X] LINSERT
2022-03-25 05:04:02 +08:00
- [X] LPUSHX
- [X] RPUSHX
2022-03-08 05:11:43 +08:00
- [X] String Family
- [X] SETEX
- [X] APPEND
- [X] PREPEND (dragonfly specific)
- [ ] BITCOUNT
- [ ] BITFIELD
- [ ] BITOP
- [ ] BITPOS
- [ ] GETBIT
2022-03-25 05:04:02 +08:00
- [X] GETRANGE
2022-04-19 00:45:48 +08:00
- [X] INCRBYFLOAT
2022-03-27 01:06:46 +08:00
- [X] PSETEX
- [ ] SETBIT
2022-03-27 01:06:46 +08:00
- [X] SETRANGE
2022-03-25 05:04:02 +08:00
- [X] STRLEN
2022-03-03 15:34:53 +08:00
- [X] HashSet Family
- [X] HSET
- [X] HMSET
2022-03-03 15:34:53 +08:00
- [X] HDEL
- [X] HEXISTS
- [X] HGET
- [X] HMGET
2022-03-03 15:34:53 +08:00
- [X] HLEN
2022-04-02 16:03:35 +08:00
- [X] HINCRBY
2022-04-20 03:21:54 +08:00
- [X] HINCRBYFLOAT
- [X] HGETALL
- [X] HKEYS
2022-04-02 16:03:35 +08:00
- [X] HSETNX
- [X] HVALS
2022-04-20 03:21:54 +08:00
- [X] HSCAN
- [X] PubSub family
2022-03-30 19:25:42 +08:00
- [X] PUBLISH
- [ ] PUBSUB
- [ ] PUBSUB CHANNELS
2022-03-30 19:25:42 +08:00
- [X] SUBSCRIBE
- [X] UNSUBSCRIBE
- [X] PSUBSCRIBE
- [X] PUNSUBSCRIBE
- [X] Server Family
- [ ] WATCH
- [ ] UNWATCH
2022-03-31 19:26:33 +08:00
- [X] DISCARD
- [X] CLIENT LIST/SETNAME
- [ ] CLIENT KILL/UNPAUSE/PAUSE/GETNAME/REPLY/TRACKINGINFO
- [X] COMMAND
2022-04-22 04:26:29 +08:00
- [X] COMMAND COUNT
- [ ] COMMAND GETKEYS/INFO
- [ ] CONFIG GET/REWRITE/SET/RESETSTAT
- [ ] MIGRATE
- [ ] ROLE
- [ ] SLOWLOG
- [ ] PSYNC
- [ ] TIME
- [ ] LATENCY...
- [X] Generic Family
- [X] SCAN
- [X] PEXPIREAT
- [ ] PEXPIRE
- [ ] DUMP
- [X] EVAL
- [X] EVALSHA
- [ ] OBJECT
- [ ] PERSIST
2022-03-25 05:04:02 +08:00
- [X] PTTL
- [ ] RESTORE
2022-04-30 21:58:36 +08:00
- [X] SCRIPT LOAD/EXISTS
- [ ] SCRIPT DEBUG/KILL/FLUSH
- [X] Set Family
- [X] SSCAN
- [X] Sorted Set Family
- [X] ZCOUNT
2022-05-11 02:48:24 +08:00
- [X] ZINTERSTORE
- [X] ZLEXCOUNT
- [X] ZRANGEBYLEX
- [X] ZRANK
- [X] ZREMRANGEBYLEX
- [X] ZREMRANGEBYRANK
- [X] ZREVRANGEBYSCORE
- [X] ZREVRANK
2022-05-11 02:48:24 +08:00
- [X] ZUNIONSTORE
- [X] ZSCAN
- [ ] HYPERLOGLOG Family
- [ ] PFADD
- [ ] PFCOUNT
- [ ] PFMERGE
Memcache API
2022-04-04 17:07:27 +08:00
- [X] set
- [X] get
- [X] replace
- [X] add
- [X] stats (partial)
- [x] append
- [x] prepend
- [x] delete
- [x] flush_all
- [x] incr
- [x] decr
- [x] version
- [x] quit
2022-05-27 21:15:12 +08:00
Some commands were implemented as decorators along the way:
2022-04-04 17:07:27 +08:00
2022-05-18 13:25:20 +08:00
- [X] ROLE (2.8) decorator as master.
- [X] UNLINK (4.0) decorator for DEL command
- [X] BGSAVE (decorator for save)
- [X] FUNCTION FLUSH (does nothing)
2022-03-06 03:35:49 +08:00
2022-05-18 13:25:20 +08:00
### Milestone - H/A
Implement leader/follower replication (PSYNC/REPLICAOF/...).
2022-03-06 03:35:49 +08:00
2022-05-18 13:25:20 +08:00
### Milestone - "Maturity"
APIs 3,4,5 without cluster support, without modules and without memory introspection commands. Also
without geo commands and without support for keyspace notifications, without streams.
Probably design config support. Overall - few dozens commands...
Probably implement cluster-API decorators to allow cluster-configured clients to connect to a
single instance.
2022-04-05 13:36:00 +08:00
2022-05-18 13:25:20 +08:00
### Next milestones will be determined along the way.
2022-05-18 13:25:20 +08:00
## Design decisions
2022-05-18 13:25:20 +08:00
### Novel cache design
Dragonfly has a single unified adaptive caching algorithm that is very simple and memory efficient.
You can enable caching mode by passing `--cache_mode=true` flag. Once this mode
is on, Dragonfly will evict items least likely to be stumbled upon in the future but only when
it is near maxmemory limit.
2022-03-06 03:35:49 +08:00
### Expiration deadlines with relative accuracy
Expiration ranges are limited to ~4 years. Moreover, expiration deadlines
2022-03-06 03:35:49 +08:00
with millisecond precision (PEXPIRE/PSETEX etc) will be rounded to closest second
**for deadlines greater than 134217727ms (approximately 37 hours)**.
Such rounding has less than 0.001% error which I hope is acceptable for large ranges.
2022-03-24 17:30:22 +08:00
If it breaks your use-cases - talk to me or open an issue and explain your case.
2022-04-04 17:07:27 +08:00
For more detailed differences between this and Redis implementations [see here](doc/differences.md).
### Native Http console and Prometheus compatible metrics
2022-05-18 13:25:20 +08:00
By default Dragonfly allows http access via its main TCP port (6379). That's right, you
can connect to Dragonfly via Redis protocol and via HTTP protocol - the server recognizes
the protocol automatically during the connection initiation. Go ahead and try it with your browser.
Right now it does not have much info but in the future we are planning to add there useful
debugging and management info. If you go to `:6379/metrics` url you will see some prometheus
compatible metrics.
2022-05-18 13:25:20 +08:00
Important! Http console is meant to be accessed within a safe network.
If you expose Dragonfly's TCP port externally, it is advised to disable the console
with `--http_admin_console=false` or `--nohttp_admin_console`.
## Background
Dragonfly started as an experiment to see how an in-memory datastore could look like if it was designed in 2022. Based on lessons learned from our experience as users of memory stores and as engineers who worked for cloud companies, we knew that we need to preserve two key properties for Dragonfly: a) to provide atomicity guarantees for all its operations, and b) to guarantee low, sub-millisecond latency over very high throughput.
Our first challenge was how to fully utilize CPU, memory, and i/o resources using servers that are available today in public clouds. To solve this, we used [shared-nothing architecture](https://en.wikipedia.org/wiki/Shared-nothing_architecture), which allows us to partition the keyspace of the memory store between threads, so that each thread would manage its own slice of dictionary data. We call these slices - shards. The library that powers thread and I/O management for shared-nothing architecture is open-sourced [here](https://github.com/romange/helio).
To provide atomicity guarantees for multi-key operations, we used the advancements from recent academic research. We chose the paper ["VLL: a lock manager redesign for main memory database systems”](https://www.cs.umd.edu/~abadi/papers/vldbj-vll.pdf) to develop the transactional framework for Dragonfly. The choice of shared-nothing architecture and VLL allowed us to compose atomic multi-key operations without using mutexes or spinlocks. This was a major milestone for our PoC and its performance stood out from other commercial and open-source solutions.
Our second challenge was to engineer more efficient data structures for the new store. To achieve this goal, we based our core hashtable structure on paper ["Dash: Scalable Hashing on Persistent Memory"](https://arxiv.org/pdf/2003.07302.pdf). The paper itself is centered around persistent memory domain and is not directly related to main-memory stores.
Nevertheless, its very much applicable for our problem. It suggested a hashtable design that allowed us to maintain two special properties that are present in the Redis dictionary: a) its incremental hashing ability during datastore growth b) its ability to traverse the dictionary under changes using a stateless scan operation. Besides these 2 properties,
Dash is much more efficient in CPU and memory. By leveraging Dash's design, we were able to innovate further with the following features:
* Efficient record expiry for TTL records.
* A novel cache eviction algorithm that achieves higher hit rates than other caching strategies like LRU and LFU with **zero memory overhead**.
* A novel **fork-less** snapshotting algorithm.
After we built the foundation for Dragonfly and [we were happy with its performance](#benchmarks),
we went on to implement the Redis and Memcached functionality. By now, we have implemented ~130 Redis commands (equivalent to v2.8) and 13 Memcached commands.
And finally, <br>
<em>Our mission is to build a well-designed, ultra-fast, cost-efficient in-memory datastore for cloud workloads that takes advantage of the latest hardware advancements. We intend to address the pain points of current solutions while preserving their product APIs and propositions.
</em>