dragonfly/docs/df-share-nothing.md

6.7 KiB
Raw Permalink Blame History

Dragonfly Architecture

Dragonfly is a modern replacement for memory stores like Redis and Memcached. It scales vertically on a single instance to support millions of requests per second. It is more memory efficient, has been designed with reliability in mind, and includes a better caching design.

Threading model

Dragonfly uses a single process with a multiple-thread architecture. Each Dragonfly thread is indirectly assigned several responsibilities via fibers.

One such responsibility is handling incoming connections. Once a socket listener accepts a client connection, the connection spends its entire lifetime bound to a single thread inside a fiber. Dragonfly is written to be 100% non-blocking; it uses fibers to provide asynchronisity in each thread. One of the essential properties of asynchronisity is that a thread cannot be blocked as long as it has pending CPU tasks. Dragonfly preserves this property by wrapping each unit of execution context in a fiber; we wrap units of execution that can potentially be blocked on I/O. For example, a connection loop runs within a fiber; a function that writes a snapshot runs inside a fiber, and so on.

As a side comment - asynchronicity and parallelism are different terms. Nodejs, for example, provides asynchronous execution but is single-threaded. Similarly, each Dragonfly thread is asynchronous on its own; therefore, Dragonfly is responsive to incoming events even when it handles long-running commands like saving to disk or running Lua scripts.

Thread actors in DF

The DF in-memory database is sharded into N parts, where N is less or equal to the number of threads in the system. Each database shard is owned and accessed by a single thread. The same thread can handle TCP connections and simultaneously host a database shard. See the diagram below.


Here, our DF process spawns 4 threads, where threads 1 through 3 handle I/O (i.e., manage client connections) and threads 2 through 4 manage DB shards. Thread 2, for example, divides its CPU time between handling incoming requests and processing DB operations on the shard it owns.

So when we say that thread 1 is an I/O thread, we mean that Dragonfly can pin fibers that manage client connections to thread 1. In general, any thread can have many responsibilities that require CPU time; database management and connection handling are only two of those responsibilities.

Fibers

I suggest reading my intro post about Boost.Fibers to learn more about fibers.

By the way, I want to compliment Boost.Fibers libraryit has been exceptionally well designed: it's unintrusive, lightweight, and efficient. Moreover, its default scheduler can be overidden. In the case of helio, the I/O library that powers Dragonfly, we overrode the Boost.Fibers scheduler to support shared-nothing architecture and integrate it with the I/O polling loop.

Importantly, fibers require bottom-up support in the application layer to preserve their asynchronisity. For example, in the snippet below, a blocking write into fd won't magically allow a fiber to preempt and switch to another fiber. No, the whole thread will be blocked.

...
write(fd, buf, 1000000);

...
pthread_mutex_lock(...);

Similarly, with a pthread_mutex_lock call, the whole thread might be blocked, wasting precious CPU time.. Therefore, the Dragonfly code uses fiber-friendly primitives for I/O, communication, and coordination. These primitives are supplied by the helio and Boost.Fibers libraries.

Life of a command request

This section explains how Dragonfly handles a command in the context of shared-nothing architecture. In most architectures used today, multi-threaded servers use mutex locks to protect their data structures, but Dragonfly does not. Why is this?

Inter-thread interactions in Dragonfly occur only via passing messages from thread to thread. For example, consider the following sequence diagram of handling a SET request:

@startuml

actor       User       as A1
boundary    connection  as B1
entity      "Shard K"   as E1
A1 ->  B1 : SET KEY VAL
B1 -> E1 : SET KEY VAL / k = HASH(KEY) % N
E1 -> B1 : OK
B1 -> A1 : Response

@enduml

Here, a connection fiber resides in a thread different from one that handles the KEY entity. We use hashing to decide which shard owns which key.

Another way to think of this flow is that a connection fiber serves as a coordinator for issuing transactional commands to other threads. In this simple example, the external "SET" command requires a single message passed from the coordinator to the destination shard thread. When we think of the Dragonfly model in the context of a single command request, I prefer to use the following diagram instead of the one above.


Here, a coordinator (or connection fiber) might even reside on one of the threads that coincidently owns one of the shards. However, it iseasier to think of it as a separate entity that never directly accesses any shard data.

The coordinator serves as a virtualization layer that hides all the complexity of talking to multiple shards. It employs start-of-the-art algorithms to provide atomicity (and strict serializability) semantics for multi-key commands like "mset, mget, and blpop." It also offers strict serializability for Lua scripts and multi-command transactions.

Hiding such complexity is valuable to the end customer, but it comes with some CPU and latency costs. We believe the trade-off is worthwhile given the value that Dragonfly provides.

If you want to deep dive into Dragonfly architecture without the complexities of transactional code, it's worth checking Midi Redis, which implements a toy backend supporting PING, SET, and GET commands.

In fact, Dragonfly grew from that project; they share a common commit history.

By the way, to learn how to build even simpler TCP backends than midi-redis, helio library provides sample backends like these: echo_server and ping_iouring_server.cc. These backends reach millions of QPS on multi-core servers much like Dragonfly and midi-redis do.