RFC-5: BeyondFS Design (#5)

* Add proposal beyond fs design

Signed-off-by: Xuanwo <github@xuanwo.io>

* Assign number

Signed-off-by: Xuanwo <github@xuanwo.io>

* Rename

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update rfc

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update implement

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update retional

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update tracking issue

Signed-off-by: Xuanwo <github@xuanwo.io>

* code format

Signed-off-by: Xuanwo <github@xuanwo.io>
This commit is contained in:
Xuanwo 2021-07-14 15:51:36 +08:00 committed by GitHub
parent 7a3b6151b6
commit 236c009d20
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 163 additions and 10 deletions

View File

@ -20,9 +20,9 @@ vet:
@go vet ./... @go vet ./...
@echo "ok" @echo "ok"
build: tidy check build: tidy format check
@echo "build storage" @echo "build storage"
@go build -o bin/aofs ./cmd/aofs @go build -o bin/beyondfs ./cmd/beyondfs
@echo "ok" @echo "ok"
test: test:
@ -34,8 +34,3 @@ test:
tidy: tidy:
@go mod tidy @go mod tidy
@go mod verify @go mod verify
clean:
@echo "clean generated files"
@find . -type f -name 'generated.go' -delete
@echo "Done"

41
docs/rfcs/0-example.md Normal file
View File

@ -0,0 +1,41 @@
- Author: (fill me in with `name <mail>`, e.g., Xuanwo <github@xuanwo.io>)
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: [beyondstorage/beyond-fs#0](https://github.com/beyondstorage/beyond-fs/issues/0)
- Tracking Issue: [beyondstorage/beyond-fs#0](https://github.com/beyondstorage/beyond-fs/issues/0)
# RFC-0: <proposal name>
- Updates: (delete this part if not applicable)
- [GSP-20](./20-abc): Deletes something
- Updated By: (delete this part if not applicable)
- [GSP-10](./10-do-be-do-be-do): Adds something
- [GSP-1000](./1000-lalala): Deprecates this RFC
## Background
Explain why we are doing this.
Related issues and early discussions can be linked, but the RFC should try to be self-contained if possible.
## Proposal
<proposal's content>
## Rationale
<proposal's rationale content, other implementations>
Possible content:
- Design Principles
- Drawbacks
- Alternative implementations and comparison
- Possible Q&As
## Compatibility
<proposal's compatibility statement>
## Implementation
Explain what steps should be done to implement this proposal.

View File

@ -0,0 +1,117 @@
- Author: Xuanwo <github@xuanwo.io>
- Start Date: 2021-07-13
- RFC PR: [beyondstorage/beyond-fs#5](https://github.com/beyondstorage/beyond-fs/issues/5)
- Tracking Issue: [beyondstorage/beyond-fs#6](https://github.com/beyondstorage/beyond-fs/issues/6)
# RFC-5: BeyondFS Design
## Background
[FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace), a.k.a., Filesystem in Userspace has a wide range of applications in different scenarios, from lightweight data viewing to big data analytics applications for massive amounts of data. FUSE is a bridge that allows users to mount a storage service as local file systems so that the application doesn't need to refactor code.
Many filesystems implemented in FUSE already:
- [s3fs-fuse](https://github.com/s3fs-fuse/s3fs-fuse): allow mount an S3 bucket.
- [goofys](https://github.com/kahing/goofys): a high-performance, POSIX-ish Amazon S3 file system.
- [gcsfuse](https://github.com/GoogleCloudPlatform/gcsfuse/): A user-space file system for interacting with Google Cloud Storage
- [juicefs](https://github.com/juicedata/juicefs): a distributed POSIX file system built on top of Redis and S3.
- [glusterfs](https://glusterdocs-beta.readthedocs.io/en/latest/overview-concepts/fuse.html): GlusterFS is a userspace filesystem. This was a decision made by the GlusterFS developers initially as getting the modules into the Linux kernel is a very long and difficult process.
- [ntfs-3g](https://github.com/tuxera/ntfs-3g): NTFS-3G provide NTFS support via FUSE
- [Lustre](https://git.whamcloud.com/fs/lustre-release.git): a type of parallel distributed file system, generally used for large-scale cluster computing
- [moosefs](https://github.com/moosefs/moosefs): Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
- [sshfs](https://github.com/libfuse/sshfs): A network filesystem client to connect to SSH servers
- [hdfs](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html): The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
- ...
Those file systems are designed for different purposes. Today we will focus on those FUSE services that mount remote storage services locally.
To design a FUSE filesystem, there are the following decisions to be made:
### Metadata maintenance
There are mainly two kinds of design:
**No metadata maintenance**
Only cache metadata that is used locally, all data will be stored at underlying storage services.
This design is adopted by s3fs and goofys.
Under this design, the user cannot do the following things:
- Write the same path from different nodes.
- Read data that has been written by another node.
**Standalone metadata**
Maintain metadata in separate services, only use underlying storage services as data storage.
This design is adopted by juicefs.
Under this design, all file metadata are stored in separate metadata services which makes it far more quickly than store metadata in an underlying storage service either.
But if the metadata service is down or broken, the user could be failed to read data even data loss.
### POSIX Compatibility
POSIX(the Portable Operating System Interface) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. To the FUSE filesystem, we only care about the API about files and dirs. More specifically, the FUSE API that defined in [libfuse](https://github.com/libfuse/libfuse) / [macFUSE](https://osxfuse.github.io/). Most FUSEs implement only part of the interface. For example, HDFS only supports append write, s3fs doesn't support atomic renames of files or directories.
### Underlying Storage Service
Most FUSEs choose to use existing underlying storage services instead of writing their own. [ByondStorage](https://beyondstorage.io/) community has built a vendor-neutral storage library [go-storage](https://github.com/beyondstorage/go-storage) which allows operating data upon various storage services from s3, gcs, oss to ftp, google drive, ipfs.
## Proposal
I propose to build a POSIX-ish file system based on [go-storage](https://github.com/beyondstorage/go-storage) which is called BeyondFS.
**BeyondFS only cache metadata**
BeyondFS will not maintain metadata and all of them will be persisted on underlying storage services eventually.
**BeyondFS is sharable**
BeyondFS only caches metadata, but the cache could be stored in a distributed key/value store which can be shared between different nodes.
**BeyondFS is POSIX-ish**
BeyondFS will try its best to satisfy POSIX requirements but won't commit to being fully POSIX-compatible. That means BeyondFS will give up some hard to implement features and won't implement some features depends on underlying storage services ability.
BeyondFS is designed for these scenarios:
- Write Once Read Many: data only be written once and read by other clients many times.
- Rarely Random Write: data will be written at once instead of random write.
- Small Amount List: services have their index and don't depend on `readdir` to fetch file lists.
## Rationale
### Why not maintain metadata?
Maintain metadata requires an extra metadata service. The service could be down or broken.
And this enforces the user to read/write data from this service. Data that stores in underlying storage services are not readable for users anymore.
[ByondStorage](https://beyondstorage.io/) focuses on providing cross-cloud data services, intends to build a world that data flows freely. Maintaining metadata in extra services is another form of vendor lock-in, so it doesn't fit our route.
### Why not focus on a standalone machine?
Makes BeyondFS sharable doesn't mean we will sacrifice users that only have a single node. Distributed caching is a natural step forward for BeyondFS.
Users who don't care about distributed cache can use local memory mode without extra cost. For example, users use BeyondFS in thousands of nodes that all of them only write their unique path and don't read other nodes' data.
### Why not fully POSIX-compatible?
It's hard and impossible in our current design.
For example, without separate metadata services and data slicing, it's impossible to implement random write upon s3.
We choose to forgo meeting these limitations in exchange for higher throughput and concurrency performance.
## Compatibility
Say `Hello, World!` instead.
## Implementation
Firstly, we will implement a POSIX-ish file system that only caches metadata locally.
Then, we will focus on performance improvement including prefetch and cache logic.
Finally, we will extend our metadata cache logic to other key/value systems.

4
go.mod
View File

@ -1,3 +1,3 @@
module github.com/aos-dev/go-fs module github.com/beyondstorage/beyond-fs
go 1.16 go 1.15