SeaweedFS: A Fast Distributed Storage System for Blobs, Objects, Files

SeaweedFS

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.


Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Quick Start with Single Binary

  • Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe
  • Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -mserver="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Quick Start SeaweedFS S3 on AWS

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

  1. to store billions of files!
  2. to serve the files fast!

SeaweedFS started as an Object Store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem

On top of the object store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, YDB, etc.

For any distributed key value stores, the large values can be offloaded to SeaweedFS. With the fast access speed and linearly scalable capacity, SeaweedFS can work as a distributed Key-Large-Value store.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and Cheaper than direct cloud storage!

Back to TOC

Additional Features

  • Can choose no replication or different replication levels, rack and data center aware.
  • Automatic master servers failover - no single point of failure (SPOF).
  • Automatic Gzip compression depending on file MIME type.
  • Automatic compaction to reclaim disk space after deletion or update.
  • Automatic entry TTL expiration.
  • Any server with some disk spaces can add to the total storage space.
  • Adding/Removing servers does not cause any data re-balancing unless triggered by admin commands.
  • Optional picture resizing.
  • Support ETag, Accept-Range, Last-Modified, etc.
  • Support in-memory/leveldb/readonly mode tuning for memory/performance balance.
  • Support rebalancing the writable and readonly volumes.
  • Customizable Multiple Storage Tiers: Customizable storage disk types to balance performance and cost.
  • Transparent cloud integration: unlimited capacity via tiered cloud storage for warm data.
  • Erasure Coding for warm storage Rack-Aware 10.4 erasure coding reduces storage cost and increases availability.

Back to TOC

Filer Features

Kubernetes

Back to TOC

Example: Using Seaweed Object Store

By default, the master node runs on port 9333, and the volume nodes run on port 8080. Let's start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. We'll use localhost as an example.

SeaweedFS uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.

Start Master Server

> ./weed master

Start Volume Servers

> weed volume -dir="/tmp/data1" -max=5  -mserver="localhost:9333" -port=8080 &
> weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &

Write File

To upload a file: first, send a HTTP POST, PUT, or GET request to /dir/assign to get an fid and a volume server URL:

> curl http://localhost:9333/dir/assign
{"count":1,"fid":"3,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080"}

Second, to store the file content, send a HTTP multi-part POST request to url + '/' + fid from the response:

> curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{"name":"myphoto.jpg","size":43234,"eTag":"1cc0118e"}

To update, send another POST request with updated file content.

For deletion, send an HTTP DELETE request to the same url + '/' + fid URL:

> curl -X DELETE http://127.0.0.1:8080/3,01637037d6

Save File Id

Now, you can save the fid, 3,01637037d6 in this case, to a database field.

The number 3 at the start represents a volume id. After the comma, it's one file key, 01, and a file cookie, 637037d6.

The volume id is an unsigned 32-bit integer. The file key is an unsigned 64-bit integer. The file cookie is an unsigned 32-bit integer, used to prevent URL guessing.

The file key and file cookie are both coded in hex. You can store the <volume id, file key, file cookie> tuple in your own format, or simply store the fid as a string.

If stored as a string, in theory, you would need 8+1+16+8=33 bytes. A char(33) would be enough, if not more than enough, since most uses will not need 2^32 volumes.

If space is really a concern, you can store the file id in your own format. You would need one 4-byte integer for volume id, 8-byte long number for file key, and a 4-byte integer for the file cookie. So 16 bytes are more than enough.

Read File

Here is an example of how to render the URL.

First look up the volume server's URLs by the file's volumeId:

> curl http://localhost:9333/dir/lookup?volumeId=3
{"volumeId":"3","locations":[{"publicUrl":"localhost:8080","url":"localhost:8080"}]}

Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.

Now you can take the public URL, render the URL or directly read from the volume server via URL:

 http://localhost:8080/3,01637037d6.jpg

Notice we add a file extension ".jpg" here. It's optional and just one way for the client to specify the file content type.

If you want a nicer URL, you can use one of these alternative URL formats:

 http://localhost:8080/3/01637037d6/my_preferred_name.jpg
 http://localhost:8080/3/01637037d6.jpg
 http://localhost:8080/3,01637037d6.jpg
 http://localhost:8080/3/01637037d6
 http://localhost:8080/3,01637037d6

If you want to get a scaled version of an image, you can add some params:

http://localhost:8080/3/01637037d6.jpg?height=200&width=200
http://localhost:8080/3/01637037d6.jpg?height=200&width=200&mode=fit
http://localhost:8080/3/01637037d6.jpg?height=200&width=200&mode=fill

Rack-Aware and Data Center-Aware Replication

SeaweedFS applies the replication strategy at a volume level. So, when you are getting a file id, you can specify the replication strategy. For example:

curl http://localhost:9333/dir/assign?replication=001

The replication parameter options are:

000: no replication
001: replicate once on the same rack
010: replicate once on a different rack, but same data center
100: replicate once on a different data center
200: replicate twice on two different data center
110: replicate once on a different rack, and once on a different data center

More details about replication can be found on the wiki.

You can also set the default replication strategy when starting the master server.

Allocate File Key on Specific Data Center

Volume servers can be started with a specific data center name:

 weed volume -dir=/tmp/1 -port=8080 -dataCenter=dc1
 weed volume -dir=/tmp/2 -port=8081 -dataCenter=dc2

When requesting a file key, an optional "dataCenter" parameter can limit the assigned volume to the specific data center. For example, this specifies that the assigned volume should be limited to 'dc1':

 http://localhost:9333/dir/assign?dataCenter=dc1

Other Features

Back to TOC

Object Store Architecture

Usually distributed file systems split each file into chunks, a central master keeps a mapping of filenames, chunk indices to chunk handles, and also which chunks each chunk server has.

The main drawback is that the central master can't handle many small files efficiently, and since all read requests need to go through the chunk master, so it might not scale well for many concurrent users.

Instead of managing chunks, SeaweedFS manages data volumes in the master server. Each data volume is 32GB in size, and can hold a lot of files. And each storage node can have many data volumes. So the master node only needs to store the metadata about the volumes, which is a fairly small amount of data and is generally stable.

The actual file metadata is stored in each volume on volume servers. Since each volume server only manages metadata of files on its own disk, with only 16 bytes for each file, all file access can read file metadata just from memory and only needs one disk operation to actually read file data.

For comparison, consider that an xfs inode structure in Linux is 536 bytes.

Master Server and Volume Server

The architecture is fairly simple. The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication.

All volumes are managed by a master server. The master server contains the volume id to volume server mapping. This is fairly static information, and can be easily cached.

On each write request, the master server also generates a file key, which is a growing 64-bit unsigned integer. Since write requests are not generally as frequent as read requests, one master server should be able to handle the concurrency well.

Write and Read files

When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node URL) for the file. The client then contacts the volume node and POSTs the file content.

When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node URL, volume node public URL), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.

Please see the example for details on the write-read process.

Storage Size

In the current implementation, each volume can hold 32 gibibytes (32GiB or 8x2^32 bytes). This is because we align content to 8 bytes. We can easily increase this to 64GiB, or 128GiB, or more, by changing 2 lines of code, at the cost of some wasted padding space due to alignment.

There can be 4 gibibytes (4GiB or 2^32 bytes) of volumes. So the total system size is 8 x 4GiB x 4GiB which is 128 exbibytes (128EiB or 2^67 bytes).

Each individual file size is limited to the volume size.

Saving memory

All file meta information stored on an volume server is readable from memory without disk access. Each file takes just a 16-byte map entry of <64bit key, 32bit offset, 32bit size>. Of course, each map entry has its own space cost for the map. But usually the disk space runs out before the memory does.

Tiered Storage to the cloud

The local volume servers are much faster, while cloud storages have elastic capacity and are actually more cost-efficient if not accessed often (usually free to upload, but relatively costly to access). With the append-only structure and O(1) access time, SeaweedFS can take advantage of both local and cloud storage by offloading the warm data to the cloud.

Usually hot data are fresh and warm data are old. SeaweedFS puts the newly created volumes on local servers, and optionally upload the older volumes on the cloud. If the older data are accessed less often, this literally gives you unlimited capacity with limited local servers, and still fast for new data.

With the O(1) access time, the network latency cost is kept at minimum.

If the hot/warm data is split as 20/80, with 20 servers, you can achieve storage capacity of 100 servers. That's a cost saving of 80%! Or you can repurpose the 80 servers to store new data also, and get 5X storage throughput.

Back to TOC

Compared to Other File Systems

Most other distributed file systems seem more complicated than necessary.

SeaweedFS is meant to be fast and simple, in both setup and operation. If you do not understand how it works when you reach here, we've failed! Please raise an issue with any questions or update this file with clarifications.

SeaweedFS is constantly moving forward. Same with other systems. These comparisons can be outdated quickly. Please help to keep them updated.

Back to TOC

Compared to HDFS

HDFS uses the chunk approach for each file, and is ideal for storing large files.

SeaweedFS is ideal for serving relatively smaller files quickly and concurrently.

SeaweedFS can also store extra large files by splitting them into manageable data chunks, and store the file ids of the data chunks into a meta chunk. This is managed by "weed upload/download" tool, and the weed master or volume servers are agnostic about it.

Back to TOC

Compared to GlusterFS, Ceph

The architectures are mostly the same. SeaweedFS aims to store and read files fast, with a simple and flat architecture. The main differences are

  • SeaweedFS optimizes for small files, ensuring O(1) disk seek operation, and can also handle large files.
  • SeaweedFS statically assigns a volume id for a file. Locating file content becomes just a lookup of the volume id, which can be easily cached.
  • SeaweedFS Filer metadata store can be any well-known and proven data store, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd, YDB etc, and is easy to customize.
  • SeaweedFS Volume server also communicates directly with clients via HTTP, supporting range queries, direct uploads, etc.
SystemFile MetadataFile Content ReadPOSIXREST APIOptimized for large number of small files
SeaweedFSlookup volume id, cacheableO(1) disk seek YesYes
SeaweedFS FilerLinearly Scalable, CustomizableO(1) disk seekFUSEYesYes
GlusterFShashing FUSE, NFS  
Cephhashing + rules FUSEYes 
MooseFSin memory FUSE No
MinIOseparate meta file for each file  YesNo

Back to TOC

Compared to GlusterFS

GlusterFS stores files, both directories and content, in configurable volumes called "bricks".

GlusterFS hashes the path and filename into ids, and assigned to virtual volumes, and then mapped to "bricks".

Back to TOC

Compared to MooseFS

MooseFS chooses to neglect small file issue. From moosefs 3.0 manual, "even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header", because it "was initially designed for keeping large amounts (like several thousands) of very big files"

MooseFS Master Server keeps all meta data in memory. Same issue as HDFS namenode.

Back to TOC

Compared to Ceph

Ceph can be setup similar to SeaweedFS as a key->blob store. It is much more complicated, with the need to support layers on top of it. Here is a more detailed comparison

SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.

Ceph, like SeaweedFS, is based on the object store RADOS. Ceph is rather complicated with mixed reviews.

Ceph uses CRUSH hashing to automatically manage data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes is also as simple as it can be.

SeaweedFS is optimized for small files. Small files are stored as one continuous block of content, with at most 8 unused bytes between files. Small file access is O(1) disk read.

SeaweedFS Filer uses off-the-shelf stores, such as MySql, Postgres, Sqlite, Mongodb, Redis, Elastic Search, Cassandra, HBase, MemSql, TiDB, CockroachCB, Etcd, YDB, to manage file directories. These stores are proven, scalable, and easier to manage.

SeaweedFScomparable to Cephadvantage
MasterMDSsimpler
VolumeOSDoptimized for small files
FilerCeph FSlinearly scalable, Customizable, O(1) or O(logN)

Back to TOC

Compared to MinIO

MinIO follows AWS S3 closely and is ideal for testing for S3 API. It has good UI, policies, versionings, etc. SeaweedFS is trying to catch up here. It is also possible to put MinIO as a gateway in front of SeaweedFS later.

MinIO metadata are in simple files. Each file write will incur extra writes to corresponding meta file.

MinIO does not have optimization for lots of small files. The files are simply stored as is to local disks. Plus the extra meta file and shards for erasure coding, it only amplifies the LOSF problem.

MinIO has multiple disk IO to read one file. SeaweedFS has O(1) disk reads, even for erasure coded files.

MinIO has full-time erasure coding. SeaweedFS uses replication on hot data for faster speed and optionally applies erasure coding on warm data.

MinIO does not have POSIX-like API support.

MinIO has specific requirements on storage layout. It is not flexible to adjust capacity. In SeaweedFS, just start one volume server pointing to the master. That's all.

Dev Plan

  • More tools and documentation, on how to manage and scale the system.
  • Read and write stream data.
  • Support structured data.

This is a super exciting project! And we need helpers and support!

Back to TOC

Installation Guide

Installation guide for users who are not familiar with golang

Step 1: install go on your machine and setup the environment by following the instructions at:

https://golang.org/doc/install

make sure to define your $GOPATH

Step 2: checkout this repo:

git clone https://github.com/seaweedfs/seaweedfs.git

Step 3: download, compile, and install the project by executing the following command

cd seaweedfs/weed && make install

Once this is done, you will find the executable "weed" in your $GOPATH/bin directory

Back to TOC

Disk Related Topics

Hard Drive Performance

When testing read performance on SeaweedFS, it basically becomes a performance test of your hard drive's random read speed. Hard drives usually get 100MB/s~200MB/s.

Solid State Disk

To modify or delete small files, SSD must delete a whole block at a time, and move content in existing blocks to a new block. SSD is fast when brand new, but will get fragmented over time and you have to garbage collect, compacting blocks. SeaweedFS is friendly to SSD since it is append-only. Deletion and compaction are done on volume level in the background, not slowing reading and not causing fragmentation.

Back to TOC

Benchmark

My Own Unscientific Single Machine Results on Mac Book with Solid State Disk, CPU: 1 Intel Core i7 2.6GHz.

Write 1 million 1KB file:

Concurrency Level:      16
Time taken for tests:   66.753 seconds
Completed requests:      1048576
Failed requests:        0
Total transferred:      1106789009 bytes
Requests per second:    15708.23 [#/sec]
Transfer rate:          16191.69 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.3      1.0       84.3      0.9

Percentage of the requests served within a certain time (ms)
   50%      0.8 ms
   66%      1.0 ms
   75%      1.1 ms
   80%      1.2 ms
   90%      1.4 ms
   95%      1.7 ms
   98%      2.1 ms
   99%      2.6 ms
  100%     84.3 ms

Randomly read 1 million files:

Concurrency Level:      16
Time taken for tests:   22.301 seconds
Completed requests:      1048576
Failed requests:        0
Total transferred:      1106812873 bytes
Requests per second:    47019.38 [#/sec]
Transfer rate:          48467.57 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.0      0.3       54.1      0.2

Percentage of the requests served within a certain time (ms)
   50%      0.3 ms
   90%      0.4 ms
   98%      0.6 ms
   99%      0.7 ms
  100%     54.1 ms

Back to TOC

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The text of this page is available for modification and reuse under the terms of the Creative Commons Attribution-Sharealike 3.0 Unported License and the GNU Free Documentation License (unversioned, with no invariant sections, front-cover texts, or back-cover texts).

Back to TOC

Stargazers over time

Stargazers over time



Sponsor SeaweedFS via Patreon


Download Details:

Author: Seaweedfs
Source Code: https://github.com/seaweedfs/seaweedfs 
License: Apache-2.0 license

#go #golang #kubernetes #s3 

SeaweedFS: A Fast Distributed Storage System for Blobs, Objects, Files

A High-performance, POSIX-ish Amazon S3 File System Written in Go

Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go

Overview

Goofys allows you to mount an S3 bucket as a filey system.

It's a Filey System instead of a File System because goofys strives for performance first and POSIX second. Particularly things that are difficult to support on S3 or would translate into more than one round-trip would either fail (random writes) or faked (no per-file permission). Goofys does not have an on disk data cache (checkout catfs), and consistency model is close-to-open.

Installation

On Linux, install via pre-built binaries. You may also need to install fuse-utils first.

On macOS, install via Homebrew:

$ brew cask install osxfuse
$ brew install goofys
  • Or build from source with Go 1.10 or later:
$ export GOPATH=$HOME/work
$ go get github.com/kahing/goofys
$ go install github.com/kahing/goofys

Usage

$ cat ~/.aws/credentials
[default]
aws_access_key_id = AKID1234567890
aws_secret_access_key = MY-SECRET-KEY
$ $GOPATH/bin/goofys <bucket> <mountpoint>
$ $GOPATH/bin/goofys <bucket:prefix> <mountpoint> # if you only want to mount objects under a prefix

Users can also configure credentials via the AWS CLI or the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

To mount an S3 bucket on startup, make sure the credential is configured for root, and can add this to /etc/fstab:

goofys#bucket   /mnt/mountpoint        fuse     _netdev,allow_other,--file-mode=0666,--dir-mode=0777    0       0

See also: Instruction for Azure Blob Storage, Azure Data Lake Gen1, and Azure Data Lake Gen2.

Got more questions? Check out questions other people asked

Benchmark

Using --stat-cache-ttl 1s --type-cache-ttl 1s for goofys -ostat_cache_expire=1 for s3fs to simulate cold runs. Detail for the benchmark can be found in bench.sh. Raw data is available as well. The test was run on an EC2 m5.4xlarge in us-west-2a connected to a bucket in us-west-2. Units are seconds.

Benchmark result

To run the benchmark, configure EC2's instance role to be able to write to $TESTBUCKET, and then do:

$ sudo docker run -e BUCKET=$TESTBUCKET -e CACHE=false --rm --privileged --net=host -v /tmp/cache:/tmp/cache kahing/goofys-bench
# result will be written to $TESTBUCKET

See also: cached benchmark result and result on Azure.

Current Status

goofys has been tested under Linux and macOS.

List of non-POSIX behaviors/limitations:

  • only sequential writes supported
  • does not store file mode/owner/group
    • use --(dir|file)-mode or --(uid|gid) options
  • does not support symlink or hardlink
  • ctime, atime is always the same as mtime
  • cannot rename directories with more than 1000 children
  • unlink returns success even if file is not present
  • fsync is ignored, files are only flushed on close

In addition to the items above, the following are supportable but not yet implemented:

  • creating files larger than 1TB

Compatibility with non-AWS S3

goofys has been tested with the following non-AWS S3 providers:

  • Amplidata / WD ActiveScale
  • Ceph (ex: Digital Ocean Spaces, DreamObjects, gridscale)
  • EdgeFS
  • EMC Atmos
  • Google Cloud Storage
  • Minio (limited)
  • OpenStack Swift
  • S3Proxy
  • Scaleway
  • Wasabi

Additionally, goofys also works with the following non-S3 object stores:

  • Azure Blob Storage
  • Azure Data Lake Gen1
  • Azure Data Lake Gen2

References

Download Details:

Author: Kahing
Source Code: https://github.com/kahing/goofys 
License: Apache-2.0 license

#go #golang #filesystem #aws #s3

A High-performance, POSIX-ish Amazon S3 File System Written in Go

JuiceFS: A Distributed POSIX File System Built on top Of Redis and S3

JuiceFS is a high-performance POSIX file system released under Apache License 2.0, particularly designed for the cloud-native environment. The data, stored via JuiceFS, will be persisted in object storage (e.g. Amazon S3), and the corresponding metadata can be persisted in various database engines such as Redis, MySQL, and TiKV based on the scenarios and requirements.

With JuiceFS, massive cloud storage can be directly connected to big data, machine learning, artificial intelligence, and various application platforms in production environments. Without modifying code, the massive cloud storage can be used as efficiently as local storage.

Highlighted Features

  1. Fully POSIX-compatible: Use as a local file system, seamlessly docking with existing applications without breaking business workflow.
  2. Fully Hadoop-compatible: JuiceFS' Hadoop Java SDK is compatible with Hadoop 2.x and Hadoop 3.x as well as a variety of components in the Hadoop ecosystems.
  3. S3-compatible: JuiceFS' S3 Gateway provides an S3-compatible interface.
  4. Cloud Native: A Kubernetes CSI Driver is provided for easily using JuiceFS in Kubernetes.
  5. Shareable: JuiceFS is a shared file storage that can be read and written by thousands of clients.
  6. Strong Consistency: The confirmed modification will be immediately visible on all the servers mounted with the same file system.
  7. Outstanding Performance: The latency can be as low as a few milliseconds, and the throughput can be expanded nearly unlimitedly (depending on the size of the object storage). Test results
  8. Data Encryption: Supports data encryption in transit and at rest (please refer to the guide for more information).
  9. Global File Locks: JuiceFS supports both BSD locks (flock) and POSIX record locks (fcntl).
  10. Data Compression: JuiceFS supports LZ4 or Zstandard to compress all your data.

Architecture

JuiceFS consists of three parts:

  1. JuiceFS Client: Coordinates object storage and metadata storage engine as well as implementation of file system interfaces such as POSIX, Hadoop, Kubernetes, and S3 gateway.
  2. Data Storage: Stores data, with supports of a variety of data storage media, e.g., local disk, public or private cloud object storage, and HDFS.
  3. Metadata Engine: Stores the corresponding metadata that contains information of file name, file size, permission group, creation and modification time and directory structure, etc., with supports of different metadata engines, e.g., Redis, MySQL, SQLite and TiKV.

JuiceFS Architecture

JuiceFS can store the metadata of file system on Redis, which is a fast, open-source, in-memory key-value data storage, particularly suitable for storing metadata; meanwhile, all the data will be stored in object storage through JuiceFS client. Learn more

JuiceFS Storage Format

Each file stored in JuiceFS is split into "Chunk" s at a fixed size with the default upper limit of 64 MiB. Each Chunk is composed of one or more "Slice"(s), and the length of the slice varies depending on how the file is written. Each slice is composed of size-fixed "Block" s, which are 4 MiB by default. These blocks will be stored in object storage in the end; at the same time, the metadata information of the file and its Chunks, Slices, and Blocks will be stored in metadata engines via JuiceFS. Learn more

How JuiceFS stores your files

When using JuiceFS, files will eventually be split into Chunks, Slices and Blocks and stored in object storage. Therefore, the source files stored in JuiceFS cannot be found in the file browser of the object storage platform; instead, there are only a chunks directory and a bunch of digitally numbered directories and files in the bucket. Don't panic! This is just the secret of the high-performance operation of JuiceFS!

Getting Started

Before you begin, make sure you have:

  1. Redis database for metadata storage
  2. Object storage for storing data blocks
  3. JuiceFS Client downloaded and installed

Please refer to Quick Start Guide to start using JuiceFS right away!

Command Reference

Check out all the command line options in command reference.

Kubernetes

It is also very easy to use JuiceFS on Kubernetes. Please find more information here.

Hadoop Java SDK

If you wanna use JuiceFS in Hadoop, check Hadoop Java SDK.

Advanced Topics

Please refer to JuiceFS Document Center for more information.

POSIX Compatibility

JuiceFS has passed all of the compatibility tests (8813 in total) in the latest pjdfstest .

All tests successful.

Test Summary Report
-------------------
/root/soft/pjdfstest/tests/chown/00.t          (Wstat: 0 Tests: 1323 Failed: 0)
  TODO passed:   693, 697, 708-709, 714-715, 729, 733
Files=235, Tests=8813, 233 wallclock secs ( 2.77 usr  0.38 sys +  2.57 cusr  3.93 csys =  9.65 CPU)
Result: PASS

Aside from the POSIX features covered by pjdfstest, JuiceFS also provides:

  • Close-to-open consistency. Once a file is written and closed, it is guaranteed to view the written data in the following open and read. Within the same mount point, all the written data can be read immediately.
  • Rename and all other metadata operations are atomic, which are guaranteed by Redis transaction.
  • Opened files remain accessible after unlink from same mount point.
  • Mmap (tested with FSx).
  • Fallocate with punch hole support.
  • Extended attributes (xattr).
  • BSD locks (flock).
  • POSIX record locks (fcntl).

Performance Benchmark

Basic benchmark

JuiceFS provides a subcommand that can run a few basic benchmarks to help you understand how it works in your environment:

JuiceFS Bench

Throughput

A sequential read/write benchmark has also been performed on JuiceFS, EFS and S3FS by fio.

Sequential Read Write Benchmark

Above result figure shows that JuiceFS can provide 10X more throughput than the other two (see more details).

Metadata IOPS

A simple mdtest benchmark has been performed on JuiceFS, EFS and S3FS by mdtest.

Metadata Benchmark

The result shows that JuiceFS can provide significantly more metadata IOPS than the other two (see more details).

Analyze performance

There is a virtual file called .accesslog in the root of JuiceFS to show all the details of file system operations and the time they take, for example:

$ cat /jfs/.accesslog
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
2021.01.15 08:26:11.003473 [uid:0,gid:0,pid:4403] write (17675,198,997439): OK <0.000014>
2021.01.15 08:26:11.003616 [uid:0,gid:0,pid:4403] write (17666,390,951582): OK <0.000006>

The last number on each line is the time (in seconds) that the current operation takes. You can directly use this to debug and analyze performance issues, or try ./juicefs profile /jfs to monitor real time statistics. Please run ./juicefs profile -h or refer to here to learn more about this subcommand.

Supported Object Storage

  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • Alibaba Cloud Object Storage Service (OSS)
  • Tencent Cloud Object Storage (COS)
  • QingStor Object Storage
  • Ceph RGW
  • MinIO
  • Local disk
  • Redis

JuiceFS supports almost all object storage services. Learn more.

Who is using

JuiceFS is production ready and used by thousands of machines in production. A list of users has been assembled and documented here. In addition JuiceFS has several collaborative projects that integrate with other open source projects, which we have documented here. If you are also using JuiceFS, please feel free to let us know, and you are welcome to share your specific experience with everyone.

The storage format is stable, will be supported by all future releases.

Roadmap

  • Support FoundationDB as metadata engine
  • Directory quotas
  • User and group quotas
  • Snapshot
  • Write once read many (WORM)

Reporting Issues

We use GitHub Issues to track community reported issues. You can also contact the community for any questions.

Contributing

Thank you for your contribution! Please refer to the CONTRIBUTING.md for more information.

Community

Welcome to join the Discussions and the Slack channel to connect with JuiceFS team members and other users.

Usage Tracking

JuiceFS collects anonymous usage data by default to help us better understand how the community is using JuiceFS. Only core metrics (e.g. version number) will be reported, and user data and any other sensitive data will not be included. The related code can be viwed here.

You could also disable reporting easily by command line option --no-usage-report:

juicefs mount --no-usage-report

Credits

The design of JuiceFS was inspired by Google File System, HDFS and MooseFS. Thanks for their great work!

FAQ

Why doesn't JuiceFS support XXX object storage?

JuiceFS supports many object storage. Please check out this list first. If the object storage you want to use is compatible with S3, you could treat it as S3. Otherwise, try reporting issue.

Can I use Redis Cluster as metadata engine?

Yes. Since v1.0.0 Beta3 JuiceFS supports the use of Redis Cluster as the metadata engine, but it should be noted that Redis Cluster requires that the keys of all operations in a transaction must be in the same hash slot, so a JuiceFS file system can only use one hash slot.

See "Redis Best Practices" for more information.

What's the difference between JuiceFS and XXX?

See "Comparison with Others" for more information.

For more FAQs, please see the full list.

Stargazers over time

Stargazers over time

📺 Video: What is JuiceFS?

📖 Document: Quick Start Guide

Download Details:

Author: juicedata
Source Code: https://github.com/juicedata/juicefs 
License: Apache-2.0 license

#go #golang #redis #s3 

JuiceFS: A Distributed POSIX File System Built on top Of Redis and S3
Gordon  Taylor

Gordon Taylor

1657055940

S3leveldown: An Implementation Of LevelDOWN That Uses Amazon S3

S3LevelDOWN

An abstract-leveldown compliant implementation of LevelDOWN that uses Amazon S3 as a backing store. S3 is actually a giant key-value store on the cloud, even though it is marketed as a file store. Use this database with the LevelUP API.

To use this optimally, please read "Performance considerations" and "Warning about concurrency" sections below.

You could also use this as an alternative API to read/write S3. The API simpler to use when compared to the AWS SDK!

Installation

Install s3leveldown and peer dependencies levelup and aws-sdk with yarn or npm.

$ npm install s3leveldown aws-sdk levelup

Documentation

See the LevelUP API for high level usage.

s3leveldown(location [, s3])

Constructor of s3leveldown backing store. Use with levelup.

Arguments:

  • location name of the S3 bucket with optional sub-folder. Example mybucket or mybucket/folder.
  • s3 Optional S3 client from aws-sdk. A default client will be used if not specified.

Example

Please refer to the AWS SDK docs to set up your API credentials before using.

Using Promises

(async () => {
  // create DB
  const db = levelup(s3leveldown('mybucket'));

  // put items
  await db.batch()
    .put('name', 'Pikachu')
    .put('dob', 'February 27, 1996')
    .put('occupation', 'Pokemon')
    .write();
  
  // read items
  await db.createReadStream()
    .on('data', data => { console.log('data', `${data.key.toString()}=${data.value.toString()}`); })
    .on('close', () => { console.log('done!') });
})();

Using Callbacks

const levelup = require('levelup');
const s3leveldown = require('s3leveldown');

const db = levelup(s3leveldown('my_bucket'));

db.batch()
  .put('name', 'Pikachu')
  .put('dob', 'February 27, 1996')
  .put('occupation', 'Pokemon')
  .write(function () { 
    db.readStream()
      .on('data', console.log)
      .on('close', function () { console.log('Pika pi!') })
  });

Example with min.io

You could also use s3leveldown with S3 compatible servers such as MinIO.

const levelup = require('levelup');
const s3leveldown = require('s3leveldown');
const AWS = require('aws-sdk');

const s3 = new AWS.S3({
  apiVersion: '2006-03-01',
  accessKeyId: 'YOUR-ACCESSKEYID',
  secretAccessKey: 'YOUR-SECRETACCESSKEY',
  endpoint: 'http://127.0.0.1:9000',
  s3ForcePathStyle: true,
  signatureVersion: 'v4'
});

const db = levelup(s3leveldown('my_bucket', s3));

Example with PouchDB

Sub folders

You can create your Level DB in a sub-folder in your S3 bucket, just use my_bucket/sub_folder when passing the location.

Performance considerations

There are a few performance caveats due to the limited API provided by the AWS S3 API:

When iterating, getting values is expensive. A seperate S3 API call is made to get the value of each key. If you don't need the value, pass { values: false } in the options. Each S3 API call can return 1000 keys, so if there are 3000 results, 3 calls are made to list the keys, and if getting values as well, another 3000 API calls are made.

Avoid iterating large datasets when passing { reverse: true }. Since the S3 API call do not allow retrieving keys in reverse order, the entire result set needs to be stored in memory and reversed. If your database is large ( >5k keys ), be sure to provide start (gt, gte) and end (lt, lte), or the entire database will need to be fetched.

By default when iterating, 1000 keys will be returned. If you only want 10 keys for example, set { limit: 10 } and the S3 API call will only request 10 keys. Note that if you have { reverse: true }, this optimisation does not apply as we need to fetch everything from start to end and reverse it in memory. To override the default number of keys to return in a single API call, you can set the s3ListObjectMaxKeys option when creating the iterator. The maximum accepted by the S3 API is 1000.

Specify the AWS region of the bucket to improve performance, by calling AWS.config.update({ region: 'ap-southeast-2' }); replace ap-southeast-2 with your region.

Warning about concurrency

Individual operations (put get del) are atomic as guaranteed by S3, but the implementation of batch is not atomic. Two concurrent batch calls will have their operations interwoven. Don't use any plugins which require this to be atomic or you will end up with your database corrupted! However, if you can guarantee that only one process will write the S3 bucket at a time, then this should not be an issue. Ideally, you want to avoid race conditions where two processes are writing to the same key at the same time. In those cases the last write wins.

Iterator snapshots are not supported. When iterating through a list of keys and values, you may get the changes, similar to dirty reads.

Tests and debug

S3LevelDOWN uses debug. To see debug message set the environment variable DEBUG=S3LevelDOWN.

To run the test suite, you need to set a S3 bucket to the environment variable S3_TEST_BUCKET. Also be sure to set your AWS credentials

$ S3_TEST_BUCKET=my-test-bucket npm run test

Author: loune
Source Code: https://github.com/loune/s3leveldown 
License: MIT license

#javascript #s3 

S3leveldown: An Implementation Of LevelDOWN That Uses Amazon S3
Hermann  Frami

Hermann Frami

1656912240

Serverless Sthree Env

STHREE ENV PLUGIN

This plugin is used to get config from a json formatted file in S3 and copy them to environment variable

HOW TO USE

  • just include 'serverless-sthree-env' plugin to your serverless yml file

For now it is quite simple, the bucket and config key name is predefined based on

Bucket: (service name)-config-(stage) Key: config.json

everything in that key will be copied over to your environment variable

eg. service name is my-apps and I am using dev stage

so create a bucket in the same region with the name my-apps-config-dev and config.json file inside there like below

{
  "KEY": "VALUE" 
}

FUTURE IMPROVEMENT

  • Create serverless command to create a config bucket
  • Allow selection of config bucket and key
  • Whitelist or blacklist env variable to be copied

Author: StyleTributeIT 
Source Code: https://github.com/StyleTributeIT/serverless-sthree-env 
License: 

#serverless #env #s3 

Serverless Sthree Env
Hermann  Frami

Hermann Frami

1656889860

Serverless Static Plugin

📦 ✨ Serverless Static Plugin

Note

Deploy functionality is in active development, it soon will be available

1.install the plugin

First, add Serverless Static to your project, be sure that you already have the serverless-offline plugin already installed

$ npm install serverless-static --save-dev

or, if serverless-offline is not already installed

$ npm install serverless-static serverless-offline --save-dev

2. add it to your serverless.yml file

Then inside your project's serverless.yml file add following entry to the plugins section: serverless-static. If there is no plugin section you will need to add it to the file.

It should look something like this:

plugins:
  - serverless-offline
  - serverless-static 

3. customize behavior (optional)

custom:
  static:
    path: ./public # select the folder you want to serve
    port: 8000 # select a specific port 

# this will overide default behavior
# it will serve the folder ./public
# it will serve it throught localhost:8000

Author: iliasbhal
Source Code: https://github.com/iliasbhal/serverless-static 
License: 

#serverless #static #s3 #lambda 

Serverless Static Plugin
Hermann  Frami

Hermann Frami

1656614460

Serverless Plugin for S3 Sync

⚡️ Serverless Plugin for S3 Sync   

With this plugin for serverless, you can sync local folders to S3 buckets after your service is deployed.

Usage

Add the NPM package to your project:

# Via yarn
$ yarn add serverless-s3bucket-sync

# Via npm
$ npm install serverless-s3bucket-sync

Add the plugin to your serverless.yml:

plugins:
  - serverless-s3bucket-sync

Configuration

Configure S3 Bucket syncing Auto Scaling in serverless.yml with references to your local folder and the name of the S3 bucket.

custom:
  s3-sync:
    - folder: relative/folder
      bucket: bucket-name

That's it! With the next deployment, serverless will sync your local folder relative/folder with the S3 bucket named bucket-name.

Sync

You can use sls sync to synchornize all buckets without deploying your serverless stack.

Contribution

You are welcome to contribute to this project! 😘

To make sure you have a pleasant experience, please read the code of conduct. It outlines core values and beliefs and will make working together a happier experience.

Author: sbstjn
Source Code: https://github.com/sbstjn/serverless-s3bucket-sync 
License: MIT license

#serverless #s3 #sync #aws 

Serverless Plugin for S3 Sync
Hermann  Frami

Hermann Frami

1656606960

How to Synchronize Local Folders and S3 Prefixes for Serverless Framework

Serverless S3 Sync

A plugin to sync local directories and S3 prefixes for Serverless Framework ⚡ .

Use Case

  • Static Website ( serverless-s3-sync ) & Contact form backend ( serverless ) .
  • SPA ( serverless ) & assets ( serverless-s3-sync ) .

Install

Run npm install in your Serverless project.

$ npm install --save serverless-s3-sync

Add the plugin to your serverless.yml file

plugins:
  - serverless-s3-sync

Compatibility with Serverless Framework

Version 2.0.0 is compatible with Serverless Framework v3, but it uses the legacy logging interface. Version 3.0.0 and later uses the new logging interface.

serverless-s3-syncServerless Framework
v1.xv1.x, v2.x
v2.0.0v1.x, v2.x, v3.x
≥ v3.0.0v3.x

Setup

custom:
  s3Sync:
    # A simple configuration for copying static assets
    - bucketName: my-static-site-assets # required
      bucketPrefix: assets/ # optional
      localDir: dist/assets # required

    # An example of possible configuration options
    - bucketName: my-other-site
      localDir: path/to/other-site
      deleteRemoved: true # optional, indicates whether sync deletes files no longer present in localDir. Defaults to 'true'
      acl: public-read # optional
      followSymlinks: true # optional
      defaultContentType: text/html # optional
      params: # optional
        - index.html:
            CacheControl: 'no-cache'
        - "*.js":
            CacheControl: 'public, max-age=31536000'
      bucketTags: # optional, these are appended to existing S3 bucket tags (overwriting tags with the same key)
        tagKey1: tagValue1
        tagKey2: tagValue2

    # This references bucket name from the output of the current stack
    - bucketNameKey: AnotherBucketNameOutputKey
      localDir: path/to/another

    # ... but can also reference it from the output of another stack,
    # see https://www.serverless.com/framework/docs/providers/aws/guide/variables#reference-cloudformation-outputs
    - bucketName: ${cf:another-cf-stack-name.ExternalBucketOutputKey}
      localDir: path

resources:
  Resources:
    AssetsBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: my-static-site-assets
    OtherSiteBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: my-other-site
        AccessControl: PublicRead
        WebsiteConfiguration:
          IndexDocument: index.html
          ErrorDocument: error.html
    AnotherBucket:
      Type: AWS::S3::Bucket
  Outputs:
    AnotherBucketNameOutputKey:
      Value: !Ref AnotherBucket

Usage

Run sls deploy, local directories and S3 prefixes are synced.

Run sls remove, S3 objects in S3 prefixes are removed.

Run sls deploy --nos3sync, deploy your serverless stack without syncing local directories and S3 prefixes.

Run sls remove --nos3sync, remove your serverless stack without removing S3 objects from the target S3 buckets.

sls s3sync

Sync local directories and S3 prefixes.

Offline usage

If also using the plugins serverless-offline and serverless-s3-local, sync can be supported during development by placing the bucket configuration(s) into the buckets object and specifying the alterate endpoint (see below).

custom:
  s3Sync:
    # an alternate s3 endpoint
    endpoint: http://localhost:4569
    buckets:
    # A simple configuration for copying static assets
    - bucketName: my-static-site-assets # required
      bucketPrefix: assets/ # optional
      localDir: dist/assets # required
# ...

As per serverless-s3-local's instructions, once a local credentials profile is configured, run sls offline start --aws-profile s3local to sync to the local s3 bucket instead of Amazon AWS S3

bucketNameKey will not work in offline mode and can only be used in conjunction with valid AWS credentials, use bucketName instead.

run sls deploy for normal deployment

Always disable auto sync

custom:
  s3Sync:
    # Disable sync when sls deploy and sls remove
    noSync: true
    buckets:
    # A simple configuration for copying static assets
    - bucketName: my-static-site-assets # required
      bucketPrefix: assets/ # optional
      localDir: dist/assets # required
# ...

Author: k1LoW
Source Code: https://github.com/k1LoW/serverless-s3-sync 
License: 

#serverless #s3 #sync 

How to Synchronize Local Folders and S3 Prefixes for Serverless Framework
Hermann  Frami

Hermann Frami

1656599580

Serverless S3 Remover

serverless-s3-remover

plugin for serverless to make buckets empty before remove

Usage

Run next command.

$ npm install serverless-s3-remover

Add to your serverless.yml

plugins:
  - serverless-s3-remover

custom:
  remover:
     buckets:
       - my-bucket-1
       - my-bucket-2

You can specify any number of buckets that you want.

Now you can make all buckets empty by running:

$ sls s3remove

When removing

When removing serverless stack, this plugin automatically make buckets empty before removing stack.

$ sls remove

Using Prompt

You can use prompt before deleting bucket.

custom:
  remover:
    prompt: true # default value is `false`
    buckets:
      - remover-bucket-a
      - remover-bucket-b

terminal.png

Populating the configuration object before using it

custom:
  boolean:
    true: true
    false: false
  remover:
    prompt: ${self:custom.boolean.${opt:s3-remover-prompt, 'true'}}

I can use the command line argument --s3-remover-prompt false to disable the prompt feature.

Author: Sinofseven
Source Code: https://github.com/sinofseven/serverless-s3-remover 
License: MIT license

#serverless #s3 #remove #plugin 

Serverless S3 Remover
Hermann  Frami

Hermann Frami

1656592140

Serverless S3 Local

serverless-s3-local

serverless-s3-local is a Serverless plugin to run S3 clone in local. This is aimed to accelerate development of AWS Lambda functions by local testing. I think it is good to collaborate with serverless-offline.

Installation

Use npm

npm install serverless-s3-local --save-dev

Use serverless plugin install

sls plugin install --name serverless-s3-local

Example

serverless.yaml

service: serverless-s3-local-example
provider:
  name: aws
  runtime: nodejs12.x
plugins:
  - serverless-s3-local
  - serverless-offline
custom:
# Uncomment only if you want to collaborate with serverless-plugin-additional-stacks
# additionalStacks:
#    permanent:
#      Resources:
#        S3BucketData:
#            Type: AWS::S3::Bucket
#            Properties:
#                BucketName: ${self:service}-data
  s3:
    host: localhost
    directory: /tmp
resources:
  Resources:
    NewResource:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: local-bucket
functions:
  webhook:
    handler: handler.webhook
    events:
      - http:
        method: GET
        path: /
  s3hook:
    handler: handler.s3hook
    events:
      - s3: local-bucket
        event: s3:*

handler.js (AWS SDK v2)

const AWS = require("aws-sdk");

module.exports.webhook = (event, context, callback) => {
  const S3 = new AWS.S3({
    s3ForcePathStyle: true,
    accessKeyId: "S3RVER", // This specific key is required when working offline
    secretAccessKey: "S3RVER",
    endpoint: new AWS.Endpoint("http://localhost:4569"),
  });
  S3.putObject({
    Bucket: "local-bucket",
    Key: "1234",
    Body: new Buffer("abcd")
  }, () => callback(null, "ok"));
};

module.exports.s3hook = (event, context) => {
  console.log(JSON.stringify(event));
  console.log(JSON.stringify(context));
  console.log(JSON.stringify(process.env));
};

handler.js (AWS SDK v3)

const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");

module.exports.webhook = (event, context, callback) => {
  const client = new S3Client({
    forcePathStyle: true,
    credentials: {
      accessKeyId: "S3RVER", // This specific key is required when working offline
      secretAccessKey: "S3RVER",
    },
    endpoint: "http://localhost:4569",
  });
  client
    .send(
      new PutObjectCommand({
        Bucket: "local-bucket",
        Key: "1234",
        Body: Buffer.from("abcd"),
      })
    )
    .then(() => callback(null, "ok"));
};

module.exports.s3hook = (event, context) => {
  console.log(JSON.stringify(event));
  console.log(JSON.stringify(context));
  console.log(JSON.stringify(process.env));
};

Configuration options

Configuration options can be defined in multiple ways. They will be parsed with the following priority:

  • custom.s3 in serverless.yml
  • custom.serverless-offline in serverless.yml
  • Default values (see table below)
OptionDescriptionTypeDefault value
addressThe host/IP to bind the S3 server tostring'localhost'
hostThe host where internal S3 calls are made. Should be the same as addressstring 
portThe port that S3 server will listen tonumber4569
directoryThe location where the S3 files will be created. The directory must exist, it won't be createdstring'./buckets'
accessKeyIdThe Access Key Id to authenticate requestsstring'S3RVER'
secretAccessKeyThe Secret Access Key to authenticate requestsstring'S3RVER'
corsThe S3 CORS configuration XML. See AWS docsstring | Buffer 
websiteThe S3 Website configuration XML. See AWS docsstring | Buffer 
noStartSet to true if you already have an S3rver instance runningbooleanfalse
allowMismatchedSignaturesPrevent SignatureDoesNotMatch errors for all well-formed signaturesbooleanfalse
silentSuppress S3rver log messagesbooleanfalse
serviceEndpointOverride the AWS service root for subdomain-style accessstringamazonaws.com
httpsProtocolTo enable HTTPS, specify directory (relative to your cwd, typically your project dir) for both cert.pem and key.pem files.string 
vhostBucketsDisable vhost-style access for all bucketsbooleantrue
bucketsExtra bucket names will be created after starting S3 localstring 

Feature

  • Start local S3 server with specified root directory and port.
  • Create buckets at launching.
  • Support serverless-plugin-additional-stacks
  • Support serverless-webpack
  • Support serverless-plugin-existing-s3
  • Support S3 events.

Working with IaC tools

If your want to work with IaC tools such as terraform, you have to manage creating bucket process. In this case, please follow the below steps.

  1. Comment out configurations about S3 Bucket from resources section in serverless.yml.
#resources:
#  Resources:
#    NewResource:
#      Type: AWS::S3::Bucket
#      Properties:
#        BucketName: local-bucket
  1. Create bucket directory in s3rver working directory.
$ mkdir /tmp/local-bucket

Triggering AWS Events offline

This plugin will create a temporary directory to store mock S3 info. You must use the AWS cli to trigger events locally. First, using aws configure set up a new profile, i.e. aws configure --profile s3local. The default creds are

aws_access_key_id = S3RVER
aws_secret_access_key = S3RVER

You can now use this profile to trigger events. e.g. to trigger a put-object on a file at ~/tmp/userdata.csv in a local bucket run: aws --endpoint http://localhost:4569 s3 cp ~/tmp/data.csv s3://local-bucket/userdata.csv --profile s3local

You should see the event trigger in the serverless offline console: info: PUT /local-bucket/user-data.csv 200 16ms 0b and a new object with metadata will appear in your local bucket.

See also

Author: ar90n
Source Code: https://github.com/ar90n/serverless-s3-local 
License: MIT license

#serverless #s3 #lambda #local 

Serverless S3 Local
Hermann  Frami

Hermann Frami

1656584580

Serverless S3 Encryption

Serverless-s3-encryption

set or remove the encryption settings on the s3 buckets in your serverless stack

This plugin runs on the after:deploy hook, but you can also run it manually with: sls s3-encryption update

Install

npm install --save-dev serverless-s3-encryption

Usage

See the example below for how to modify your serverless.yml

# serverless.yml

plugins:
  # ...
  - serverless-s3-encryption

custom:
  # ...
  s3-encryption:
    buckets:
      MyEncryptedBucket:
        # see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putBucketEncryption-property
        # accepted values: none, AES256, aws:kms
        SSEAlgorithm: AES256
        # only if SSEAlgorithm is aws:kms
        KMSMasterKeyID: STRING_VALUE 

resources:
  Resources:
    MyEncryptedBucket:
      Type: "AWS::S3::Bucket"
      Description: my encrypted bucket
      DeletionPolicy: Retain

Author: Tradle
Source Code: https://github.com/tradle/serverless-s3-encryption 
License: 

#serverless #s3 #encryption 

Serverless S3 Encryption
Hermann  Frami

Hermann Frami

1656577200

Serverless S3 Deploy

serverless-s3-deploy

Plugin for serverless to deploy files to a variety of S3 Buckets

Note: This project is currently not maintained.

Installation

 npm install --save-dev serverless-s3-deploy

Usage

Add to your serverless.yml:

  plugins:
    - serverless-s3-deploy

  custom:
    assets:
      targets:
       - bucket: my-bucket
         files:
          - source: ../assets/
            globs: '**/*.css'
          - source: ../app/
            globs:
              - '**/*.js'
              - '**/*.map'
       - bucket: my-other-bucket
         empty: true
         prefix: subdir
         files:
          - source: ../email-templates/
            globs: '**/*.html'

You can specify any number of targets that you want. Each target has a bucket and a prefix.

bucket is either the name of your S3 bucket or a reference to a CloudFormation resources created in the same serverless configuration file. See below for additional details.

You can specify source relative to the current directory.

Each source has its own list of globs, which can be either a single glob, or a list of globs.

Setting empty to true will delete all files inside the bucket before uploading the new content to S3 bucket. The prefix value is respected and files outside will not be deleted.

Now you can upload all of these assets to your bucket by running:

$ sls s3deploy

If you have defined multiple buckets, you can limit your deployment to a single bucket with the --bucket option:

$ sls s3deploy --bucket my-bucket

ACL

You can optionally specificy an ACL for the files uploaded on a per target basis:

  custom:
    assets:
      targets:
        - bucket: my-bucket
          acl: private
          files:

The default value is private. Options are defined here.

Content Type

The appropriate Content Type for each file will attempt to be determined using mime-types. If one can't be determined, a default fallback of 'application/octet-stream' will be used.

You can override this fallback per-source by setting defaultContentType.

  custom:
    assets:
      targets:
        - bucket: my-bucket
          files:
            - source: html/
              defaultContentType: text/html
              ...

Other Headers

Additional headers can be included per target by providing a headers object.

See http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html for more details.

  custom:
    assets:
      targets:
        - bucket: my-bucket
          files:
            - source: html/
              headers:
                CacheControl: max-age=31104000 # 1 year

Resolving References

A common use case is to create the S3 buckets in the resources section of your serverless configuration and then reference it in your S3 plugin settings:

  custom:
    assets:
      targets:
        - bucket:
            Ref: MyBucket
          files:
            - source: html/

  resources:
    # AWS CloudFormation Template
    Resources:
      MyBucket:
        Type: AWS::S3::Bucket
        Properties:
          AccessControl: PublicRead
          WebsiteConfiguration:
            IndexDocument: index.html
            ErrorDocument: index.html

You can disable the resolving with the following flag:

  custom:
    assets:
      resolveReferences: false

Auto-deploy

If you want s3deploy to run automatically after a deploy, set the auto flag:

  custom:
    assets:
      auto: true

IAM Configuration

You're going to need an IAM policy that supports this deployment. This might be a good starting point:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${bucket}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${bucket}/*"
            ]
        }
    ]
}

Upload concurrency

If you want to tweak the upload concurrency, change uploadConcurrency config:

config:
  assets:
    # defaults to 3
    uploadConcurrency: 1

Verbosity

Verbosity cloud be enabled using either of these methods:

Configuration:

  custom:
    assets:
      verbose: true

Cli:

  sls s3deploy -v

Author: Funkybob
Source Code: https://github.com/funkybob/serverless-s3-deploy 
License: MIT license

#serverless #deploy #s3 #plugin 

Serverless S3 Deploy
Hermann  Frami

Hermann Frami

1656009240

Serverless Offline S3

serverless-offline-s3

This Serverless-offline plugin emulates AWS λ and S3 queue on your local machine. To do so, it listens S3 bucket events and invokes your handlers.

Features:

Installation

First, add serverless-offline-s3 to your project:

npm install serverless-offline-s3

Then inside your project's serverless.yml file, add following entry to the plugins section before serverless-offline (and after serverless-webpack if presents): serverless-offline-s3.

plugins:
  - serverless-webpack
  - serverless-offline-s3
  - serverless-offline

See example

How it works?

To be able to emulate AWS S3 Bucket on local machine there should be some bucket system actually running. One of the existing implementations suitable for the task is Minio.

Minio is a High Performance Object Storage released under Apache License v2.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads. See example s3 service setup.

We also need to setup actual buckets in Minio server, we can use AWS cli tools for that. In example, we spawn-up another container with aws-cli pre-installed and run initialization script, against Minio server in separate container.

Once Minio is running and initialized, we can proceed with the configuration of the plugin.

Note that starting from version v3.1 of the plugin.

Configure

Functions

The configuration of function of the plugin follows the serverless documentation.

functions:
  myS3Handler:
    handler: handler.compute
    events:
      - s3:
          bucket: myBucket
          event: s3:ObjectCreated:Put

S3

The configuration of aws.S3's client of the plugin is done by defining a custom: serverless-offline-s3 object in your serverless.yml with your specific configuration.

Minio with the following configuration:

custom:
serverless-offline-s3:
  endpoint: http://0.0.0.0:9000
    region: eu-west-1
  accessKey: minioadmin
  secretKey: minioadmin

Author: CoorpAcademy
Source Code: https://github.com/CoorpAcademy/serverless-plugins 
License: 

#serverless #s3 #aws 

Serverless Offline S3

How to Upload files to AWS S3 Using .Net 6 WebAPI - Step by Step

In this video we will go through Web API and AWS S3 integration. We will learn about S3 and how to use it with our .Net web api and how can upload files to it.

So what we will cover today:  

00:00 intro
00:51 Agenda
01:26 What is AWS S3
02:54 What is an S3 Bucket
06:22 S3 Characteristics - Benefits
09:26 Securing S3 Bucket
10:22 S3 Encryption
14:22 S3 Class Types
21:37 Ingredients (dev requirements)
22:13 Code time
22:37 Create an IAM User
23:29 Create an S3 bucket
27:03 Create Web API and Classlib
31:37 Setup the S3 Classlib
32:02 Create S3 Models (DTOs)
37:17 Create the interface and service
52:04 Creating the controller
59:53 Injecting the Services
01:01:10 Testing the application

Source code:
https://github.com/mohamadlawand087/NET6-S3

DotNet SDK:
https://dotnet.microsoft.com/download

Visual Studio Code:
https://code.visualstudio.com/](https://code.visualstudio.com/

#net6  #s3 #aws #dotnet #aps.net

How to Upload files to AWS S3 Using .Net 6 WebAPI  - Step by Step

S3 Vs EBS Vs HDFS Vs EFS - Use Cases and Interview Tips and AWS Storage

In this AWS Storage video, we will understand differences between object, block, file and distributed file system storages. Then compare S3, EBS, HDFS, EFS. We will look into some use cases and finally share some of interview tips.

#aws #s3 #ebs #hdfs #efs

S3 Vs EBS Vs HDFS Vs EFS - Use Cases and Interview Tips and AWS Storage