Gordon  Taylor

Gordon Taylor

1660638720

Yjs: Shared Data Types for Building Collaborative Software

Yjs

A CRDT framework with a powerful abstraction of shared data

Yjs is a CRDT implementation that exposes its internal data structure as shared types. Shared types are common data types like Map or Array with superpowers: changes are automatically distributed to other peers and merged without merge conflicts.

Overview

This repository contains a collection of shared types that can be observed for changes and manipulated concurrently. Network functionality and two-way-bindings are implemented in separate modules.

Bindings

NameCursorsBindingDemo
ProseMirror                                                  y-prosemirrordemo
Quilly-quilldemo
CodeMirrory-codemirrordemo
Monacoy-monacodemo

Providers

Setting up the communication between clients, managing awareness information, and storing shared data for offline usage is quite a hassle. Providers manage all that for you and are the perfect starting point for your collaborative app.

y-webrtc

Propagates document updates peer-to-peer using WebRTC. The peers exchange signaling data over signaling servers. Publically available signaling servers are available. Communication over the signaling servers can be encrypted by providing a shared secret, keeping the connection information and the shared document private.

y-websocket

A module that contains a simple websocket backend and a websocket client that connects to that backend. The backend can be extended to persist updates in a leveldb database.

y-indexeddb

Efficiently persists document updates to the browsers indexeddb database. The document is immediately available and only diffs need to be synced through the network provider.

y-dat

[WIP] Write document updates effinciently to the dat network using multifeed. Each client has an append-only log of CRDT local updates (hypercore). Multifeed manages and sync hypercores and y-dat listens to changes and applies them to the Yjs document.

Getting Started

Install Yjs and a provider with your favorite package manager:

npm i yjs y-websocket

Start the y-websocket server:

PORT=1234 node ./node_modules/y-websocket/bin/server.js

Example: Observe types

const yarray = doc.getArray('my-array')
yarray.observe(event => {
  console.log('yarray was modified')
})
// every time a local or remote client modifies yarray, the observer is called
yarray.insert(0, ['val']) // => "yarray was modified"

Example: Nest types

Remember, shared types are just plain old data types. The only limitation is that a shared type must exist only once in the shared document.

const ymap = doc.getMap('map')
const foodArray = new Y.Array()
foodArray.insert(0, ['apple', 'banana'])
ymap.set('food', foodArray)
ymap.get('food') === foodArray // => true
ymap.set('fruit', foodArray) // => Error! foodArray is already defined

Now you understand how types are defined on a shared document. Next you can jump to the demo repository or continue reading the API docs.

Example: Using and combining providers

Any of the Yjs providers can be combined with each other. So you can sync data over different network technologies.

In most cases you want to use a network provider (like y-websocket or y-webrtc) in combination with a persistence provider (y-indexeddb in the browser). Persistence allows you to load the document faster and to persist data that is created while offline.

For the sake of this demo we combine two different network providers with a persistence provider.

import * as Y from 'yjs'
import { WebrtcProvider } from 'y-webrtc'
import { WebsocketProvider } from 'y-websocket'
import { IndexeddbPersistence } from 'y-indexeddb'

const ydoc = new Y.Doc()

// this allows you to instantly get the (cached) documents data
const indexeddbProvider = new IndexeddbPersistence('count-demo', ydoc)
idbP.whenSynced.then(() => {
  console.log('loaded data from indexed db')
})

// Sync clients with the y-webrtc provider.
const webrtcProvider = new WebrtcProvider('count-demo', ydoc)

// Sync clients with the y-websocket provider
const websocketProvider = new WebsocketProvider(
  'wss://demos.yjs.dev', 'count-demo', ydoc
)

// array of numbers which produce a sum
const yarray = ydoc.getArray('count')

// observe changes of the sum
yarray.observe(event => {
  // print updates when the data changes
  console.log('new sum: ' + yarray.toArray().reduce((a,b) => a + b))
})

// add 1 to the sum
yarray.push([1]) // => "new sum: 1"

API

import * as Y from 'yjs'

Shared Types

Y.Array
 

A shareable Array-like type that supports efficient insert/delete of elements at any position. Internally it uses a linked list of Arrays that is split when necessary.

const yarray = new Y.Array()insert(index:number, content:Array<object|boolean|Array|string|number|Uint8Array|Y.Type>)

Insert content at index. Note that content is an array of elements. I.e. array.insert(0, [1]) splices the list and inserts 1 at position 0.

push(Array<Object|boolean|Array|string|number|Uint8Array|Y.Type>)unshift(Array<Object|boolean|Array|string|number|Uint8Array|Y.Type>)delete(index:number, length:number)get(index:number)length:numberforEach(function(value:object|boolean|Array|string|number|Uint8Array|Y.Type, index:number, array: Y.Array)) map(function(T, number, YArray):M):Array<M>toArray():Array<object|boolean|Array|string|number|Uint8Array|Y.Type>

Copies the content of this YArray to a new Array.

toJSON():Array<Object|boolean|Array|string|number>

Copies the content of this YArray to a new Array. It transforms all child types to JSON using their toJSON method.

[Symbol.Iterator]

Returns an YArray Iterator that contains the values for each index in the array.for (let value of yarray) { .. }

observe(function(YArrayEvent, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns.

unobserve(function(YArrayEvent, Transaction):void)

Removes an observe event listener from this type.

observeDeep(function(Array<YEvent>, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type or any of its children is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns. The event listener receives all Events created by itself or any of its children.

unobserveDeep(function(Array<YEvent>, Transaction):void)

Removes an observeDeep event listener from this type.

Y.Map
 

A shareable Map type.

const ymap = new Y.Map()

get(key:string):object|boolean|string|number|Uint8Array|Y.Typeset(key:string, value:object|boolean|string|number|Uint8Array|Y.Type)delete(key:string)has(key:string):booleanget(index:number)toJSON():Object<string, Object|boolean|Array|string|number|Uint8Array>

Copies the [key,value] pairs of this YMap to a new Object.It transforms all child types to JSON using their toJSON method.

forEach(function(value:object|boolean|Array|string|number|Uint8Array|Y.Type, key:string, map: Y.Map))

Execute the provided function once for every key-value pair.

[Symbol.Iterator]

Returns an Iterator of [key, value] pairs.for (let [key, value] of ymap) { .. }

entries()

Returns an Iterator of [key, value] pairs.

values()

Returns an Iterator of all values.

keys()

Returns an Iterator of all keys.

observe(function(YMapEvent, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns.

unobserve(function(YMapEvent, Transaction):void)

Removes an observe event listener from this type.

observeDeep(function(Array<YEvent>, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type or any of its children is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns. The event listener receives all Events created by itself or any of its children.

unobserveDeep(function(Array<YEvent>, Transaction):void)

Removes an observeDeep event listener from this type.

Y.Text
 

A shareable type that is optimized for shared editing on text. It allows to assign properties to ranges in the text. This makes it possible to implement rich-text bindings to this type.

This type can also be transformed to the delta format. Similarly the YTextEvents compute changes as deltas.

const ytext = new Y.Text()insert(index:number, content:string, [formattingAttributes:Object<string,string>])

Insert a string at index and assign formatting attributes to it.ytext.insert(0, 'bold text', { bold: true })

delete(index:number, length:number)format(index:number, length:number, formattingAttributes:Object<string,string>)

Assign formatting attributes to a range in the text

applyDelta(delta, opts:Object<string,any>)

See Quill Delta Can set options for preventing remove ending newLines, default is true.ytext.applyDelta(delta, { sanitize: false })

length:numbertoString():string

Transforms this type, without formatting options, into a string.

toJSON():string

See toString

toDelta():Delta

Transforms this type to a Quill Delta

observe(function(YTextEvent, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns.

unobserve(function(YTextEvent, Transaction):void)

Removes an observe event listener from this type.

observeDeep(function(Array<YEvent>, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type or any of its children is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns. The event listener receives all Events created by itself or any of its children.

unobserveDeep(function(Array<YEvent>, Transaction):void)

Removes an observeDeep event listener from this type.

Y.XmlFragment
 

A container that holds an Array of Y.XmlElements.

const yxml = new Y.XmlFragment()

insert(index:number, content:Array<Y.XmlElement|Y.XmlText>)delete(index:number, length:number)get(index:number)length:numbertoArray():Array<Y.XmlElement|Y.XmlText>

Copies the children to a new Array.

toDOM():DocumentFragment

Transforms this type and all children to new DOM elements.

toString():string

Get the XML serialization of all descendants.

toJSON():string

See toString.

observe(function(YXmlEvent, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns.

unobserve(function(YXmlEvent, Transaction):void)

Removes an observe event listener from this type.

observeDeep(function(Array<YEvent>, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type or any of its children is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns. The event listener receives all Events created by itself or any of its children.

unobserveDeep(function(Array<YEvent>, Transaction):void)

Removes an observeDeep event listener from this type.

Y.XmlElement
 

A shareable type that represents an XML Element. It has a nodeName, attributes, and a list of children. But it makes no effort to validate its content and be actually XML compliant.

const yxml = new Y.XmlElement()

insert(index:number, content:Array<Y.XmlElement|Y.XmlText>)delete(index:number, length:number)get(index:number)length:numbersetAttribute(attributeName:string, attributeValue:string)removeAttribute(attributeName:string)getAttribute(attributeName:string):stringgetAttributes(attributeName:string):Object<string,string>toArray():Array<Y.XmlElement|Y.XmlText>

Copies the children to a new Array.

toDOM():Element

Transforms this type and all children to a new DOM element.

toString():string

Get the XML serialization of all descendants.

toJSON():string

See toString.

observe(function(YXmlEvent, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns.

unobserve(function(YXmlEvent, Transaction):void)

Removes an observe event listener from this type.

observeDeep(function(Array<YEvent>, Transaction):void)

Adds an event listener to this type that will be called synchronously every time this type or any of its children is modified. In the case this type is modified in the event listener, the event listener will be called again after the current event listener returns. The event listener receives all Events created by itself or any of its children.

unobserveDeep(function(Array<YEvent>, Transaction):void)

Removes an observeDeep event listener from this type.

Y.Doc

const doc = new Y.Doc()

clientID

A unique id that identifies this client. (readonly)

gc

Whether garbage collection is enabled on this doc instance. Set `doc.gc = false` in order to disable gc and be able to restore old content. See https://github.com/yjs/yjs#yjs-crdt-algorithm for more information about gc in Yjs.

transact(function(Transaction):void [, origin:any])

Every change on the shared document happens in a transaction. Observer calls and the update event are called after each transaction. You should bundle changes into a single transaction to reduce the amount of event calls. I.e. doc.transact(() => { yarray.insert(..); ymap.set(..) }) triggers a single change event. 
You can specify an optional origin parameter that is stored on transaction.origin and on('update', (update, origin) => ..).

get(string, Y.[TypeClass]):[Type]

Define a shared type.

getArray(string):Y.Array

Define a shared Y.Array type. Is equivalent to y.get(string, Y.Array).

getMap(string):Y.Map

Define a shared Y.Map type. Is equivalent to y.get(string, Y.Map).

getXmlFragment(string):Y.XmlFragment

Define a shared Y.XmlFragment type. Is equivalent to y.get(string, Y.XmlFragment).

on(string, function)

Register an event listener on the shared type

off(string, function)

Unregister an event listener from the shared type

Y.Doc Events

on('update', function(updateMessage:Uint8Array, origin:any, Y.Doc):void)

Listen to document updates. Document updates must be transmitted to all other peers. You can apply document updates in any order and multiple times.

on('beforeTransaction', function(Y.Transaction, Y.Doc):void)

Emitted before each transaction.

on('afterTransaction', function(Y.Transaction, Y.Doc):void)

Emitted after each transaction.

Document Updates

Changes on the shared document are encoded into document updates. Document updates are commutative and idempotent. This means that they can be applied in any order and multiple times.

Example: Listen to update events and apply them on remote client

const doc1 = new Y.Doc()
const doc2 = new Y.Doc()

doc1.on('update', update => {
  Y.applyUpdate(doc2, update)
})

doc2.on('update', update => {
  Y.applyUpdate(doc1, update)
})

// All changes are also applied to the other document
doc1.getArray('myarray').insert(0, ['Hello doc2, you got this?'])
doc2.getArray('myarray').get(0) // => 'Hello doc2, you got this?'

Yjs internally maintains a state vector that denotes the next expected clock from each client. In a different interpretation it holds the number of structs created by each client. When two clients sync, you can either exchange the complete document structure or only the differences by sending the state vector to compute the differences.

Example: Sync two clients by exchanging the complete document structure

const state1 = Y.encodeStateAsUpdate(ydoc1)
const state2 = Y.encodeStateAsUpdate(ydoc2)
Y.applyUpdate(ydoc1, state2)
Y.applyUpdate(ydoc2, state1)

Example: Sync two clients by computing the differences

This example shows how to sync two clients with the minimal amount of exchanged data by computing only the differences using the state vector of the remote client. Syncing clients using the state vector requires another roundtrip, but can safe a lot of bandwidth.

const stateVector1 = Y.encodeStateVector(ydoc1)
const stateVector2 = Y.encodeStateVector(ydoc2)
const diff1 = Y.encodeStateAsUpdate(ydoc1, stateVector2)
const diff2 = Y.encodeStateAsUpdate(ydoc2, stateVector1)
Y.applyUpdate(ydoc1, diff2)
Y.applyUpdate(ydoc2, diff1)

Y.applyUpdate(Y.Doc, update:Uint8Array, [transactionOrigin:any])

Apply a document update on the shared document. Optionally you can specify transactionOrigin that will be stored on transaction.origin and ydoc.on('update', (update, origin) => ..).

Y.encodeStateAsUpdate(Y.Doc, [encodedTargetStateVector:Uint8Array]):Uint8Array

Encode the document state as a single update message that can be applied on the remote document. Optionally specify the target state vector to only write the differences to the update message.

Y.encodeStateVector(Y.Doc):Uint8Array

Computes the state vector and encodes it into an Uint8Array.

Relative Positions

This API is not stable yet

This feature is intended for managing selections / cursors. When working with other users that manipulate the shared document, you can't trust that an index position (an integer) will stay at the intended location. A relative position is fixated to an element in the shared document and is not affected by remote changes. I.e. given the document "a|c", the relative position is attached to c. When a remote user modifies the document by inserting a character before the cursor, the cursor will stay attached to the character c. insert(1, 'x')("a|c") = "ax|c". When the relative position is set to the end of the document, it will stay attached to the end of the document.

Example: Transform to RelativePosition and back

const relPos = Y.createRelativePositionFromTypeIndex(ytext, 2)
const pos = Y.createAbsolutePositionFromRelativePosition(relPos, doc)
pos.type === ytext // => true
pos.index === 2 // => true

Example: Send relative position to remote client (json)

const relPos = Y.createRelativePositionFromTypeIndex(ytext, 2)
const encodedRelPos = JSON.stringify(relPos)
// send encodedRelPos to remote client..
const parsedRelPos = JSON.parse(encodedRelPos)
const pos = Y.createAbsolutePositionFromRelativePosition(parsedRelPos, remoteDoc)
pos.type === remoteytext // => true
pos.index === 2 // => true

Example: Send relative position to remote client (Uint8Array)

const relPos = Y.createRelativePositionFromTypeIndex(ytext, 2)
const encodedRelPos = Y.encodeRelativePosition(relPos)
// send encodedRelPos to remote client..
const parsedRelPos = Y.decodeRelativePosition(encodedRelPos)
const pos = Y.createAbsolutePositionFromRelativePosition(parsedRelPos, remoteDoc)
pos.type === remoteytext // => true
pos.index === 2 // => true

Y.createRelativePositionFromTypeIndex(Uint8Array|Y.Type, number)Y.createAbsolutePositionFromRelativePosition(RelativePosition, Y.Doc)Y.encodeRelativePosition(RelativePosition):Uint8ArrayY.decodeRelativePosition(Uint8Array):RelativePosition

Y.UndoManager

Yjs ships with an Undo/Redo manager for selective undo/redo of of changes on a Yjs type. The changes can be optionally scoped to transaction origins.

const ytext = doc.getText('text')
const undoManager = new Y.UndoManager(ytext)

ytext.insert(0, 'abc')
undoManager.undo()
ytext.toString() // => ''
undoManager.redo()
ytext.toString() // => 'abc'

constructor(scope:Y.AbstractType|Array<Y.AbstractType> [, {captureTimeout:number,trackedOrigins:Set<any>,deleteFilter:function(item):boolean}])

Accepts either single type as scope or an array of types.

undo()redo()stopCapturing()on('stack-item-added', { stackItem: { meta: Map<any,any> }, type: 'undo' | 'redo' }) 

Register an event that is called when a StackItem is added to the undo- or the redo-stack.

on('stack-item-popped', { stackItem: { meta: Map<any,any> }, type: 'undo' | 'redo' }) 

Register an event that is called when a StackItem is popped from the undo- or the redo-stack.

Example: Stop Capturing

UndoManager merges Undo-StackItems if they are created within time-gap smaller than options.captureTimeout. Call um.stopCapturing() so that the next StackItem won't be merged.

// without stopCapturing
ytext.insert(0, 'a')
ytext.insert(1, 'b')
undoManager.undo()
ytext.toString() // => '' (note that 'ab' was removed)
// with stopCapturing
ytext.insert(0, 'a')
undoManager.stopCapturing()
ytext.insert(0, 'b')
undoManager.undo()
ytext.toString() // => 'a' (note that only 'b' was removed)

Example: Specify tracked origins

Every change on the shared document has an origin. If no origin was specified, it defaults to null. By specifying trackedOrigins you can selectively specify which changes should be tracked by UndoManager. The UndoManager instance is always added to trackedOrigins.

class CustomBinding {}

const ytext = doc.getText('text')
const undoManager = new Y.UndoManager(ytext, {
  trackedOrigins: new Set([42, CustomBinding])
})

ytext.insert(0, 'abc')
undoManager.undo()
ytext.toString() // => 'abc' (does not track because origin `null` and not part
                 //           of `trackedTransactionOrigins`)
ytext.delete(0, 3) // revert change

doc.transact(() => {
  ytext.insert(0, 'abc')
}, 42)
undoManager.undo()
ytext.toString() // => '' (tracked because origin is an instance of `trackedTransactionorigins`)

doc.transact(() => {
  ytext.insert(0, 'abc')
}, 41)
undoManager.undo()
ytext.toString() // => '' (not tracked because 41 is not an instance of
                 //        `trackedTransactionorigins`)
ytext.delete(0, 3) // revert change

doc.transact(() => {
  ytext.insert(0, 'abc')
}, new CustomBinding())
undoManager.undo()
ytext.toString() // => '' (tracked because origin is a `CustomBinding` and
                 //        `CustomBinding` is in `trackedTransactionorigins`)

Example: Add additional information to the StackItems

When undoing or redoing a previous action, it is often expected to restore additional meta information like the cursor location or the view on the document. You can assign meta-information to Undo-/Redo-StackItems.

const ytext = doc.getText('text')
const undoManager = new Y.UndoManager(ytext, {
  trackedOrigins: new Set([42, CustomBinding])
})

undoManager.on('stack-item-added', event => {
  // save the current cursor location on the stack-item
  event.stackItem.meta.set('cursor-location', getRelativeCursorLocation())
})

undoManager.on('stack-item-popped', event => {
  // restore the current cursor location on the stack-item
  restoreCursorLocation(event.stackItem.meta.get('cursor-location'))
})

Miscellaneous

Typescript Declarations

Yjs has type descriptions. But until this ticket is fixed, this is how you can make use of Yjs type declarations.

{
  "compilerOptions": {
    "allowJs": true,
    "checkJs": true,
  },
  "maxNodeModuleJsDepth": 5
}

Yjs CRDT Algorithm

Conflict-free replicated data types (CRDT) for collaborative editing are an alternative approach to operational transformation (OT). A very simple differenciation between the two approaches is that OT attempts to transform index positions to ensure convergence (all clients end up with the same content), while CRDTs use mathematical models that usually do not involve index transformations, like linked lists. OT is currently the de-facto standard for shared editing on text. OT approaches that support shared editing without a central source of truth (a central server) require too much bookkeeping to be viable in practice. CRDTs are better suited for distributed systems, provide additional guarantees that the document can be synced with remote clients, and do not require a central source of truth.

Yjs implements a modified version of the algorithm described in this paper. I will eventually publish a paper that describes why this approach works so well in practice. Note: Since operations make up the document structure, we prefer the term struct now.

CRDTs suitable for shared text editing suffer from the fact that they only grow in size. There are CRDTs that do not grow in size, but they do not have the characteristics that are benificial for shared text editing (like intention preservation). Yjs implements many improvements to the original algorithm that diminish the trade-off that the document only grows in size. We can't garbage collect deleted structs (tombstones) while ensuring a unique order of the structs. But we can 1. merge preceeding structs into a single struct to reduce the amount of meta information, 2. we can delete content from the struct if it is deleted, and 3. we can garbage collect tombstones if we don't care about the order of the structs anymore (e.g. if the parent was deleted).

Examples:

  1. If a user inserts elements in sequence, the struct will be merged into a single struct. E.g. array.insert(0, ['a']), array.insert(0, ['b']); is first represented as two structs ([{id: {client, clock: 0}, content: 'a'}, {id: {client, clock: 1}, content: 'b'}) and then merged into a single struct: [{id: {client, clock: 0}, content: 'ab'}].
  2. When a struct that contains content (e.g. ItemString) is deleted, the struct will be replaced with an ItemDeleted that does not contain content anymore.
  3. When a type is deleted, all child elements are transformed to GC structs. A GC struct only denotes the existence of a struct and that it is deleted. GC structs can always be merged with other GC structs if the id's are adjacent.

Especially when working on structured content (e.g. shared editing on ProseMirror), these improvements yield very good results when benchmarking random document edits. In practice they show even better results, because users usually edit text in sequence, resulting in structs that can easily be merged. The benchmarks show that even in the worst case scenario that a user edits text from right to left, Yjs achieves good performance even for huge documents.

State Vector

Yjs has the ability to exchange only the differences when syncing two clients. We use lamport timestamps to identify structs and to track in which order a client created them. Each struct has an struct.id = { client: number, clock: number} that uniquely identifies a struct. We define the next expected clock by each client as the state vector. This data structure is similar to the version vectors data structure. But we use state vectors only to describe the state of the local document, so we can compute the missing struct of the remote client. We do not use it to track causality.


Yjs is network agnostic (p2p!), supports many existing rich text editors, offline editing, version snapshots, undo/redo and shared cursors. It scales well with an unlimited number of users and is well suited for even large documents.

👷‍♀️ If you are looking for professional support, please consider supporting this project via a "support contract" on GitHub Sponsors. I will attend your issues quicker and we can discuss questions and problems in regular video conferences. Otherwise you can find help on our community discussion board.

Who is using Yjs

  • Relm A collaborative gameworld for teamwork and community. :star2:
  • Input A collaborative note taking app. :star2:
  • Room.sh A meeting application with integrated collaborative drawing, editing, and coding tools. :star:
  • http://coronavirustechhandbook.com/ A collaborative wiki that is edited by thousands of different people to work on a rapid and sophisticated response to the coronavirus outbreak and subsequent impacts. :star:
  • Nimbus Note A note-taking app designed by Nimbus Web.
  • JoeDocs An open collaborative wiki.
  • Pluxbox RadioManager A web-based app to collaboratively organize radio broadcasts.
  • Cattaz A wiki that can run custom applications in the wiki pages.

Download Details:

Author: yjs
Source Code: https://github.com/yjs/yjs 
License: View license

#javascript #decentralized #realtime  

Yjs: Shared Data Types for Building Collaborative Software

S3git: Git for Cloud Storage (or Version Control For Data)

s3git: git for Cloud Storage (or Version Control for Data)

s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git!

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository. It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3. Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Exactly like git, s3git does not require any server-side components, just download and run the executable. It imports the golang package s3git-go that can be used from other applications as well. Or see the Python module or Ruby gem.

Use cases for s3git

  • Build and Release Management (see example with all Kubernetes releases).
  • DevOps Scenarios
  • Data Consolidation
  • Analytics
  • Photo and Video storage

See use cases for a detailed description of these use cases.

Download binaries

DISCLAIMER: These are PRE-RELEASE binaries -- use at your own peril for now

OSX

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64

$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD}   # Add current dir where s3git has been downloaded to
$ s3git

Linux

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64

$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD}   # Add current dir where s3git has been downloaded to
$ s3git

Windows

Download s3git.exe from https://github.com/s3git/s3git/releases/download/v0.9.1/s3git.exe

C:\Users\Username\Downloads> s3git.exe

Building from source

Build instructions are as follows (see install golang for setting up a working golang environment):

$ go get -d github.com/s3git/s3git
$ cd $GOPATH/src/github.com/s3git/s3git 
$ go install
$ s3git

BLAKE2 Tree Hashing and Storage Format

Read here how s3git uses the BLAKE2 Tree hashing mode for both deduplicated and hydrated storage (and here for info for BLAKE2 at scale).

Example workflow

Here is a simple workflow to create a new repository and populate it with some data:

$ mkdir s3git-repo && cd s3git-repo
$ s3git init
Initialized empty s3git repository in ...
$ # Just stream in some text
$ echo "hello s3git" | s3git add
Added: 18e622875a89cede0d7019b2c8afecf8928c21eac18ec51e38a8e6b829b82c3ef306dec34227929fa77b1c7c329b3d4e50ed9e72dc4dc885be0932d3f28d7053
$ # Add some more files
$ s3git add "*.mp4"
$ # Commit and log
$ s3git commit -m "My first commit"
$ s3git log --pretty

Push to cloud storage

$ # Add remote back end and push to it
$ s3git remote add "primary" -r s3://s3git-playground -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
$ s3git push
$ # Read back content
$ s3git cat 18e6
hello s3git

Note: Do not store any important info in the s3git-playground bucket. It will be auto-deleted within 24-hours.

Directory versioning

You can also use s3git for directory versioning. This allows you to 'capture' changes coherently all the way down from a directory and subsequently go back to previous versions of the full state of the directory (and not just any file). Think of it as a Time Machine for directories instead of individual files.

So instead of 'saving' a directory by making a full copy into 'MyFolder-v2' (and 'MyFolder-v3', etc.) you capture the state of a directory and give it a meaningful message ("Changed color to red") as version so it is always easy to go back to the version you are looking for.

In addition you can discard any uncommitted changes that you made and go back to the last version that you have captured, which basically means you can (after committing) mess around in a directory and then be rest assured that you can always go back to its original state.

If you push your repository into the cloud then you will have an automatic backup and additionally you can easily collaborate with other people.

Lastly, it works of course with huge binary data too, so not just for text files as in the following 'demo' example:

$ mkdir dir-versioning && cd dir-versioning
$ s3git init .
$ # Just create a single file
$ echo "First line" > text.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 11 May 25 09:06 text.txt
$ #
$ # Create initial snapshot
$ s3git snapshot create -m "Initial snapshot" .
$ # Add new line to initial file and create another file
$ echo "Second line" >> text.txt && echo "Another file" > text2.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 23 May 25 09:08 text.txt
-rw-rw-r-- 1 ec2-user ec2-user 13 May 25 09:08 text2.txt
$ s3git snapshot status .
     New: /home/ec2-user/dir-versioning/text2.txt
Modified: /home/ec2-user/dir-versioning/text.txt
$ #
$ # Create second snapshot
$ s3git snapshot create -m "Second snapshot" .
$ s3git log --pretty
3a4c3466264904fed3d52a1744fb1865b21beae1a79e374660aa231e889de41191009afb4795b61fdba9c156 Second snapshot
77a8e169853a7480c9a738c293478c9923532f56fcd02e3276142a1a29ac7f0006b5dff65d5ca245255f09fa Initial snapshot
$ more text.txt
First line
Second line
$ more text2.txt
Another file
$ #
$ # Go back one version in time
$ s3git snapshot checkout . HEAD^
$ more text.txt
First line
$ more text2.txt
text2.txt: No such file or directory
$ #
$ # Switch back to latest revision
$ s3git snapshot checkout .
$ more text2.txt
Another file

Note that snapshotting works for all files in the directory including any subdirectories. Click the following link for a more elaborate repository that includes all releases of the Kubernetes project.

Clone the YFCC100M dataset

Clone a large repo with 100 million files totaling 11.5 TB in size (Multimedia Commons), yet requiring only 7 GB local disk space.

(Note that this takes about 7 minutes on an SSD-equipped MacBook Pro with 500 Mbit/s download connection so for less powerful hardware you may want to skip to the next section (or if you lack 7 GB local disk space, try a df -h . first). Then again it is quite a few files...)

$ s3git clone s3://s3git-100m -a "AKIAI26TSIF6JIMMDSPQ" -s "5NvshAhI0KMz5Gbqkp7WNqXYlnjBjkf9IaJD75x7"
Cloning into ...
Done. Totaling 97,974,749 objects.
$ cd s3git-100m
$ # List all files starting with '123456'
$ s3git ls 123456
12345649755b9f489df2470838a76c9df1d4ee85e864b15cf328441bd12fdfc23d5b95f8abffb9406f4cdf05306b082d3773f0f05090766272e2e8c8b8df5997
123456629a711c83c28dc63f0bc77ca597c695a19e498334a68e4236db18df84a2cdd964180ab2fcf04cbacd0f26eb345e09e6f9c6957a8fb069d558cadf287e
123456675eaecb4a2984f2849d3b8c53e55dd76102a2093cbca3e61668a3dd4e8f148a32c41235ab01e70003d4262ead484d9158803a1f8d74e6acad37a7a296
123456e6c21c054744742d482960353f586e16d33384f7c42373b908f7a7bd08b18768d429e01a0070fadc2c037ef83eef27453fc96d1625e704dd62931be2d1
$ s3git cat cafebad > olympic.jpg
$ # List and count total nr of files
$ s3git ls | wc -l
97974749

Fork that repo

Below is an example for alice and bob working together on a repository.

$ mkdir alice && cd alice
alice $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../alice/s3git-spoon-knife
Done. Totaling 0 objects.
alice $ cd s3git-spoon-knife
alice $ # add a file filled with zeros
alice $ dd if=/dev/zero count=1 | s3git add
Added: 3ad6df690177a56092cb1ac7e9690dcabcac23cf10fee594030c7075ccd9c5e38adbaf58103cf573b156d114452b94aa79b980d9413331e22a8c95aa6fb60f4e
alice $ # add 9 more files (with random content)
alice $ for n in {1..9}; do dd if=/dev/urandom count=1 | s3git add; done
alice $ # commit
alice $ s3git commit -m "Commit from alice"
alice $ # and push
alice $ s3git push

Clone it again as bob on a different computer/different directory/different universe:

$ mkdir bob && cd bob
bob $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../bob/s3git-spoon-knife
Done. Totaling 10 objects.
bob $ cd s3git-spoon-knife
bob $ # Check if we can access our empty file
bob $ s3git cat 3ad6 | hexdump
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00000200
bob $ # add another 10 files
bob $ for n in {1..10}; do dd if=/dev/urandom count=1 | s3git add; done
bob $ # commit
bob $ s3git commit -m "Commit from bob"
bob $ # and push back
bob $ s3git push

Switch back to alice again to pull the new content:

alice $ s3git pull
Done. Totaling 20 objects.
alice $ s3git log --pretty
3f67a4789e2a820546745c6fa40307aa490b7167f7de770f118900a28e6afe8d3c3ec8d170a19977cf415d6b6c5acb78d7595c825b39f7c8b20b471a84cfbee0 Commit from bob
a48cf36af2211e350ec2b05c98e9e3e63439acd1e9e01a8cb2b46e0e0d65f1625239bd1f89ab33771c485f3e6f1d67f119566523a1034e06adc89408a74c4bb3 Commit from alice

Note: Do not store any important info in the s3git-spoon-knife bucket. It will be auto-deleted within 24-hours.

Here is an nice screen recording:

asciicast

Happy forking!

You may be wondering about concurrent behaviour from

Integration with Minio

Instead of S3 you can happily use the Minio server, for example the public server at https://play.minio.io:9000. Just make sure you have a bucket created using mc (example below uses s3git-test):

$ mkdir minio-test && cd minio-test
$ s3git init 
$ s3git remote add "primary" -r s3://s3git-test -a "Q3AM3UQ867SPQQA43P2F" -s "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG" -e "https://play.minio.io:9000"
$ echo "hello minio" | s3git add
Added: c7bb516db796df8dcc824aec05db911031ab3ac1e5ff847838065eeeb52d4410b4d57f8df2e55d14af0b7b1d28362de1176cd51892d7cbcaaefb2cd3f616342f
$ s3git commit -m "Commit for minio test"
$ s3git push
Pushing 1 / 1 [==============================================================================================================================] 100.00 % 0

and clone it

$ s3git clone s3://s3git-test -a "Q3AM3UQ867SPQQA43P2F" -s "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG" -e "https://play.minio.io:9000"
Cloning into .../s3git-test
Done. Totaling 1 object.
$ cd s3git-test/
$ s3git ls
c7bb516db796df8dcc824aec05db911031ab3ac1e5ff847838065eeeb52d4410b4d57f8df2e55d14af0b7b1d28362de1176cd51892d7cbcaaefb2cd3f616342f
$ s3git cat c7bb
hello minio
$ s3git log --pretty
6eb708ec7dfd75d9d6a063e2febf16bab3c7a163e203fc677c8a9178889bac012d6b3fcda56b1eb160b1be7fa56eb08985422ed879f220d42a0e6ec80c5735ea Commit for minio test

Contributions

Contributions are welcome! Please see CONTRIBUTING.md.

Key features

Easy: Use a workflow and syntax that you already know and love

Fast: Lightning fast operation, especially on large files and huge repositories

Infinite scalability: Stop worrying about maximum repository sizes and have the ability to grow indefinitely

Work from local SSD: Make a huge cloud disk appear like a local drive

Instant sync: Push local changes and pull down instantly on other clones

Versioning: Keep previous versions safe and have the ability to undo or go back in time

Forking: Ability to make many variants by forking

Verifiable: Be sure that you have everything and be tamper-proof (“data has not been messed with”)

Deduplication: Do not store the same data twice

Simplicity: Simple by design and provide one way to accomplish tasks

Command Line Help

$ s3git help
s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git.

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository.
It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3.
Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Usage:
  s3git [command]

Available Commands:
  add         Add stream or file(s) to the repository
  cat         Read a file from the repository
  clone       Clone a repository into a new directory
  commit      Commit the changes in the repository
  init        Create an empty repository
  log         Show commit log
  ls          List files in the repository
  pull        Update local repository
  push        Update remote repositories
  remote      Manage remote repositories
  snapshot    Manage snapshots
  status      Show changes in repository

Flags:
  -h, --help[=false]: help for s3git

Use "s3git [command] --help" for more information about a command.

FAQ

Q Is s3git compatible to git at the binary level?
A No. git is optimized for text content with very nice and powerful diffing and using compressed storage whereas s3git is more focused on large repos with primarily non-text blobs backed up by cloud storage like S3.
Q Do you support encryption?
A No. However it is trivial to encrypt data before streaming into s3git add, eg pipe it through openssl enc or similar.
Q Do you support zipping?
A No. Again it is trivial to zip it before streaming into s3git add, eg pipe it through zip -r - . or similar.
Q Why don't you provide a FUSE interface?
A Supporting FUSE would mean introducing a lot of complexity related to POSIX which we would rather avoid.

Download Details:

Author: s3git
Source Code: https://github.com/s3git/s3git 
License: Apache-2.0 license

#go #golang #git #decentralized 

S3git: Git for Cloud Storage (or Version Control For Data)
Gordon  Taylor

Gordon Taylor

1657071180

Gaia: A Decentralized High-performance Storage System

Gaia: A decentralized high-performance storage system

Overview

Gaia works by hosting data in one or more existing storage systems of the user's choice. These storage systems are typically cloud storage systems. We currently have driver support for S3 and Azure Blob Storage, but the driver model allows for other backend support as well. The point is, the user gets to choose where their data lives, and Gaia enables applications to access it via a uniform API.

Blockstack applications use the Gaia storage system to store data on behalf of a user. When the user logs in to an application, the authentication process gives the application the URL of a Gaia hub, which performs writes on behalf of that user. The Gaia hub authenticates writes to a location by requiring a valid authentication token, generated by a private key authorized to write at that location.

User Control: How is Gaia Decentralized?

Gaia's approach to decentralization focuses on user-control of data and storage. If a user can choose which gaia hub and which backend provider to store data with, then that is all the decentralization required to enable user-controlled applications.

In Gaia, the control of user data lies in the way that user data is accessed. When an application fetches a file data.txt for a given user alice.id, the lookup will follow these steps:

  1. Fetch the zonefile for alice.id, and read her profile URL from that zonefile
  2. Fetch the Alice's profile and verify that it is signed by alice.id's key
  3. Read the application root URL (e.g. https://gaia.alice.org/) out of the profile
  4. Fetch file from https://gaia.alice.org/data.txt

Because alice.id controls her zonefile, she can change where her profile is stored, if the current storage of the profile is compromised. Similarly, if Alice wishes to change her gaia provider, or run her own gaia node, she can change the entry in her profile.

For applications writing directly on behalf of Alice, they do not need to perform this lookup. Instead, the stack.js authentication flow provides Alice's chosen application root URL to the application. This authentication flow is also within Alice's control, because the authentication response must be generated by Alice's browser.

While it is true that many Gaia hubs will use backend providers like AWS or Azure, allowing users to easily operate their own hubs, which may select different backend providers (and we'd like to implement more backend drivers), enables truly user-controlled data, while enabling high performance and high availability for data reads and writes.

Write-to and Read-from URL Guarantees

A performance and simplicity oriented guarantee of the Gaia specification is that when an application submits a write to a URL https://myhub.service.org/store/foo/bar, the application is guaranteed to be able to read from a URL https://myreads.com/foo/bar. While the prefix of the read-from URL may change between the two, the suffix must be the same as the write-to URL.

This allows an application to know exactly where a written file can be read from, given the read prefix. To obtain that read prefix, the Gaia service defines an endpoint:

GET /hub_info/

which returns a JSON object with a read_url_prefix.

For example, if my service returns:

{ ...,
  "read_url_prefix": "https://myservice.org/read/"
}

I know that if I submit a write request to:

https://myservice.org/store/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json

That I will be able to read that file from:

https://myservice.org/read/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json

Address-based Access-Control

Access control in a gaia storage hub is performed on a per-address basis. Writes to URLs /store/<address>/<file> are only allowed if the writer can demonstrate that they control that address. This is achieved via an authentication token, which is a message signed by the private-key associated with that address. The message itself is a challenge-text, returned via the /hub_info/ endpoint.

V1 Authentication Scheme

The V1 authentication scheme uses a JWT, prefixed with v1: as a bearer token in the HTTP authorization field. The expected JWT payload structure is:

{
 'type': 'object',
 'properties': {
   'iss': { 'type': 'string' },
   'exp': { 'type': 'IntDate' },
   'iat': { 'type': 'IntDate' },
   'gaiaChallenge': { 'type': 'string' },
   'associationToken': { 'type': 'string' },
   'salt': { 'type': 'string' }
 }
 'required': [ 'iss', 'gaiaChallenge' ]
}

In addition to iss, exp, and gaiaChallenge claims, clients may add other properties (e.g., a salt field) to the payload, and they will not affect the validity of the JWT. Rather, the validity of the JWT is checked by ensuring:

  1. That the JWT is signed correctly by verifying with the pubkey hex provided as iss
  2. That iss matches the address associated with the bucket.
  3. That gaiaChallenge is equal to the server's challenge text.
  4. That the epoch time exp is greater than the server's current epoch time.
  5. That the epoch time iat (issued-at date) is greater than the bucket's revocation date (only if such a date has been set by the bucket owner).

Association Tokens

The association token specification is considered private, as it is mostly used for internal Gaia use cases. This means that this specification can change or become deprecated in the future.

Often times, a single user will use many different keys to store data. These keys may be generated on-the-fly. Instead of requiring the user to explicitly whitelist each key, the v1 authentication scheme allows the user to bind a key to an already-whitelisted key via an association token.

An association token is a JWT signed by a whitelisted key that, in turn, contains the public key that signs the authentication JWT that contains it. Put another way, the Gaia hub will accept a v1 authentication JWT if it contains an associationToken JWT that (1) was sigend by a whitelisted address, and (2) identifies the signer of the authentication JWT.

The association token JWT has the following structure in its payload:

{
  'type': 'object',
  'properties': {
    'iss': { 'type': 'string' },
    'exp': { 'type': 'IntDate' },
    'iat': { 'type': 'IntDate' },
    'childToAssociate': { 'type': 'string' },
    'salt': { 'type': 'string' },
  },
  'required': [ 'iss', 'exp', 'childToAssociate' ]
}

Here, the iss field should be the public key of a whitelisted address. The childtoAssociate should be equal to the iss field of the authentication JWT. Note that the exp field is required in association tokens.

Legacy authentication scheme

In more detail, this signed message is:

BASE64({ "signature" : ECDSA_SIGN(SHA256(challenge-text)),
         "publickey" : PUBLICKEY_HEX })

Currently, challenge-text must match the known challenge-text on the gaia storage hub. However, as future work enables more extensible forms of authentication, we could extend this to allow the auth token to include the challenge-text as well, which the gaia storage hub would then need to also validate.

Data storage format

A gaia storage hub will store the written data exactly as given. This means that the storage hub does not provide many different kinds of guarantees about the data. It does not ensure that data is validly formatted, contains valid signatures, or is encrypted. Rather, the design philosophy is that these concerns are client-side concerns. Client libraries (such as stacks.js) are capable of providing these guarantees, and we use a liberal definition of the end-to-end principle to guide this design decision.

Operation of a Gaia Hub

Configuration files

A configuration JSON file should be stored either in the top-level directory of the hub server, or a file location may be specified in the environment variable CONFIG_PATH.

An example configuration file is provided in (./hub/config.sample.json) You can specify the logging level, the number of social proofs required for addresses to write to the system, the backend driver, the credentials for that backend driver, and the readURL for the storage provider.

Private hubs

A private hub services requests for a single user. This is controlled via whitelisting the addresses allowed to write files. In order to support application storage, because each application uses a different app- and user-specific address, each application you wish to use must be added to the whitelist separately.

Alternatively, the user's client must use the v1 authentication scheme and generate an association token for each app. The user should whitelist her address, and use her associated private key to sign each app's association token. This removes the need to whitelist each application, but with the caveat that the user needs to take care that her association tokens do not get misused.

Open-membership hubs

An open-membership hub will allow writes for any address top-level directory, each request will still be validated such that write requests must provide valid authentication tokens for that address. Operating in this mode is recommended for service and identity providers who wish to support many different users.

In order to limit the users that may interact with such a hub to users who provide social proofs of identity, we support an execution mode where the hub checks that a user's profile.json object contains social proofs in order to be able to write to other locations. This can be configured via the config.json.

Driver model

Gaia hub drivers are fairly simple. The biggest requirement is the ability to fulfill the write-to/read-from URL guarantee.

A driver can expect that two modification operations to the same path will be mutually exclusive. No writes, renames, or deletes to the same path will be concurrent.

As currently implemented a gaia hub driver must implement the following functions:

interface DriverModel {

  /**
   * Return the prefix for reading files from.
   *  a write to the path `foo` should be readable from
   *  `${getReadURLPrefix()}foo`
   * @returns the read url prefix.
   */
  getReadURLPrefix(): string;

  /**
   * Performs the actual write of a file to `path`
   *   the file must be readable at `${getReadURLPrefix()}/${storageToplevel}/${path}`
   *
   * @param options.path - path of the file.
   * @param options.storageToplevel - the top level directory to store the file in
   * @param options.contentType - the HTTP content-type of the file
   * @param options.stream - the data to be stored at `path`
   * @param options.contentLength - the bytes of content in the stream
   * @param options.ifMatch - optional etag value to be used for optimistic concurrency control
   * @param options.ifNoneMatch - used with the `*` value to save a file not known to exist,
   * guaranteeing that another upload didn't happen before, losing the data of the previous
   * @returns Promise that resolves to an object containing a public-readable URL of the stored content and the objects etag value
   */
  performWrite(options: {
    path: string;
    storageTopLevel: string;
    stream: Readable;
    contentLength: number;
    contentType: string;
    ifMatch?: string;
    ifNoneMatch?: string;
  }): Promise<{
    publicURL: string,
    etag: string
  }>;

  /**
   * Deletes a file. Throws a `DoesNotExist` if the file does not exist. 
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   * @param  options.contentType - the HTTP content-type of the file
   */
  performDelete(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<void>;

  /**
   * Renames a file given a path. Some implementations do not support
   * a first class move operation and this can be implemented as a copy and delete. 
   * @param options.path - path of the original file
   * @param options.storageTopLevel - the top level directory for the original file
   * @param options.newPath - new path for the file
   */
  performRename(options: {
    path: string;
    storageTopLevel: string;
    newPath: string;
  }): Promise<void>;

  /**
   * Retrieves metadata for a given file.
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   */
  performStat(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<{
    exists: boolean;
    lastModifiedDate: number;
    contentLength: number;
    contentType: string;
    etag: string;
  }>;

  /**
   * Returns an object with a NodeJS stream.Readable for the file content
   * and metadata about the file.
   * @param options.path - path of the file
   * @param options.storageTopLevel - the top level directory
   */
  performRead(options: {
    path: string;
    storageTopLevel: string;
  }): Promise<{
    data: Readable;
    lastModifiedDate: number;
    contentLength: number;
    contentType: string;
    etag: string;
  }>;

  /**
   * Return a list of files beginning with the given prefix,
   * as well as a driver-specific page identifier for requesting
   * the next page of entries.  The return structure should
   * take the form { "entries": [string], "page"?: string }
   * @returns {Promise} the list of files and a possible page identifier.
   */
  listFiles(options: {
    pathPrefix: string;
    page?: string;
  }): Promise<{
    entries: string[];
    page?: string;
  }>;

  /**
   * Return a list of files beginning with the given prefix,
   * as well as file metadata, and a driver-specific page identifier
   * for requesting the next page of entries.
   */
  listFilesStat(options: {
    pathPrefix: string;
    page?: string;
  }): Promise<{
    entries: {
        name: string;
        lastModifiedDate: number;
        contentLength: number;
        etag: string;
    }[];
    page?: string;
  }>;
  
}

HTTP API

The Gaia storage API defines the following endpoints:


GET ${read-url-prefix}/${address}/${path}

This returns the data stored by the gaia hub at ${path}. The response headers include Content-Type and ETag, along with the required CORS headers Access-Control-Allow-Origin and Access-Control-Allow-Methods.


HEAD ${read-url-prefix}/${address}/${path}

Returns the same headers as the corresponding GET request. HEAD requests do not return a response body.


POST ${hubUrl}/store/${address}/${path}

This performs a write to the gaia hub at ${path}.

On success, it returns a 202 status, and a JSON object:

{
 "publicURL": "${read-url-prefix}/${address}/${path}",
 "etag": "version-identifier"
}

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.

Additionally, file ETags and conditional request headers are used as a concurrency control mechanism. All requests to this endpoint should contain either an If-Match header or an If-None-Match header. The three request types are as follows:

Update existing file: this request must specify an If-Match header containing the most up to date ETag. If the file has been updated elsewhere and the ETag supplied in the If-Match header doesn't match that of the file in gaia, a 412 Precondition Failed error will be returned.

Create a new file: this request must specify the If-None-Match: * header. If the already exists at the given path, a 412 Precondition Failed error will be returned.

Overwrite a file: this request must specify the If-Match: * header. Note that this bypasses concurrency control and should be used with caution. Improper use can cause bugs such as unintended data loss.

The file ETag is returned in the response body of the store POST request, the response headers of GET and HEAD requests, and in the returned entries in list-files request.

Additionally, a request to a file path that already has a previous ongoing request still processing for the same file path will return with a 409 Conflict error. This can be handled with a retry.


DELETE ${hubUrl}/delete/${address}/${path}

This performs a deletion of a file in the gaia hub at ${path}.

On success, it returns a 202 status. Returns a 404 if the path does not exist. Returns 400 if the path is invalid.

The DELETE must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


GET ${hubUrl}/hub_info/

Returns a JSON object:

{
 "challenge_text": "text-which-must-be-signed-to-validate-requests",
 "read_url_prefix": "${read-url-prefix}"
 "latest_auth_version": "v1"
}

The latest auth version allows the client to figure out which auth versions the gaia hub supports.


POST ${hubUrl}/revoke-all/${address}

The post body must be a JSON object with the following field:

{ "oldestValidTimestamp": "${timestamp}" }

Where the timestamp is an epoch time in seconds. The timestamp is written to a bucket-specific file (/${address}-auth). This becomes the oldest valid iat timestamp for authentication tokens that write to the /${address}/ bucket.

On success, it returns a 202 status, and a JSON object:

{ "status": "success" }

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


POST ${hubUrl}/list-files/${address}

The post body can contain a page field with the pagination identifier from a previous request:

{ "page": "${lastListFilesResult.page}" }

If the post body contains a stat: true field then the returned JSON includes file metadata:

{
  "entries": [
    { "name": "string", "lastModifiedDate": "number", "contentLength": "number", "etag": "string" },
    { "name": "string", "lastModifiedDate": "number", "contentLength": "number", "etag": "string" },
    // ...
  ],
  "page": "string" // possible pagination marker
}

If the post body does not contain a stat: true field then the returned JSON entries will only be file name strings:

{
  "entries": [
    "fileNameExample1",
    "fileNameExample2",
    // ...
  ],
  "page": "string" // possible pagination marker
}

The POST must contain an authentication header with a bearer token. The bearer token's content and generation is described in the access control section of this document.


Future Design Goals

Dependency on DNS

The gaia specification requires that a gaia hub return a URL that a user's client will be able to fetch. In practice, most gaia hubs will use URLs with DNS entries for hostnames (though URLs with IP addresses would work as well). However, even though the spec uses URLs, that doesn't necessarily make an opinionated claim on underlying mechanisms for that URL. If a browser supported new URL schemes which enabled lookups without traditional DNS (for example, with the Blockstack Name System instead), then gaia hubs could return URLs implementing that scheme. As the Blockstack ecosystem develops and supports these kinds of features, we expect users would deploy gaia hubs that would take advantage.

Extensibly limiting membership sets

Some service providers may wish to provide hub services to a limited set of different users, with a provider-specific method of authenticating that a user or address is within that set. In order to provide that functionality, our hub implementation would need to be extensible enough to allow plugging in different authentication models.

A .storage Namespace

Gaia nodes can request data from other Gaia nodes, and can store data to other Gaia nodes. In effect, Gaia nodes can be "chained together" in arbitrarily complex ways. This creates an opportunity to create a decentralized storage marketplace.

Example

For example, Alice can make her Gaia node public and program it to store data to her Amazon S3 bucket and her Dropbox account. Bob can then POST data to Alice's node, causing her node to replicate data to both providers. Later, Charlie can read Bob's data from Alice's node, causing Alice's node to fetch and serve back the data from her cloud storage. Neither Bob nor Charlie have to set up accounts on Amazon S3 and Dropbox this way, since Alice's node serves as an intermediary between them and the storage providers.

Since Alice is on the read/write path between Bob and Charlie and cloud storage, she has the opportunity to make optimizations. First, she can program her Gaia node to synchronously write data to local disk and asynchronously back it up to S3 and Dropbox. This would speed up Bob's writes, but at the cost of durability (i.e. Alice's node could crash before replicating to the cloud).

In addition, Alice can program her Gaia node to service all reads from disk. This would speed up Charlie's reads, since he'll get the latest data without having to hit back-end cloud storage providers.

Service Description

Since Alice is providing a service to Bob and Charlie, she will want compensation. This can be achieved by having both of them send her money via the underlying blockchain.

To do so, she would register her node's IP address in a .storage namespace in Blockstack, and post her rates per gigabyte in her node's profile and her payment address. Once Bob and Charlie sent her payment, her node would begin accepting reads and writes from them up to the capacity purchased. They would continue sending payments as long as Alice provides them with service.

Other experienced Gaia node operators would register their nodes in .storage, and compete for users by offerring better durability, availability, performance, extra storage features, and so on.

Notes on our deployed service

Our deployed service places some modest limitations on file uploads and rate limits. Currently, the service will only allow up to 20 write requests per second and a maximum file size of 5MB. However, these limitations are only for our service, if you deploy your own Gaia hub, these limitations are not necessary.

Project Comparison

Here's how Gaia stacks up against other decentralized storage systems. Features that are common to all storage systems are omitted for brevity.

FeaturesGaiaSiaStorjIPFSDATSSB
User controls where data is hostedX     
Data can be viewed in a normal Web browserX  X  
Data is read/writeX   XX
Data can be deletedX   XX
Data can be listedXXX XX
Deleted data space is reclaimedXXXX  
Data lookups have predictable performanceX X   
Writes permission can be delegatedX     
Listing permission can be delegatedX     
Supports multiple backends nativelyX X   
Data is globally addressableXXXXX 
Needs a cryptocurrency to work XX   
Data is content-addressed XXXXX

This document describes the high-level design and implementation of the Gaia storage system, also briefly explained in the docs.stacks.co. It includes specifications for backend storage drivers and interactions between developer APIs and the Gaia service.

Developers who wish to use the Gaia storage system should see the stacks.js documentation here and in particular the storage package here.

Instructions on setting up, configuring and testing a Gaia Hub can be found here and here.


Author: Stacks-network
Source Code: https://github.com/stacks-network/gaia 
License: MIT license

#javascript #typescript #decentralized 

Gaia: A Decentralized High-performance Storage System

Sourcify: A Solidity Source Code and Metadata Verification Tool

Sourcify (sourcify.dev) is a Solidity source code and metadata verification tool.

The project aims to serve as an infrastructure for other tools with an open repository of verified contracts as well as an API and other services. The goal is to make contract interactions on the blockchain safer and more user friendly with open sourced contract codes, contract ABI, and NatSpec comments available via the contract metadata.

ℹ️ This monorepo contains the main services and the verification UI. The sourcifyeth Github organization contains all other auxiliary services and components.

Documentation

For more details refer to docs.sourcify.dev

Questions?

🔍 Check out docs F.A.Q. and use search in docs.

💬 Chat with us on Gitter or Matrix chat

🐦 Follow us and help us spread the word on Twitter.

Adding a new chain

If you'd like to add a new chain support to Sourcify please follow the chain support instructions in docs.

Download Details:
Author: ethereum
Source Code: https://github.com/ethereum/sourcify
License: MIT License

#blockchain  #solidity  #ethereum  #smartcontract #decentralized 

Sourcify: A Solidity Source Code and Metadata Verification Tool
Callum  Owen

Callum Owen

1649939700

What is a DEX (Decentralized Exchange) and How It Works?

In this video, we explain what a DEX, or Decentralized Exchange is and how it works using a neat and simple animation for you to understand. Decentralized exchanges (DEX) are a type of cryptocurrency exchange which allows for direct peer-to-peer cryptocurrency transactions to take place online securely and without the need for an intermediary.

#decentralized #blockchain #decentralizedexchange #dex 

What is a DEX (Decentralized Exchange) and How It Works?
Callum  Owen

Callum Owen

1648120860

What Are DApps? and 12 Decentralized Application Examples

dApps, or Decentralized Applications are apps that run using blockchain technology. They are permissionless, open-source, and allow the use of cryptocurrency to perform a wide variety of opportunities.

#dapp #dapp #blockchain #decentralized 

What Are DApps? and 12 Decentralized Application Examples
Hans  Marvin

Hans Marvin

1648069500

What is DeFi (Decentralized Finance) ? With Examples

You might be searching the web trying to figure out "What is Defi?". Well, in this video we cover exactly what defi (or decentralized finance) is and cover 5 main pillars holding it up, including examples to help illustrate our points.

1) Stablecoins
2) Lending and Borrowing
3) Decentralized Exchanges
4) Insurance
5) Margin Trading

#defi #blockchain #decentralized #DecentralizedFinance

What is DeFi (Decentralized Finance) ? With Examples
Callum  Owen

Callum Owen

1648055220

What is a DAO in Crypto? (Decentralized Autonomous Organization)

A DAO, or a Decentralized Autonomous Organization, is a company set up to run by code on the blockchain. The people who own tokens associated with the DAO are responsible for voting on changes and proposing new ideas to keep the DAO up and running and improving.

#dao #blockchain #decentralized 

What is a DAO in Crypto? (Decentralized Autonomous Organization)
Hans  Marvin

Hans Marvin

1648007640

What is Uniswap Decentralized Exchange?

In this video, we explain what the Uniswap Decentralized Exchange is, including the purpose of it and the reason for the UNI token. There are 2 main uses for Uniswap.

So you could be a token swapper, and change your ETH for BAT for only .03%, which is useful to many traders. In fact, Uniswap currently has over one billion dollars trading each day.

Another main use is you could be a liquidity provider, and earn a return on your investment by lending it to the pool. Right now, there are 9 billion dollars in all of the pools on the Uniswap exchange.

#uniswap #decentralized #UNIToken

What is Uniswap Decentralized Exchange?
Callum  Owen

Callum Owen

1648000020

What is Sushiswap? (Animated) Sushi Token + Kashi + Miso Explained

What is Sushiswap? Sushiswap is a decentralized exchange that was founded by copying Uniswap's code and offering better rates. The history of the Sushi token is quite unique, including how Chef Nomi rug pulled, and then gave back to the community and apologized.  They offer a few other interesting DeFi tools like Kashi and Miso and we explain them in this video.

#sushiswap #blockchain #ethereum #decentralized 

What is Sushiswap? (Animated) Sushi Token + Kashi + Miso Explained
Callum  Owen

Callum Owen

1647948780

What Is Yield Farming? What You Need To Know

If you're sticking your toes into the world of Decentralized Finance (DeFi), you may want to learn about Yield Farming. So what is Yield Farming? In the simplest terms, Yield Farming is finding the best places to put your crypto so that you can earn more free crypto. In this video, we go over 4 popular examples of common yield farming methods.

#defi #decentralized #blockchain #decentralizedfinance #farm 

What Is Yield Farming? What You Need To Know
Nikita  Koelpin

Nikita Koelpin

1647419331

How to Use Radicle CLI To Host Web3 Git

In this video I'll show how to use Radicle - both the Radicle CLI as well as the network - to host your app source code on a decentralized network instead of relying on a centralized service like GitHub or Bitbucket.

https://github.com/radicle-dev/radicle-cli

#web3 #git #Radicle #cli #decentralized 

How to Use Radicle CLI To Host Web3 Git

Smart Contract Development Guide for Business Owners

https://www.blog.duomly.com/smart-contract-development-guide/

As a business owner, you may have heard of the term “smart contract” and wondered what it is and how it could benefit your business. Smart contracts are digital contracts that use blockchain technology to automate the negotiation and execution of transactions.

If you’d like to learn more about smart contract development, please read our guide on how to get started. In it, we’ll teach you the basics of smart contract development and how to apply it to your business.

#blockchain #hyperledger #softwaredevelopment #developer #development #decentralized #blockchaindevelopment #smartcontract

Smart Contract Development Guide for Business Owners

Blockchain Development Guide for Business Owners

https://www.blog.duomly.com/blockchain-development-guide/

This article introduces you to blockchain development basics, how much it costs, finding developers, and where to look for them. 

#blockchain #hyperledger #software-development #developer #development #decentralized 

Blockchain Development Guide for Business Owners

NFT Development Guide for Business Owners

https://www.blog.duomly.com/nft-development-guide/

If you’re a business owner, you know that staying ahead of the competition is key to success. And to stay ahead, you need to be constantly innovating and evolving your business model. But how do you do that? How can you create something new when everything around you seems so familiar?

One way to develop new ideas is to explore the world of NFT development. NFTs are a relatively new technology, and there are still many possibilities for what they can be used for. So if you’re looking for ways to take your business to the next level, then NFT development may be just what you need.

#blockchain #hyperledger #web3 #nft #business #businesses #token #tokenization #tokens #decentralized #p2p #entrepreneur #entrepreneurs #businesses #startup 

NFT Development Guide for Business Owners