Royce  Reinger

Royce Reinger

1678471140

Sacred: A tool To Help You Configure, Organize, Log

Sacred

Every experiment is sacred

Every experiment is great

If an experiment is wasted

God gets quite irate

Sacred is a tool to help you configure, organize, log and reproduce experiments. It is designed to do all the tedious overhead work that you need to do around your actual experiment in order to:

  • keep track of all the parameters of your experiment
  • easily run your experiment for different settings
  • save configurations for individual runs in a database
  • reproduce your results

Sacred achieves this through the following main mechanisms:

  • Config Scopes A very convenient way of the local variables in a function to define the parameters your experiment uses.
  • Config Injection: You can access all parameters of your configuration from every function. They are automatically injected by name.
  • Command-line interface: You get a powerful command-line interface for each experiment that you can use to change parameters and run different variants.
  • Observers: Sacred provides Observers that log all kinds of information about your experiment, its dependencies, the configuration you used, the machine it is run on, and of course the result. These can be saved to a MongoDB, for easy access later.
  • Automatic seeding helps controlling the randomness in your experiments, such that the results remain reproducible.

Example

Script to train an SVM on the iris datasetThe same script as a Sacred experiment
from numpy.random import permutation
from sklearn import svm, datasets





C = 1.0
gamma = 0.7



iris = datasets.load_iris()
perm = permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]
clf = svm.SVC(C=C, kernel='rbf',
        gamma=gamma)
clf.fit(iris.data[:90],
        iris.target[:90])
print(clf.score(iris.data[90:],
                iris.target[90:]))
from numpy.random import permutation
from sklearn import svm, datasets
from sacred import Experiment
ex = Experiment('iris_rbf_svm')

@ex.config
def cfg():
  C = 1.0
  gamma = 0.7

@ex.automain
def run(C, gamma):
  iris = datasets.load_iris()
  per = permutation(iris.target.size)
  iris.data = iris.data[per]
  iris.target = iris.target[per]
  clf = svm.SVC(C=C, kernel='rbf',
          gamma=gamma)
  clf.fit(iris.data[:90],
          iris.target[:90])
  return clf.score(iris.data[90:],
                   iris.target[90:])

Documentation

The documentation is hosted at ReadTheDocs.

Installing

You can directly install it from the Python Package Index with pip:

pip install sacred

Or if you want to do it manually you can checkout the current version from git and install it yourself:

git clone https://github.com/IDSIA/sacred.git

cd sacred

python setup.py install

You might want to also install the numpy and the pymongo packages. They are optional dependencies but they offer some cool features:

pip install numpy pymongo

Tests

The tests for sacred use the pytest package. You can execute them by running pytest in the sacred directory like this:

pytest

There is also a config file for tox so you can automatically run the tests for various python versions like this:

tox

Update pytest version

If you update or change the pytest version, the following files need to be changed:

  • dev-requirements.txt
  • tox.ini
  • test/test_utils.py
  • setup.py

Contributing

If you find a bug, have a feature request or want to discuss something general you are welcome to open an issue. If you have a specific question related to the usage of sacred, please ask a question on StackOverflow under the python-sacred tag. We value documentation a lot. If you find something that should be included in the documentation please document it or let us know whats missing. If you are using Sacred in one of your projects and want to share your code with others, put your repo in the Projects using Sacred <docs/projects_using_sacred.rst>_ list. Pull requests are highly welcome!

Frontends

At this point there are three frontends to the database entries created by sacred (that I'm aware of). They are developed externally as separate projects.

Omniboard

docs/images/omniboard-table.png

docs/images/omniboard-metric-graphs.png

Omniboard is a web dashboard that helps in visualizing the experiments and metrics / logs collected by sacred. Omniboard is written with React, Node.js, Express and Bootstrap.

Incense

docs/images/incense-artifact.png

docs/images/incense-metric.png

Incense is a Python library to retrieve runs stored in a MongoDB and interactively display metrics and artifacts in Jupyter notebooks.

Sacredboard

docs/images/sacredboard.png

Sacredboard is a web-based dashboard interface to the sacred runs stored in a MongoDB.

Neptune

docs/images/neptune-compare.png

docs/images/neptune-collaboration.png

Neptune is a metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata via API available for both Python and R programming languages:

docs/images/neptune-query-api.png

In order to log your sacred experiments to Neptune, all you need to do is add an observer:

from neptune.new.integrations.sacred import NeptuneObserver
ex.observers.append(NeptuneObserver(api_token='<YOUR_API_TOKEN>',
                                    project='<YOUR_WORKSPACE/YOUR_PROJECT>'))

For more info, check the Neptune + Sacred integration guide.

SacredBrowser

docs/images/sacred_browser.png

SacredBrowser is a PyQt4 application to browse the MongoDB entries created by sacred experiments. Features include custom queries, sorting of the results, access to the stored source-code, and many more. No installation is required and it can connect to a local database or over the network.

Prophet

Prophet is an early prototype of a webinterface to the MongoDB entries created by sacred experiments, that is discontinued. It requires you to run RestHeart to access the database.

Related Projects

Sumatra

Sumatra is a tool for managing and tracking projects based on numerical

simulation and/or analysis, with the aim of supporting reproducible research.

It can be thought of as an automated electronic lab notebook for

computational projects.

Sumatra takes a different approach by providing commandline tools to initialize a project and then run arbitrary code (not just python). It tracks information about all runs in a SQL database and even provides a nice browser tool. It integrates less tightly with the code to be run, which makes it easily applicable to non-python experiments. But that also means it requires more setup for each experiment and configuration needs to be done using files. Use this project if you need to run non-python experiments, or are ok with the additional setup/configuration overhead.

Future Gadget Laboratory

FGLab is a machine learning dashboard, designed to make prototyping

experiments easier. Experiment details and results are sent to a database,

which allows analytics to be performed after their completion. The server

is FGLab, and the clients are FGMachines.

Similar to Sumatra, FGLab is an external tool that can keep track of runs from any program. Projects are configured via a JSON schema and the program needs to accept these configurations via command-line options. FGLab also takes the role of a basic scheduler by distributing runs over several machines.


Download Details:

Author: IDSIA
Source Code: https://github.com/IDSIA/sacred 
License: MIT license

#machinelearning #python #infrastructure #mongodb 

Sacred: A tool To Help You Configure, Organize, Log
Royce  Reinger

Royce Reinger

1673746980

Catalyst: Accelerated deep learning R&D

Catalyst

Accelerated Deep Learning R&D

Catalyst is a PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. 
Break the cycle – use the Catalyst!

Catalyst at PyTorch Ecosystem Day 2021

 

Catalyst poster

Catalyst at PyTorch Developer Day 2021

Catalyst poster


Getting started

pip install -U catalyst
import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)
loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False), batch_size=32),
}

runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)

# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", topk=(1, 3, 5)),
        dl.PrecisionRecallF1SupportCallback(input_key="logits", target_key="targets"),
    ],
    logdir="./logs",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
)

# model evaluation
metrics = runner.evaluate_loader(
    loader=loaders["valid"],
    callbacks=[dl.AccuracyCallback(input_key="logits", target_key="targets", topk=(1, 3, 5))],
)

# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
    assert prediction["logits"].detach().cpu().numpy().shape[-1] == 10

# model post-processing
model = runner.model.cpu()
batch = next(iter(loaders["valid"]))[0]
utils.trace_model(model=model, batch=batch)
utils.quantize_model(model=model)
utils.prune_model(model=model, pruning_fn="l1_unstructured", amount=0.8)
utils.onnx_export(model=model, batch=batch, file="./logs/mnist.onnx", verbose=True)

Step-by-step Guide

  1. Start with Catalyst — A PyTorch Framework for Accelerated Deep Learning R&D introduction.
  2. Try notebook tutorials or check minimal examples for first deep dive.
  3. Read blog posts with use-cases and guides.
  4. Learn machine learning with our "Deep Learning with Catalyst" course.
  5. And finally, join our slack if you want to chat with the team and contributors.

Overview

Catalyst helps you implement compact but full-featured Deep Learning pipelines with just a few lines of code. You get a training loop with metrics, early-stopping, model checkpointing, and other features without the boilerplate.

Installation

Generic installation:

pip install -U catalyst

Specialized versions, extra requirements might apply

 

pip install catalyst[ml]         # installs ML-based Catalyst
pip install catalyst[cv]         # installs CV-based Catalyst
# master version installation
pip install git+https://github.com/catalyst-team/catalyst@master --upgrade
# all available extensions are listed here:
# https://github.com/catalyst-team/catalyst/blob/master/setup.py

 

Catalyst is compatible with: Python 3.7+. PyTorch 1.4+. 
Tested on Ubuntu 16.04/18.04/20.04, macOS 10.15, Windows 10, and Windows Subsystem for Linux.

Documentation

master

22.02

  • 2021 edition

 

Minimal Examples

CustomRunner – PyTorch for-loop decomposition

import os
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
optimizer = optim.Adam(model.parameters(), lr=0.02)

train_data = MNIST(os.getcwd(), train=True)
valid_data = MNIST(os.getcwd(), train=False)
loaders = {
    "train": DataLoader(train_data, batch_size=32),
    "valid": DataLoader(valid_data, batch_size=32),
}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.engine.device))

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveMetric(compute_on_call=False)
            for key in ["loss", "accuracy01", "accuracy03"]
        }

    def handle_batch(self, batch):
        # model train/valid step
        # unpack the batch
        x, y = batch
        # run model forward pass
        logits = self.model(x)
        # compute the loss
        loss = F.cross_entropy(logits, y)
        # compute the metrics
        accuracy01, accuracy03 = metrics.accuracy(logits, y, topk=(1, 3))
        # log metrics
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.meters[key].update(self.batch_metrics[key].item(), self.batch_size)
        # run model backward pass
        if self.is_train_loader:
            self.engine.backward(loss)
            self.optimizer.step()
            self.optimizer.zero_grad()

    def on_loader_end(self, runner):
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
)
# model inference
for logits in runner.predict_loader(loader=loaders["valid"]):
    assert logits.detach().cpu().numpy().shape[-1] == 10

ML - linear regression

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# data
num_samples, num_features = int(1e4), int(1e1)
X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    num_epochs=8,
    verbose=True,
)

ML - multiclass classification

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples,) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", num_classes=num_classes),
        # uncomment for extra metrics:
        # dl.PrecisionRecallF1SupportCallback(
        #     input_key="logits", target_key="targets", num_classes=num_classes
        # ),
        # dl.AUCCallback(input_key="logits", target_key="targets"),
        # catalyst[ml] required ``pip install catalyst[ml]``
        # dl.ConfusionMatrixCallback(
        #     input_key="logits", target_key="targets", num_classes=num_classes
        # ),
    ],
)

ML - multilabel classification

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, num_classes) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy01",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        # uncomment for extra metrics:
        # dl.MultilabelAccuracyCallback(input_key="scores", target_key="targets", threshold=0.5),
        # dl.MultilabelPrecisionRecallF1SupportCallback(
        #     input_key="scores", target_key="targets", threshold=0.5
        # ),
    ]
)

ML - multihead classification

import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes1, num_classes2 = int(1e4), int(1e1), 4, 10
X = torch.rand(num_samples, num_features)
y1 = (torch.rand(num_samples,) * num_classes1).to(torch.int64)
y2 = (torch.rand(num_samples,) * num_classes2).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y1, y2)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

class CustomModule(nn.Module):
    def __init__(self, in_features: int, out_features1: int, out_features2: int):
        super().__init__()
        self.shared = nn.Linear(in_features, 128)
        self.head1 = nn.Linear(128, out_features1)
        self.head2 = nn.Linear(128, out_features2)

    def forward(self, x):
        x = self.shared(x)
        y1 = self.head1(x)
        y2 = self.head2(x)
        return y1, y2

# model, criterion, optimizer, scheduler
model = CustomModule(num_features, num_classes1, num_classes2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [2])

class CustomRunner(dl.Runner):
    def handle_batch(self, batch):
        x, y1, y2 = batch
        y1_hat, y2_hat = self.model(x)
        self.batch = {
            "features": x,
            "logits1": y1_hat,
            "logits2": y2_hat,
            "targets1": y1,
            "targets2": y2,
        }

# model training
runner = CustomRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.CriterionCallback(metric_key="loss1", input_key="logits1", target_key="targets1"),
        dl.CriterionCallback(metric_key="loss2", input_key="logits2", target_key="targets2"),
        dl.MetricAggregationCallback(metric_key="loss", metrics=["loss1", "loss2"], mode="mean"),
        dl.BackwardCallback(metric_key="loss"),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.AccuracyCallback(
            input_key="logits1", target_key="targets1", num_classes=num_classes1, prefix="one_"
        ),
        dl.AccuracyCallback(
            input_key="logits2", target_key="targets2", num_classes=num_classes2, prefix="two_"
        ),
        # catalyst[ml] required ``pip install catalyst[ml]``
        # dl.ConfusionMatrixCallback(
        #     input_key="logits1", target_key="targets1", num_classes=num_classes1, prefix="one_cm"
        # ),
        # dl.ConfusionMatrixCallback(
        #     input_key="logits2", target_key="targets2", num_classes=num_classes2, prefix="two_cm"
        # ),
        dl.CheckpointCallback(
            logdir="./logs/one",
            loader_key="valid", metric_key="one_accuracy01", minimize=False, topk=1
        ),
        dl.CheckpointCallback(
            logdir="./logs/two",
            loader_key="valid", metric_key="two_accuracy03", minimize=False, topk=3
        ),
    ],
    loggers={"console": dl.ConsoleLogger(), "tb": dl.TensorboardLogger("./logs/tb")},
)

 

ML – RecSys

 

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_users, num_features, num_items = int(1e4), int(1e1), 10
X = torch.rand(num_users, num_features)
y = (torch.rand(num_users, num_items) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_items)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.CriterionCallback(input_key="logits", target_key="targets", metric_key="loss"),
        # uncomment for extra metrics:
        # dl.AUCCallback(input_key="scores", target_key="targets"),
        # dl.HitrateCallback(input_key="scores", target_key="targets", topk=(1, 3, 5)),
        # dl.MRRCallback(input_key="scores", target_key="targets", topk=(1, 3, 5)),
        # dl.MAPCallback(input_key="scores", target_key="targets", topk=(1, 3, 5)),
        # dl.NDCGCallback(input_key="scores", target_key="targets", topk=(1, 3, 5)),
        dl.BackwardCallback(metric_key="loss"),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.CheckpointCallback(
            logdir="./logs", loader_key="valid", metric_key="loss", minimize=True
        ),
    ]
)

CV - MNIST classification

import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)

train_data = MNIST(os.getcwd(), train=True)
valid_data = MNIST(os.getcwd(), train=False)
loaders = {
    "train": DataLoader(train_data, batch_size=32),
    "valid": DataLoader(valid_data, batch_size=32),
}

runner = dl.SupervisedRunner()
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    logdir="./logs",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
# uncomment for extra metrics:
#     callbacks=[
#         dl.AccuracyCallback(input_key="logits", target_key="targets", num_classes=10),
#         dl.PrecisionRecallF1SupportCallback(
#             input_key="logits", target_key="targets", num_classes=10
#         ),
#         dl.AUCCallback(input_key="logits", target_key="targets"),
#         # catalyst[ml] required ``pip install catalyst[ml]``
#         dl.ConfusionMatrixCallback(
#             input_key="logits", target_key="targets", num_classes=num_classes
#         ),
#     ]
)

CV - MNIST segmentation

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.losses import IoULoss


model = nn.Sequential(
    nn.Conv2d(1, 1, 3, 1, 1), nn.ReLU(),
    nn.Conv2d(1, 1, 3, 1, 1), nn.Sigmoid(),
)
criterion = IoULoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

train_data = MNIST(os.getcwd(), train=True)
valid_data = MNIST(os.getcwd(), train=False)
loaders = {
    "train": DataLoader(train_data, batch_size=32),
    "valid": DataLoader(valid_data, batch_size=32),
}

class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch):
        x = batch[self._input_key]
        x_noise = (x + torch.rand_like(x)).clamp_(0, 1)
        x_ = self.model(x_noise)
        self.batch = {self._input_key: x, self._output_key: x_, self._target_key: x}

runner = CustomRunner(
    input_key="features", output_key="scores", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.IOUCallback(input_key="scores", target_key="targets"),
        dl.DiceCallback(input_key="scores", target_key="targets"),
        dl.TrevskyCallback(input_key="scores", target_key="targets", alpha=0.2),
    ],
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
)

CV - MNIST metric learning

import os
from torch.optim import Adam
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.data import HardTripletsSampler
from catalyst.contrib.datasets import MnistMLDataset, MnistQGDataset
from catalyst.contrib.losses import TripletMarginLossWithSampler
from catalyst.contrib.models import MnistSimpleNet
from catalyst.data.sampler import BatchBalanceClassSampler


# 1. train and valid loaders
train_dataset = MnistMLDataset(root=os.getcwd())
sampler = BatchBalanceClassSampler(
    labels=train_dataset.get_labels(), num_classes=5, num_samples=10, num_batches=10
)
train_loader = DataLoader(dataset=train_dataset, batch_sampler=sampler)

valid_dataset = MnistQGDataset(root=os.getcwd(), gallery_fraq=0.2)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=1024)

# 2. model and optimizer
model = MnistSimpleNet(out_features=16)
optimizer = Adam(model.parameters(), lr=0.001)

# 3. criterion with triplets sampling
sampler_inbatch = HardTripletsSampler(norm_required=False)
criterion = TripletMarginLossWithSampler(margin=0.5, sampler_inbatch=sampler_inbatch)

# 4. training with catalyst Runner
class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch) -> None:
        if self.is_train_loader:
            images, targets = batch["features"].float(), batch["targets"].long()
            features = self.model(images)
            self.batch = {"embeddings": features, "targets": targets,}
        else:
            images, targets, is_query = \
                batch["features"].float(), batch["targets"].long(), batch["is_query"].bool()
            features = self.model(images)
            self.batch = {"embeddings": features, "targets": targets, "is_query": is_query}

callbacks = [
    dl.ControlFlowCallbackWrapper(
        dl.CriterionCallback(input_key="embeddings", target_key="targets", metric_key="loss"),
        loaders="train",
    ),
    dl.ControlFlowCallbackWrapper(
        dl.CMCScoreCallback(
            embeddings_key="embeddings",
            labels_key="targets",
            is_query_key="is_query",
            topk=[1],
        ),
        loaders="valid",
    ),
    dl.PeriodicLoaderCallback(
        valid_loader_key="valid", valid_metric_key="cmc01", minimize=False, valid=2
    ),
]

runner = CustomRunner(input_key="features", output_key="embeddings")
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    callbacks=callbacks,
    loaders={"train": train_loader, "valid": valid_loader},
    verbose=False,
    logdir="./logs",
    valid_loader="valid",
    valid_metric="cmc01",
    minimize_valid_metric=False,
    num_epochs=10,
)

CV - MNIST GAN

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.layers import GlobalMaxPool2d, Lambda

latent_dim = 128
generator = nn.Sequential(
    # We want to generate 128 coefficients to reshape into a 7x7x128 map
    nn.Linear(128, 128 * 7 * 7),
    nn.LeakyReLU(0.2, inplace=True),
    Lambda(lambda x: x.view(x.size(0), 128, 7, 7)),
    nn.ConvTranspose2d(128, 128, (4, 4), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.ConvTranspose2d(128, 128, (4, 4), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.Conv2d(128, 1, (7, 7), padding=3),
    nn.Sigmoid(),
)
discriminator = nn.Sequential(
    nn.Conv2d(1, 64, (3, 3), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    nn.Conv2d(64, 128, (3, 3), stride=(2, 2), padding=1),
    nn.LeakyReLU(0.2, inplace=True),
    GlobalMaxPool2d(),
    nn.Flatten(),
    nn.Linear(128, 1),
)

model = nn.ModuleDict({"generator": generator, "discriminator": discriminator})
criterion = {"generator": nn.BCEWithLogitsLoss(), "discriminator": nn.BCEWithLogitsLoss()}
optimizer = {
    "generator": torch.optim.Adam(generator.parameters(), lr=0.0003, betas=(0.5, 0.999)),
    "discriminator": torch.optim.Adam(discriminator.parameters(), lr=0.0003, betas=(0.5, 0.999)),
}
train_data = MNIST(os.getcwd(), train=False)
loaders = {"train": DataLoader(train_data, batch_size=32)}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        batch_size = 1
        # Sample random points in the latent space
        random_latent_vectors = torch.randn(batch_size, latent_dim).to(self.engine.device)
        # Decode them to fake images
        generated_images = self.model["generator"](random_latent_vectors).detach()
        return generated_images

    def handle_batch(self, batch):
        real_images, _ = batch
        batch_size = real_images.shape[0]

        # Sample random points in the latent space
        random_latent_vectors = torch.randn(batch_size, latent_dim).to(self.engine.device)

        # Decode them to fake images
        generated_images = self.model["generator"](random_latent_vectors).detach()
        # Combine them with real images
        combined_images = torch.cat([generated_images, real_images])

        # Assemble labels discriminating real from fake images
        labels = \
            torch.cat([torch.ones((batch_size, 1)), torch.zeros((batch_size, 1))]).to(self.engine.device)
        # Add random noise to the labels - important trick!
        labels += 0.05 * torch.rand(labels.shape).to(self.engine.device)

        # Discriminator forward
        combined_predictions = self.model["discriminator"](combined_images)

        # Sample random points in the latent space
        random_latent_vectors = torch.randn(batch_size, latent_dim).to(self.engine.device)
        # Assemble labels that say "all real images"
        misleading_labels = torch.zeros((batch_size, 1)).to(self.engine.device)

        # Generator forward
        generated_images = self.model["generator"](random_latent_vectors)
        generated_predictions = self.model["discriminator"](generated_images)

        self.batch = {
            "combined_predictions": combined_predictions,
            "labels": labels,
            "generated_predictions": generated_predictions,
            "misleading_labels": misleading_labels,
        }


runner = CustomRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    callbacks=[
        dl.CriterionCallback(
            input_key="combined_predictions",
            target_key="labels",
            metric_key="loss_discriminator",
            criterion_key="discriminator",
        ),
        dl.BackwardCallback(metric_key="loss_discriminator"),
        dl.OptimizerCallback(
            optimizer_key="discriminator",
            metric_key="loss_discriminator",
        ),
        dl.CriterionCallback(
            input_key="generated_predictions",
            target_key="misleading_labels",
            metric_key="loss_generator",
            criterion_key="generator",
        ),
        dl.BackwardCallback(metric_key="loss_generator"),
        dl.OptimizerCallback(
            optimizer_key="generator",
            metric_key="loss_generator",
        ),
    ],
    valid_loader="train",
    valid_metric="loss_generator",
    minimize_valid_metric=True,
    num_epochs=20,
    verbose=True,
    logdir="./logs_gan",
)

# visualization (matplotlib required):
# import matplotlib.pyplot as plt
# %matplotlib inline
# plt.imshow(runner.predict_batch(None)[0, 0].cpu().numpy())

CV - MNIST VAE

import os
import torch
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.contrib.datasets import MNIST

LOG_SCALE_MAX = 2
LOG_SCALE_MIN = -10

def normal_sample(loc, log_scale):
    scale = torch.exp(0.5 * log_scale)
    return loc + scale * torch.randn_like(scale)

class VAE(nn.Module):
    def __init__(self, in_features, hid_features):
        super().__init__()
        self.hid_features = hid_features
        self.encoder = nn.Linear(in_features, hid_features * 2)
        self.decoder = nn.Sequential(nn.Linear(hid_features, in_features), nn.Sigmoid())

    def forward(self, x, deterministic=False):
        z = self.encoder(x)
        bs, z_dim = z.shape

        loc, log_scale = z[:, : z_dim // 2], z[:, z_dim // 2 :]
        log_scale = torch.clamp(log_scale, LOG_SCALE_MIN, LOG_SCALE_MAX)

        z_ = loc if deterministic else normal_sample(loc, log_scale)
        z_ = z_.view(bs, -1)
        x_ = self.decoder(z_)

        return x_, loc, log_scale

class CustomRunner(dl.IRunner):
    def __init__(self, hid_features, logdir, engine):
        super().__init__()
        self.hid_features = hid_features
        self._logdir = logdir
        self._engine = engine

    def get_engine(self):
        return self._engine

    def get_loggers(self):
        return {
            "console": dl.ConsoleLogger(),
            "csv": dl.CSVLogger(logdir=self._logdir),
            "tensorboard": dl.TensorboardLogger(logdir=self._logdir),
        }

    @property
    def num_epochs(self) -> int:
        return 1

    def get_loaders(self):
        loaders = {
            "train": DataLoader(MNIST(os.getcwd(), train=False), batch_size=32),
            "valid": DataLoader(MNIST(os.getcwd(), train=False), batch_size=32),
        }
        return loaders

    def get_model(self):
        model = self.model if self.model is not None else VAE(28 * 28, self.hid_features)
        return model

    def get_optimizer(self, model):
        return optim.Adam(model.parameters(), lr=0.02)

    def get_callbacks(self):
        return {
            "backward": dl.BackwardCallback(metric_key="loss"),
            "optimizer": dl.OptimizerCallback(metric_key="loss"),
            "checkpoint": dl.CheckpointCallback(
                self._logdir,
                loader_key="valid",
                metric_key="loss",
                minimize=True,
                topk=3,
            ),
        }

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveMetric(compute_on_call=False)
            for key in ["loss_ae", "loss_kld", "loss"]
        }

    def handle_batch(self, batch):
        x, _ = batch
        x = x.view(x.size(0), -1)
        x_, loc, log_scale = self.model(x, deterministic=not self.is_train_loader)

        loss_ae = F.mse_loss(x_, x)
        loss_kld = (
            -0.5 * torch.sum(1 + log_scale - loc.pow(2) - log_scale.exp(), dim=1)
        ).mean()
        loss = loss_ae + loss_kld * 0.01

        self.batch_metrics = {"loss_ae": loss_ae, "loss_kld": loss_kld, "loss": loss}
        for key in ["loss_ae", "loss_kld", "loss"]:
            self.meters[key].update(self.batch_metrics[key].item(), self.batch_size)

    def on_loader_end(self, runner):
        for key in ["loss_ae", "loss_kld", "loss"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

    def predict_batch(self, batch):
        random_latent_vectors = torch.randn(1, self.hid_features).to(self.engine.device)
        generated_images = self.model.decoder(random_latent_vectors).detach()
        return generated_images

runner = CustomRunner(128, "./logs", dl.CPUEngine())
runner.run()
# visualization (matplotlib required):
# import matplotlib.pyplot as plt
# %matplotlib inline
# plt.imshow(runner.predict_batch(None)[0].cpu().numpy().reshape(28, 28))

AutoML - hyperparameters optimization with Optuna

import os
import optuna
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.contrib.datasets import MNIST


def objective(trial):
    lr = trial.suggest_loguniform("lr", 1e-3, 1e-1)
    num_hidden = int(trial.suggest_loguniform("num_hidden", 32, 128))

    train_data = MNIST(os.getcwd(), train=True)
    valid_data = MNIST(os.getcwd(), train=False)
    loaders = {
        "train": DataLoader(train_data, batch_size=32),
        "valid": DataLoader(valid_data, batch_size=32),
    }
    model = nn.Sequential(
        nn.Flatten(), nn.Linear(784, num_hidden), nn.ReLU(), nn.Linear(num_hidden, 10)
    )
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()

    runner = dl.SupervisedRunner(input_key="features", output_key="logits", target_key="targets")
    runner.train(
        model=model,
        criterion=criterion,
        optimizer=optimizer,
        loaders=loaders,
        callbacks={
            "accuracy": dl.AccuracyCallback(
                input_key="logits", target_key="targets", num_classes=10
            ),
            # catalyst[optuna] required ``pip install catalyst[optuna]``
            "optuna": dl.OptunaPruningCallback(
                loader_key="valid", metric_key="accuracy01", minimize=False, trial=trial
            ),
        },
        num_epochs=3,
    )
    score = trial.best_score
    return score

study = optuna.create_study(
    direction="maximize",
    pruner=optuna.pruners.MedianPruner(
        n_startup_trials=1, n_warmup_steps=0, interval_steps=1
    ),
)
study.optimize(objective, n_trials=3, timeout=300)
print(study.best_value, study.best_params)

Config API - minimal example

runner:
  _target_: catalyst.runners.SupervisedRunner
  model:
    _var_: model
    _target_: torch.nn.Sequential
    args:
      - _target_: torch.nn.Flatten
      - _target_: torch.nn.Linear
        in_features: 784  # 28 * 28
        out_features: 10
  input_key: features
  output_key: &output_key logits
  target_key: &target_key targets
  loss_key: &loss_key loss

run:
  # ≈ stage 1
  - _call_: train  # runner.train(...)

    criterion:
      _target_: torch.nn.CrossEntropyLoss

    optimizer:
      _target_: torch.optim.Adam
      params:  # model.parameters()
        _var_: model.parameters
      lr: 0.02

    loaders:
      train:
        _target_: torch.utils.data.DataLoader
        dataset:
          _target_: catalyst.contrib.datasets.MNIST
          root: data
          train: y
        batch_size: 32

      &valid_loader_key valid:
        &valid_loader
        _target_: torch.utils.data.DataLoader
        dataset:
          _target_: catalyst.contrib.datasets.MNIST
          root: data
          train: n
        batch_size: 32

    callbacks:
      - &accuracy_metric
        _target_: catalyst.callbacks.AccuracyCallback
        input_key: *output_key
        target_key: *target_key
        topk: [1,3,5]
      - _target_: catalyst.callbacks.PrecisionRecallF1SupportCallback
        input_key: *output_key
        target_key: *target_key

    num_epochs: 1
    logdir: logs
    valid_loader: *valid_loader_key
    valid_metric: *loss_key
    minimize_valid_metric: y
    verbose: y

  # ≈ stage 2
  - _call_: evaluate_loader  # runner.evaluate_loader(...)
    loader: *valid_loader
    callbacks:
      - *accuracy_metric
catalyst-run --config example.yaml

Tests

All Catalyst code, features, and pipelines are fully tested. We also have our own catalyst-codestyle and a corresponding pre-commit hook. During testing, we train a variety of different models: image classification, image segmentation, text classification, GANs, and much more. We then compare their convergence metrics in order to verify the correctness of the training procedure and its reproducibility. As a result, Catalyst provides fully tested and reproducible best practices for your deep learning research and development.

Blog Posts

Talks

Community

Accelerated with Catalyst

Research Papers

Blog Posts

Competitions

Toolkits

Other

See other projects at the GitHub dependency graph.

If your project implements a paper, a notable use-case/tutorial, or a Kaggle competition solution, or if your code simply presents interesting results and uses Catalyst, we would be happy to add your project to the list above! Do not hesitate to send us a PR with a brief description of the project similar to the above.

Contribution Guide

We appreciate all contributions. If you are planning to contribute back bug-fixes, there is no need to run that by us; just send a PR. If you plan to contribute new features, new utility functions, or extensions, please open an issue first and discuss it with us.

User Feedback

We've created feedback@catalyst-team.com as an additional channel for user feedback.

  • If you like the project and want to thank us, this is the right place.
  • If you would like to start a collaboration between your team and Catalyst team to improve Deep Learning R&D, you are always welcome.
  • If you don't like Github Issues and prefer email, feel free to email us.
  • Finally, if you do not like something, please, share it with us, and we can see how to improve it.

We appreciate any type of feedback. Thank you!

Acknowledgments

Since the beginning of the Сatalyst development, a lot of people have influenced it in a lot of different ways.

Catalyst.Team

Catalyst.Contributors

Trusted by

Citation

Please use this bibtex if you want to cite this repository in your publications:

@misc{catalyst,
    author = {Kolesnikov, Sergey},
    title = {Catalyst - Accelerated deep learning R&D},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/catalyst-team/catalyst}},
}

Download Details:

Author: Catalyst-team
Source Code: https://github.com/catalyst-team/catalyst 
License: Apache-2.0 license

#machinelearning #python #infrastructure #naturallanguageprocessing 

Catalyst: Accelerated deep learning R&D
Hermann  Frami

Hermann Frami

1673505086

Pipecd: The one CD for All {applications, Platforms, Operations}

Pipecd

A GitOps style continuous delivery platform that provides consistent deployment and operations experience for any applications 

Overview

PipeCD provides a unified continuous delivery solution for multiple application kinds on multi-cloud that empowers engineers to deploy faster with more confidence, a GitOps tool that enables doing deployment operations by pull request on Git.

Visibility

  • Deployment pipeline UI shows clarify what is happening
  • Separate logs viewer for each individual deployment
  • Realtime visualization of application state
  • Deployment notifications to slack, webhook endpoints
  • Insights show metrics like lead time, deployment frequency, MTTR and change failure rate to measure delivery performance

Automation

  • Automated deployment analysis to measure deployment impact based on metrics, logs, emitted requests
  • Automatically roll back to the previous state as soon as analysis or a pipeline stage fails
  • Automatically detect configuration drift to notify and render the changes
  • Automatically trigger a new deployment when a defined event has occurred (e.g. container image pushed, helm chart published, etc)

Safety and Security

  • Support single sign-on and role-based access control
  • Credentials are not exposed outside the cluster and not saved in the control-plane
  • Piped makes only outbound requests and can run inside a restricted network
  • Built-in secrets management

Multi-provider & Multi-Tenancy

  • Support multiple application kinds on multi-cloud including Kubernetes, Terraform, Cloud Run, AWS Lambda
  • Support multiple analysis providers including Prometheus, Datadog, Stackdriver, and more
  • Easy to operate multi-cluster, multi-tenancy by separating control-plane and piped

Explore PipeCD docs » Play with live demo »

Download Details:

Author: Pipe-cd
Source Code: https://github.com/pipe-cd/pipecd 
License: Apache-2.0 license

#serverless #kubernetes #infrastructure #lambda 

Pipecd: The one CD for All {applications, Platforms, Operations}
Nigel  Uys

Nigel Uys

1673432640

Ansible Best Practises

Ansible Best Practises

If infrastructures are to be treated as a code than projects that manage them must be treated as software projects. As your infrastructure code gets bigger and bigger you have more problems to deal with. Code layout, variable precedence, small hacks here and there. Therefore, the organization of your code is very important, and in this repository you can find some of the best practices (in our opinion) to manage your infrastructure code. Problems that are addressed are:

  • Overall organization
  • How to manage external roles
  • Usage of variables
  • Naming
  • Staging
  • Complexity of plays
  • Encryption of data (e.g. passwords, certificates)
  • Installation of ansible and module dependencies

TL;DR

  • Do not manage external roles in your repository manually, use ansible-galaxy
  • Do not use pre_task, task or post_tasks in your play, use roles to reuse the code
  • Keep all your variables in one place, if possible
  • Do not use variables in your play
  • Use variables in the roles instead of hard-coding
  • Keep the names consistent between groups, plays, variables, and roles
  • Different environments (development, test, production) must be close as possible, if not equal
  • Do not put your password or certificates as plain text in your git repo, use ansible-vault for encrypting
  • Use tags in your play
  • Keep all your ansible dependencies in a single place and make the installation dead-simple

1. Directory Layout

This is the directory layout of this repository with an explanation.

production.ini            # inventory file for production stage
development.ini           # inventory file for development stage
test.ini                  # inventory file for test stage
vpass                     # ansible-vault password file
                          # This file should not be committed into the repository
                          # therefore file is in ignored by git
group_vars/
    all/                  # variables under this directory belongs all the groups
        apt.yml           # ansible-apt role variable file for all groups
    webservers/           # here we assign variables to webservers groups
        apt.yml           # Each file will correspond to a role i.e. apt.yml
        nginx.yml         # ""
    postgresql/           # here we assign variables to postgresql groups
        postgresql.yml    # Each file will correspond to a role i.e. postgresql
        postgresql-password.yml   # Encrypted password file
plays/
    ansible.cfg           # Ansible.cfg file that holds all ansible config
    webservers.yml        # playbook for webserver tier
    postgresql.yml        # playbook for postgresql tier

roles/
    roles_requirements.yml# All the information about the roles
    external/             # All the roles that are in git or ansible galaxy
                          # Roles that are in roles_requirements.yml file will be downloaded into this directory
    internal/             # All the roles that are not public

extension/
    setup/                 # All the setup files for updating roles and ansible dependencies

2. How to Manage Roles

It is a bad habit to manage the roles that are developed by other developers, in your git repository manually. It is also important to separate them so that you can distinguish those that are external and can be updated vs those that are internal. Therefore, you can use ansible-galaxy for installing the roles you need, at the location you need, by simply defining them in the roles_requirements.yml:

---
- src: ANXS.build-essential
  version: "v1.0.1"

Roles can be downloaded/updated with this command:

./extensions/setup/role_update.sh

This command will delete all external roles and download everything from scratch. It is a good practice, as this will not allow you to make changes in the roles.

3. Keep your plays simple

If you want to take the advantage of the roles, you have to keep your plays simple. Therefore do not add any tasks to your main play. Your play should only consist of the list of roles that it depends on. Here is an example:

---

- name: postgresql.yml | All roles
  hosts: postgresql
  sudo: True

  roles:
    - { role: common,                   tags: ["common"] }
    - { role: ANXS.postgresql,          tags: ["postgresql"] }

As you can see there are also no variables in this play, you can use variables in many different ways in ansible, and to keep it simple and easier to maintain do not use variables in plays. Furthermore, use tags, they give wonderful control over role execution.

4. Stages

Most likely you will need different stages (e.g. test, development, production) for the product you are either developing or helping to develop. A good way to manage different stages is to have multiple inventory files. As you can see in this repository, there are three inventory files. Each stage you have must be identical as possible, that also means, you should try to use few as possible host variables. It is best to not use at all.

5. Variables

Variables are wonderful, they allow you to use all this existing code by just setting some values. Ansible offers many different ways to use variables. However, soon as your project starts to get bigger, and more you spread variables here and there, more problems you will encounter. Therefore it is good practice to keep all your variables in one place, and this place happen to be group_vars. They are not host dependent, so it will help you to have a better staging environment as well. Furthermore, if you have internal roles that you have developed, keep the variables out of them as well, so you can reuse them easily.

6. Name consistency

If you want to maintain your code, keep the name consistent between your plays, inventories, roles and group variables. Use the name of the roles to separate different variables in each group. For instance, if you are using the role nginx under webservers play, variables that belong to nginx should be located under group_vars/webservers/nginx.yml. What this effectively means is that group_vars supports directory and every file inside the group will be loaded. You can, of course, put all of them in a single file as well, but this is messy, therefore don't do it.

7. Encrypting Passwords and Certificates

It is most likely that you will have a password or certificates in your repository. It is not a good practice to put them in a repository as plain text. You can use ansible-vault to encrypt sensitive data. You can refer to postgresql-password.yml in group variables to see the encrypted file and postgresql-password-plain.yml to see the plain text file, commented out. To decrypt the file, you need the vault password, which you can place in your root directory but it MUST NOT be committed to your git repository. You should share the password with your coworkers with some other method than committing to git a repo.

There is also git-crypt that allow you to work with a key or GPG. It's more transparent on daily work than ansible-vault

8. Project Setup

As it should be very easy to set-up the work environment, all required packages that ansible needs, as well as ansible should be installed very easily. This will allow newcomers or developers to start using ansible project very fast and easily. Therefore, python_requirements.txt file is located at:

extensions/setup/python_requirements.txt

This structure will help you to keep your dependencies in a single place, as well as making it easier to install everything including ansible. All you have to do is to execute the setup file:

./extensions/setup/setup.sh

Running the Code

Code in this repo is functional and tested. To run it, you need to install ansible and all the dependencies. You can do this simply by executing:

./extensions/setup/setup.sh
  • If you already have ansible, and you do not want to go through the installation simply create a vpass text file in the root directory and add the secret code (123456)
  • To install roles execute the role_update.sh which will download all the roles
./extensions/setup/role_update.sh
  • Go to the plays directory and then execute the following command ( do not forget to change the host address in the development.ini )
ansible-playbook -i ../development.ini webservers.yml

Download Details:

Author: Enginyoyen
Source Code: https://github.com/enginyoyen/ansible-best-practises 
License: MIT license

#ansible #infrastructure #automate 

Ansible Best Practises
Nigel  Uys

Nigel Uys

1672979588

Ansible-role-docker: Ansible Role - Docker

Ansible Role: Docker

An Ansible Role that installs Docker on Linux.

Requirements

None.

Role Variables

Available variables are listed below, along with default values (see defaults/main.yml):

# Edition can be one of: 'ce' (Community Edition) or 'ee' (Enterprise Edition).
docker_edition: 'ce'
docker_packages:
    - "docker-{{ docker_edition }}"
    - "docker-{{ docker_edition }}-cli"
    - "docker-{{ docker_edition }}-rootless-extras"
docker_packages_state: present

The docker_edition should be either ce (Community Edition) or ee (Enterprise Edition). You can also specify a specific version of Docker to install using the distribution-specific format: Red Hat/CentOS: docker-{{ docker_edition }}-<VERSION> (Note: you have to add this to all packages); Debian/Ubuntu: docker-{{ docker_edition }}=<VERSION> (Note: you have to add this to all packages).

You can control whether the package is installed, uninstalled, or at the latest version by setting docker_package_state to present, absent, or latest, respectively. Note that the Docker daemon will be automatically restarted if the Docker package is updated. This is a side effect of flushing all handlers (running any of the handlers that have been notified by this and any other role up to this point in the play).

docker_service_manage: true
docker_service_state: started
docker_service_enabled: true
docker_restart_handler_state: restarted

Variables to control the state of the docker service, and whether it should start on boot. If you're installing Docker inside a Docker container without systemd or sysvinit, you should set docker_service_manage to false.

docker_install_compose_plugin: false
docker_compose_package: docker-compose-plugin
docker_compose_package_state: present

Docker Compose Plugin installation options. These differ from the below in that docker-compose is installed as a docker plugin (and used with docker compose) instead of a standalone binary.

docker_install_compose: true
docker_compose_version: "1.26.0"
docker_compose_arch: "{{ ansible_architecture }}"
docker_compose_path: /usr/local/bin/docker-compose

Docker Compose installation options.

docker_repo_url: https://download.docker.com/linux

The main Docker repo URL, common between Debian and RHEL systems.

docker_apt_release_channel: stable
docker_apt_arch: "{{ 'arm64' if ansible_architecture == 'aarch64' else 'amd64' }}"
docker_apt_repository: "deb [arch={{ docker_apt_arch }}] {{ docker_repo_url }}/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} {{ docker_apt_release_channel }}"
docker_apt_ignore_key_error: True
docker_apt_gpg_key: "{{ docker_repo_url }}/{{ ansible_distribution | lower }}/gpg"

(Used only for Debian/Ubuntu.) You can switch the channel to nightly if you want to use the Nightly release.

You can change docker_apt_gpg_key to a different url if you are behind a firewall or provide a trustworthy mirror. Usually in combination with changing docker_apt_repository as well.

docker_yum_repo_url: "{{ docker_repo_url }}/{{ (ansible_distribution == 'Fedora') | ternary('fedora','centos') }}/docker-{{ docker_edition }}.repo"docker_edition }}.repo
docker_yum_repo_enable_nightly: '0'
docker_yum_repo_enable_test: '0'
docker_yum_gpg_key: "{{ docker_repo_url }}/centos/gpg"

(Used only for RedHat/CentOS.) You can enable the Nightly or Test repo by setting the respective vars to 1.

You can change docker_yum_gpg_key to a different url if you are behind a firewall or provide a trustworthy mirror. Usually in combination with changing docker_yum_repository as well.

docker_users:
  - user1
  - user2

A list of system users to be added to the docker group (so they can use Docker on the server).

docker_daemon_options:
  storage-driver: "devicemapper"
  log-opts:
    max-size: "100m"

Custom dockerd options can be configured through this dictionary representing the json file /etc/docker/daemon.json.

Use with Ansible (and docker Python library)

Many users of this role wish to also use Ansible to then build Docker images and manage Docker containers on the server where Docker is installed. In this case, you can easily add in the docker Python library using the geerlingguy.pip role:

- hosts: all

  vars:
    pip_install_packages:
      - name: docker

  roles:
    - geerlingguy.pip
    - geerlingguy.docker

Dependencies

None.

Example Playbook

- hosts: all
  roles:
    - geerlingguy.docker

Sponsors

  • We Manage: Helping start-ups and grown-ups scaling their infrastructure in a sustainable way.

The above sponsor(s) are supporting Jeff Geerling on GitHub Sponsors. You can sponsor Jeff's work too, to help him continue improving these Ansible open source projects!

Download Details:

Author: Geerlingguy
Source Code: https://github.com/geerlingguy/ansible-role-docker 
License: MIT license

#ansible #docker #infrastructure #debian #ubuntu 

Ansible-role-docker: Ansible Role - Docker

Dharavi Redevelopment Plan

As plans to redevelop Dharavi, one of Asia's largest slum clusters, emerge, innovators and technologists, see an unprecedented opportunity to go beyond simple infrastructure development and create a technology-centric environment that can boost and fortify the socio-economic status of the region.

While India is ready for the vast investments and the opportunities for development that come along with it, how can the country leverage its current digital vision to deliver accelerated development plans for its underprivileged communities? 

Get the full scoop in our blog: https://bit.ly/3j3iPEM

#dharavi #adanigroup #dharaviredevelopment #digitalecosystem #digitaltransformation #infrastructure #technology

Dharavi Redevelopment Plan
Royce  Reinger

Royce Reinger

1668061440

Cortex: Production infrastructure for Machine Learning At Scale

Cortex

Production infrastructure for machine learning at scale

Deploy, manage, and scale machine learning models in production.

Serverless workloads

Realtime - respond to requests in real-time and autoscale based on in-flight request volumes.

Async - process requests asynchronously and autoscale based on request queue length.

Batch - run distributed and fault-tolerant batch processing jobs on-demand.

Automated cluster management

Autoscaling - elastically scale clusters with CPU and GPU instances.

Spot instances - run workloads on spot instances with automated on-demand backups.

Environments - create multiple clusters with different configurations.

CI/CD and observability integrations

Provisioning - provision clusters with declarative configuration or a Terraform provider.

Metrics - send metrics to any monitoring tool or use pre-built Grafana dashboards.

Logs - stream logs to any log management tool or use the pre-built CloudWatch integration.

Built for AWS

EKS - Cortex runs on top of EKS to scale workloads reliably and cost-effectively.

VPC - deploy clusters into a VPC on your AWS account to keep your data private.

IAM - integrate with IAM for authentication and authorization workflows.

Download Details:

Author: Cortexlabs
Source Code: https://github.com/cortexlabs/cortex 
License: Apache-2.0 license

#machinelearning #infrastructure #scale 

Cortex: Production infrastructure for Machine Learning At Scale
Dexter  Goodwin

Dexter Goodwin

1667378100

AWS-cdk: AWS Cloud Development Kit (AWS CDK)

AWS Cloud Development Kit (AWS CDK)

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define cloud infrastructure in code and provision it through AWS CloudFormation.

It offers a high-level object-oriented abstraction to define AWS resources imperatively using the power of modern programming languages. Using the CDK’s library of infrastructure constructs, you can easily encapsulate AWS best practices in your infrastructure definition and share it without worrying about boilerplate logic.

The CDK is available in the following languages:


Jump To: Developer Guide | API Reference | Getting Started | Getting Help | Contributing | RFCs | Roadmap | More Resources


Developers use the CDK framework in one of the supported programming languages to define reusable cloud components called constructs, which are composed together into stacks, forming a "CDK app".

They then use the AWS CDK CLI to interact with their CDK app. The CLI allows developers to synthesize artifacts such as AWS CloudFormation Templates, deploy stacks to development AWS accounts and "diff" against a deployed stack to understand the impact of a code change.

The AWS Construct Library includes a module for each AWS service with constructs that offer rich APIs that encapsulate the details of how to use AWS. The AWS Construct Library aims to reduce the complexity and glue-logic required when integrating various AWS services to achieve your goals on AWS.

Modules in the AWS Construct Library are designated Experimental while we build them; experimental modules may have breaking API changes in any release. After a module is designated Stable, it adheres to semantic versioning, and only major releases can have breaking changes. Each module's stability designation is available on its Overview page in the AWS CDK API Reference. For more information, see Versioning in the CDK Developer Guide.

Getting Started

For a detailed walkthrough, see the tutorial in the AWS CDK Developer Guide.

At a glance

Install or update the AWS CDK CLI from npm (requires Node.js ≥ 14.15.0). We recommend using a version in Active LTS

npm i -g aws-cdk

(See Manual Installation for installing the CDK from a signed .zip file).

Initialize a project:

mkdir hello-cdk
cd hello-cdk
cdk init sample-app --language=typescript

This creates a sample project looking like this:

export class HelloCdkStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const queue = new sqs.Queue(this, 'HelloCdkQueue', {
      visibilityTimeout: cdk.Duration.seconds(300)
    });

    const topic = new sns.Topic(this, 'HelloCdkTopic');

    topic.addSubscription(new subs.SqsSubscription(queue));
  }
}

Deploy this to your account:

cdk deploy

Use the cdk command-line toolkit to interact with your project:

  • cdk deploy: deploys your app into an AWS account
  • cdk synth: synthesizes an AWS CloudFormation template for your app
  • cdk diff: compares your app with the deployed stack

Getting Help

The best way to interact with our team is through GitHub. You can open an issue and choose from one of our templates for bug reports, feature requests, documentation issues, or guidance.

If you have a support plan with AWS Support, you can also create a new support case.

You may also find help on these community resources:

Roadmap

The AWS CDK Roadmap project board lets developers know about our upcoming features and priorities to help them plan how to best leverage the CDK and identify opportunities to contribute to the project. See ROADMAP.md for more information and FAQs.

Contributing

We welcome community contributions and pull requests. See CONTRIBUTING.md for information on how to set up a development environment and submit code.

Metrics collection

This solution collects anonymous operational metrics to help AWS improve the quality and features of the CDK. For more information, including how to disable this capability, please see the developer guide.

More Resources

Download Details:

Author: aws
Source Code: https://github.com/aws/aws-cdk 
License: Apache-2.0 license

#typescript #aws #cloud #infrastructure 

AWS-cdk: AWS Cloud Development Kit (AWS CDK)
Dexter  Goodwin

Dexter Goodwin

1659395160

SitePen/dstore: A Data infrastructure Framework

dstore

The dstore package is a data infrastructure framework, providing the tools for modelling and interacting with data collections and objects. dstore is designed to work with a variety of data storage mediums, and provide a consistent interface for accessing data across different user interface components. There are several key entities within dstore:

Collection

A Collection is the interface for a collection of items, which can be filtered or sorted to create new collections. When implementing this interface, every method and property is optional, and is only needed if the functionality it provides is required. However, all the included collections implement every method. Note that the objects in the collection might not be immediately retrieved from the underlying data storage until they are actually accessed through forEach(), fetch(), or fetchRange(). These fetch methods return a snapshot of the data, and if the data has changed, these methods can later be used to retrieve the latest data.

Querying

Several methods are available for querying collections. These methods allow you to define a query through several steps. Normally, stores are queried first by calling filter() to specify which objects to be included, if the filtering is needed. Next, if an order needs to be specified, the sort() method is called to ensure the results will be sorted. A typical query from a store would look like:

store.filter({priority: 'high'}).sort('dueDate').forEach(function (object) {
    // called for each item in the final result set
});

In addition, the track() method may be used to track store changes, ensuring notifications include index information about object changes, and keeping result sets up-to-date after a query. The fetch() method is an alternate way to retrieve results, providing a promise to an array for accessing query results. The sections below describes each of these methods and how to use them.

Filtering

Filtering is used to specify a subset of objects to be returned in a filtered collection. The simplest use of the filter() method is to call it with a plain object as the argument, that specifies name-value pairs that the returned objects must match. Or a filter builder can be used to construct more sophisticated filter conditions. To use the filter builder, first construct a new filter object from the Filter constructor on the collection you would be querying:

var filter = new store.Filter();

We now have a filter object, that represents a filter, without any operators applied yet. We can create new filter objects by calling the operator methods on the filter object. The operator methods will return new filter objects that hold the operator condition. For example, to specify that we want to retrieve objects with a priority property with a value of "high", and stars property with a value greater than 5, we could write:

var highPriorityFiveStarFilter = filter.eq('priority', 'high').gt('stars', 5);

This filter object can then be passed as the argument to the filter() method on a collection/store:

var highPriorityFiveStarCollection = store.filter(highPriorityFiveStarFilter);

The following methods are available on the filter objects. First are the property filtering methods, which each take a property name as the first argument, and a property value to compare for the second argument:

  • eq: Property values must equal the filter value argument.
  • ne: Property values must not equal the filter value argument.
  • lt: Property values must be less than the filter value argument.
  • lte: Property values must be less than or equal to the filter value argument.
  • gt: Property values must be greater than the filter value argument.
  • gte: Property values must be greater than or equal to the filter value argument.
  • in: An array should be passed in as the second argument, and property values must be equal to one of the values in the array.
  • match: Property values must match the provided regular expression.
  • contains: Filters for objects where the specified property's value is an array and the array contains any value that equals the provided value or satisfies the provided expression.

The following are combinatorial methods:

  • and: This takes two arguments that are other filter objects, that both must be true.
  • or: This takes two arguments that are other filter objects, where one of the two must be true.

Nesting

A few of the filters can also be built upon with other collections (potentially from other stores). In particular, you can provide a collection as the argument for the in or contains filter. This provides functionality similar to nested queries or joins. This generally will need to be combined with a select to return the correct values for matching. For example, if we wanted to find all the tasks in high priority projects, where the task store has a projectId property/column that is a foreign key, referencing objects in a project store. We can perform our nested query:

var tasksOfHighPriorityProjects = taskStore.filter(
    new Filter().in('projectId',
        projectStore.filter({ priority: 'high' }).select('id')
    )
);

Implementations

Different stores may implement filtering in different ways. The dstore/Memory will perform filtering in memory. The dstore/Request/dstore/Rest stores will translate the filters into URL query strings to send to the server. Simple queries will be in standard URL-encoded query format and complex queries will conform to RQL syntax (which is a superset of standard query format).

New filter methods can be created by subclassing dstore/Filter and adding new methods. New methods can be created by calling Filter.filterCreator and by providing the name of the new method. If you will be using new methods with stores that mix in SimpleQuery like memory stores, you can also add filter comparators by overriding the _getFilterComparator method, returning comparators for the additional types, and delegating to this.inherited for the rest.

dstore/SimpleQuery provides a simple shorthand for nested property queries - a side-effect of this is that property names that contain the period character are not supported. Example nested property query:

store.filter({ 'name.last': 'Smith' })

This would match the object:

{
	name: {
		first: 'John',
		last: 'Smith'
	}
}

For the dstore/Request/dstore/Rest stores, you can define alternate serializations of filters to URL queries for existing or new methods by overriding the _renderFilterParams. This method is called with a filter object (and by default is recursively called by combinatorial operators), and should return a string serialization of the filter, that will be inserted into the query string of the URL sent to the server.

The filter objects themselves consist of tree structures. Each filter object has two properties, the operator type, which corresponds to whichever operator was used (like eq or and), and the args, which is an array of values provided to the operator. With and and or operators, the arguments are other filter objects, forming a hierarchy. When filter operators are chained together (through sequential calls), they are combined with the and operator (each operator defined in a sub-filter object).

Collection API

The following property and methods are available on dstore collections:

Property Summary

PropertyDescription
ModelThis constructor represents the data model class to use for the objects returned from the store. All objects returned from the store should have their prototype set to the prototype property of the model, such that objects from this store should return true from object instanceof collection.Model.

Method Summary

filter(query)

This filters the collection, returning a new subset collection. The query can be an object, or a filter object, with the properties defining the constraints on matching objects. Some stores, like server or RQL stores, may accept string-based queries. Stores with in-memory capabilities (like dstore/Memory) may accept a function for filtering as well, but using the filter builder will ensure the greatest cross-store compatibility.

matchesFilter(item)

This tests the provided item to see if it matches the current filter or not.

sort(property, [descending])

This sorts the collection, returning a new ordered collection. Note that if sort is called multiple times, previous sort calls may be ignored by the store (it is up to store implementation how to handle that). If a multiple sort order is desired, use the array of sort orders defined by below.

sort([highestSortOrder, nextSortOrder...])

This also sorts the collection, but can be called to define multiple sort orders by priority. Each argument is an object with a property property and an optional descending property (defaults to ascending, if not set), to define the order. For example:

collection.sort([
	{ property: 'lastName' },
	{ property: 'firstName' }
])

would result in a new collection sorted by lastName, with firstName used to sort identical lastName values.

select([property, ...])

This selects specific properties that should be included in the returned objects.

select(property)

This will indicate that the return results will consist of the values of the given property of the queried objects. For example, this would return a collection of name values, pulled from the original collection of objects:

collection.select('name');

forEach(callback, thisObject)

This iterates over the query results. Note that this may be executed asynchronously and the callback may be called after this function returns. This will return a promise to indicate the completion of the iteration. This method forces a fetch of the data.

fetch()

Normally collections may defer the execution (like making an HTTP request) required to retrieve the results until they are actually accessed. Calling fetch() will force the data to be retrieved, returning a promise to an array.

fetchRange({start: start, end: end})

This fetches a range of objects from the collection, returning a promise to an array. The returned (and resolved) promise should have a totalLength property with a promise that resolves to a number indicating the total number of objects available in the collection.

on(type, listener, filterEvents)

This allows you to define a listener for events that take place on the collection or parent store. When an event takes place, the listener will be called with an event object as the single argument. The following event types are defined:

TypeDescription
addThis indicates that a new object was added to the store. The new object is available on the target property.
updateThis indicates that an object in the stores was updated. The updated object is available on the target property.
deleteThis indicates that an object in the stores was removed. The id of the object is available on the id property.

Setting filterEvents to true indicates the listener will be called only when the emitted event references an item (event.target) that satisfies the collection's current filter query. Note: if filterEvents is set to true for type update, the listener will be called only when the item passed to put matches the collection's query. The original item will not be evaluted. For example, a store contains items marked "to-do" and items marked "done" and one collection uses a query looking for "to-do" items and one looks for "done" items. Both collections are listening for "update" events. If an item is updated from "to-do" to "done", only the "done" collection will be notified of the update.

If detecting when an item is removed from a collection due to an update is desired, set filterEvents to false and use the matchesFilter(item) method to test if each item updated is currently in the collection.

There is also a corresponding emit(type, event) method (from the Store interface) that can be used to emit events when objects have changed.

track()

This method will create a new collection that will be tracked and updated as the parent collection changes. This will cause the events sent through the resulting collection to include an index and previousIndex property to indicate the position of the change in the collection. This is an optional method, and is usually provided by dstore/Trackable. For example, you can create an observable store class, by using dstore/Trackable as a mixin:

var TrackableMemory = declare([Memory, Trackable]);

Trackable requires client side querying functionality. Client side querying functionality is available in dstore/SimpleQuery (and inherited by dstore/Memory). If you are using a Request, Rest, or other server side store, you will need to implement client-side query functionality (by implementing querier methods), or mixin SimpleQuery:

var TrackableRest = declare([Rest, SimpleQuery, Trackable]);

Once we have created a new instance from this store, we can track a collection, which could be the top level store itself, or a downstream filtered or sorted collection:

var store = new TrackableMemory({ data: [...] });
var filteredSorted = store.filter({ inStock: true }).sort('price');
var tracked = filteredSorted.track();

Once we have a tracked collection, we can listen for notifications:

tracked.on('add, update, delete', function (event) {    var newIndex = event.index;    var oldIndex = event.previousIndex;    var object = event.target; });

Trackable requires fetched data to determine the position of modified objects and can work with either full or partial data. We can do a fetch() or forEach() to access all the items in the filtered collection:

tracked.fetch();

Or we can do a fetchRange() to make individual range requests for items in the collection:

tracked.fetchRange(0, 10);

Trackable will keep track of each page of data, and send out notifications based on the data it has available, along with index information, indicating the new and old position of the object that was modified. Regardless of whether full or partial data is fetched, tracked events and the indices they report are relative to the entire collection, not relative to individual fetched ranges. Tracked events also include a totalLength property indicating the total length of the collection.

If an object is added or updated, and falls outside of all of the fetched ranges, the index will be undefined. However, if the object falls between fetched ranges (but within one), there will also be a beforeIndex that indicates the index of the first object that the new or update objects comes before.

Custom Querying

Custom query methods can be created using the dstore/QueryMethod module. We can define our own query method, by extending a store, and defining a method with the QueryMethod. The QueryMethod constructor should be passed an object with the following possible properties:

  • type - This is a string, identifying the query method type.
  • normalizeArguments - This can be a function that takes the arguments passed to the method, and normalizes them for later execution.
  • applyQuery - This is an optional function that can be called on the resulting collection that is returned from the generated query method.
  • querierFactory - This is an optional function that can be used to define the computation of the set of objects returned from a query, on client-side or in-memory stores. It is called with the normalized arguments, and then returns a new function that will be called with an array, and is expected to return a new array.

For example, we could create a getChildren method that queried for children object, by simply returning the children property array from a parent:

declare([Memory], {
    getChildren: new QueryMethod({
        type: 'children',
        querierFactory: function (parent) {
            var parentId = this.getIdentity(parent);

            return function (data) {
                // note: in this case, the input data is ignored as this querier
                // returns an object's array of children instead

                // return the children of the parent
                // or an empty array if the parent no longer exists
                var parent = this.getSync(parentId);
                return parent ? parent.children : [];
            };
        }
	})
});

Store

A store is an extension of a collection and is an entity that not only contains a set of objects, but also provides an interface for identifying, adding, modifying, removing, and querying data. Below is the definition of the store interface. Every method and property is optional, and is only needed if the functionality it provides is required (although the provided full stores (Rest and Memory) implement all the methods except transaction() and getChildren()). Every method returns a promise for the specified return value, unless otherwise noted.

In addition to the methods and properties inherited from Collections, the Store API also exposes the following properties and methods.

Property Summary

PropertyDescription
idPropertyIf the store has a single primary key, this indicates the property to use as the identity property. The values of this property should be unique. This defaults to "id".
ModelThis is the model class to use for all the data objects that originate from this store. By default this will be set to null, so that all objects will be plain objects, but this property can be set to the class from dmodel/Model or any other model constructor. You can create your own model classes (and schemas), and assign them to a store. All objects that come from the store will have their prototype set such that they will be instances of the model. The default value of null will disable any prototype modifications and leave data as plain objects.
defaultNewToStartIf a new object is added to a store, this will indicate it if it should go to the start or end. By default, it will be placed at the end.

Method Summary

MethodDescription
get(id)This retrieves an object by its identity. This returns a promise for the object. If no object was found, the resolved value should be undefined.
getIdentity(object)This returns an object's identity (note: this should always execute synchronously).
put(object, [directives])This stores an object. It can be used to update or create an object. This returns a promise that may resolve to the object after it has been saved.
add(object, [directives])This creates an object, and throws an error if the object already exists. This should return a promise for the newly created object.
remove(id)This deletes an object, using the identity to indicate which object to delete. This returns a promise that resolves to a boolean value indicating whether the object was successfully removed.
transaction()Starts a transaction and returns a transaction object. The transaction object should include a commit() and abort() to commit and abort transactions, respectively. Note, that a store user might not call transaction() prior to using put, delete, etc. in which case these operations effectively could be thought of as “auto-commit” style actions.
create(properties)Creates and returns a new instance of the data model. The returned object will not be stored in the object store until it its save() method is called, or the store's add() is called with this object. This should always execute synchronously.
getChildren(parent)This retrieves the children of the provided parent object. This should return a new collection representing the children.
mayHaveChildren(parent)This should return true or false indicating whether or not a parent might have children. This should always return synchronously, as a way of checking if children might exist before actually retrieving all the children.
getRootCollection()This should return a collection of the top level objects in a hierarchical store.
emit(type, event)This can be used to dispatch event notifications, indicating changes to the objects in the collection. This should be called by put, add, and remove methods if the autoEmit property is false. This can also be used to notify stores if objects have changed from other sources (if a change has occurred on the server, from another user). There is a corresponding on method on collections for listening to data change events. Also, the Trackable mixin can be used to add index/position information to the events.

Synchronous Methods

Stores that can perform synchronous operations may provide analogous methods for get, put, add, and remove that end with Sync to provide synchronous support. For example getSync(id) will directly return an object instead of a promise. The dstore/Memory store provides Sync methods in addition to the promise-based methods. This behavior has been separated into distinct methods to provide consistent return types.

It is generally advisable to always use the asynchronous methods so that client code does not have to be updated in case the store is changed. However, if you have very performance intensive store accesses, the synchronous methods can be used to avoid the minor overhead imposed by promises.

MethodDescription
getSync(id)This retrieves an object by its identity. If no object was found, the returned value should be undefined.
putSync(object, [directives])This stores an object. It can be used to update or create an object. This returns the object after it has been saved.
addSync(object, [directives])This creates an object, and throws an error if the object already exists. This should return the newly created object.
removeSync(id)This deletes an object, using the identity to indicate which object to delete. This returns a boolean value indicating whether the object was successfully removed.

Included Stores

The dstore package includes several store implementations that can be used for the needs of different applications. These include:

  • Memory - This is a simple memory-based store that takes an array and provides access to the objects in the array through the store interface.
  • Request - This is a simple server-based collection that sends HTTP requests following REST conventions to access and modify data requested through the store interface.
  • Rest - This is a store built on Request that implements add, remove, and update operations using HTTP requests following REST conventions.
  • RequestMemory - This is a Memory-based store that will retrieve its contents from a server/URL.
  • LocalDB - This a store based on the browser's local database/storage capabilities. Data stored in this store will be persisted in the local browser.
  • Cache - This is a store mixin that combines a master and caching store to provide caching functionality.
  • Trackable - This a store mixin that adds index information to add, update, and remove events of tracked store instances. This adds a track() method for tracking stores.
  • Tree - This is a store mixin that provides hierarchical querying functionality, defining a parent/child relationships for the display of data in a tree.
  • SimpleQuery - This is a mixin with basic querying functionality, which is extended by the Memory store, and can be used to add client side querying functionality to the Request/Rest store.
  • Store - This is a base store, with the base methods that are used by all other stores.

Constructing Stores

All the stores can be instantiated with an options argument to the constructor, to provide properties to be copied to the store. This can include methods to be added to the new store.

Stores can also be constructed by combining a base store with mixins. The various store mixins are designed to be combined through dojo declare to create a class to instantiate a store. For example, if you wish to add tracking and tree functionality to a Memory store, we could combine these:

// create the class based on the Memory store with added functionality
var TrackedTreeMemoryStore = declare([Memory, Trackable, Tree]);
// now create an instance
var myStore = new TrackedTreeMemoryStore({ data: [...] });

The store mixins can only be used as mixins, but stores can be combined with other stores as well. For example, if we wanted to add the Rest functionality to the RequestMemory store (so the entire store data was retrieved from the server on construction, but data changes are sent to the server), we could write:

var RestMemoryStore = declare([Rest, RequestMemory]);
// now create an instance
var myStore = new RestMemoryStore({ target: '/data-source/' });

Another common case is needing to add tracking to the dstore/Rest store, which requires client side querying, which can be provided by dstore/SimpleQuery:

var TrackedRestStore = declare([Rest, SimpleQuery, Trackable]);

Memory

The Memory store is a basic client-side in-memory store that can be created from a simple JavaScript array. When creating a memory store, the data (which should be an array of objects) can be provided in the data property to the constructor. The data should be an array of objects, and all the objects are considered to be existing objects and must have identities (this is not "creating" new objects, no events are fired for the objects that are provided, nor are identities assigned).

For example:

myStore = new Memory({
    data: [{
        id: 1,
        aProperty: ...,
        ...
    }]
});

The array supplied as the data property will not be copied, it will be used as-is as the store's data. It can be changed at run-time with the setData method.

Methods

The Memory store provides synchronous equivalents of standard asynchronous store methods that directly return objects or results, without a promise.

NameDescription
getSync(id)Retrieve an object by its identity. If no object is found, the returned value is undefined.
addSync(object, options)Create an object (throws an error if the object already exists). Returns the newly created object.
putSync(object, options)Store an object. Can be used to update or create an object. Returns the object after it has been saved.
removeSync(id)Delete an object, using the identity to indicate which object to delete. Returns a boolean value indicating whether the object was successfully removed.
setData(data)Set the store's data to the specified array.

Request

This is a simple collection for accessing data by retrieval from a server (typically through XHR). The target URL path to use for requests can be defined with the target property. A request for data will be sent to the server when a fetch occurs (due a call to fetch(), fetchRange(), or forEach()). Request supports several properties for defining the generation of query strings:

  • sortParam - This will specify the query parameter to use for specifying the sort order. This will default to sort(<properties>) in the query string.
  • selectParam - This will specify the query parameter to use for specifying the select properties. This will default to select(<properties>) in the query string.
  • rangeStartParam and rangeCountParam - This will specify the query parameter to use for specifying the range. This will default to limit(<count>,<start>) in the query string.
    • e.g. limit(50,200) will request items 200-249
  • useRangeHeaders - This will specify that range information should be specified in the Range (or X-Range) header.
    • e.g. Range: items 200-249 will request items 200-249

Server considerations for a Request/Rest store

The response should be in JSON format. It should include the data and a number indicating the total number of items:

  • data: the response can either be a JSON array containing the items or a JSON object with an items property that is an array containing the items
  • total: if the response is an array then the total should be specified in the Content-Range header, e.g.:
    • Content-Range: items 0-24/500
    • If the response is an object then the total should be specified on the total property of the object, e.g.:
{
    "total": 500,
    "items": [ /* ...items */ ]
}

Rest

This store extends the Request store, to add functionality for adding, updating, and removing objects. All modifications trigger HTTP requests to the server using the corresponding RESTful HTTP methods. A get() triggers a GET, remove() triggers a DELETE, and add() and put() will trigger a PUT if an id is available or provided, and a POST will be used to create new objects with server provided ids.

For example:

myStore = new Rest({
    target: '/PathToData/'
});

All modification or retrieval methods (except getIdentity()) on Request and Rest execute asynchronously, returning a promise.

The server must respond to GET requests for an item by ID with an object representing the item (not an array).

Store

This is the base class used for all stores, providing basic functionality for tracking collection states and converting objects to be model instances. This (or any of the other classes above) can be extended for creating custom stores.

RequestMemory

This store provides client-side querying functionality, but will load its data from the server up-front, using the provided URL. This is an asynchronous store since queries and data retrieval may be made before the data has been retrieved from the server.

RequestMemory accepts the same target option for its URL as Request and Rest. Additionally, it supports a refresh method which can be called (and optionally passed a new target URL) to reload data from the server endpoint.

LocalDB

This a store based on the browser's local database/storage capabilities. Data stored in this store will be persisted in the local browser. The LocalDB will automatically load the best storage implementation based on browser's capabilities. These storage implementation follow the same interface. LocalDB will attempt to load one of these stores (highest priority first, and these can also be used directly if you do not want automatic selection):

  • dstore/db/IndexedDB - This uses the IndexedDB API. This is available on the latest version of all major browsers (introduced in IE 10 and Safari 7.1/8, but with some serious bugs).
  • dstore/db/SQL - This uses the WebSQL API. This is available on Safari and Chrome.
  • dstore/db/LocalStorage - This uses the localStorage API. This is available on all major browsers, going back to IE8. The localStorage API does not provide any indexed querying, so this loads the entire database in memory. This can be very expensive for large datasets, so this store is generally avoided, except to provide functionality on old versions of IE.
  • dstore/db/has - This is not a store, but provides feature has tests for indexeddb and sql.

The LocalDB stores requires a few extra parameters, not needed by other stores. First, it needs a database configuration object. A database configuration object defines all the stores or tables that are used by the stores, and which properties to index. There should be a single database configuration object for the entire application, and it should be passed to all the store instances. The configuration object should include a version (which should be incremented whenever the configuration is changed), and a set of stores in the stores object. Within the stores object, each property that will be used should be defined. Each property value should have a property configuration object with the following optional properties:

  • preference - This defines the priority of using this property for index-based querying. This should be a larger number for more unique properties. A boolean property would generally have a preference of 1, and a completely unique property should be 100.
  • indexed - This is a boolean indicating if a property should be indexed. This defaults to true.
  • multiEntry - This indicates the property will have an array of values, and should be indexed correspondingly. Internet Explorer's implementation of IndexedDB does not currently support multiEntry.
  • autoIncrement - This indicates if a property should automatically increment.

Alternately a number can be provided as a property configuration, and will be used as the preference.

An example database configuration object is:

var dbConfig = {
    version: 5,
    stores: {
        posts: {
            name: 10,
            id: {
                autoIncrement: true,
                preference: 100
            },
            tags: {
                multiEntry: true,
                preference: 5
            },
            content: {
                indexed: false
            }
        },
        commments: {
            author: {},
            content: {
                indexed: false
            }
        }
    }
};

In addition, each store should define a storeName property to identify which database store corresponds to the store instance. For example:

var postsStore = new LocalDB({ dbConfig: dbConfig, storeName: 'posts' });
var commentsStore = new LocalDB({ dbConfig: dbConfig, storeName: 'comments' });

Once created, these stores can be used like any other store.

Cache

This is a mixin that can be used to add caching functionality to a store. This can also be used to wrap an existing store, by using the static create function:

var cachedStore = Cache.create(existingStore, {
    cachingStore: new Memory()
});

This store has the following properties and methods:

NameDescription
cachingStoreThis can be used to define the store to be used for caching the data. By default a Memory store will be used.
isValidFetchCacheThis is a flag that indicates if the data fetched for a collection/store can be cached to fulfill subsequent fetches. This is false by default, and the value will be inherited by downstream collections. It is important to note that only full fetch() requests will fill the cache for subsequent fetch() requests. fetchRange() requests will not fulfill a collection, and subsequent fetchRange() requests will not go to the cache unless the collection has been fully loaded through a fetch() request.
allLoadedThis is a flag indicating that the given collection/store has its data loaded. This can be useful if you want to provide a caching store prepopulated with data for a given collection. If you are setting this to true, make sure you set isValidFetchCache to true as well to indicate that the data is available for fetching.
canCacheQuery(method, args)This can be a boolean or a method that will indicate if a collection can be cached (if it should have isValidFetchCache set to true), based on the query method and arguments used to derive the collection.
isLoaded(object)This can be defined to indicate if a given object in a query can be cached (by default, objects are cached).

Tree

This is a mixin that provides basic support for hierarchical data. This implements several methods that can then be used by hierarchical UI components (like dgrid with a tree column). This mixin uses a parent-based approach to finding children, retrieving the children of an object by querying for objects that have parent property with the id of the parent object. In addition, objects may have a hasChildren property to indicate if they have children (if the property is absent, it is assumed that they may have children). This mixin implements the following methods:

  • getChildren(parent) - This returns a collection representing the children of the provided parent object. This is produced by filtering for objects that have a parent property with the id of the parent object.
  • mayHaveChildren(parent) - This synchronously returns a boolean indicating whether or not the parent object might have children (the actual children may need to be retrieved asynchronously).
  • getRootCollection() - This returns the root collection, the collection of objects with parent property that is null.

The Tree mixin may serve as an example for alternate hierarchical implementations. By implementing these methods as they are in dstore/Tree, one could change the property names for data that uses different parent references or indications of children. Another option would be define the children of an object as direct references from the parent object. In this case, you would define getChildren to associate the parent object with the returned collection and override fetch and fetchRange to return a promise to the array of the children of the parent.

Trackable

The Trackable mixin adds functionality for tracking the index positions of objects as they are added, updated, or deleted. The Trackable mixin adds a track() method to create a new tracked collection. When events are fired (from modification operations, or other sources), the tracked can match the changes from the events to any cached data in the collection (which may be ordered by sorting, or filtered), and decorates the events with index positions. More information about tracked collections and events can be found in the collections documentation.

Resource Query Language

Resource Query Language (RQL) is a query language specifically designed to be easily embedded in URLs (it is a compatible superset of standard encoded query parameters), as well as easily interpreted within JavaScript for client-side querying. Therefore RQL is a query language suitable for consistent client and server-delegated queries. The dstore packages serializes complex filter/queries into RQL (RQL supersets standard query parameters, and so simple queries are simply serialized as standard query parameters).

dstore also includes support for using RQL as the query language for filtering. This can be enabled by mixing dstore/extensions/RqlQuery into your collection type:

require([
    'dojo/_base/declare',
    'dstore/Memory',
    'dstore/extensions/RqlQuery'
], function (declare, Memory, RqlQuery) {
    var RqlStore = declare([ Memory, RqlQuery ]);
    var rqlStore = new RqlStore({
        ...
    });

    rqlStore.filter('price<10|rating>3').forEach(function (product) {
        // return each product that has a price less than 10 or a rating greater than 3
    });
}};

Make sure you have installed/included the rql package if you are using the RQL query engine.

Collection

A Collection is the interface for a collection of items, which can be filtered or sorted to create new collections. When implementing this interface, every method and property is optional, and is only needed if the functionality it provides is required. However, all the included collections implement every method. Note that the objects in the collection might not be immediately retrieved from the underlying data storage until they are actually accessed through forEach(), fetch(), or fetchRange(). These fetch methods return a snapshot of the data, and if the data has changed, these methods can later be used to retrieve the latest data.

Querying

Several methods are available for querying collections. These methods allow you to define a query through several steps. Normally, stores are queried first by calling filter() to specify which objects to be included, if the filtering is needed. Next, if an order needs to be specified, the sort() method is called to ensure the results will be sorted. A typical query from a store would look like:

store.filter({priority: 'high'}).sort('dueDate').forEach(function (object) {
    // called for each item in the final result set
});

In addition, the track() method may be used to track store changes, ensuring notifications include index information about object changes, and keeping result sets up-to-date after a query. The fetch() method is an alternate way to retrieve results, providing a promise to an array for accessing query results. The sections below describes each of these methods and how to use them.

Filtering

Filtering is used to specify a subset of objects to be returned in a filtered collection. The simplest use of the filter() method is to call it with a plain object as the argument, that specifies name-value pairs that the returned objects must match. Or a filter builder can be used to construct more sophisticated filter conditions. To use the filter builder, first construct a new filter object from the Filter constructor on the collection you would be querying:

var filter = new store.Filter();

We now have a filter object, that represents a filter, without any operators applied yet. We can create new filter objects by calling the operator methods on the filter object. The operator methods will return new filter objects that hold the operator condition. For example, to specify that we want to retrieve objects with a priority property with a value of "high", and stars property with a value greater than 5, we could write:

var highPriorityFiveStarFilter = filter.eq('priority', 'high').gt('stars', 5);

This filter object can then be passed as the argument to the filter() method on a collection/store:

var highPriorityFiveStarCollection = store.filter(highPriorityFiveStarFilter);

The following methods are available on the filter objects. First are the property filtering methods, which each take a property name as the first argument, and a property value to compare for the second argument:

  • eq: Property values must equal the filter value argument.
  • ne: Property values must not equal the filter value argument.
  • lt: Property values must be less than the filter value argument.
  • lte: Property values must be less than or equal to the filter value argument.
  • gt: Property values must be greater than the filter value argument.
  • gte: Property values must be greater than or equal to the filter value argument.
  • in: An array should be passed in as the second argument, and property values must be equal to one of the values in the array.
  • match: Property values must match the provided regular expression.
  • contains: Filters for objects where the specified property's value is an array and the array contains any value that equals the provided value or satisfies the provided expression.

The following are combinatorial methods:

  • and: This takes two arguments that are other filter objects, that both must be true.
  • or: This takes two arguments that are other filter objects, where one of the two must be true.

Nesting

A few of the filters can also be built upon with other collections (potentially from other stores). In particular, you can provide a collection as the argument for the in or contains filter. This provides functionality similar to nested queries or joins. This generally will need to be combined with a select to return the correct values for matching. For example, if we wanted to find all the tasks in high priority projects, where the task store has a projectId property/column that is a foreign key, referencing objects in a project store. We can perform our nested query:

var tasksOfHighPriorityProjects = taskStore.filter(
    new Filter().in('projectId',
        projectStore.filter({ priority: 'high' }).select('id')
    )
);

Implementations

Different stores may implement filtering in different ways. The dstore/Memory will perform filtering in memory. The dstore/Request/dstore/Rest stores will translate the filters into URL query strings to send to the server. Simple queries will be in standard URL-encoded query format and complex queries will conform to RQL syntax (which is a superset of standard query format).

New filter methods can be created by subclassing dstore/Filter and adding new methods. New methods can be created by calling Filter.filterCreator and by providing the name of the new method. If you will be using new methods with stores that mix in SimpleQuery like memory stores, you can also add filter comparators by overriding the _getFilterComparator method, returning comparators for the additional types, and delegating to this.inherited for the rest.

dstore/SimpleQuery provides a simple shorthand for nested property queries - a side-effect of this is that property names that contain the period character are not supported. Example nested property query:

store.filter({ 'name.last': 'Smith' })

This would match the object:

{
	name: {
		first: 'John',
		last: 'Smith'
	}
}

For the dstore/Request/dstore/Rest stores, you can define alternate serializations of filters to URL queries for existing or new methods by overriding the _renderFilterParams. This method is called with a filter object (and by default is recursively called by combinatorial operators), and should return a string serialization of the filter, that will be inserted into the query string of the URL sent to the server.

The filter objects themselves consist of tree structures. Each filter object has two properties, the operator type, which corresponds to whichever operator was used (like eq or and), and the args, which is an array of values provided to the operator. With and and or operators, the arguments are other filter objects, forming a hierarchy. When filter operators are chained together (through sequential calls), they are combined with the and operator (each operator defined in a sub-filter object).

Collection API

The following property and methods are available on dstore collections:

Property Summary

PropertyDescription
ModelThis constructor represents the data model class to use for the objects returned from the store. All objects returned from the store should have their prototype set to the prototype property of the model, such that objects from this store should return true from object instanceof collection.Model.

Method Summary

filter(query)

This filters the collection, returning a new subset collection. The query can be an object, or a filter object, with the properties defining the constraints on matching objects. Some stores, like server or RQL stores, may accept string-based queries. Stores with in-memory capabilities (like dstore/Memory) may accept a function for filtering as well, but using the filter builder will ensure the greatest cross-store compatibility.

matchesFilter(item)

This tests the provided item to see if it matches the current filter or not.

sort(property, [descending])

This sorts the collection, returning a new ordered collection. Note that if sort is called multiple times, previous sort calls may be ignored by the store (it is up to store implementation how to handle that). If a multiple sort order is desired, use the array of sort orders defined by below.

sort([highestSortOrder, nextSortOrder...])

This also sorts the collection, but can be called to define multiple sort orders by priority. Each argument is an object with a property property and an optional descending property (defaults to ascending, if not set), to define the order. For example:

collection.sort([
	{ property: 'lastName' },
	{ property: 'firstName' }
])

would result in a new collection sorted by lastName, with firstName used to sort identical lastName values.

select([property, ...])

This selects specific properties that should be included in the returned objects.

select(property)

This will indicate that the return results will consist of the values of the given property of the queried objects. For example, this would return a collection of name values, pulled from the original collection of objects:

collection.select('name');

forEach(callback, thisObject)

This iterates over the query results. Note that this may be executed asynchronously and the callback may be called after this function returns. This will return a promise to indicate the completion of the iteration. This method forces a fetch of the data.

fetch()

Normally collections may defer the execution (like making an HTTP request) required to retrieve the results until they are actually accessed. Calling fetch() will force the data to be retrieved, returning a promise to an array.

fetchRange({start: start, end: end})

This fetches a range of objects from the collection, returning a promise to an array. The returned (and resolved) promise should have a totalLength property with a promise that resolves to a number indicating the total number of objects available in the collection.

on(type, listener, filterEvents)

This allows you to define a listener for events that take place on the collection or parent store. When an event takes place, the listener will be called with an event object as the single argument. The following event types are defined:

TypeDescription
addThis indicates that a new object was added to the store. The new object is available on the target property.
updateThis indicates that an object in the stores was updated. The updated object is available on the target property.
deleteThis indicates that an object in the stores was removed. The id of the object is available on the id property.

Setting filterEvents to true indicates the listener will be called only when the emitted event references an item (event.target) that satisfies the collection's current filter query. Note: if filterEvents is set to true for type update, the listener will be called only when the item passed to put matches the collection's query. The original item will not be evaluted. For example, a store contains items marked "to-do" and items marked "done" and one collection uses a query looking for "to-do" items and one looks for "done" items. Both collections are listening for "update" events. If an item is updated from "to-do" to "done", only the "done" collection will be notified of the update.

If detecting when an item is removed from a collection due to an update is desired, set filterEvents to false and use the matchesFilter(item) method to test if each item updated is currently in the collection.

There is also a corresponding emit(type, event) method (from the Store interface) that can be used to emit events when objects have changed.

track()

This method will create a new collection that will be tracked and updated as the parent collection changes. This will cause the events sent through the resulting collection to include an index and previousIndex property to indicate the position of the change in the collection. This is an optional method, and is usually provided by dstore/Trackable. For example, you can create an observable store class, by using dstore/Trackable as a mixin:

var TrackableMemory = declare([Memory, Trackable]);

Trackable requires client side querying functionality. Client side querying functionality is available in dstore/SimpleQuery (and inherited by dstore/Memory). If you are using a Request, Rest, or other server side store, you will need to implement client-side query functionality (by implementing querier methods), or mixin SimpleQuery:

var TrackableRest = declare([Rest, SimpleQuery, Trackable]);

Once we have created a new instance from this store, we can track a collection, which could be the top level store itself, or a downstream filtered or sorted collection:

var store = new TrackableMemory({ data: [...] });
var filteredSorted = store.filter({ inStock: true }).sort('price');
var tracked = filteredSorted.track();

Once we have a tracked collection, we can listen for notifications:

tracked.on('add, update, delete', function (event) {
    var newIndex = event.index;
    var oldIndex = event.previousIndex;
    var object = event.target;
});

Trackable requires fetched data to determine the position of modified objects and can work with either full or partial data. We can do a fetch() or forEach() to access all the items in the filtered collection:

tracked.fetch();

Or we can do a fetchRange() to make individual range requests for items in the collection:

tracked.fetchRange(0, 10);

Trackable will keep track of each page of data, and send out notifications based on the data it has available, along with index information, indicating the new and old position of the object that was modified. Regardless of whether full or partial data is fetched, tracked events and the indices they report are relative to the entire collection, not relative to individual fetched ranges. Tracked events also include a totalLength property indicating the total length of the collection.

If an object is added or updated, and falls outside of all of the fetched ranges, the index will be undefined. However, if the object falls between fetched ranges (but within one), there will also be a beforeIndex that indicates the index of the first object that the new or update objects comes before.

Custom Querying

Custom query methods can be created using the dstore/QueryMethod module. We can define our own query method, by extending a store, and defining a method with the QueryMethod. The QueryMethod constructor should be passed an object with the following possible properties:

  • type - This is a string, identifying the query method type.
  • normalizeArguments - This can be a function that takes the arguments passed to the method, and normalizes them for later execution.
  • applyQuery - This is an optional function that can be called on the resulting collection that is returned from the generated query method.
  • querierFactory - This is an optional function that can be used to define the computation of the set of objects returned from a query, on client-side or in-memory stores. It is called with the normalized arguments, and then returns a new function that will be called with an array, and is expected to return a new array.

For example, we could create a getChildren method that queried for children object, by simply returning the children property array from a parent:

declare([Memory], {
    getChildren: new QueryMethod({
        type: 'children',
        querierFactory: function (parent) {
            var parentId = this.getIdentity(parent);

            return function (data) {
                // note: in this case, the input data is ignored as this querier
                // returns an object's array of children instead

                // return the children of the parent
                // or an empty array if the parent no longer exists
                var parent = this.getSync(parentId);
                return parent ? parent.children : [];
            };
        }
	})
});

Store

A store is an extension of a collection and is an entity that not only contains a set of objects, but also provides an interface for identifying, adding, modifying, removing, and querying data. Below is the definition of the store interface. Every method and property is optional, and is only needed if the functionality it provides is required (although the provided full stores (Rest and Memory) implement all the methods except transaction() and getChildren()). Every method returns a promise for the specified return value, unless otherwise noted.

In addition to the methods and properties inherited from Collections, the Store API also exposes the following properties and methods.

Property Summary

PropertyDescription
idPropertyIf the store has a single primary key, this indicates the property to use as the identity property. The values of this property should be unique. This defaults to "id".
ModelThis is the model class to use for all the data objects that originate from this store. By default this will be set to null, so that all objects will be plain objects, but this property can be set to the class from dmodel/Model or any other model constructor. You can create your own model classes (and schemas), and assign them to a store. All objects that come from the store will have their prototype set such that they will be instances of the model. The default value of null will disable any prototype modifications and leave data as plain objects.
defaultNewToStartIf a new object is added to a store, this will indicate it if it should go to the start or end. By default, it will be placed at the end.

Method Summary

MethodDescription
get(id)This retrieves an object by its identity. This returns a promise for the object. If no object was found, the resolved value should be undefined.
getIdentity(object)This returns an object's identity (note: this should always execute synchronously).
put(object, [directives])This stores an object. It can be used to update or create an object. This returns a promise that may resolve to the object after it has been saved.
add(object, [directives])This creates an object, and throws an error if the object already exists. This should return a promise for the newly created object.
remove(id)This deletes an object, using the identity to indicate which object to delete. This returns a promise that resolves to a boolean value indicating whether the object was successfully removed.
transaction()Starts a transaction and returns a transaction object. The transaction object should include a commit() and abort() to commit and abort transactions, respectively. Note, that a store user might not call transaction() prior to using put, delete, etc. in which case these operations effectively could be thought of as “auto-commit” style actions.
create(properties)Creates and returns a new instance of the data model. The returned object will not be stored in the object store until it its save() method is called, or the store's add() is called with this object. This should always execute synchronously.
getChildren(parent)This retrieves the children of the provided parent object. This should return a new collection representing the children.
mayHaveChildren(parent)This should return true or false indicating whether or not a parent might have children. This should always return synchronously, as a way of checking if children might exist before actually retrieving all the children.
getRootCollection()This should return a collection of the top level objects in a hierarchical store.
emit(type, event)This can be used to dispatch event notifications, indicating changes to the objects in the collection. This should be called by put, add, and remove methods if the autoEmit property is false. This can also be used to notify stores if objects have changed from other sources (if a change has occurred on the server, from another user). There is a corresponding on method on collections for listening to data change events. Also, the Trackable mixin can be used to add index/position information to the events.

Synchronous Methods

Stores that can perform synchronous operations may provide analogous methods for get, put, add, and remove that end with Sync to provide synchronous support. For example getSync(id) will directly return an object instead of a promise. The dstore/Memory store provides Sync methods in addition to the promise-based methods. This behavior has been separated into distinct methods to provide consistent return types.

It is generally advisable to always use the asynchronous methods so that client code does not have to be updated in case the store is changed. However, if you have very performance intensive store accesses, the synchronous methods can be used to avoid the minor overhead imposed by promises.

MethodDescription
getSync(id)This retrieves an object by its identity. If no object was found, the returned value should be undefined.
putSync(object, [directives])This stores an object. It can be used to update or create an object. This returns the object after it has been saved.
addSync(object, [directives])This creates an object, and throws an error if the object already exists. This should return the newly created object.
removeSync(id)This deletes an object, using the identity to indicate which object to delete. This returns a boolean value indicating whether the object was successfully removed.

Promise-based API and Synchronous Operations

All CRUD methods, such as get, put, remove, and fetch, return promises. However, stores and collections may provide synchronous versions of those methods with a "Sync" suffix (e.g., Memory#fetchSync to fetch synchronously from a Memory store).

Data Modelling

In addition to handling collections of items, dstore works with the dmodel package to provides robust data modeling capabilities for managing individual objects. dmodel provides a data model class that includes multiple methods on data objects, for saving, validating, and monitoring objects for changes. By setting a model on stores, all objects returned from a store, whether a single object returned from a get() or an array of objects returned from a fetch(), will be an instance of the store's data model.

For more information, please see the dmodel project.

Adapters

Adapters make it possible work with legacy Dojo object stores and widgets that expect Dojo object stores. dstore also includes an adapter for using a store with charts. See the Adapters section for more information.

Testing

dstore uses Intern as its test runner. A full description of how to setup testing is available here. Tests can either be run using the browser, or using Sauce Labs. More information on writing your own tests with Intern can be found in the Intern wiki.

Dependencies

dstore's only required dependency is Dojo version 1.8 or higher. Running the unit tests requires the intern-geezer package (see the testing docs for more information). The extensions/RqlQuery module can leverage the rql package, but the rql package is only needed if you use this extension.

Contributing

We welcome contributions, but please read the contributing documentation to help us be able to effectively receive your contributions and pull requests.

Download Details: 

Author: SitePen
Source Code: https://github.com/SitePen/dstore 
License: View license

#javascript #framework #data #infrastructure 

SitePen/dstore: A Data infrastructure Framework
Vinnie  Erdman

Vinnie Erdman

1627165620

Development for Infrastructure Engineers and Sysadmins (dev theory)

This video kicks off the OFFICIAL Development for Infrastructure series. The series is all about how YOU can learn development and coding for a cloud development or infrastructure developer path.

Let’s face it, coding is the way of the future. It’s not that you have to build the next Twitter, but you do need to know how to automate and script.

Part 1: Developer theory and a hands-on demo to put the theory to practice.

#development #infrastructure

Development for Infrastructure Engineers and Sysadmins (dev theory)

Taking KubeMQ Build & Deploy for a Test Drive: My Thoughts and Impressions

Introduction

As a full-stack developer who often takes on DevOps and infrastructure responsibilities, the following happens all too frequently.

**Problem: **I need to set up a backend server for my app!

Solution: Google it!

Google result #1:

  • How to set up <tech 1>, <tech 2>, and <tech 3> in just five minutes, for free!
  • <skip the prereqs because who needs those, follow the instructions, hit an error>

Hmm, let’s try that again…

  • <install all the prereqs, follow the instructions, hit another error>

Okay, maybe it’s just this article. Let’s try a different one.

Google result #2:

  • Make your own <tech 2> and <tech 3> server using Docker and Kubernetes.
  • <follow the instructions, hit an error>

Well…maybe the third time’s the charm?

Guess I’ll try again in the morning…

While every tool promises to be simple to set up and use, the reality is that setting up infrastructure can be complex and unforgiving without the necessary experience. As such, I tend to shy away from instructions that consist of long lists of command-line operations, as more often than not they don’t work for me.

I’ve recently been exploring KubeMQ, a Kubernetes-native message queue. They’ve recently released a new web-based configuration tool called Build and Deploy which promises to make infrastructure setup as simple as filling in a form.

In this article, I’ll cover what KubeMQ is, what Build and Deploy adds, and we’ll run through a test scenario with an API gateway and a Redis backend.

#infrastructure #devops #kubernetes #cicd

Taking KubeMQ Build & Deploy for a Test Drive: My Thoughts and Impressions
Nels  Franecki

Nels Franecki

1625816820

Kubernetes Essential Tools 2021

Introduction

In this article I will try to summarize my favorite tools for Kubernetes with special emphasis on the newest and lesser known tools which I think will become very popular.

This is just my personal list based on my experience but, in order to avoid biases, I will try to also mention alternatives to each tool so you can compare and decide based on your needs. I will keep this article as short as I can and I will try to provide links so you can explore more on your own. My goal is to answer the question:_ “How can I do X in Kubernetes?”_ by describing tools for different software development tasks.

K3D

K3D is my favorite way to run Kubernetes(K8s) clusters on my laptop. It is extremely lightweight and very fast. It is a wrapper around K3S using Docker. So, you only need Docker to run it and it has a very low resource usage. The only problem is that** it is not fully K8s compliant**, but this shouldn’t be an issue for local development. For test environments you can use other solutions. K3D is faster than Kind, but Kind is fully compliant.

Alternatives

Krew

Krew is an essential tool to manage Kubectl plugins, this is a must have for any K8s user. I won’t go into the details of the more than 145 plugins available but at least install kubens and kubectx.

Lens

Lens is an IDE for K8s for SREs, Ops and Developers. It works with any Kubernetes distribution: on-prem or in the cloud. It is fast, easy to use and provides real time observability. With Lens it is very easy to manage many clusters. This is a must have if you are a cluster operator.

#ci-cd-pipeline #kubernetes #devops #infrastructure #k8s

Kubernetes Essential Tools 2021

Use AWS Copilot CLI to Deploy Containers on an Existing Infrastructure 

This tutorial shows how you can easily deploy containers on your existing VPC with AWS Copilot CLI.

Introduction

When you start a project where you’re containerizing your applications, it’s pretty easy to get started locally and develop your container images. But when you plan to deploy these containers in the Cloud it can be quite a challenge and a lot of work to set up the required infrastructure to run these containers.

That’s where AWS Copilot CLI comes in. This is a tool that allows developers to build, release, and operate production-ready containerized applications on Amazon ECS and AWS Fargate. Essentially it does the heavy lifting for you and creates the required infrastructure using CloudFormation. That means in a matter of minutes, you’re able to deploy your containers in your AWS account. Where normally it would take at least a day to set up the required orchestration using CloudFormation.

One of the most useful features of AWS Copilot that I discovered (released in v0.3) was that you can configure AWS Copilot to use pre-existing VPC, subnets, and CIDR ranges. This is a huge advantage when you’re in the middle of the development phase of a project.

You might ask, why is that the case? That’s because you can use your existing Infra as Code tool to deploy resources such as RDS, Elasticache, and EFS and connect them to your AWS Copilot containers. That way you can quickly set up a working container environment. In the meantime, you or your teammates can work on replicating what you’ve deployed with AWS Copilot in your preferred Infra as Code tool e.g. CloudFormation, CDK, or Terraform.

So in this post, I’ll show you how you can deploy containers on your existing VPC with AWS Copilot. As a reference case, we’ll be deploying the following Django example App and combine it with cloud-native services like Elasticache (Redis), RDS (Postgress) & Secrets Manager.

#aws-copilot #aws #infrastructure

Use AWS Copilot CLI to Deploy Containers on an Existing Infrastructure 
Zachariah  Wiza

Zachariah Wiza

1625282220

Pulumi - Infrastructure as Code (IaC) Using Programming Languages

Pulumi allows us to manage infrastructure as code (IaC) using familiar programming languages. It is like Terraform, but without HCL.

Please watch “Terraform vs. Pulumi vs. Crossplane” (https://youtu.be/RaoKcJGchKM) if you’d like to see a comparison with other IaC tools.

#pulumi #infrastructure #iac

Timecodes ⏱:
00:00 Intro
01:40 Creating a new project
06:39 Creating a cluster
15:46 Final thoughts

➡ Gist with the commands: https://gist.github.com/bcc3f1bb064f7d7d12cf46aebce81edb

🔗 Pulumi: https://www.pulumi.com/

🔗 Model script and command provisioning issue: https://github.com/pulumi/pulumi/issues/99

📚 DevOps Catalog, Patterns, And Blueprints: https://www.devopstoolkitseries.com/posts/catalog/

📚 Books and courses: https://www.devopstoolkitseries.com

🎤 Podcast: https://www.devopsparadox.com/

💬 Live streams: https://www.youtube.com/c/DevOpsParadox

➡ Follow me on Twitter: https://twitter.com/vfarcic

➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

#infrastructure #pulumi #devops

Pulumi - Infrastructure as Code (IaC) Using Programming Languages
Reid  Rohan

Reid Rohan

1624639380

How to Define and Deploy Infrastructure with Code

Pulumi, an alternative to Terraform and CloudFormation, allows you to define AWS (and other, like Azure and GCP) services using your preferred coding language.

An Introduction to Pulumi

Pulumi allows you to build cloud applications and infrastructure through the combination of the safety and reliability of IaaS (Infrastructure as a Service) and the power of programming languages you are familiar with. Rather than defining infrastructure with JSON, YAML, or a custom-built IaaS language, you can define infrastructure with the programming language of your choice, use loops, conditionals, and functions, and reuse these design patterns across different projects. For example, here is how we could create an EC2 instance with an assigned security group with the Terraform HashiCorp Configuration Language

#infrastructure #aws #ec2 #javascript

How to Define and Deploy Infrastructure with Code