Reid  Rohan

Reid Rohan

1664907300

Reaflow: Node-based Visualizations for React

🕸 reaflow


Node-based Visualizations for React 


REAFLOW is a modular diagram engine for building static or interactive editors. The library is feature-rich and modular allowing for displaying complex visualizations with total customizability.

If you are looking for network graphs, checkout reagraph.

🚀 Quick Links

✨ Features

  • Complex automatic layout leveraging ELKJS
  • Easy Node/Edge/Port customizations
  • Zooming / Panning / Centering controls
  • Drag and drop Node/Port connecting and rearranging
  • Nesting of Nodes/Edges
  • Proximity based Node linking helper
  • Node/Edge selection helper
  • Undo/Redo helper

📦 Usage

Install the package via NPM:

npm i reaflow --save

Install the package via Yarn:

yarn add reaflow

Import the component into your app and add some nodes and edges:

import React from 'react';
import { Canvas } from 'reaflow';

export default () => (
  <Canvas
    maxWidth={800}
    maxHeight={600}
    nodes={[
      {
        id: '1',
        text: '1'
      },
      {
        id: '2',
        text: '2'
      }
    ]}
    edges={[
      {
        id: '1-2',
        from: '1',
        to: '2'
      }
    ]}
  />
);

🔭 Development

If you want to run reaflow locally, its super easy!

  • Clone the repo
  • yarn install
  • yarn start
  • Browser opens to Storybook page

Download Details:

Author: Reaviz
Source Code: https://github.com/reaviz/reaflow 
License: Apache-2.0 license

#javascript #react #workflow

Reaflow: Node-based Visualizations for React

Lint-staged: Run Linters on Git Staged Files

🚫💩 lint-staged

Run linters against staged git files and don't let :poop: slip into your code base!

npm install --save-dev lint-staged # requires further setup
$ git commit

✔ Preparing lint-staged...
❯ Running tasks for staged files...
  ❯ packages/frontend/.lintstagedrc.json — 1 file
    ↓ *.js — no files [SKIPPED]
    ❯ *.{json,md} — 1 file
      ⠹ prettier --write
  ↓ packages/backend/.lintstagedrc.json — 2 files
    ❯ *.js — 2 files
      ⠼ eslint --fix
    ↓ *.{json,md} — no files [SKIPPED]
◼ Applying modifications from tasks...
◼ Cleaning up temporary files...

See asciinema video

asciicast

Why

Linting makes more sense when run before committing your code. By doing so you can ensure no errors go into the repository and enforce code style. But running a lint process on a whole project is slow, and linting results can be irrelevant. Ultimately you only want to lint files that will be committed.

This project contains a script that will run arbitrary shell tasks with a list of staged files as an argument, filtered by a specified glob pattern.

Related blog posts and talks

If you've written one, please submit a PR with the link to it!

Installation and setup

To install lint-staged in the recommended way, you need to:

  1. Install lint-staged itself:
    • npm install --save-dev lint-staged
  2. Set up the pre-commit git hook to run lint-staged
    • Husky is a popular choice for configuring git hooks
    • Read more about git hooks here
  3. Install some linters, like ESLint or Prettier
  4. Configure lint-staged to run linters and other tasks:
    • for example: { "*.js": "eslint" } to run ESLint for all staged JS files
    • See Configuration for more info

Don't forget to commit changes to package.json and .husky to share this setup with your team!

Now change a few files, git add or git add --patch some of them to your commit, and try to git commit them.

See examples and configuration for more information.

Changelog

See Releases.

Migration

v13

  • Since v13.0.0 lint-staged no longer supports Node.js 12. Please upgrade your Node.js version to at least 14.13.1, or 16.0.0 onward.

v12

  • Since v12.0.0 lint-staged is a pure ESM module, so make sure your Node.js version is at least 12.20.0, 14.13.1, or 16.0.0. Read more about ESM modules from the official Node.js Documentation site here.

v10

  • From v10.0.0 onwards any new modifications to originally staged files will be automatically added to the commit. If your task previously contained a git add step, please remove this. The automatic behaviour ensures there are less race-conditions, since trying to run multiple git operations at the same time usually results in an error.
  • From v10.0.0 onwards, lint-staged uses git stashes to improve speed and provide backups while running. Since git stashes require at least an initial commit, you shouldn't run lint-staged in an empty repo.
  • From v10.0.0 onwards, lint-staged requires Node.js version 10.13.0 or later.
  • From v10.0.0 onwards, lint-staged will abort the commit if linter tasks undo all staged changes. To allow creating an empty commit, please use the --allow-empty option.

Command line flags

❯ npx lint-staged --help
Usage: lint-staged [options]

Options:
  -V, --version                      output the version number
  --allow-empty                      allow empty commits when tasks revert all staged changes (default: false)
  -p, --concurrent <number|boolean>  the number of tasks to run concurrently, or false for serial (default: true)
  -c, --config [path]                path to configuration file, or - to read from stdin
  --cwd [path]                       run all tasks in specific directory, instead of the current
  -d, --debug                        print additional debug information (default: false)
  --diff [string]                    override the default "--staged" flag of "git diff" to get list of files. Implies
                                     "--no-stash".
  --diff-filter [string]             override the default "--diff-filter=ACMR" flag of "git diff" to get list of files
  --max-arg-length [number]          maximum length of the command-line argument string (default: 0)
  --no-stash                         disable the backup stash, and do not revert in case of errors
  -q, --quiet                        disable lint-staged’s own console output (default: false)
  -r, --relative                     pass relative filepaths to tasks (default: false)
  -x, --shell [path]                 skip parsing of tasks for better shell support (default: false)
  -v, --verbose                      show task output even when tasks succeed; by default only failed output is shown
                                     (default: false)
  -h, --help                         display help for command
  • --allow-empty: By default, when linter tasks undo all staged changes, lint-staged will exit with an error and abort the commit. Use this flag to allow creating empty git commits.
  • --concurrent [number|boolean]: Controls the concurrency of tasks being run by lint-staged. NOTE: This does NOT affect the concurrency of subtasks (they will always be run sequentially). Possible values are:
    • false: Run all tasks serially
    • true (default) : Infinite concurrency. Runs as many tasks in parallel as possible.
    • {number}: Run the specified number of tasks in parallel, where 1 is equivalent to false.
  • --config [path]: Manually specify a path to a config file or npm package name. Note: when used, lint-staged won't perform the config file search and will print an error if the specified file cannot be found. If '-' is provided as the filename then the config will be read from stdin, allowing piping in the config like cat my-config.json | npx lint-staged --config -.
  • --cwd [path]: By default tasks run in the current working directory. Use the --cwd some/directory to override this. The path can be absolute or relative to the current working directory.
  • --debug: Run in debug mode. When set, it does the following:
    • uses debug internally to log additional information about staged files, commands being executed, location of binaries, etc. Debug logs, which are automatically enabled by passing the flag, can also be enabled by setting the environment variable $DEBUG to lint-staged*.
    • uses verbose renderer for listr; this causes serial, uncoloured output to the terminal, instead of the default (beautified, dynamic) output.
  • --diff: By default linters are filtered against all files staged in git, generated from git diff --staged. This option allows you to override the --staged flag with arbitrary revisions. For example to get a list of changed files between two branches, use --diff="branch1...branch2". You can also read more from about git diff and gitrevisions.
  • --diff-filter: By default only files that are added, copied, modified, or renamed are included. Use this flag to override the default ACMR value with something else: added (A), copied (C), deleted (D), modified (M), renamed (R), type changed (T), unmerged (U), unknown (X), or pairing broken (B). See also the git diff docs for --diff-filter.
  • --max-arg-length: long commands (a lot of files) are automatically split into multiple chunks when it detects the current shell cannot handle them. Use this flag to override the maximum length of the generated command string.
  • --no-stash: By default a backup stash will be created before running the tasks, and all task modifications will be reverted in case of an error. This option will disable creating the stash, and instead leave all modifications in the index when aborting the commit.
  • --quiet: Supress all CLI output, except from tasks.
  • --relative: Pass filepaths relative to process.cwd() (where lint-staged runs) to tasks. Default is false.
  • --shell: By default linter commands will be parsed for speed and security. This has the side-effect that regular shell scripts might not work as expected. You can skip parsing of commands with this option. To use a specific shell, use a path like --shell "/bin/bash".
  • --verbose: Show task output even when tasks succeed. By default only failed output is shown.

Configuration

Lint-staged can be configured in many ways:

  • lint-staged object in your package.json
  • .lintstagedrc file in JSON or YML format, or you can be explicit with the file extension:
    • .lintstagedrc.json
    • .lintstagedrc.yaml
    • .lintstagedrc.yml
  • .lintstagedrc.mjs or lint-staged.config.mjs file in ESM format
    • the default export value should be a configuration: export default { ... }
  • .lintstagedrc.cjs or lint-staged.config.cjs file in CommonJS format
    • the exports value should be a configuration: module.exports = { ... }
  • lint-staged.config.js or .lintstagedrc.js in either ESM or CommonJS format, depending on whether your project's package.json contains the "type": "module" option or not.
  • Pass a configuration file using the --config or -c flag

Configuration should be an object where each value is a command to run and its key is a glob pattern to use for this command. This package uses micromatch for glob patterns. JavaScript files can also export advanced configuration as a function. See Using JS configuration files for more info.

You can also place multiple configuration files in different directories inside a project. For a given staged file, the closest configuration file will always be used. See "How to use lint-staged in a multi-package monorepo?" for more info and an example.

package.json example:

{
  "lint-staged": {
    "*": "your-cmd"
  }
}

.lintstagedrc example

{
  "*": "your-cmd"
}

This config will execute your-cmd with the list of currently staged files passed as arguments.

So, considering you did git add file1.ext file2.ext, lint-staged will run the following command:

your-cmd file1.ext file2.ext

Task concurrency

By default lint-staged will run configured tasks concurrently. This means that for every glob, all the commands will be started at the same time. With the following config, both eslint and prettier will run at the same time:

{
  "*.ts": "eslint",
  "*.md": "prettier --list-different"
}

This is typically not a problem since the globs do not overlap, and the commands do not make changes to the files, but only report possible errors (aborting the git commit). If you want to run multiple commands for the same set of files, you can use the array syntax to make sure commands are run in order. In the following example, prettier will run for both globs, and in addition eslint will run for *.ts files after it. Both sets of commands (for each glob) are still started at the same time (but do not overlap).

{
  "*.ts": ["prettier --list-different", "eslint"],
  "*.md": "prettier --list-different"
}

Pay extra attention when the configured globs overlap, and tasks make edits to files. For example, in this configuration prettier and eslint might try to make changes to the same *.ts file at the same time, causing a race condition:

{
  "*": "prettier --write",
  "*.ts": "eslint --fix"
}

If necessary, you can limit the concurrency using --concurrent <number> or disable it entirely with --concurrent false.

Filtering files

Linter commands work on a subset of all staged files, defined by a glob pattern. lint-staged uses micromatch for matching files with the following rules:

  • If the glob pattern contains no slashes (/), micromatch's matchBase option will enabled, so globs match a file's basename regardless of directory:
    • "*.js" will match all JS files, like /test.js and /foo/bar/test.js
    • "!(*test).js". will match all JS files, except those ending in test.js, so foo.js but not foo.test.js
  • If the glob pattern does contain a slash (/), it will match for paths as well:
    • "./*.js" will match all JS files in the git repo root, so /test.js but not /foo/bar/test.js
    • "foo/**/\*.js" will match all JS files inside the/foodirectory, so/foo/bar/test.jsbut not/test.js

When matching, lint-staged will do the following

  • Resolve the git root automatically, no configuration needed.
  • Pick the staged files which are present inside the project directory.
  • Filter them using the specified glob patterns.
  • Pass absolute paths to the linters as arguments.

NOTE: lint-staged will pass absolute paths to the linters to avoid any confusion in case they're executed in a different working directory (i.e. when your .git directory isn't the same as your package.json directory).

Also see How to use lint-staged in a multi-package monorepo?

Ignoring files

The concept of lint-staged is to run configured linter tasks (or other tasks) on files that are staged in git. lint-staged will always pass a list of all staged files to the task, and ignoring any files should be configured in the task itself.

Consider a project that uses prettier to keep code format consistent across all files. The project also stores minified 3rd-party vendor libraries in the vendor/ directory. To keep prettier from throwing errors on these files, the vendor directory should be added to prettier's ignore configuration, the .prettierignore file. Running npx prettier . will ignore the entire vendor directory, throwing no errors. When lint-staged is added to the project and configured to run prettier, all modified and staged files in the vendor directory will be ignored by prettier, even though it receives them as input.

In advanced scenarios, where it is impossible to configure the linter task itself to ignore files, but some staged files should still be ignored by lint-staged, it is possible to filter filepaths before passing them to tasks by using the function syntax. See Example: Ignore files from match.

What commands are supported?

Supported are any executables installed locally or globally via npm as well as any executable from your $PATH.

Using globally installed scripts is discouraged, since lint-staged may not work for someone who doesn't have it installed.

lint-staged uses execa to locate locally installed scripts. So in your .lintstagedrc you can write:

{
  "*.js": "eslint --fix"
}

Pass arguments to your commands separated by space as you would do in the shell. See examples below.

Running multiple commands in a sequence

You can run multiple commands in a sequence on every glob. To do so, pass an array of commands instead of a single one. This is useful for running autoformatting tools like eslint --fix or stylefmt but can be used for any arbitrary sequences.

For example:

{
  "*.js": ["eslint", "prettier --write"]
}

going to execute eslint and if it exits with 0 code, it will execute prettier --write on all staged *.js files.

Using JS configuration files

Writing the configuration file in JavaScript is the most powerful way to configure lint-staged (lint-staged.config.js, similar, or passed via --config). From the configuration file, you can export either a single function or an object.

If the exports value is a function, it will receive an array of all staged filenames. You can then build your own matchers for the files and return a command string or an array of command strings. These strings are considered complete and should include the filename arguments, if wanted.

If the exports value is an object, its keys should be glob matches (like in the normal non-js config format). The values can either be like in the normal config or individual functions like described above. Instead of receiving all matched files, the functions in the exported object will only receive the staged files matching the corresponding glob key.

Function signature

The function can also be async:

(filenames: string[]) => string | string[] | Promise<string | string[]>

Example: Export a function to build your own matchers

Click to expand

// lint-staged.config.js
import micromatch from 'micromatch'

export default (allStagedFiles) => {
  const shFiles = micromatch(allStagedFiles, ['**/src/**/*.sh'])
  if (shFiles.length) {
    return `printf '%s\n' "Script files aren't allowed in src directory" >&2`
  }
  const codeFiles = micromatch(allStagedFiles, ['**/*.js', '**/*.ts'])
  const docFiles = micromatch(allStagedFiles, ['**/*.md'])
  return [`eslint ${codeFiles.join(' ')}`, `mdl ${docFiles.join(' ')}`]
}

Example: Wrap filenames in single quotes and run once per file

Click to expand

// .lintstagedrc.js
export default {
  '**/*.js?(x)': (filenames) => filenames.map((filename) => `prettier --write '${filename}'`),
}

Example: Run tsc on changes to TypeScript files, but do not pass any filename arguments

Click to expand

// lint-staged.config.js
export default {
  '**/*.ts?(x)': () => 'tsc -p tsconfig.json --noEmit',
}

Example: Run ESLint on entire repo if more than 10 staged files

Click to expand

// .lintstagedrc.js
export default {
  '**/*.js?(x)': (filenames) =>
    filenames.length > 10 ? 'eslint .' : `eslint ${filenames.join(' ')}`,
}

Example: Use your own globs

Click to expand

It's better to use the function-based configuration (seen above), if your use case is this.

// lint-staged.config.js
import micromatch from 'micromatch'

export default {
  '*': (allFiles) => {
    const codeFiles = micromatch(allFiles, ['**/*.js', '**/*.ts'])
    const docFiles = micromatch(allFiles, ['**/*.md'])
    return [`eslint ${codeFiles.join(' ')}`, `mdl ${docFiles.join(' ')}`]
  },
}

Example: Ignore files from match

Click to expand

If for some reason you want to ignore files from the glob match, you can use micromatch.not():

// lint-staged.config.js
import micromatch from 'micromatch'

export default {
  '*.js': (files) => {
    // from `files` filter those _NOT_ matching `*test.js`
    const match = micromatch.not(files, '*test.js')
    return `eslint ${match.join(' ')}`
  },
}

Please note that for most cases, globs can achieve the same effect. For the above example, a matching glob would be !(*test).js.

Example: Use relative paths for commands

Click to expand

import path from 'path'

export default {
  '*.ts': (absolutePaths) => {
    const cwd = process.cwd()
    const relativePaths = absolutePaths.map((file) => path.relative(cwd, file))
    return `ng lint myProjectName --files ${relativePaths.join(' ')}`
  },
}

Reformatting the code

Tools like Prettier, ESLint/TSLint, or stylelint can reformat your code according to an appropriate config by running prettier --write/eslint --fix/tslint --fix/stylelint --fix. Lint-staged will automatically add any modifications to the commit as long as there are no errors.

{
  "*.js": "prettier --write"
}

Prior to version 10, tasks had to manually include git add as the final step. This behavior has been integrated into lint-staged itself in order to prevent race conditions with multiple tasks editing the same files. If lint-staged detects git add in task configurations, it will show a warning in the console. Please remove git add from your configuration after upgrading.

Examples

All examples assume you've already set up lint-staged in the package.json file and husky in its own config file.

{
  "name": "My project",
  "version": "0.1.0",
  "scripts": {
    "my-custom-script": "linter --arg1 --arg2"
  },
  "lint-staged": {}
}

In .husky/pre-commit

#!/usr/bin/env sh
. "$(dirname "$0")/_/husky.sh"

npx lint-staged

Note: we don't pass a path as an argument for the runners. This is important since lint-staged will do this for you.

ESLint with default parameters for *.js and *.jsx running as a pre-commit hook

Click to expand

{
  "*.{js,jsx}": "eslint"
}

Automatically fix code style with --fix and add to commit

Click to expand

{
  "*.js": "eslint --fix"
}

This will run eslint --fix and automatically add changes to the commit.

Reuse npm script

Click to expand

If you wish to reuse a npm script defined in your package.json:

{
  "*.js": "npm run my-custom-script --"
}

The following is equivalent:

{
  "*.js": "linter --arg1 --arg2"
}

Use environment variables with linting commands

Click to expand

Linting commands do not support the shell convention of expanding environment variables. To enable the convention yourself, use a tool like cross-env.

For example, here is jest running on all .js files with the NODE_ENV variable being set to "test":

{
  "*.js": ["cross-env NODE_ENV=test jest --bail --findRelatedTests"]
}

Automatically fix code style with prettier for any format Prettier supports

Click to expand

{
  "*": "prettier --ignore-unknown --write"
}

Automatically fix code style with prettier for JavaScript, TypeScript, Markdown, HTML, or CSS

Click to expand

{
  "*.{js,jsx,ts,tsx,md,html,css}": "prettier --write"
}

Stylelint for CSS with defaults and for SCSS with SCSS syntax

Click to expand

{
  "*.css": "stylelint",
  "*.scss": "stylelint --syntax=scss"
}

Run PostCSS sorting and Stylelint to check

Click to expand

{
  "*.scss": ["postcss --config path/to/your/config --replace", "stylelint"]
}

Minify the images

Click to expand

{
  "*.{png,jpeg,jpg,gif,svg}": "imagemin-lint-staged"
}

More about imagemin-lint-staged

imagemin-lint-staged is a CLI tool designed for lint-staged usage with sensible defaults.

See more on this blog post for benefits of this approach.

Typecheck your staged files with flow

Click to expand

{
  "*.{js,jsx}": "flow focus-check"
}

Frequently Asked Questions

Can I use lint-staged via node?

Click to expand

Yes!

import lintStaged from 'lint-staged'

try {
  const success = await lintStaged()
  console.log(success ? 'Linting was successful!' : 'Linting failed!')
} catch (e) {
  // Failed to load configuration
  console.error(e)
}

Parameters to lintStaged are equivalent to their CLI counterparts:

const success = await lintStaged({
  allowEmpty: false,
  concurrent: true,
  configPath: './path/to/configuration/file',
  cwd: process.cwd(),
  debug: false,
  maxArgLength: null,
  quiet: false,
  relative: false,
  shell: false,
  stash: true,
  verbose: false,
})

You can also pass config directly with config option:

const success = await lintStaged({
  allowEmpty: false,
  concurrent: true,
  config: { '*.js': 'eslint --fix' },
  cwd: process.cwd(),
  debug: false,
  maxArgLength: null,
  quiet: false,
  relative: false,
  shell: false,
  stash: true,
  verbose: false,
})

The maxArgLength option configures chunking of tasks into multiple parts that are run one after the other. This is to avoid issues on Windows platforms where the maximum length of the command line argument string is limited to 8192 characters. Lint-staged might generate a very long argument string when there are many staged files. This option is set automatically from the cli, but not via the Node.js API by default.

Using with JetBrains IDEs (WebStorm, PyCharm, IntelliJ IDEA, RubyMine, etc.)

Click to expand

Update: The latest version of JetBrains IDEs now support running hooks as you would expect.

When using the IDE's GUI to commit changes with the precommit hook, you might see inconsistencies in the IDE and command line. This is known issue at JetBrains so if you want this fixed, please vote for it on YouTrack.

Until the issue is resolved in the IDE, you can use the following config to work around it:

husky v1.x

{
  "husky": {
    "hooks": {
      "pre-commit": "lint-staged",
      "post-commit": "git update-index --again"
    }
  }
}

husky v0.x

{
  "scripts": {
    "precommit": "lint-staged",
    "postcommit": "git update-index --again"
  }
}

Thanks to this comment for the fix!

How to use lint-staged in a multi-package monorepo?

Click to expand

Install lint-staged on the monorepo root level, and add separate configuration files in each package. When running, lint-staged will always use the configuration closest to a staged file, so having separate configuration files makes sure linters do not "leak" into other packages.

For example, in a monorepo with packages/frontend/.lintstagedrc.json and packages/backend/.lintstagedrc.json, a staged file inside packages/frontend/ will only match that configuration, and not the one in packages/backend/.

Note: lint-staged discovers the closest configuration to each staged file, even if that configuration doesn't include any matching globs. Given these example configurations:

// ./.lintstagedrc.json
{ "*.md": "prettier --write" }
// ./packages/frontend/.lintstagedrc.json
{ "*.js": "eslint --fix" }

When committing ./packages/frontend/README.md, it will not run prettier, because the configuration in the frontend/ directory is closer to the file and doesn't include it. You should treat all lint-staged configuration files as isolated and separated from each other. You can always use JS files to "extend" configurations, for example:

import baseConfig from '../.lintstagedrc.js'

export default {
  ...baseConfig,
  '*.js': 'eslint --fix',
}

Can I lint files outside of the current project folder?

Click to expand

tl;dr: Yes, but the pattern should start with ../.

By default, lint-staged executes linters only on the files present inside the project folder(where lint-staged is installed and run from). So this question is relevant only when the project folder is a child folder inside the git repo. In certain project setups, it might be desirable to bypass this restriction. See #425, #487 for more context.

lint-staged provides an escape hatch for the same(>= v7.3.0). For patterns that start with ../, all the staged files are allowed to match against the pattern. Note that patterns like *.js, **/*.js will still only match the project files and not any of the files in parent or sibling directories.

Example repo: sudo-suhas/lint-staged-django-react-demo.

Can I run lint-staged in CI, or when there are no staged files?

Click to expand

Lint-staged will by default run against files staged in git, and should be run during the git pre-commit hook, for example. It's also possible to override this default behaviour and run against files in a specific diff, for example all changed files between two different branches. If you want to run lint-staged in the CI, maybe you can set it up to compare the branch in a Pull Request/Merge Request to the target branch.

Try out the git diff command until you are satisfied with the result, for example:

git diff --diff-filter=ACMR --name-only master...my-branch

This will print a list of added, changed, modified, and renamed files between master and my-branch.

You can then run lint-staged against the same files with:

npx lint-staged --diff="master...my-branch"

Can I use lint-staged with ng lint

Click to expand

You should not use ng lint through lint-staged, because it's designed to lint an entire project. Instead, you can add ng lint to your git pre-commit hook the same way as you would run lint-staged.

See issue !951 for more details and possible workarounds.

How can I ignore files from .eslintignore?

Click to expand

ESLint throws out warning File ignored because of a matching ignore pattern. Use "--no-ignore" to override warnings that breaks the linting process ( if you used --max-warnings=0 which is recommended ).

ESLint < 7

Click to expand

Based on the discussion from this issue, it was decided that using the outlined script is the best route to fix this.

So you can setup a .lintstagedrc.js config file to do this:

import { CLIEngine } from 'eslint'

export default {
  '*.js': (files) => {
    const cli = new CLIEngine({})
    return 'eslint --max-warnings=0 ' + files.filter((file) => !cli.isPathIgnored(file)).join(' ')
  },
}

ESLint >= 7

Click to expand

In versions of ESLint > 7, isPathIgnored is an async function and now returns a promise. The code below can be used to reinstate the above functionality.

Since 10.5.3, any errors due to a bad ESLint config will come through to the console.

import { ESLint } from 'eslint'

const removeIgnoredFiles = async (files) => {
  const eslint = new ESLint()
  const isIgnored = await Promise.all(
    files.map((file) => {
      return eslint.isPathIgnored(file)
    })
  )
  const filteredFiles = files.filter((_, i) => !isIgnored[i])
  return filteredFiles.join(' ')
}

export default {
  '**/*.{ts,tsx,js,jsx}': async (files) => {
    const filesToLint = await removeIgnoredFiles(files)
    return [`eslint --max-warnings=0 ${filesToLint}`]
  },
}

Download Details:

Author: okonet
Source Code: https://github.com/okonet/lint-staged 
License: MIT license

#javascript #git #workflow #eslint 

Lint-staged: Run Linters on Git Staged Files
Nat  Grady

Nat Grady

1661347320

A Shiny Framework for Workflow Management & Data Visualization

systemPipeShiny 

systemPipeShiny(SPS) a Shiny-based R/Bioconductor package that extends the widely used systemPipeR workflow environment with data visualization and a versatile graphical user interface. SPS can work as a general framework to build custom web apps on data analysis and visualization. Besides, SPS provides many developer tools that are distributed as separate packages.

design

Demos

SPS has provided a variety of options to change how it work. Here are some examples.

Type and linkoption changednotes
Default full installationSee installationfull app, may take longer (~15s) to load
Minimum installationSee installationno modules installed
Login enabledlogin_screen = TRUE; login_theme = "empty"no modules installed
Login and login themeslogin_screen = TRUE; login_theme = "random"no modules installed
App admin pageadmin_page = TRUEuse the link or simply add "?admin" to the end of URL of any demos

For the login required demos, the app account name is "user" password "user".

For the admin login, account name "admin", password "admin".

Please DO NOT delete or change password when you are trying the admin features. Although shinyapps.io will reset the app once a while, this will affect other people who are viewing the demo simultaneously.

Rstudio Cloud

There is an Rstudio Cloud project instance that you can also play with. You need to create a free new account. Two Bioconductor related modules - workflow & RNAseq are not installed. They require more than 1GB RAM to install and to run which is beyond the limit of a free account.

Documents

To see all the details of SPS, read the user manual on our website.

Installation

SPS is under heavy development. We recommend to install the develop version (most recent) for the latest features.

Full

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeShiny", dependencies=TRUE)

This will install all required packages including suggested packages that are required by the core modules. Be aware, it will take quite some time if you are installing on Linux where only source installation is available. Windows and Mac binary installations will be much faster.

Minimum

To install the package, please use the BiocManager::install command:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeShiny")

By the minimum installation, all the 3 core modules are not installed. You can still start the app, and when you start the app and click on these modules, it will tell to enable these modules, what packages to install and what command you need to run. Just follow the instructions. Install as you need.

Most recent

To obtain the most recent updates immediately, one can install it directly from GitHub as follow:

if (!requireNamespace("remotes", quietly=TRUE))
    install.packages("remotes")
remotes::install("systemPipeR/systemPipeShiny", dependencies=TRUE)

Similarly, remotes::install("systemPipeR/systemPipeShiny") for the minimum develop version.

Linux

If you are on Linux, you may also need the following system dependencies before installing SPS. Different distributions may have different commands, but the following commands are examples for Ubuntu:

sudo apt-get install -y libicu-dev
sudo apt-get install -y pandoc
sudo apt-get install -y zlib1g-dev
sudo apt-get install -y libcurl4-openssl-dev
sudo apt-get install -y libssl-dev      
sudo apt-get install -y make

Quick start

This is a basic example which shows how to use systempipeShiny package:

## Imports the library
library(systemPipeShiny)
## Creates the project directory
spsInit()

By default, a project folder is created and named as SPS_+DATE. This project folder provides all the necessary files to launch the application. If you are using Rstudio, global.R file will be opened automatically and this is the only file you may need to make custom changes if there is any.

Click the green "Run App" button in Rstudio if you are on the global.R file or run following in console to start the app.

## Launching the interface
shiny::runApp()

options

Change some of the options listed in global.R will change how the app behave, for example, modules to load, title, logo, whether to apply user authentication, and more ...

Other packages in systemPipeShiny

PackageDescriptionDocumentsFunction referenceDemo
systemPipeShinySPS main packagewebsitelinkdemo
spsCompsSPS UI and server componentswebsitelinkdemo
drawerSPS interactive image editing toolwebsitelinkdemo
spsUtilSPS utility functionswebsitelinkNA

Screenshots of SPS

Full app

sps

Loading screens

loading screens

Workflow module

WF

Workflow Execution

WF run

RNASeq module

RNASeq module

RNASeq module

Canvas

canvas

Admin

Admin

Admin

Admin

Debugging

Debugging

Contact & contributions

Please use https://github.com/systemPipeR/systemPipeShiny/issues for reporting bugs, issues or for suggesting new features to be implemented.

We'd love to hear from all users and developers. Submit your pull request if you have new thoughts or improvements.

Download Details:

Author: systemPipeR
Source Code: https://github.com/systemPipeR/systemPipeShiny 

#r #datavisualization #workflow 

A Shiny Framework for Workflow Management & Data Visualization
Dang  Tu

Dang Tu

1660391340

Cách Tạo Quy Trình Làm Việc Tạm Dừng Và Chờ Sự Kiện

Trong Quy trình công việc , thật dễ dàng để chuỗi các dịch vụ khác nhau thành một quy trình làm việc tự động. Đối với một số trường hợp sử dụng, bạn có thể cần phải tạm dừng thực thi dòng công việc và đợi một số đầu vào. Đầu vào này có thể là sự chấp thuận của con người hoặc dịch vụ bên ngoài gọi lại với dữ liệu cần thiết để hoàn thành quy trình làm việc.

Với lệnh gọi lại Dòng công việc , một dòng công việc có thể tạo một điểm cuối HTTP và tạm dừng thực thi cho đến khi nó nhận được lệnh gọi lại HTTP đến điểm cuối đó. Điều này rất hữu ích để tạo quy trình làm việc kiểu người ở giữa. Trong một bài đăng trên blog trước đây , Guillaume Laforge đã chỉ ra cách xây dựng quy trình dịch tự động với sự xác thực của con người bằng cách sử dụng lệnh gọi lại.

Gọi lại là rất tốt, nhưng ai đó (hoặc một số dịch vụ) phải biết điểm cuối gọi lại và thực hiện cuộc gọi HTTP đến điểm cuối đó. Và, nhiều dịch vụ gửi hoặc tạo các sự kiện, thay vì gọi các điểm cuối HTTP. Sẽ thật tuyệt nếu bạn có thể tạm dừng thực hiện quy trình công việc và tiếp tục khi nhận được một sự kiện cụ thể? Ví dụ: bạn có thể sử dụng khả năng này để tạo quy trình làm việc tạm dừng và chờ thư mới từ chủ đề Pub / Sub hoặc một tệp mới được tạo trong nhóm Cloud Storage.

Gọi lại sự kiện

Mặc dù Dòng công việc không cung cấp lệnh gọi lại sự kiện, nhưng có thể để thực thi dòng công việc chờ một sự kiện bằng cách sử dụng lệnh gọi lại, Firestore và Eventarc.

Ý tưởng là sử dụng Firestore để lưu trữ chi tiết cuộc gọi lại từ quy trình làm việc ban đầu của bạn, sử dụng Eventarc để lắng nghe các sự kiện từ các nguồn sự kiện khác nhau và sử dụng quy trình làm việc được kích hoạt bởi sự kiện thứ hai để nhận các sự kiện từ Eventarc và gọi lại các điểm cuối gọi lại trong quy trình làm việc ban đầu của bạn.

Ngành kiến ​​​​trúc

Đây là cách thiết lập chi tiết hơn:

  1. Dòng callback-event-samplecông việc ( yaml ) tạo một lệnh gọi lại cho nguồn sự kiện mà nó quan tâm đến các sự kiện đang chờ từ đó.
  2. Nó lưu trữ thông tin gọi lại cho nguồn sự kiện trong một tài liệu trong Firestore.
  3. Nó tiếp tục với quy trình làm việc của mình và tại một thời điểm nào đó, bắt đầu chờ đợi một sự kiện.
  4. Trong thời gian chờ đợi, callback-event-listener( yaml ) đã sẵn sàng để được kích hoạt bởi các sự kiện từ chủ đề Pub / Sub và nhóm Cloud Storage với Eventarc.
  5. Tại một thời điểm nào đó, Eventarc nhận một sự kiện và kích hoạt trình lắng nghe sự kiện.
  6. Trình nghe sự kiện tìm tài liệu cho nguồn sự kiện trong Firestore.
  7. Trình xử lý sự kiện gọi lại tất cả các URL gọi lại được đăng ký với nguồn sự kiện đó. callback-event-sampleluồng công việc nhận sự kiện thông qua điểm cuối gọi lại của nó và ngừng chờ đợi.
  8. Nó xóa URL gọi lại khỏi Firestore và tiếp tục thực hiện.

Hướng dẫn thiết lập chi tiết cùng với định nghĩa quy trình làm việc YAML có trên GitHub .

Bài kiểm tra

Để kiểm tra các lệnh gọi lại sự kiện, trước tiên hãy thực hiện dòng công việc mẫu:

gcloud workflows run callback-event-sample

Sau khi bắt đầu, quy trình sẽ tạm dừng và chờ, bạn có thể xác nhận điều này bằng trạng thái đang chạy của nó trên Google Cloud Console:

Để kiểm tra các lệnh gọi lại Pub / Sub, hãy gửi tin nhắn Pub / Sub:

TOPIC =topic-callback
gcloud pubsub topics publish $TOPIC --message= "Hello World"

Trong nhật ký dòng công việc, bạn sẽ thấy rằng dòng công việc đã bắt đầu và dừng chờ sự kiện:

Started waiting for an event from source topic-callback
Stopped waiting for an event from source topic-callback

Để kiểm tra các sự kiện trên Cloud Storage, hãy tải một tệp mới lên Cloud Storage:

BUCKET = $PROJECT_ID-bucket-callback
echo "Hello World" > random.txt gsutil cp random.txt
gs://$BUCKET/random.txt

Bạn sẽ thấy rằng quy trình làm việc đã bắt đầu và dừng chờ sự kiện:

Đã bắt đầu đợi sự kiện từ nguồn $ PROJECT_ID-bucket-callback Đã dừng chờ sự kiện từ nguồn $ PROJECT_ID-bucket-callback

Tại thời điểm này, dòng công việc sẽ ngừng thực thi và bạn cũng sẽ thấy các sự kiện Pub / Sub và Cloud Storage đã nhận trong đầu ra:

Bài đăng trên blog này đề cập đến cách tạo quy trình công việc chờ các sự kiện Pub / Sub và Cloud Storage bằng cách sử dụng callback, Firestore và Eventarc. Bạn có thể mở rộng mẫu để nghe bất kỳ sự kiện nào mà Eventarc hỗ trợ . Vui lòng liên hệ với tôi trên Twitter @meteatamel nếu có bất kỳ câu hỏi hoặc phản hồi nào.

Liên kết: https://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a

#workflow 

Cách Tạo Quy Trình Làm Việc Tạm Dừng Và Chờ Sự Kiện
Hans  Marvin

Hans Marvin

1660384083

How to Create Workflows That Pause and Wait for Events

In Workflows, it’s easy to chain various services together into an automated workflow. For some use cases, you might need to pause workflow execution and wait for some input. This input could be a human approval or an external service calling back with data needed to complete the workflow.

​​With Workflows callbacks, a workflow can create an HTTP endpoint and pause execution until it receives an HTTP callback to that endpoint. This is very useful for creating human-in-the-middle type workflows. In a previous blog post, Guillaume Laforge showed how to build an automated translation workflow with human validation using callbacks.

Callbacks are great, but someone (or some service) has to know the callback endpoint and make an HTTP call to that endpoint. And, many services send or generate events, instead of calling HTTP endpoints. Wouldn’t it be nice if you could pause a workflow execution and resume when a specific event is received? For example, you could use this capability to create workflows that pause and wait for a new message from a Pub/Sub topic or a new file to be created in a Cloud Storage bucket.

See more at: https://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a

#workflow 

How to Create Workflows That Pause and Wait for Events
Thierry  Perret

Thierry Perret

1660376820

Créer Des Flux De Travail Qui Mettent En Pause Et Attendent

Dans Workflows , il est facile d'enchaîner divers services dans un workflow automatisé. Pour certains cas d'utilisation, vous devrez peut-être interrompre l'exécution du flux de travail et attendre une entrée. Cette entrée peut être une approbation humaine ou un service externe rappelant les données nécessaires pour terminer le flux de travail.

​​Avec les rappels de workflows , un workflow peut créer un point de terminaison HTTP et interrompre l'exécution jusqu'à ce qu'il reçoive un rappel HTTP vers ce point de terminaison. Ceci est très utile pour créer des flux de travail de type human-in-the-middle. Dans un article de blog précédent , Guillaume Laforge a montré comment créer un flux de travail de traduction automatisé avec validation humaine à l'aide de rappels.

Les rappels sont excellents, mais quelqu'un (ou un service) doit connaître le point de terminaison de rappel et effectuer un appel HTTP vers ce point de terminaison. De plus, de nombreux services envoient ou génèrent des événements au lieu d'appeler des points de terminaison HTTP. Ne serait-il pas agréable de pouvoir suspendre l'exécution d'un flux de travail et de reprendre lorsqu'un événement spécifique est reçu ? Par exemple, vous pouvez utiliser cette fonctionnalité pour créer des workflows qui s'interrompent et attendent un nouveau message d'un sujet Pub/Sub ou un nouveau fichier à créer dans un bucket Cloud Storage.

Rappels d'événements

Bien que Workflows ne fournisse pas de rappels d'événements prêts à l'emploi, il est possible qu'une exécution de workflow attende un événement en utilisant des rappels, Firestore et Eventarc.

L'idée est d'utiliser Firestore pour stocker les détails de rappel de votre flux de travail d'origine, d'utiliser Eventarc pour écouter les événements provenant de diverses sources d'événements et d'utiliser un deuxième flux de travail déclenché par un événement pour recevoir des événements d'Eventarc et rappeler les points de terminaison de rappel dans votre flux de travail d'origine.

Architecture

Voici la configuration plus en détail :

  1. Un callback-event-sampleflux de travail ( yaml ) crée un rappel pour une source d'événements à partir de laquelle il souhaite attendre des événements.
  2. Il stocke les informations de rappel pour la source de l'événement dans un document dans Firestore.
  3. Il poursuit son flux de travail et, à un moment donné, commence à attendre un événement.
  4. En attendant, callback-event-listener( yaml ) est prêt à être déclenché par des événements d'un sujet Pub/Sub et d'un bucket Cloud Storage avec Eventarc.
  5. À un moment donné, Eventarc reçoit un événement et déclenche l'écouteur d'événement.
  6. L'écouteur d'événement trouve le document pour la source de l'événement dans Firestore.
  7. L'écouteur d'événement rappelle toutes les URL de rappel enregistrées avec cette source d'événement. callback-event-samplele workflow reçoit l'événement via son point de terminaison de rappel et arrête d'attendre.
  8. Il supprime l'URL de rappel de Firestore et poursuit son exécution.

Des instructions de configuration détaillées ainsi que des définitions de flux de travail YAML se trouvent sur GitHub .

Test

Pour tester les rappels d'événements, exécutez d'abord l'exemple de workflow :

gcloud workflows run callback-event-sample

Une fois qu'il démarre, le workflow s'interrompt et attend, ce que vous pouvez confirmer par son état d'exécution sur Google Cloud Console :

Pour tester les rappels Pub/Sub, envoyez un message Pub/Sub :

TOPIC =topic-callback
gcloud pubsub topics publish $TOPIC --message= "Hello World"

Dans les journaux de workflow, vous devriez voir que le workflow a démarré et s'est arrêté en attendant l'événement :

Started waiting for an event from source topic-callback
Stopped waiting for an event from source topic-callback

Pour tester les événements Cloud Storage, importez un nouveau fichier dans Cloud Storage :

BUCKET = $PROJECT_ID-bucket-callback
echo "Hello World" > random.txt gsutil cp random.txt
gs://$BUCKET/random.txt

Vous devriez voir que le workflow a démarré et s'est arrêté en attendant l'événement :

A commencé à attendre un événement de la source $PROJECT_ID-bucket-callback A arrêté d'attendre un événement à partir de la source $PROJECT_ID-bucket-callback

À ce stade, le workflow doit cesser de s'exécuter et vous devez également voir les événements Pub/Sub et Cloud Storage reçus dans la sortie :

Cet article de blog a expliqué comment créer des workflows qui attendent les événements Pub/Sub et Cloud Storage à l'aide de rappels, Firestore et Eventarc. Vous pouvez étendre l'échantillon pour écouter tous les événements pris en charge par Eventarc . N'hésitez pas à me contacter sur Twitter @meteatamel pour toute question ou commentaire.

Lien : https://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a

#workflow 

Créer Des Flux De Travail Qui Mettent En Pause Et Attendent

Как создать рабочие процессы, которые приостанавливают и ждут событий

В рабочих процессах можно легко объединить различные службы в автоматизированный рабочий процесс . В некоторых случаях может потребоваться приостановить выполнение рабочего процесса и дождаться ввода данных. Этими входными данными могут быть одобрение человека или внешняя служба, перезванивающая с данными, необходимыми для завершения рабочего процесса.

С помощью обратных вызовов рабочих процессов рабочий процесс может создать конечную точку HTTP и приостановить выполнение до тех пор, пока не получит обратный вызов HTTP для этой конечной точки. Это очень полезно для создания рабочих процессов типа «человек посередине». В предыдущем сообщении в блоге Гийом Лафорж показал , как создать автоматизированный рабочий процесс перевода с проверкой человеком с помощью обратных вызовов.

Обратные вызовы — это прекрасно, но кто-то (или какая-то служба) должен знать конечную точку обратного вызова и выполнить HTTP-вызов этой конечной точке. Кроме того, многие службы отправляют или генерируют события вместо вызова конечных точек HTTP. Было бы неплохо, если бы вы могли приостанавливать выполнение рабочего процесса и возобновлять его при получении определенного события? Например, вы можете использовать эту возможность для создания рабочих процессов, которые приостанавливаются и ждут нового сообщения из темы Pub/Sub или нового файла, который будет создан в корзине Cloud Storage.

Обратные вызовы событий

Хотя рабочие процессы не предоставляют готовые обратные вызовы событий, можно заставить выполнение рабочего процесса ожидать события с помощью обратных вызовов, Firestore и Eventarc.

Идея состоит в том, чтобы использовать Firestore для хранения сведений об обратном вызове из вашего исходного рабочего процесса, использовать Eventarc для прослушивания событий из различных источников событий и использовать второй рабочий процесс, запускаемый событием, для получения событий от Eventarc и обратного вызова к конечным точкам обратного вызова в исходном рабочем процессе.

Архитектура

Вот настройка более подробно:

  1. Рабочий callback-event-sampleпроцесс ( yaml ) создает обратный вызов для источника событий, от которого он заинтересован в ожидании событий.
  2. Он хранит информацию обратного вызова для источника события в документе в Firestore.
  3. Он продолжает свой рабочий процесс и в какой-то момент начинает ждать события.
  4. Тем временем callback-event-listener( yaml ) готов к запуску событиями из темы Pub/Sub и корзины облачного хранилища с Eventarc.
  5. В какой-то момент Eventarc получает событие и запускает прослушиватель событий.
  6. Прослушиватель событий находит документ для источника события в Firestore.
  7. Прослушиватель событий вызывает обратно все URL-адреса обратного вызова, зарегистрированные в этом источнике событий. callback-event-sampleрабочий процесс получает событие через свою конечную точку обратного вызова и прекращает ожидание.
  8. Он удаляет URL-адрес обратного вызова из Firestore и продолжает его выполнение.

Подробные инструкции по настройке вместе с определениями рабочего процесса YAML находятся на GitHub .

Тест

Чтобы протестировать обратные вызовы событий, сначала выполните пример рабочего процесса:

gcloud workflows run callback-event-sample

После запуска рабочий процесс приостановится и будет ждать, что вы можете подтвердить по его рабочему состоянию в Google Cloud Console:

Чтобы протестировать обратные вызовы Pub/Sub, отправьте сообщение Pub/Sub:

TOPIC =topic-callback
gcloud pubsub topics publish $TOPIC --message= "Hello World"

В журналах рабочего процесса вы должны увидеть, что рабочий процесс запустился и остановился в ожидании события:

Started waiting for an event from source topic-callback
Stopped waiting for an event from source topic-callback

Чтобы протестировать события Cloud Storage, загрузите новый файл в Cloud Storage:

BUCKET = $PROJECT_ID-bucket-callback
echo "Hello World" > random.txt gsutil cp random.txt
gs://$BUCKET/random.txt

Вы должны увидеть, что рабочий процесс запустился и остановился в ожидании события:

Начато ожидание события из источника $PROJECT_ID-bucket-callback Прекращено ожидание события из источника $PROJECT_ID-bucket-callback

На этом этапе рабочий процесс должен прекратить выполнение, и вы также должны увидеть полученные события Pub/Sub и Cloud Storage в выходных данных:

В этом сообщении блога рассказывается, как создавать рабочие процессы, ожидающие событий Pub/Sub и облачного хранилища, с помощью обратных вызовов, Firestore и Eventarc. Вы можете расширить образец для прослушивания любых событий, которые поддерживает Eventarc . Не стесняйтесь обращаться ко мне в Твиттере @meteatamel с любыми вопросами или отзывами.

Ссылка: https://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a

#workflow 

Как создать рабочие процессы, которые приостанавливают и ждут событий
田辺  亮介

田辺 亮介

1660362240

如何創建暫停和等待事件的工作流

Workflows中,很容易將各種服務鏈接到一個自動化的工作流中。對於某些用例,您可能需要暫停工作流執行並等待一些輸入。此輸入可能是人工批准或外部服務調用完成工作流所需的數據。

​​使用 Workflows回調,工作流可以創建一個 HTTP 端點並暫停執行,直到它接收到該端點的 HTTP 回調。這對於創建中間人類型的工作流非常有用。在之前的博客文章中,Guillaume Laforge展示瞭如何使用回調來構建具有人工驗證的自動翻譯工作流程。

回調很棒,但必須有人(或某些服務)知道回調端點並對該端點進行 HTTP 調用。而且,許多服務發送或生成事件,而不是調用 HTTP 端點。如果您可以暫停工作流執行並在收到特定事件時恢復,那不是很好嗎?例如,您可以使用此功能創建暫停並等待來自 Pub/Sub 主題的新消息或要在 Cloud Storage 存儲桶中創建新文件的工作流。

事件回調

雖然 Workflows 不提供開箱即用的事件回調,但可以使用回調、Firestore 和 Eventarc 讓工作流執行等待事件。

這個想法是使用 Firestore 存儲來自原始工作流的回調詳細信息,使用 Eventarc 偵聽來自各種事件源的事件,並使用第二個事件觸發的工作流從 Eventarc 接收事件並回調原始工作流中的回調端點。

建築學

以下是更詳細的設置:

  1. callback-event-sample工作流 ( yaml ) 為它有興趣等待事件的事件源創建回調。
  2. 它將事件源的回調信息存儲在 Firestore 的文檔中。
  3. 它繼續其工作流程,並在某個時候開始等待事件。
  4. 同時,callback-event-listener( yaml ) 已準備好由來自 Pub/Sub 主題和帶有 Eventarc 的 Cloud Storage 存儲桶的事件觸發。
  5. 在某些時候,Eventarc 接收到一個事件並觸發事件偵聽器。
  6. 事件偵聽器在 Firestore 中查找事件源的文檔。
  7. 事件監聽器回調所有註冊到該事件源的回調 URL。callback-event-sample工作流通過其回調端點接收事件並停止等待。
  8. 它從 Firestore 中刪除回調 URL 並繼續執行。

GitHub 上提供了詳細的設置說明以及 YAML 工作流定義。

測試

要測試事件回調,首先執行示例工作流:

gcloud workflows run callback-event-sample

啟動後,工作流程將暫停並等待,您可以通過 Google Cloud Console 上的運行狀態來確認:

要測試 Pub/Sub 回調,請發送 Pub/Sub 消息:

TOPIC =topic-callback
gcloud pubsub topics publish $TOPIC --message= "Hello World"

在工作流日誌中,您應該看到工作流開始和停止等待事件:

Started waiting for an event from source topic-callback
Stopped waiting for an event from source topic-callback

要測試 Cloud Storage 事件,請將新文件上傳到 Cloud Storage:

BUCKET = $PROJECT_ID-bucket-callback
echo "Hello World" > random.txt gsutil cp random.txt
gs://$BUCKET/random.txt

您應該看到工作流開始和停止等待事件:

開始等待來自源 $PROJECT_ID-bucket-callback 的事件 停止等待來自源 $PROJECT_ID-bucket-callback 的事件

此時,工作流應該停止執行,您還應該在輸出中看到收到的 Pub/Sub 和 Cloud Storage 事件:

這篇博文介紹瞭如何使用回調、Firestore 和 Eventarc 創建等待 Pub/Sub 和 Cloud Storage 事件的工作流。您可以擴展示例以偵聽Eventarc 支持的任何事件。如有任何問題或反饋,請隨時在 Twitter @meteatamel上與我聯繫。

鏈接:https ://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a

#workflow

如何創建暫停和等待事件的工作流

Weave.jl: Scientific Reports/literate Programming for Julia

Weave  

Weave is a scientific report generator/literate programming tool for the Julia programming language. It resembles Pweave, knitr, R Markdown, and Sweave.

You can write your documentation and code in input document using Markdown, Noweb or ordinal Julia script syntax, and then use weave function to execute code and generate an output document while capturing results and figures.

Current features

  • Publish markdown directly to HTML and PDF using Julia or Pandoc
  • Execute code as in terminal or in a unit of code chunk
  • Capture Plots.jl or Gadfly.jl figures
  • Supports various input format: Markdown, Noweb, Jupyter Notebook, and ordinal Julia script
  • Conversions between those input formats
  • Supports various output document formats: HTML, PDF, GitHub markdown, Jupyter Notebook, MultiMarkdown, Asciidoc and reStructuredText
  • Simple caching of results

Citing Weave: Pastell, Matti. 2017. Weave.jl: Scientific Reports Using Julia. The Journal of Open Source Software. http://dx.doi.org/10.21105/joss.00204

Weave in Juno demo

Installation

You can install the latest release using Julia package manager:

using Pkg
Pkg.add("Weave")

Usage

using Weave

# add depencies for the example
using Pkg; Pkg.add(["Plots", "DSP"])

filename = normpath(Weave.EXAMPLE_FOLDER, "FIR_design.jmd")
weave(filename, out_path = :pwd)

If you have LaTeX installed you can also weave directly to pdf.

filename = normpath(Weave.EXAMPLE_FOLDER, "FIR_design.jmd")
weave(filename, out_path = :pwd, doctype = "md2pdf")

NOTE: Weave.EXAMPLE_FOLDER just points to examples directory.

Documentation

Documenter.jl with MKDocs generated documentation:

Editor support

Install language-weave to add Weave support to Juno. It allows running code from Weave documents with usual keybindings and allows preview of html and pdf output.

The Julia extension for Visual Studio Code adds Weave support to Visual Studio Code.

Contributing

You can contribute to this package by opening issues on GitHub or implementing things yourself and making a pull request. We'd also appreciate more example documents written using Weave.

Contributors

You can see the list of contributors on GitHub: https://github.com/JunoLab/Weave.jl/graphs/contributors . Thanks for the important additions, fixes and comments.

Example projects using Weave

Download Details:

Author: JunoLab
Source Code: https://github.com/JunoLab/Weave.jl 
License: MIT license

#julia #programming #workflow 

Weave.jl: Scientific Reports/literate Programming for Julia

Pipelines.jl: A Lightweight Julia Package for Computational Pipelines

Pipelines.jl

A lightweight Julia package for computational pipelines.

Building reusable pipelines and workflows is easier than you have ever thought.

Package Features

Easy to build both simple and complex tasks.

Supports external command lines and pure Julia functions.

Supports resuming interrupted tasks, retrying failed tasks, and skipping finished tasks.

Supports dependency check.

Supports inputs, outputs validation, and so on.

Supports program queuing and workload management with JobSchedulers.jl

Installation

Pipelines.jl can be installed using the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run

pkg> add Pipelines

To use the package, type

using Pipelines

Documentation

  • STABLEdocumentation of the most recently tagged version.
  • DEVELdocumentation of the in-development version.

Download Details:

Author: cihga39871
Source Code: https://github.com/cihga39871/Pipelines.jl 
License: MIT license

#julia #workflow 

Pipelines.jl: A Lightweight Julia Package for Computational Pipelines
Nat  Grady

Nat Grady

1660179960

Drake: An R-focused Pipeline toolkit for Reproducibility

Drake is superseded. Consider targets instead.

As of 2021-01-21, drake is superseded. The targets R package is the long-term successor of drake, and it is more robust and easier to use. Please visit https://books.ropensci.org/targets/drake.html for full context and advice on transitioning.

The drake R package

Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. How much of that valuable output can you keep, and how much do you need to update? How much runtime must you endure all over again?

For projects in R, the drake package can help. It analyzes your workflow, skips steps with up-to-date results, and orchestrates the rest with optional distributed computing. At the end, drake provides evidence that your results match the underlying code and data, which increases your ability to trust your research.

Video

That Feeling of Workflowing (Miles McBain)

workflowing 

(By Miles McBain; venue, resources)

rOpenSci Community Call

commcall 

(resources)

What gets done stays done.

Too many data science projects follow a Sisyphean loop:

  1. Launch the code.
  2. Wait while it runs.
  3. Discover an issue.
  4. Rerun from scratch.

For projects with long runtimes, this process gets tedious. But with drake, you can automatically

  1. Launch the parts that changed since last time.
  2. Skip the rest.

How it works

To set up a project, load your packages,

library(drake)
library(dplyr)
library(ggplot2)
library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following objects are masked from 'package:drake':
#> 
#>     expand, gather

load your custom functions,

create_plot <- function(data) {
  ggplot(data) +
    geom_histogram(aes(x = Ozone)) +
    theme_gray(24)
}

check any supporting files (optional),

# Get the files with drake_example("main").
file.exists("raw_data.xlsx")
#> [1] TRUE
file.exists("report.Rmd")
#> [1] TRUE

and plan what you are going to do.

plan <- drake_plan(
  raw_data = readxl::read_excel(file_in("raw_data.xlsx")),
  data = raw_data %>%
    mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE))),
  hist = create_plot(data),
  fit = lm(Ozone ~ Wind + Temp, data),
  report = rmarkdown::render(
    knitr_in("report.Rmd"),
    output_file = file_out("report.html"),
    quiet = TRUE
  )
)

plan
#> # A tibble: 5 x 2
#>   target   command                                                              
#>   <chr>    <expr_lst>                                                           
#> 1 raw_data readxl::read_excel(file_in("raw_data.xlsx"))                        …
#> 2 data     raw_data %>% mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TR…
#> 3 hist     create_plot(data)                                                   …
#> 4 fit      lm(Ozone ~ Wind + Temp, data)                                       …
#> 5 report   rmarkdown::render(knitr_in("report.Rmd"), output_file = file_out("re…

So far, we have just been setting the stage. Use make() or r_make() to do the real work. Targets are built in the correct order regardless of the row order of plan.

make(plan) # See also r_make().
#> ▶ target raw_data
#> ▶ target data
#> ▶ target fit
#> ▶ target hist
#> ▶ target report

Except for files like report.html, your output is stored in a hidden .drake/ folder. Reading it back is easy.

readd(data) # See also loadd().
#> # A tibble: 153 x 6
#>    Ozone Solar.R  Wind  Temp Month   Day
#>    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  41       190   7.4    67     5     1
#>  2  36       118   8      72     5     2
#>  3  12       149  12.6    74     5     3
#>  4  18       313  11.5    62     5     4
#>  5  42.1      NA  14.3    56     5     5
#>  6  28        NA  14.9    66     5     6
#>  7  23       299   8.6    65     5     7
#>  8  19        99  13.8    59     5     8
#>  9   8        19  20.1    61     5     9
#> 10  42.1     194   8.6    69     5    10
#> # … with 143 more rows

You may look back on your work and see room for improvement, but it’s all good! The whole point of drake is to help you go back and change things quickly and painlessly. For example, we forgot to give our histogram a bin width.

readd(hist)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

So let’s fix the plotting function.

create_plot <- function(data) {
  ggplot(data) +
    geom_histogram(aes(x = Ozone), binwidth = 10) +
    theme_gray(24)
}

drake knows which results are affected.

vis_drake_graph(plan) # See also r_vis_drake_graph().

hist1

The next make() just builds hist and report.html. No point in wasting time on the data or model.

make(plan) # See also r_make().
#> ▶ target hist
#> ▶ target report
loadd(hist)
hist

Reproducibility with confidence

The R community emphasizes reproducibility. Traditional themes include scientific replicability, literate programming with knitr, and version control with git. But internal consistency is important too. Reproducibility carries the promise that your output matches the code and data you say you used. With the exception of non-default triggers and hasty mode, drake strives to keep this promise.

Evidence

Suppose you are reviewing someone else’s data analysis project for reproducibility. You scrutinize it carefully, checking that the datasets are available and the documentation is thorough. But could you re-create the results without the help of the original author? With drake, it is quick and easy to find out.

make(plan) # See also r_make().
#> ℹ unloading 1 targets from environment
#> ✓ All targets are already up to date.

outdated(plan) # See also r_outdated().
#> character(0)

With everything already up to date, you have tangible evidence of reproducibility. Even though you did not re-create the results, you know the results are recreatable. They faithfully show what the code is producing. Given the right package environment and system configuration, you have everything you need to reproduce all the output by yourself.

Ease

When it comes time to actually rerun the entire project, you have much more confidence. Starting over from scratch is trivially easy.

clean()    # Remove the original author's results.
make(plan) # Independently re-create the results from the code and input data.
#> ▶ target raw_data
#> ▶ target data
#> ▶ target fit
#> ▶ target hist
#> ▶ target report

Big data efficiency

Select specialized data formats to increase speed and reduce memory consumption. In version 7.5.2.9000 and above, the available formats are “fst” for data frames (example below) and “keras” for Keras models (example here).

library(drake)
n <- 1e8 # Each target is 1.6 GB in memory.
plan <- drake_plan(
  data_fst = target(
    data.frame(x = runif(n), y = runif(n)),
    format = "fst"
  ),
  data_old = data.frame(x = runif(n), y = runif(n))
)
make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#>   target   elapsed              user                 system    
#>   <chr>    <Duration>           <Duration>           <Duration>
#> 1 data_fst 13.93s               37.562s              7.954s    
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s

History and provenance

As of version 7.5.2, drake tracks the history and provenance of your targets: what you built, when you built it, how you built it, the arguments you used in your function calls, and how to get the data back. (Disable with make(history = FALSE))

history <- drake_history(analyze = TRUE)
history
#> # A tibble: 12 x 11
#>    target current built exists hash  command   seed runtime na.rm quiet
#>    <chr>  <lgl>   <chr> <lgl>  <chr> <chr>    <int>   <dbl> <lgl> <lgl>
#>  1 data   TRUE    2020… TRUE   11e2… "raw_d… 1.29e9 0.011   TRUE  NA   
#>  2 data   TRUE    2020… TRUE   11e2… "raw_d… 1.29e9 0.00400 TRUE  NA   
#>  3 fit    TRUE    2020… TRUE   3c87… "lm(Oz… 1.11e9 0.006   NA    NA   
#>  4 fit    TRUE    2020… TRUE   3c87… "lm(Oz… 1.11e9 0.002   NA    NA   
#>  5 hist   FALSE   2020… TRUE   88ae… "creat… 2.10e8 0.011   NA    NA   
#>  6 hist   TRUE    2020… TRUE   0304… "creat… 2.10e8 0.003   NA    NA   
#>  7 hist   TRUE    2020… TRUE   0304… "creat… 2.10e8 0.009   NA    NA   
#>  8 raw_d… TRUE    2020… TRUE   855d… "readx… 1.20e9 0.02    NA    NA   
#>  9 raw_d… TRUE    2020… TRUE   855d… "readx… 1.20e9 0.0330  NA    NA   
#> 10 report TRUE    2020… TRUE   5504… "rmark… 1.30e9 1.31    NA    TRUE 
#> 11 report TRUE    2020… TRUE   5504… "rmark… 1.30e9 0.413   NA    TRUE 
#> 12 report TRUE    2020… TRUE   5504… "rmark… 1.30e9 0.475   NA    TRUE 
#> # … with 1 more variable: output_file <chr>

Remarks:

  • The quiet column appears above because one of the drake_plan() commands has knit(quiet = TRUE).
  • The hash column identifies all the previous versions of your targets. As long as exists is TRUE, you can recover old data.
  • Advanced: if you use make(cache_log_file = TRUE) and put the cache log file under version control, you can match the hashes from drake_history() with the git commit history of your code.

Let’s use the history to recover the oldest histogram.

hash <- history %>%
  filter(target == "hist") %>%
  pull(hash) %>%
  head(n = 1)
cache <- drake_cache()
cache$get_value(hash)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Independent replication

With even more evidence and confidence, you can invest the time to independently replicate the original code base if necessary. Up until this point, you relied on basic drake functions such as make(), so you may not have needed to peek at any substantive author-defined code in advance. In that case, you can stay usefully ignorant as you reimplement the original author’s methodology. In other words, drake could potentially improve the integrity of independent replication.

Readability and transparency

Ideally, independent observers should be able to read your code and understand it. drake helps in several ways.

  • The drake plan explicitly outlines the steps of the analysis, and vis_drake_graph() visualizes how those steps depend on each other.
  • drake takes care of the parallel scheduling and high-performance computing (HPC) for you. That means the HPC code is no longer tangled up with the code that actually expresses your ideas.
  • You can generate large collections of targets without necessarily changing your code base of imported functions, another nice separation between the concepts and the execution of your workflow

Scale up and out.

Not every project can complete in a single R session on your laptop. Some projects need more speed or computing power. Some require a few local processor cores, and some need large high-performance computing systems. But parallel computing is hard. Your tables and figures depend on your analysis results, and your analyses depend on your datasets, so some tasks must finish before others even begin. drake knows what to do. Parallelism is implicit and automatic. See the high-performance computing guide for all the details.

# Use the spare cores on your local machine.
make(plan, jobs = 4)

# Or scale up to a supercomputer.
drake_hpc_template_file("slurm_clustermq.tmpl") # https://slurm.schedmd.com/
options(
  clustermq.scheduler = "clustermq",
  clustermq.template = "slurm_clustermq.tmpl"
)
make(plan, parallelism = "clustermq", jobs = 4)

With Docker

drake and Docker are compatible and complementary. Here are some examples that run drake inside a Docker image.

Alternatively, it is possible to run drake outside Docker and use the future package to send targets to a Docker image. drake’s Docker-psock example demonstrates how. Download the code with drake_example("Docker-psock").

Installation

You can choose among different versions of drake. The CRAN release often lags behind the online manual but may have fewer bugs.

# Install the latest stable release from CRAN.
install.packages("drake")

# Alternatively, install the development version from GitHub.
install.packages("devtools")
library(devtools)
install_github("ropensci/drake")

Function reference

The reference section lists all the available functions. Here are the most important ones.

  • drake_plan(): create a workflow data frame (like my_plan).
  • make(): build your project.
  • drake_history(): show what you built, when you built it, and the function arguments you used.
  • r_make(): launch a fresh callr::r() process to build your project. Called from an interactive R session, r_make() is more reproducible than make().
  • loadd(): load one or more built targets into your R session.
  • readd(): read and return a built target.
  • vis_drake_graph(): show an interactive visual network representation of your workflow.
  • recoverable(): Which targets can we salvage using make(recover = TRUE) (experimental).
  • outdated(): see which targets will be built in the next make().
  • deps_code(): check the dependencies of a command or function.
  • drake_failed(): list the targets that failed to build in the last make().
  • diagnose(): return the full context of a build, including errors, warnings, and messages.

Documentation

Core concepts

The following resources explain what drake can do and how it works. The learndrake workshop devotes particular attention to drake’s mental model.

In practice

  • Miles McBain’s excellent blog post explains the motivating factors and practical issues {drake} solves for most projects, how to set up a project as quickly and painlessly as possible, and how to overcome common obstacles.
  • Miles’ dflow package generates the file structure for a boilerplate drake project. It is a more thorough alternative to drake::use_drake().
  • drake is heavily function-oriented by design, and Miles’ fnmate package automatically generates boilerplate code and docstrings for functions you mention in drake plans.

Reference

Use cases

The official rOpenSci use cases and associated discussion threads describe applications of drake in the real world. Many of these use cases are linked from the drake tag on the rOpenSci discussion forum.

Here are some additional applications of drake in real-world projects.

drake projects as R packages

Some folks like to structure their drake workflows as R packages. Examples are below. In your own analysis packages, be sure to call drake::expose_imports(yourPackage) so drake can watch you package’s functions for changes and rebuild downstream targets accordingly.

Help and troubleshooting

The following resources document many known issues and challenges.

If you are still having trouble, please submit a new issue with a bug report or feature request, along with a minimal reproducible example where appropriate.

The GitHub issue tracker is mainly intended for bug reports and feature requests. While questions about usage etc. are also highly encouraged, you may alternatively wish to post to Stack Overflow and use the drake-r-package tag.

Contributing

Development is a community effort, and we encourage participation. Please read CONTRIBUTING.md for details.

Similar work

drake enhances reproducibility and high-performance computing, but not in all respects. Literate programming, local library managers, containerization, and strict session managers offer more robust solutions in their respective domains. And for the problems drake does solve, it stands on the shoulders of the giants that came before.

Pipeline tools

GNU Make

The original idea of a time-saving reproducible build system extends back at least as far as GNU Make, which still aids the work of data scientists as well as the original user base of complied language programmers. In fact, the name “drake” stands for “Data Frames in R for Make”. Make is used widely in reproducible research. Below are some examples from Karl Broman’s website.

Whereas GNU Make is language-agnostic, drake is fundamentally designed for R.

  • Instead of a Makefile, drake supports an R-friendly domain-specific language for declaring targets.
  • Targets in GNU Make are files, whereas targets in drake are arbitrary variables in memory. (drake does have opt-in support for files via file_out(), file_in(), and knitr_in().) drake caches these objects in its own storage system so R users rarely have to think about output files.

Remake

remake itself is no longer maintained, but its founding design goals and principles live on through drake. In fact, drake is a direct re-imagining of remake with enhanced scalability, reproducibility, high-performance computing, visualization, and documentation.

Factual’s Drake

Factual’s Drake is similar in concept, but the development effort is completely unrelated to the drake R package.

Other pipeline tools

There are countless other successful pipeline toolkits. The drake package distinguishes itself with its R-focused approach, Tidyverse-friendly interface, and a thorough selection of parallel computing technologies and scheduling algorithms.

Memoization

Memoization is the strategic caching of the return values of functions. It is a lightweight approach to the core problem that drake and other pipeline tools are trying to solve. Every time a memoized function is called with a new set of arguments, the return value is saved for future use. Later, whenever the same function is called with the same arguments, the previous return value is salvaged, and the function call is skipped to save time. The memoise package is the primary implementation of memoization in R.

Memoization saves time for small projects, but it arguably does not go far enough for large reproducible pipelines. In reality, the return value of a function depends not only on the function body and the arguments, but also on any nested functions and global variables, the dependencies of those dependencies, and so on upstream. drake tracks this deeper context, while memoise does not.

Literate programming

Literate programming is the practice of narrating code in plain vernacular. The goal is to communicate the research process clearly, transparently, and reproducibly. Whereas commented code is still mostly code, literate knitr / R Markdown reports can become websites, presentation slides, lecture notes, serious scientific manuscripts, and even books.

knitr and R Markdown

drake and knitr are symbiotic. drake’s job is to manage large computation and orchestrate the demanding tasks of a complex data analysis pipeline. knitr’s job is to communicate those expensive results after drake computes them. knitr / R Markdown reports are small pieces of an overarching drake pipeline. They should focus on communication, and they should do as little computation as possible.

To insert a knitr report in a drake pipeline, use the knitr_in() function inside your drake plan, and use loadd() and readd() to refer to targets in the report itself. See an example here.

Version control

drake is not a version control tool. However, it is fully compatible with git, svn, and similar software. In fact, it is good practice to use git alongside drake for reproducible workflows.

However, data poses a challenge. The datasets created by make() can get large and numerous, and it is not recommended to put the .drake/ cache or the .drake_history/ logs under version control. Instead, it is recommended to use a data storage solution such as DropBox or OSF.

Containerization and R package environments

drake does not track R packages or system dependencies for changes. Instead, it defers to tools like Docker, Singularity, renv, and packrat, which create self-contained portable environments to reproducibly isolate and ship data analysis projects. drake is fully compatible with these tools.

workflowr

The workflowr package is a project manager that focuses on literate programming, sharing over the web, file organization, and version control. Its brand of reproducibility is all about transparency, communication, and discoverability. For an example of workflowr and drake working together, see this machine learning project by Patrick Schratz.

Citation

citation("drake")
#> 
#> To cite drake in publications use:
#> 
#>   William Michael Landau, (2018). The drake R package: a pipeline
#>   toolkit for reproducibility and high-performance computing. Journal
#>   of Open Source Software, 3(21), 550,
#>   https://doi.org/10.21105/joss.00550
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {The drake R package: a pipeline toolkit for reproducibility and high-performance computing},
#>     author = {William Michael Landau},
#>     journal = {Journal of Open Source Software},
#>     year = {2018},
#>     volume = {3},
#>     number = {21},
#>     url = {https://doi.org/10.21105/joss.00550},
#>   }

Acknowledgements

Special thanks to Jarad Niemi, my advisor from graduate school, for first introducing me to the idea of Makefiles for research. He originally set me down the path that led to drake.

Many thanks to Julia Lowndes, Ben Marwick, and Peter Slaughter for reviewing drake for rOpenSci, and to Maëlle Salmon for such active involvement as the editor. Thanks also to the following people for contributing early in development.

Credit for images is attributed here.

UsageReleaseDevelopment
LicenceCRANcheck
minimal R versioncran-checkslint
rOpenSciCodecov
downloadsJOSS
 Zenodosuperseded lifecycle

Download Details:

Author: Ropensci
Source Code: https://github.com/ropensci/drake 
License: GPL-3.0 license

#r #workflow #datascience 

Drake: An R-focused Pipeline toolkit for Reproducibility
Monty  Boehm

Monty Boehm

1658607480

PredictMD.jl: Uniform interface for Machine Learning In Julia

PredictMD - Uniform interface for machine learning in Julia 

PredictMD is a free and open-source Julia package that provides a uniform interface for machine learning.

PredictMD makes it easy to automate machine learning workflows and create reproducible machine learning pipelines.

It is the official machine learning framework of the Brown Center for Biomedical Informatics.

Installation

PredictMD is registered in the Julia General registry. Therefore, to install PredictMD, simply open Julia and run the following four lines:

import Pkg
Pkg.activate("PredictMDEnvironment"; shared = true)
Pkg.add("PredictMDFull")
import PredictMDFull

Run the test suite after installing

After you install PredictMD, you should run the test suite to make sure that everything is working. You can run the test suite by running the following five lines in Julia:

import Pkg
Pkg.activate("PredictMDEnvironment"; shared = true)
Pkg.test("PredictMDExtra")
Pkg.test("PredictMDFull")
Pkg.test("PredictMD")

Citing

If you use PredictMD in research, please cite the software using the following DOI:

Docker image

Alternatively, you can use the PredictMD Docker image for easy installation. Download and start the container by running the following line:

docker run --name predictmd -it dilumaluthge/predictmd /bin/bash

Once you are inside the container, you can start Julia by running the following line:

julia

In Julia, run the following line to load PredictMD:

import PredictMDFull

You can run the test suite by running the following four lines in Julia:

import Pkg
Pkg.test("PredictMDExtra")
Pkg.test("PredictMDFull")
Pkg.test("PredictMD")

After you have exited the container, you can return to it by running the following line:

docker start -ai predictmd

To delete your container, run the following line:

docker container rm -f predictmd

To also delete the downloaded image, run the following line:

docker image rm -f dilumaluthge/predictmd

Documentation

The PredictMD documentation contains useful information, including instructions for use, example code, and a description of PredictMD's internals.

Related Repositories

  • BCBIRegistry - Julia package registry for the Brown Center for Biomedical Informatics (BCBI)
  • ClassImbalance.jl - Sampling-based methods for correcting for class imbalance in two-category classification problems
  • OfflineRegistry - Generate a custom Julia package registry, mirror, and depot for use on workstations without internet access
  • PredictMD-docker - Docker and Singularity images for PredictMD
  • PredictMD-roadmap - Roadmap for the PredictMD machine learning pipeline
  • PredictMD.jl - Uniform interface for machine learning in Julia
  • PredictMDAPI.jl - Provides the abstract types and generic functions that define the PredictMD application programming interface (API)
  • PredictMDExtra.jl - Install all of the dependencies of PredictMD (but not PredictMD itself)
  • PredictMDFull.jl - Install PredictMD and all of its dependencies
  • PredictMDSanitizer.jl - Remove potentially sensitive data from trained machine learning models

Contributing

If you would like to contribute to the PredictMD source code, please read the instructions in CONTRIBUTING.md.

Acknowledgements

  • This work was supported in part by National Institutes of Health grants U54GM115677, R01LM011963, and R25MH116440. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • PredictMD was created by Dilum P. Aluthge and Ishan Sinha.

Author: bcbi
Source Code: https://github.com/bcbi/PredictMD.jl 
License: MIT license

#julia #machinelearning #workflow 

PredictMD.jl: Uniform interface for Machine Learning In Julia
Nigel  Uys

Nigel Uys

1654866660

µTask, The Lightweight Automation Engine

µTask, the Lightweight Automation Engine      

µTask is an automation engine built for the cloud. It is:

  • simple to operate: only a postgres DB is required
  • secure: all data is encrypted, only visible to authorized users
  • extensible: you can develop custom actions in golang

µTask allows you to model business processes in a declarative yaml format. Describe a set of inputs and a graph of actions and their inter-dependencies: µTask will asynchronously handle the execution of each action, working its way around transient errors and keeping an encrypted, auditable trace of all intermediary states until completion.

Real-world examples

Here are a few real-world examples that can be implemented with µTask:

Kubernetes ingress TLS certificate provisioning

A new ingress is created on the production kubernetes cluster. A hook triggers a µTask template that:

  • generates a private key
  • requests a new certificate
  • meets the certificate issuer's challenges
  • commits the resulting certificate back to the cluster

New team member bootstrap

A new member joins the team. The team leader starts a task specifying the new member's name, that:

  • asks the new team member to generate an SSH key pair and copy the public key in a µTask-generated form
  • registers the public SSH key centrally
  • creates accounts on internal services (code repository, CI/CD, internal PaaS, ...) for the new team member
  • triggers another task to spawn a development VM
  • sends a welcome email full of GIFs

Payments API asynchronous processing

The payments API receives a request that requires an asynchronous antifraud check. It spawns a task on its companion µTask instance that:

  • calls a first risk-assessing API which returns a number
  • if the risk is low, the task succeeds immediately
  • otherwise it calls a SaaS antifraud solution API which returns a score
  • if the score is good, the task succeeds
  • if the score is very bad, the task fails
  • if it is in between, it triggers a human investigation step where an operator can enter a score in a µTask-generated form
  • when it is done, the task sends an event to the payments API to notify of the result

The payments API keeps a reference to the running workflow via its task ID. Operators of the payments API can follow the state of current tasks by requesting the µTask instance directly. Depending on the payments API implementation, it may allow its callers to follow a task's state.

Quick start

Running with docker-compose

Download our latest install script, setup your environment and launch your own local instance of µTask.

mkdir utask && cd utask
wget https://github.com/ovh/utask/releases/latest/download/install-utask.sh
sh install-utask.sh
docker-compose up

All the configuration for the application is found in the environment variables in docker-compose.yaml. You'll see that basic auth is setup for user admin with password 1234. Try logging in with this user on the graphical dashboard: http://localhost:8081/ui/dashboard.

You can also explore the API schema: http://localhost:8081/unsecured/spec.json.

Request a new task:

Get an overview of all tasks:

Get a detailed view of a running task:

Browse available task templates:

Running with your own postgres service

Alternatively, you can clone this repository and build the µTask binary:

make all

Operating in production

Operating in production

The folder you created in the previous step is meant to become a git repo where you version your own task templates and plugins. Re-download and run the latest install script to bump your version of µTask.

You'll deploy your version of µTask by building a docker image based on the official µTask image, which will include your extensions. See the Dockerfile generated during installation.

Architecture

µTask is designed to run a task scheduler and perform the task workloads within a single runtime: work is not delegated to external agents. Multiple instances of the application will coordinate around a single postgres database: each will be able to determine independently which tasks are available. When an instance of µTask decides to execute a task, it will take hold of that task to avoid collisions, then release it at the end of an execution cycle.

A task will keep running as long as its steps are successfully executed. If a task's execution is interrupted before completion, it will become available to be re-collected by one of the active instances of µTask. That means that execution might start in one instance and resume on a different one.

Maintenance procedures

Key rotation

  1. Generate a new key with symmecrypt, with the 'storage' label.
  2. Add it to your configuration items. The library will take all keys into account and use the latest possible key, falling back to older keys when finding older data.
  3. Set your API in maintenance mode (env var or command line arg, see config below): all write actions will be refused when you reboot the API.
  4. Reboot API.
  5. Make a POST request on the /key-rotate endpoint of the API.
  6. All data will be encrypted with the latest key, you can delete older keys.
  7. De-activate maintenance mode.
  8. Reboot API.

Dependencies

The only dependency for µTask is a Postgres database server. The minimum version for the Postgres database is 9.5

Configuration 🔨

Command line args

The µTask binary accepts the following arguments as binary args or env var. All are optional and have a default value:

  • init-path: the directory from where initialization plugins (see "Developing plugins") are loaded in *.so form (default: ./init)
  • plugins-path: the directory from where action plugins (see "Developing plugins") are loaded in *.so form (default: ./plugins)
  • templates-path: the directory where yaml-formatted task templates are loaded from (default: ./templates)
  • functions-path: the directory where yaml-formatted functions templates are loaded from (default: ./functions)
  • region: an arbitrary identifier, to aggregate a running group of µTask instances (commonly containers), and differentiate them from another group, in a separate region (default: default)
  • http-port: the port on which the HTTP API listents (default: 8081)
  • debug: a boolean flag to activate verbose logs (default: false)
  • maintenance-mode: a boolean to switch API to maintenance mode (default: false)

Config keys and files

Checkout the µTask config keys and files README.

Authentication

The vanilla version of µTask doesn't handle authentication by itself, it is meant to be placed behind a reverse proxy that provides a username through the "x-remote-user" http header. A username found there will be trusted as is, and used for authorization purposes (admin actions, task resolution, etc...).

For development purposes, an optional basic-auth configstore item can be provided to define a mapping of usernames and passwords. This is not meant for use in production.

Extending this basic authentication mechanism is possible by developing an "init" plugin, as described below.

Notification

Every task state change can be notified to a notification backend. µTask implements three differents notification backends: Slack, TaT, and generic webhooks.

Default payload that will be sent for generic webhooks are:

task_state_update notifications:

{
    "message": "string",
    "notification_type": "task_state_update",
    "task_id": "public_task_uuid",
    "title": "task title string",
    "state": "current task state",
    "template": "template_name",
    "requester": "optional",
    "resolver": "optional",
    "steps": "14/20",
    "potential_resolvers": "user1,user2,admin",
    "resolution_id": "optional,public_resolution_uuid",
    "tags": "{\"tag1\":\"value1\"}"
}

task_step_update notifications:

{
    "message": "string",
    "notification_type": "task_step_update",
    "task_id": "public_task_uuid",
    "title": "task title string",
    "state": "current task state",
    "template": "template_name",
    "step_name": "string",
    "step_state": "string",
    "requester": "string",
    "resolver": "string",
    "steps": "14/20",
    "resolution_id": "public_resolution_uuid",
    "tags": "{\"tag1\":\"value1\"}"
}

task_validation notifications:

{
    "message": "string",
    "notification_type": "task_validation",
    "task_id": "public_task_uuid",
    "title": "task title string",
    "state": "TODO",
    "template": "template_name",
    "requester": "optional",
    "potential_resolvers": "user1,user2,admin",
    "tags": "{\"tag1\":\"value1\"}"
}

Notification backends can be configured in the global µTask configuration, as described here.

Authoring Task Templates

Checkout the µTask examples directory.

A process that can be executed by µTask is modelled as a task template: it is written in yaml format and describes a sequence of steps, their interdepencies, and additional conditions and constraints to control the flow of execution.

The user that creates a task is called requester, and the user that executes it is called resolver. Both can be the same user in some scenarios.

A user can be allowed to resolve a task in three ways:

  • the user is included in the global configuration's list of admin_usernames
  • the user is included in the task's template list of allowed_resolver_usernames
  • the user is included in the task resolver_usernames list

Value Templating

µTask uses the go templating engine in order to introduce dynamic values during a task's execution. As you'll see in the example template below, template handles can be used to access values from different sources. Here's a summary of how you can access values through template handles:

  • .input.[INPUT_NAME]: the value of an input provided by the task's requester
  • .resolver_input.[INPUT_NAME]: the value of an input provided by the task's resolver
  • .step.[STEP_NAME].output.foo: field foo from the output of a named step
  • .step.[STEP_NAME].metadata.HTTPStatus: field HTTPStatus from the metadata of a named step
  • .step.[STEP_NAME].children: the collection of results from a 'foreach' step
  • .step.[STEP_NAME].error: error message from a failed step
  • .step.[STEP_NAME].state: current state of the given step
  • .config.[CONFIG_ITEM].bar: field bar from a config item (configstore, see above)
  • .iterator.foo: field foo from the iterator in a loop (see foreach steps below)
  • .pre_hook.output.foo: field foo from the output of the step's preHook (see preHooks below)
  • .pre_hook.metadata.HTTPStatus: field HTTPStatus from the metadata of the step's preHook (see preHooks below)
  • .function_args.[ARG_NAME]: argument that needs to be given in the conifguration section to the function (see functions below)

The following templating functions are available:

NameDescriptionReference
GolangBuiltin functions from Golang text templateDoc
SprigExtended set of functions from the Sprig projectDoc
fieldEquivalent to the dot notation, for entries with forbidden characters{{field `config` `foo.bar`}}
fieldFromEquivalent to the dot notation, for entries with forbidden characters. It takes the previous template expression as source for the templating values. Example: ```{{ {"foo.foo":"bar"}fromJson
evalEvaluates the value of a template variable{{eval `var1`}}
evalCacheEvaluates the value of a template variable, and cache for future usage (to avoid further computation){{evalCache `var1`}}
fromJsonDecodes a JSON document into a structure. If the input cannot be decoded as JSON, the function will return an empty string{{fromJson `{"a":"b"}`}}
mustFromJsonSimilar to fromJson, but will return an error in case the JSON is invalid. A common usecase consists of returning a JSON stringified data structure from a JavaScript expression (object, array), and use one of its members in the template. Example: {{(eval `myExpression` | fromJson).myArr}} or {{(eval `myExpression` | fromJson).myObj}}{{mustFromJson `{"a":"b"}`}}

Basic properties

  • name: a short unique human-readable identifier
  • description: sentence-long description of intent
  • long_description: paragraph-long basic documentation
  • doc_link: URL for external documentation about the task
  • title_format: templateable text, generates a title for a task based on this template
  • result_format: templateable map, used to generate a final result object from data collected during execution

Advanced properties

  • allowed_resolver_usernames: a list of usernames with the right to resolve a task based on this template
  • allow_all_resolver_usernames: boolean (default: false): when true, any user can execute a task based on this template
  • auto_runnable; boolean (default: false): when true, the task will be executed directly after being created, IF the requester is an accepted resolver or allow_all_resolver_usernames is true
  • blocked: boolean (default: false): no tasks can be created from this template
  • hidden: boolean (default: false): the template is not listed on the API, it is concealed to regular users
  • retry_max: int (default: 100): maximum amount of consecutive executions of a task based on this template, before being blocked for manual review

Inputs

When creating a new task, a requester needs to provide parameters described as a list of objects under the inputs property of a template. Additional parameters can be requested from a task's resolver user: those are represented under the resolver_inputs property of a template.

An input's definition allows to define validation constraints on the values provided for that input. See example template above.

Input properties

  • name: unique name, used to access the value provided by the task's requester
  • description: human readable description of the input, meant to give context to the task's requester
  • regex: (optional) a regular expression that the provided value must match
  • legal_values: (optional) a list of possible values accepted for this input
  • collection: boolean (default: false) a list of values is accepted, instead of a single value
  • type: (string|number|bool) (default: string) the type of data accepted
  • optional: boolean (default: false) the input can be left empty
  • default: (optional) a value assigned to the input if left empty

Variables

A template variable is a named holder of either:

  • a fixed value
  • a JavaScript expression evaluated on the fly.

See the example template above to see variables in action. The expression in a variable can contain template handles to introduce values dynamically (from executed steps, for instance), like a step's configuration.

The JavaScript evaluation is done using otto.

Steps

A step is the smallest unit of work that can be performed within a task. At is's heart, a step defines an action: several types of actions are available, and each type requires a different configuration, provided as part of the step definition. The state of a step will change during a task's resolution process, and determine which steps become eligible for execution. Custom states can be defined for a step, to fine-tune execution flow (see below).

A sequence of ordered steps constitutes the entire workload of a task. Steps are ordered by declaring dependencies between each other. A step declares its dependencies as a list of step names on which it waits, meaning that a step's execution will be on hold until its dependencies have been resolved. More details about dependencies.

The flow of this sequence can further be controlled with conditions on the steps: a condition is a clause that can be run before or after the step's action. A condition can either be used:

  • to skip a step altogether
  • to analyze its outcome and override the engine's default behaviour

Several conditions can be specified, the first one to evaluate as true is applied. A condition is composed of:

  • a type (skip or check)
  • a list of if assertions (value, operator, expected) which all have to be true (AND on the collection),
  • a then object to impact the state of steps (this refers to the current step)
  • an optional message to convey the intention of the condition, making it easier to inspect tasks

Here's an example of a skip condition. The value of an input is evaluated to determine the result: if the value of runType is dry, the createUser step will not be executed, its state will be set directly to DONE.

inputs:
- name: runType
  description: Run this task with/without side effects
  legal_values: [dry, wet]
steps:
  createUser:
    description: Create new user
    action:
      ... etc...
    conditions:
    - type: skip
      if:
      - value: '{{.input.runType}}'
        operator: EQ
        expected: dry
      then:
        this: DONE
      message: Dry run, skip user creation

Here's an example of a check condition. Here the return of an http call is inspected: a 404 status will put the step in a custom NOT_FOUND state. The default behavior would be to consider any 4xx status as a client error, which blocks execution of the task. The check condition allows you to consider this situation as normal, and proceed with other steps that take the NOT_FOUND state into account (creating the missing resource, for instance).

steps:
  getUser:
    description: Get user
    custom_states: [NOT_FOUND]
    action:
      type: http
      configuration:
        url: http://example.org/user/{{.input.id}}
        method: GET
    conditions:
    - type: check
      if:
      - value: '{{.step.getUser.metadata.HTTPStatus}}'
        operator: EQ
        expected: '404'
      then:
        this: NOT_FOUND
      message: User {{.input.id}} not found
  createUser:
    description: Create the user
    dependencies: ["getUser:NOT_FOUND"]
    action:
      type: http
      configuration:
        url: http://example.org/user
        method: POST
        body: |-
          {"user_id":"{{.input.id}}"}

Condition Operators

A condition can use one of the following operators:

  • EQ: equal
  • NE: not equal
  • GT: greater than
  • LT: less than
  • GE: greater or equal
  • LE: less than or equal
  • REGEXP: match a regexp
  • NOTREGEXP: doesn't match a regexp
  • IN: found in a list of values
  • NOTIN: not found in a list of values

Note that the operators IN and NOTIN expect a list of acceptable values in the field value, instead of a single one. You can specify the separator character to use to split the values of the list using the field list_separator (default: ,). Each value of the list will be trimmed of its leading and trailing white spaces before comparison.

Basic Step Properties

  • name: a unique identifier
  • description: a human readable sentence to convey the step's intent
  • action: the actual task the step executes
  • pre_hook: an action that can be executed before the actual action of the step
  • dependencies: a list of step names on which this step waits before running
  • custom_states: a list of personnalised allowed state for this step (can be assigned to the state's step using conditions)
  • retry_pattern: (seconds, minutes, hours) define on what temporal order of magnitude the re-runs of this step should be spread (default = seconds)
  • resources: a list of resources that will be used during the step execution, to control and limit the concurrent execution of the step (more information in the resources section).

Action

The action field of a step defines the actual workload to be performed. It consists of at least a type chosen among the registered action plugins, and a configuration fitting that plugin. See below for a detailed description of builtin plugins. For information on how to develop your own action plugins, refer to this section.

When an action's configuration is repeated across several steps, it can be factored by defining base_configurations at the root of the template. For example:

base_configurations:
  postMessage:
    method: POST
    url: http://message.board/new

This base configuration can then be leveraged by any step wanting to post a message, with different bodies:

steps:
  sayHello:
    description: Say hello on the message board
    action:
      type: http
      base_configuration: postMessage
      configuration:
        body: Hello
  sayGoodbye:
    description: Say goodbye on the message board
    dependencies: [sayHello]
    action:
      type: http
      base_configuration: postMessage
      configuration:
        body: Goodbye

These two step definitions are the equivalent of:

steps:
  sayHello:
    description: Say hello on the message board
    action:
      type: http
      configuration:
        body: Hello
        method: POST
        url: http://message.board/new
  sayGoodbye:
    description: Say goodbye on the message board
    dependencies: [sayHello]
    action:
      type: http
      configuration:
        body: Goodbye
        method: POST
        url: http://message.board/new

The output of an action can be enriched by means of an output. For example, in a template with an input field named id, value 1234 and a call to a service which returns the following payload:

{
  "name": "username"
}

The following action definition:

steps:
  getUser:
    description: Prefix an ID received as input, return both
    action:
      type: http
      output:
        strategy: merge
        format:
          id: "{{.input.id}}"
      configuration:
        method: GET
        url: http://directory/user/{{.input.id}}

Will render the following output, a combination of the action's raw output and the output:

{
  "id": "1234",
  "name": "username"
}

All the strategies available are:

  • merge: data in format must be a dict and will be merged with the output of the action (e.g. ahead)
  • template: the action will return exactly the data in format that can be templated (see Value Templating)

Builtin actions

Browse builtin actions

Plugin nameDescriptionDocumentation
echoPrint out a pre-determined resultAccess plugin doc
httpMake an http requestAccess plugin doc
subtaskSpawn a new task on µTaskAccess plugin doc
notifyDispatch a notification over a registered channelAccess plugin doc
apiovhMake a signed call on OVH's public API (requires credentials retrieved from configstore, containing the fields endpoint, appKey, appSecret, consumerKey, more info here)Access plugin doc
sshConnect to a remote system and run commands on itAccess plugin doc
emailSend an emailAccess plugin doc
pingSend a ping to an hostname Warn: This plugin will keep running until the count is doneAccess plugin doc
scriptExecute a script under scripts folderAccess plugin doc

PreHooks

The pre_hook field of a step can be set to define an action that is executed before the step's action. This fields supports all the sames fields as the action. It aims to fetch data for the execution of the action that can change over time and needs to be fetched at every retry, such as OTPs. All the result values of the preHook are available under the templating variable .pre_hook

doSomeAuthPost:
  pre_hook:
    type: http
    configuration:
      method: "GET"
      url: "https://example.org/otp"
  action:
    type: http
    configuration:
      method: "POST"
      url: "https://example.org/doSomePost"
      headers:
        X-Otp: "{{ .pre_hook.output }}"

Functions

Functions are abstraction of the actions to define a behavior that can be re-used in templates. They act like a plugin but are fully declared in dedicated directory functions. They can have arguments that need to be given in the configuration section of the action and can be used in the declaration of the function by accessing the templating variables under .function_args.

name: ovh::request
description: Execute a call to the ovh API
pre_hook:
  type: http
  configuration:
    method: "GET"
    url: https://api.ovh.com/1.0/auth/time
action:
  type: http
  configuration:
    headers:
    - name: X-Ovh-Signature
      value: '{{ printf "%s+%s+%s+%s%s+%s+%v" .config.apiovh.applicationSecret .config.apiovh.consumerKey .function_args.method .config.apiovh.basePath .function_args.path .function_args.body .pre_hook.output | sha1sum | printf "$1$%s"}}'
    - name: X-Ovh-Timestamp
      value: "{{ .pre_hook.output }}"
    - name: X-Ovh-Consumer
      value: "{{ .config.apiovh.consumerKey }}"
    - name: X-Ovh-Application
      value: "{{ .config.apiovh.applicationKey }}"
    method: "{{ .function_args.method }}"
    url: "{{.config.apiovh.basePath}}{{ .function_args.path }}"
    body: "{{ .function_args.body }}"

This function can be used in a template like this:

steps:
  getService:
    description: Get Service
    action:
      type: ovh::request
      configuration:
        path: "{{.input.path}}"
        method: GET
        body: ""

Dependencies

Dependencies can be declared on a step, to indicate what requirements should be met before the step can actually run. A step can have multiple dependencies, which will all have to be met before the step can start running.

A dependency can be qualified with a step's state (stepX:stateY, it depends on stepX, finishing in stateY). If omitted, then DONE is assumed.

There are two different kinds of states: builtin and custom. Builtin states are provided by uTask and include: TODO, RUNNING, DONE, CLIENT_ERROR, SERVER_ERROR, FATAL_ERROR, CRASHED, PRUNE, TO_RETRY, AFTERRUN_ERROR. Additionally, a step can define custom states via its custom_states field. These custom states provide a way for the step to express that it ran successfully, but the result may be different from the normal expected case (e.g. a custom state NOT_FOUND would let the rest of the workflow proceed, but may trigger additional provisioning steps).

A dependency (stepX:stateY) can be on any of stepX's custom states, along with DONE (builtin). These are all considered final (uTask will not touch that step anymore, it has been run to completion). Conversely, other builtin states (CLIENT_ERROR, ...) may not be used in a dependency, since those imply a transient state and the uTask engine still has work to do on these.

If you wish to declare a dependency on something normally considered as a CLIENT_ERROR (e.g. GET HTTP returns a 404), you can write a check condition to inspect your step result, and change it to a custom state instead (meaning an alternative termination, see the NOT_FOUND example)

It is possible that a dependency will never match the expected state. For example, step1 is in DONE state, and step2 has a dependency declared as step1:NOT_FOUND: it means that step2 requires that step1 finishes its execution with state NOT_FOUND. In that case, step2 will never be allowed to run, as step1 finished with state DONE. To remedy this, uTask will remove step2 from the workflow by setting its state to the special state PRUNE. Any further step depending on step2 will also be pruned, removing entire alternative execution branches. This allows crossroads patterns, where a step may be followed by two mutually exclusive branches (one for DONE, one for ALTERNATE_STATE_XXX). (Note: PRUNE may also be used in conditions to manually eliminate entire branches of execution)

A special qualifier that can be used as a dependency state is ANY (stepX:ANY). ANY matches all custom states and DONE, and it also does not get PRUNE'd recursively if stepX is set to PRUNE. This is used mostly for sequencing, either when the actual result of the step does not matter, but its timing does; or to reconcile mutually exclusive branches in a diamond pattern (using e.g. the coalesce templating function to mix optional step results).

For example, step2 can declare a dependency on step1 in the following ways:

  • step1: wait for step1 to be in state DONE (could also be written as step1:DONE)
  • step1:DONE,ALREADY_EXISTS: wait for step1 to be either in state DONE or ALREADY_EXISTS
  • step1:ANY: wait for step1 to be in any "final" state, ie. it cannot keep running

Loops

A step can be configured to take a json-formatted collection as input, in its foreach property. It will be executed once for each element in the collection, and its result will be a collection of each iteration. This scheme makes it possible to chain several steps with the foreach property.

For the following step definition (note json-format of foreach):

steps:
  prefixStrings:
    description: Process a collection of strings, adding a prefix
    foreach: '[{"id":"a"},{"id":"b"},{"id":"c"}]'
    action:
      type: echo
      configuration:
        output:
          prefixed: pre-{{.iterator.id}}

The following output can be expected to be accessible at {{.step.prefixStrings.children}}

[
    {
        "output": {
            "prefixed": "pre-a"
        },
        "metadata": {},
        "state": "DONE"
    },
    {
        "output": {
            "prefixed": "pre-b"
        },
        "metadata": {},
        "state": "DONE"
    },
    {
        "output": {
            "prefixed": "pre-c"
        },
        "metadata": {},
        "state": "DONE"
    }
]

It contains all the output, metadata and state of the different iterations, coming from the foreach loop.

This output can be then passed to another step in json format:

foreach: '{{.step.prefixStrings.children | toJson}}'

It's possible to configure the strategy used to run each elements: default strategy is parallel: each elements will be run in parallel to maximize throughput ; sequence will run each element when the previous one is done, to ensure the sequence between elements. It can be declared in the template as is:

foreach_strategy: "sequence"

Resources

Resources are a way to restrict the concurrency factor of certain operations, to control the throughput and avoid dangerous behavior e.g. flooding the targets.

High level view:

  • For each action to execute, a list of target resources is determined. (see later)
  • In the µTask configuration, numerical limits can be set to each resource label. This acts as a semaphore, allowing a certain number of concurrent slots for the given resource label. If no limit is set for a resource label, the previously mentionned target resources have no effect. Limits are declared in the resource_limits property.

The target resources for a step can be defined in its YAML definition, using the resources property.

steps:
  foobar:
    description: A dummy step, that should not execute in parallel
    resources: ["myLimitedResource"]
    action:
      type: echo
      configuration:
        output:
          foobar: fuzz

Alternatively, some target resources are determined automatically by µTask Engine:

  • When a task is run, the resource template:my-template-name is used automatically.
  • When a step is run, the plugin in charge of the execution automatically generates a list of resources. This includes generic resources such as socket, url:www.example.org, fork... allowing the µTask administrator to set-up generic limits such as "socket": 48 or "url:www.example.org": 1.

Each builtin plugins declares resources which can be discovered using the README of the plugin (example for http plugin).

Declared resource_limits must be positive integers. When a step is executed, if the number of concurrent executions is reached, the µTask Engine will wait for a slot to be released. If the resource is limited to the 0 value, then the step will not be executed and is set to TO_RETRY state, it will be run once the instance allows the execution of its resources. The default time that µTask Engine will wait for a resource to become available is 1 minute, but it can be configured using the resource_acquire_timeout property.

Task templates validation

A JSON-schema file is available to validate the syntax of task templates and functions, it's available in files hack/template-schema.json and hack/function-schema.json.

Validation can be performed at writing time if you are using a modern IDE or editor.

Validation with Visual Studio Code

  • Install YAML extension from RedHat.
    • Ctrl+P, then type ext install redhat.vscode-yaml
  • Edit your workspace configuration (settings.json file) to add:
{
  "yaml.schemas": {
      ".vscode/template-schema.json": [
          "/templates*/*.yaml"
      ],
      ".vscode/function-schema.json": [
          "/functions/*.yaml"
      ]
  }
}
  • Every template will be validated real-time while editing.

Task template snippets with Visual Studio Code

Code snippets are available in this repository to be used for task template editing: hack/templates.code-snippets

To use them inside your repository, copy the templates.code-snippets file into your .vscode workspace folder.

Available snippets:

  • template
  • variable
  • input
  • step
vscode_code_snippets_templates.gif

Extending µTask with plugins

µTask is extensible with golang plugins compiled in *.so format. Two kinds of plugins exist:

  • action plugins, that you can re-use in your task templates to implement steps
  • init plugins, a way to customize the authentication mechanism of the API, and to draw data from different providers of the configstore library

The installation script for utask creates a folder structure that will automatically package and build your code in a docker image, with your plugins ready to be loaded by the main binary at boot time. Create a separate folder for each of your plugins, within either the plugins or the init folders.

Action Plugins

Action plugins allow you to extend the kind of work that can be performed during a task. An action plugin has a name, that will be referred to as the action type in a template. It declares a configuration structure, a validation function for the data received from the template as configuration, and an execution function which performs an action based on valid configuration.

Create a new folder within the plugins folder of your utask repo. There, develop a main package that exposes a Plugin variable that implements the TaskPlugin defined in the plugins package:

type TaskPlugin interface {
    ValidConfig(baseConfig json.RawMessage, config json.RawMessage) error
    Exec(stepName string, baseConfig json.RawMessage, config json.RawMessage, ctx interface{}) (interface{}, interface{}, error)
    Context(stepName string) interface{}
    PluginName() string
    PluginVersion() string
    MetadataSchema() json.RawMessage
}

The taskplugin package provides helper functions to build a Plugin:

package main

import (
    "github.com/ovh/utask/pkg/plugins/taskplugin"
)

var (
    Plugin = taskplugin.New("my-plugin", "v0.1", exec,
        taskplugin.WithConfig(validConfig, Config{}))
)

type Config struct { ... }

func validConfig(config interface{}) (err error) {
  cfg := config.(*Config)
  ...
  return
}

func exec(stepName string, config interface{}, ctx interface{}) (output interface{}, metadata interface{}, err error) {
  cfg := config.(*Config)
  ...
  return
}

Exec function returns 3 values:

  • output: an object representing the output of the plugin, that will be usable as {{.step.xxx.output}} in the templating engine.
  • metadata: an object representing the metadata of the plugin, that will be usable as {{.step.xxx.metadata}} in the templating engine.
  • err: an error if the execution of the plugin failed. uTask is based on github.com/juju/errors package to determine if the returned error is a CLIENT_ERROR or a SERVER_ERROR.

Warning: output and metadata should not be named structures but plain map. Otherwise, you might encounter some inconsistencies in templating as keys could be different before and after marshalling in the database.

Init Plugins

Init plugins allow you to customize your instance of µtask by giving you access to its underlying configuration store and its API server.

Create a new folder within the init folder of your utask repo. There, develop a main package that exposes a Plugin variable that implements the InitializerPlugin defined in the plugins package:

type Service struct {
    Store  *configstore.Store
    Server *api.Server
}

type InitializerPlugin interface {
    Init(service *Service) error // access configstore and server to customize µTask
    Description() string         // describe what the initialization plugin does
}

As of version v1.0.0, this is meant to give you access to two features:

  • service.Store exposes the RegisterProvider(name string, f configstore.Provider) method that allow you to plug different data sources for you configuration, which are not available by default in the main runtime
  • service.Server exposes the WithAuth(authProvider func(*http.Request) (string, error)) method, where you can provide a custom source of authentication and authorization based on the incoming http requests

If you develop more than one initialization plugin, they will all be loaded in alphabetical order. You might want to provide a default initialization, plus more specific behaviour under certain scenarios.

Contributing

Backend

In order to iterate on feature development, run the utask server plus a backing postgres DB by invoking make run-test-stack-docker in a terminal. Use SIGINT (Ctrl+C) to reboot the server, and SIGQUIT (Ctrl+4) to teardown the server and its DB.

In a separate terminal, rebuild (make re) each time you want to iterate on a code patch, then reboot the server in the terminal where it is running.

To visualize API routes, a swagger-ui interface is available with the docker image, accessible through your browser at http://hostname.example/ui/swagger/.

Frontend

µTask serves two graphical interfaces: one for general use of the tool (dashboard), the other one for task template authoring (editor). They're found in the ui folder and each have their own Makefile for development purposes.

Run make dev to launch a live-reloading on your machine. The editor is a standalone GUI, while the dashboard needs a backing µTask api (see above to run a server).

Run the tests

Run all test suites against an ephemeral postgres DB:

$ make test-docker

Get in touch

You've developed a new cool feature ? Fixed an annoying bug ? We'll be happy to hear from you! Take a look at CONTRIBUTING.md

Related links

Author: ovh
Source Code: https://github.com/ovh/utask 
License: BSD-3-Clause license

#go #golang #workflow #automation 

µTask, The Lightweight Automation Engine
Diego  Elizondo

Diego Elizondo

1654277100

4 Extensiones De Google Chrome Que Mejoran Mi Flujo De Trabajo De Code

Si es como yo, es probable que tenga una colección de extensiones de codificación de Google Chrome favoritas que lo ayuden a hacer mejor su trabajo. Con el tiempo, descubrí que estas 4 extensiones de Chrome son especialmente útiles para mi flujo de trabajo de codificación. ¡Lo ayudarán a ser más eficiente, ahorrar tiempo y simplemente brindarle una mejor experiencia de codificación en general!

Puedo animarte a que las revises y te asegures de comentar cuáles son tus extensiones favoritas.

1. YouCode ( You.com )

YouCode es un nuevo tipo de motor de búsqueda creado para desarrolladores. Le permite copiar fragmentos de codificación de StackOverflow, W3 Schools o Geeks for Geeks directamente desde la página de resultados de búsqueda y ver el contenido en los paneles laterales, lo que ha sido un cambio absoluto para mí. YouCode es compatible con más de 20 sitios web de codificación populares (incluidos Github y Towards Data Science) y también tiene un validador JSON y un selector de colores integrados. Esta extensión es un ahorro de tiempo absoluto para mí.

Enlace a la extensión de Chrome:

https://chrome.google.com/webstore/detail/youcom/afiglppdonkdbkkaghbnpklddbemkbpj

2. Desarrollador web

Esta extensión agrega una barra de herramientas completa con varias herramientas de desarrollo web a su navegador y es imprescindible para cualquiera que se dedique al desarrollo web. Mostrar dimensiones div, ver JavaScript, deshabilitar estilos CSS, mostrar atributos alt o ver información de etiquetas meta son solo una pequeña fracción de lo que esta herramienta puede hacer.

Enlace a la extensión:

https://chrome.google.com/webstore/detail/web-developer/bfbameneiokkgbdmiekhjnmfkcnldhhm

3. Cambiar el tamaño de la ventana para desarrolladores

Para asegurarse de que su trabajo se vea bien en todos los tamaños de pantalla, esta extensión le permite cambiar rápidamente entre diferentes resoluciones de pantalla y ver si sus puntos de interrupción de medios funcionan como se espera.

Enlace a la extensión:

https://chrome.google.com/webstore/detail/window-resizer/kkelicaakdanhinjdeammmilcgefonfh/related

4. Wappalizador

Wappalyzer facilita descubrir con qué CRM, marcos, plataformas de comercio electrónico, bibliotecas Javascript, software de servidor, herramientas de análisis, procesadores de pago, herramientas de marketing y más sitios web están construidos. Comprenda mejor cómo funcionan los sitios web y qué tecnologías también podrían ser útiles para su propio trabajo.

Enlace a la extensión:

https://chrome.google.com/webstore/detail/wappalyzer-technology-pro/gppongmhjkpfnbhagpmjfkannfbllamg/related?hl=en

Puedo recomendarle encarecidamente que consulte estas extensiones y espero que pueda encontrar valor usándolas. ¡Me encantaría escuchar tu opinión y asegurarme de compartir tus extensiones favoritas en los comentarios también! 

Esta historia se publicó originalmente en https://hackernoon.com/4-google-chrome-extensions-that-improve-my-coding-workflow

#google-chrome #extensions #coding #workflow 

4 Extensiones De Google Chrome Que Mejoran Mi Flujo De Trabajo De Code

コーディングワークフローを改善する4つのGoogleChrome拡張機能

あなたが私のようなら、あなたはおそらくあなたがあなたの仕事をより良くするのを助けるお気に入りのグーグルクロームコーディング拡張のコレクションを持っているでしょう。時間が経つにつれて、これらの4つのChrome拡張機能がコーディングワークフローに特に役立つことがわかりました。彼らはあなたがより効率的になり、時間を節約し、そしてあなたに全体的に良いコーディング体験を与えるのを助けます!

それらをチェックして、お気に入りの拡張機能についてコメントすることをお勧めします。

1. YouCode(You.com

YouCodeは、開発者向けに構築された新しいタイプの検索エンジンです。StackOverflow、W3 Schools、またはGeeks for Geeksのコーディングスニペットを検索結果ページから直接コピーして、サイドパネルにコンテンツを表示できます。これは、私にとって絶対的なゲームチェンジャーです。YouCodeは、20を超える人気のあるコーディングWebサイト(GithubやTowards Data Scienceを含む)をサポートし、JSONバリデーターとカラーピッカーが組み込まれています。この拡張機能は、私にとって絶対的な時間の節約になります。

Chrome拡張機能へのリンク:

https://chrome.google.com/webstore/detail/youcom/afiglppdonkdbkkaghbnpklddbemkbpj

2.Web開発者

この拡張機能は、さまざまなWeb開発ツールを備えたツールバー全体をブラウザーに追加します。これは、Web開発を行うすべての人にとって必須です。divディメンションの表示、JavaScriptの表示、CSSスタイルの無効化、alt属性の表示、またはメタタグ情報の表示は、このツールで実行できることのほんの一部にすぎません。

拡張機能へのリンク:

https://chrome.google.com/webstore/detail/web-developer/bfbameneiokkgbdmiekhjnmfkcnldhhm

3.開発者向けウィンドウリサイザー

すべての画面サイズで作業が適切に表示されるようにするために、この拡張機能を使用すると、さまざまな画面解像度をすばやく切り替えて、メディアブレークポイントが期待どおりに機能しているかどうかを確認できます。

拡張機能へのリンク:

https://chrome.google.com/webstore/detail/window-resizer/kkelicaakdanhinjdeammmilcgefonfh/related

4.ワッパライザー

Wappalyzerを使用すると、CRM、フレームワーク、eコマースプラットフォーム、Javascriptライブラリ、サーバーソフトウェア、分析ツール、支払い処理業者、マーケティングツール、その他のWebサイトがどのように構築されているかを簡単に見つけることができます。Webサイトがどのように機能するか、およびどのテクノロジーが自分の作業にも役立つ可能性があるかをよりよく理解します。

拡張機能へのリンク:

https://chrome.google.com/webstore/detail/wappalyzer-technology-pro/gppongmhjkpfnbhagpmjfkannfbllamg/related?hl=en

これらの拡張機能をチェックして、それらを使用して価値を見つけることができるようになることを強くお勧めします。私はあなたの意見を聞いて、コメントでもあなたのお気に入りの拡張機能を共有することを忘れないでください! 

このストーリーは、もともとhttps://hackernoon.com/4-google-chrome-extensions-that-improve-my-coding-workflowで公開されました

#google-chrome #extensions #coding #workflow 

コーディングワークフローを改善する4つのGoogleChrome拡張機能