Level-schedule: Durable Job Scheduler Based on LevelDB

level-schedule

Durable job scheduler based on LevelDB. If the process is restarted it will transparently resume where it stopped.

Usage

Print some JSON after 5s, even if the process restarts.

var levelup = require('levelup');
var Schedule = require('level-schedule');

var db = levelup('./db');

Schedule(db)
  .job('print', function (payload) {
    console.log(payload);
  })
  .run('print', { some : 'json' }, Date.now() + 5000);

// => { some : 'json' }

Periodic tasks

Run a task every 100ms. Always call this.run at the end of the task, otherwise if the process crashes while running your task it will be scheduled once too many.

Schedule(db)
  .job('periodic', function () {
    console.log('doing some stuff...');
    this.run('periodic', Date.now() + 100)
  })
  .run('periodic', Date.now() + 100);

Jobs

A job can be synchronous or asynchronous, just use the done argument when defining an asynchronous job.

Schedule(db)
  .job('sync', function (payload) {
    if (somethingBadHappend) {
      throw new Error();
    }
  })
  .job('async', function (payload, done) {
    // notice the 2nd argument
    doSomething(function (err) {
      done(err);
    });
  })
  .on('error', console.error)
  .run('sync')
  .run('async');

API

Schedule(db)

Setup level-schedule to use db.

Schedule#job(name, fn)

Register a job with name and fn.

fn is called with 2 arguments:

  • payload : The payload
  • done : If the job performs async operations, call done() when done. If an error occured you can pass that as an error argument to done.

Schedule#run(job[, payload], timestamp)

Run job with payload at timestamp.

Schedule#on('error', fn)

Call fn whenever an Error occurs. When no error listener has been registered, errors will be thrown.

Installation

With npm do

$ npm install level-schedule

Author: juliangruber
Source Code: https://github.com/juliangruber/level-schedule 
License: MIT

#javascript #scheduling #leveldb #node 

Level-schedule: Durable Job Scheduler Based on LevelDB

Awesome Python: Libraries for Scheduling Jobs

Job Scheduler

Libraries for scheduling jobs.

  • Airflow - Airflow is a platform to programmatically author, schedule and monitor workflows.
  • APScheduler - A light but powerful in-process task scheduler that lets you schedule functions.
  • django-schedule - A calendaring app for Django.
  • doit - A task runner and build tool.
  • gunnery - Multipurpose task execution tool for distributed systems with web-based interface.
  • Joblib - A set of tools to provide lightweight pipelining in Python.
  • Plan - Writing crontab file in Python like a charm.
  • Prefect - A modern workflow orchestration framework that makes it easy to build, schedule and monitor robust data pipelines.
  • schedule - Python job scheduling for humans.
  • Spiff - A powerful workflow engine implemented in pure Python.
  • TaskFlow - A Python library that helps to make task execution easy, consistent and reliable.

Author: vinta
Source Code: https://github.com/vinta/awesome-python
License: View license

#python #scheduling 

Awesome Python: Libraries for Scheduling Jobs

Schedule: Python Job Scheduling for Humans

schedule

Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax.

  • A simple to use API for scheduling jobs, made for humans.
  • In-process scheduler for periodic jobs. No extra processes needed!
  • Very lightweight and no external dependencies.
  • Excellent test coverage.
  • Tested on Python and 3.6, 3.7, 3.8, 3.9

Usage

$ pip install schedule
import schedule
import time

def job():
    print("I'm working...")

schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

Documentation

Schedule's documentation lives at schedule.readthedocs.io.

Meta

Daniel Bader - @dbader_org - mail@dbader.org

Inspired by Adam Wiggins' article "Rethinking Cron" and the clockwork Ruby module.

Distributed under the MIT license. See LICENSE.txt for more information.

https://github.com/dbader/schedule

Author: dbader
Source Code: https://github.com/dbader/schedule
License: MIT License

#python #scheduling 

Schedule: Python Job Scheduling for Humans
Nigel  Uys

Nigel Uys

1649417040

You had one job, or more then one, which can be done in steps

Leprechaun

Leprechaun is tool where you can schedule your recurring tasks to be performed over and over.

In Leprechaun tasks are recipes, lets observe simple recipe file which is written using YAML syntax.

File is located in recipes directory which can be specified in configs.ini configurational file. For all possible settings take a look here

By definition there are 3 types of recipes, the ones that can be scheduled, the others that can be hooked and last ones that use cron pattern for scheduling jobs, they are similiar regarding steps but have some difference in definition

First we will talk about scheduled recipes and they are defined like this:

name: job1 // name of recipe
definition: schedule // definition of which type is recipe
schedule:
    min: 0 // every min
    hour: 0 // every hour
    day: 0 // every day
steps: // steps are done from first to last
    - touch ./test.txt
    - echo "Is this working?" > ./test.txt
    - mv ./test.txt ./imwondering.txt

If we set something like this

schedule:
    min: 10 // every min
    hour: 2 // every hour
    day: 2 // every day

Task will run every 2 days 2 hours and 10 mins, if we put just days to 0 then it will run every 2 hours and 10 mins

name: job2 // name of recipe
definition: hook // definition of which type is recipe
id: 45DE2239F // id which we use to find recipe
steps:
    - echo "Hooked!" > ./hook.txt

Hooked recipe can be run by sending request to {host}:{port}/hook?id={id_of_recipe} on which Leprechaun server is listening, for example localhost:11400/hook?id=45DE2239F.

Recipes that use cron pattern to schedule tasks are used like this:

name: job3 // name of recipe
definition: cron // definition of which type is recipe
pattern: * * * * *
steps: // steps are done from first to last
    - touch ./test.txt
    - echo "Is this working?" > ./test.txt
    - mv ./test.txt ./imwondering.txt

Steps also support variables which syntax is $variable, and those are environment variables ex: $LOGNAME and in our steps it will be available as $LOGNAME. We can now rewrite our job file and it will look like something like this:

name: job1 // name of recipe
definition: schedule
schedule:
    min: 0 // every min
    hour: 0 // every hour
    day: 0 // every day
steps: // steps are done from first to last
    - echo "Is this working?" > $LOGNAME

Usage is very straightforward, you just need to start client and it will run recipes you defined previously.

Steps also can be defined as sync/async tasks which is defined by ->, but keep in mind that steps in recipes are performed by linear path because one that is not async can block other from performing, lets take this one as example

steps:
- -> ping google.com
- echo "I will not wait above task to perform, he is async so i will start immidiatelly"

but in this case for example first task will block performing on any task and all others will hang waiting it to finish

steps:
- ping google.com
- -> echo "I need to wait above step to finish, then i can do my stuff"

Step Pipe

Output from one step can be passed to input of next step:

name: job1 // name of recipe
definition: schedule
schedule:
    min: 0 // every min
    hour: 0 // every hour
    day: 0 // every day
steps: // steps are done from first to last
    - echo "Pipe this to next step" }>
    - cat > piped.txt

As you see, first step is using syntax }> at the end, which tells that this command output will be passed to next command input, you can chain like this how much you want.

Step Failure

Since steps are executed linear workers doesn't care if some of the commands fail, they continue with execution, but you get notifications if you did setup those configurations. If you want that workers stop execution of next steps if some command failes you can specifify it with ! like in example:

name: job1 // name of recipe
definition: schedule
schedule:
    min: 0 // every min
    hour: 0 // every hour
    day: 0 // every day
steps: // steps are done from first to last
    - ! echo "Pipe this to next step" }>
    - cat > piped.txt
    

If first step fails, recipe will fail and all other steps wont be executed

Remote step execution

Steps can be handled by your local machine using regular syntax, if there is any need that you want specific step to be executed by some remote machine you can spoecify that in step provided in example under, syntax is rmt:some_host, leprechaun will try to communicate with remote service that is configured on provided host and will run this command at that host.

name: job1 // name of recipe
definition: schedule
schedule:
    min: 0 // every min
    hour: 0 // every hour
    day: 0 // every day
steps: // steps are done from first to last
    - rmt:some_host echo "Pipe this to next step"

Note that also as regular step this step also can pipe output to next step, so something like this is possible also:

steps: // steps are done from first to last
    - rmt:some_host echo "Pipe this to next step" }>
    - rmt:some_other_host grep -a "Pipe" }>
    - cat > stored.txt

Installation

Go to leprechaun directory and run make install, you will need sudo privileges for this. This will install scheduler, cron, and webhook services.

To install remote service run make install-remote-service, this will create leprechaunrmt binary.

Build

Go to leprechaun directory and run make build. This will build scheduler, cron, and webhook services.

To build remote service run make build-remote-service, this will create leprechaunrmt binary.

Starting/Stopping services

To start leprechaun just simply run it in background like this : leprechaun &

For more available commands run leprechaun --help

Lepretools

For cli tools take a look here

Testing

To run tests with covarage make test, to run tests and generate reports run make test-with-report files will be generated in coverprofile dir. To test specific package run make test-package package=[name]

Author: Kilgaloon
Source Code: https://github.com/kilgaloon/leprechaun 
License: MIT License

#go #golang #cron #scheduling 

You had one job, or more then one, which can be done in steps
Nigel  Uys

Nigel Uys

1649387220

Gron: Gron, Cron Jobs in Go

gron

Gron provides a clear syntax for writing and deploying cron jobs.

Goals

  • Minimalist APIs for scheduling jobs.
  • Thread safety.
  • Customizable Job Type.
  • Customizable Schedule.

Installation

$ go get github.com/roylee0704/gron

Usage

Create schedule.go

package main

import (
    "fmt"
    "time"
    "github.com/roylee0704/gron"
)

func main() {
    c := gron.New()
    c.AddFunc(gron.Every(1*time.Hour), func() {
        fmt.Println("runs every hour.")
    })
    c.Start()
}

Schedule Parameters

All scheduling is done in the machine's local time zone (as provided by the Go time package).

Setup basic periodic schedule with gron.Every().

gron.Every(1*time.Second)
gron.Every(1*time.Minute)
gron.Every(1*time.Hour)

Also support Day, Week by importing gron/xtime:

import "github.com/roylee0704/gron/xtime"

gron.Every(1 * xtime.Day)
gron.Every(1 * xtime.Week)

Schedule to run at specific time with .At(hh:mm)

gron.Every(30 * xtime.Day).At("00:00")
gron.Every(1 * xtime.Week).At("23:59")

Custom Job Type

You may define custom job types by implementing gron.Job interface: Run().

For example:

type Reminder struct {
    Msg string
}

func (r Reminder) Run() {
  fmt.Println(r.Msg)
}

After job has defined, instantiate it and schedule to run in Gron.

c := gron.New()
r := Reminder{ "Feed the baby!" }
c.Add(gron.Every(8*time.Hour), r)
c.Start()

Custom Job Func

You may register Funcs to be executed on a given schedule. Gron will run them in their own goroutines, asynchronously.

c := gron.New()
c.AddFunc(gron.Every(1*time.Second), func() {
    fmt.Println("runs every second")
})
c.Start()

Custom Schedule

Schedule is the interface that wraps the basic Next method: Next(p time.Duration) time.Time

In gron, the interface value Schedule has the following concrete types:

  • periodicSchedule. adds time instant t to underlying period p.
  • atSchedule. reoccurs every period p, at time components(hh:mm).

For more info, checkout schedule.go.

Full Example

package main

import (
    "fmt"
    "github.com/roylee0704/gron"
    "github.com/roylee0704/gron/xtime"
)

type PrintJob struct{ Msg string }

func (p PrintJob) Run() {
    fmt.Println(p.Msg)
}

func main() {

    var (
        // schedules
        daily     = gron.Every(1 * xtime.Day)
        weekly    = gron.Every(1 * xtime.Week)
        monthly   = gron.Every(30 * xtime.Day)
        yearly    = gron.Every(365 * xtime.Day)

        // contrived jobs
        purgeTask = func() { fmt.Println("purge aged records") }
        printFoo  = printJob{"Foo"}
        printBar  = printJob{"Bar"}
    )

    c := gron.New()

    c.Add(daily.At("12:30"), printFoo)
    c.AddFunc(weekly, func() { fmt.Println("Every week") })
    c.Start()

    // Jobs may also be added to a running Gron
    c.Add(monthly, printBar)
    c.AddFunc(yearly, purgeTask)

    // Stop Gron (running jobs are not halted).
    c.Stop()
}

Author: Roylee0704
Source Code: https://github.com/roylee0704/gron 
License: MIT License

#go #golang #scheduling #cron

Gron: Gron, Cron Jobs in Go
Nigel  Uys

Nigel Uys

1649379780

Goflow: Web UI-based Workflow Orchestrator for Rapid Prototyping

Goflow

A workflow/DAG orchestrator written in Go for rapid prototyping of ETL/ML/AI pipelines. Goflow comes complete with a web UI for inspecting and triggering jobs.

Quick start

With Docker

docker run -p 8181:8181 ghcr.io/fieldryand/goflow-example:master

Browse to localhost:8181 to explore the UI.

goflow-demo

Without Docker

In a fresh project directory:

go mod init # create a new module
go get github.com/fieldryand/goflow # install dependencies

Create a file main.go with contents:

package main

import "github.com/fieldryand/goflow"

func main() {
        options := goflow.Options{
                AssetBasePath: "assets/",
                StreamJobRuns: true,
                ShowExamples:  true,
        }
        gf := goflow.New(options)
        gf.Use(goflow.DefaultLogger())
        gf.Run(":8181")
}

Download the front-end from the release page, untar it, and move it to the location specified in goflow.Options.AssetBasePath. Now run the application with go run main.go and see it in the browser at localhost:8181.

Use case

Goflow was built as a simple replacement for Apache Airflow to manage some small data pipeline projects. Airflow started to feel too heavyweight for these projects where all the computation was offloaded to independent services, but there was still a need for basic orchestration, concurrency, retries, visibility etc.

Goflow prioritizes ease of deployment over features and scalability. If you need distributed workers, backfilling over time slices, a durable database of job runs, etc, then Goflow is not for you. On the other hand, if you want to rapidly prototype some pipelines, then Goflow might be a good fit.

Concepts and features

  • Job: A Goflow workflow is called a Job. Jobs can be scheduled using cron syntax.
  • Task: Each job consists of one or more tasks organized into a dependency graph. A task can be run under certain conditions; by default, a task runs when all of its dependencies finish successfully.
  • Concurrency: Jobs and tasks execute concurrently.
  • Operator: An Operator defines the work done by a Task. Goflow comes with a handful of basic operators, and implementing your own Operator is straightforward.
  • Retries: You can allow a Task a given number of retry attempts. Goflow comes with two retry strategies, ConstantDelay and ExponentialBackoff.
  • Database: Goflow supports two database types, in-memory and BoltDB. BoltDB will persist your history of job runs, whereas in-memory means the history will be lost each time the Goflow server is stopped. The default is BoltDB.
  • Streaming: Goflow uses server-sent events to stream the status of jobs and tasks to the UI in real time.

Jobs and tasks

Let's start by creating a function that returns a job called myJob. There is a single task in this job that sleeps for one second.

package main

import (
	"errors"

	"github.com/fieldryand/goflow"
)

func myJob() *goflow.Job {
	j := &goflow.Job{Name: "myJob", Schedule: "* * * * *", Active: true}
	j.Add(&goflow.Task{
		Name:     "sleepForOneSecond",
		Operator: goflow.Command{Cmd: "sleep", Args: []string{"1"}},
	})
	return j
}

By setting Active: true, we are telling Goflow to apply the provided cron schedule for this job when the application starts. Job scheduling can be activated and deactivated from the UI.

Custom operators

A custom Operator needs to implement the Run method. Here's an example of an operator that adds two positive numbers.

type PositiveAddition struct{ a, b int }

func (o PositiveAddition) Run() (interface{}, error) {
	if o.a < 0 || o.b < 0 {
		return 0, errors.New("Can't add negative numbers")
	}
	result := o.a + o.b
	return result, nil
}

Retries

Let's add a retry strategy to the sleepForOneSecond task:

func myJob() *goflow.Job {
	j := &goflow.Job{Name: "myJob", Schedule: "* * * * *"}
	j.Add(&goflow.Task{
		Name:       "sleepForOneSecond",
		Operator:   goflow.Command{Cmd: "sleep", Args: []string{"1"}},
		Retries:    5,
		RetryDelay: goflow.ConstantDelay{Period: 1},
	})
	return j
}

Instead of ConstantDelay, we could also use ExponentialBackoff (see https://en.wikipedia.org/wiki/Exponential_backoff).

Task dependencies

A job can define a directed acyclic graph (DAG) of independent and dependent tasks. Let's use the SetDownstream method to define two tasks that are dependent on sleepForOneSecond. The tasks will use the PositiveAddition operator we defined earlier, as well as a new operator provided by Goflow, Get.

func myJob() *goflow.Job {
	j := &goflow.Job{Name: "myJob", Schedule: "* * * * *"}
	j.Add(&goflow.Task{
		Name:       "sleepForOneSecond",
		Operator:   goflow.Command{Cmd: "sleep", Args: []string{"1"}},
		Retries:    5,
		RetryDelay: goflow.ConstantDelay{Period: 1},
	})
	j.Add(&goflow.Task{
		Name:       "getGoogle",
		Operator:   goflow.Get{Client: &http.Client{}, URL: "https://www.google.com"},
	})
	j.Add(&goflow.Task{
		Name:       "AddTwoPlusThree",
		Operator:   PositiveAddition{a: 2, b: 3},
	})
	j.SetDownstream(j.Task("sleepForOneSecond"), j.Task("getGoogle"))
	j.SetDownstream(j.Task("sleepForOneSecond"), j.Task("AddTwoPlusThree"))
	return j
}

Trigger rules

By default, a task has the trigger rule allSuccessful, meaning the task starts executing when all the tasks directly upstream exit successfully. If any dependency exits with an error, all downstream tasks are skipped, and the job exits with an error.

Sometimes you want a downstream task to execute even if there are upstream failures. Often these are situations where you want to perform some cleanup task, such as shutting down a server. In such cases, you can give a task the trigger rule allDone.

Let's modify sleepForOneSecond to have the trigger rule allDone.

func myJob() *goflow.Job {
	// other stuff
	j.Add(&goflow.Task{
		Name:        "sleepForOneSecond",
		Operator:    goflow.Command{Cmd: "sleep", Args: []string{"1"}},
		Retries:     5,
		RetryDelay:  goflow.ConstantDelay{Period: 1},
		TriggerRule: "allDone",
	})
	// other stuff
}

The Goflow Engine

Finally, let's create a Goflow engine, register our job, attach a logger, and run the application.

func main() {
	gf := goflow.New(goflow.Options{StreamJobRuns: true})
	gf.AddJob(myJob)
	gf.Use(goflow.DefaultLogger())
	gf.Run(":8181")
}

You can pass different options to the engine. Options currently supported:

  • AssetBasePath: The path containing the UI assets, usually assets/.
  • DBType: boltdb (default) or memory
  • BoltDBPath: This will be the filepath of the Bolt database on disk.
  • StreamJobRuns: Whether to stream updates to the UI.
  • ShowExamples: Whether to show the example jobs.

Goflow is built on the Gin framework, so you can pass any Gin handler to Use.

Author: Fieldryand
Source Code: https://github.com/fieldryand/goflow 
License: MIT License

#go #golang #scheduling 

Goflow: Web UI-based Workflow Orchestrator for Rapid Prototyping
Nigel  Uys

Nigel Uys

1649375940

Gocron: Easy and fluent Go cron scheduling

gocron: A Golang Job Scheduling Package.

gocron is a job scheduling package which lets you run Go functions at pre-determined intervals using a simple, human-friendly syntax.

gocron is a Golang scheduler implementation similar to the Ruby module clockwork and the Python job scheduling package schedule.

See also these two great articles that were used for design input:

If you want to chat, you can find us at Slack! 

Concepts

  • Scheduler: The scheduler tracks all the jobs assigned to it and makes sure they are passed to the executor when ready to be run. The scheduler is able to manage overall aspects of job behavior like limiting how many jobs are running at one time.
  • Job: The job is simply aware of the task (go function) it's provided and is therefore only able to perform actions related to that task like preventing itself from overruning a previous task that is taking a long time.
  • Executor: The executor, as it's name suggests, is simply responsible for calling the task (go function) that the job hands to it when sent by the scheduler.

Examples

s := gocron.NewScheduler(time.UTC)

s.Every(5).Seconds().Do(func(){ ... })

// strings parse to duration
s.Every("5m").Do(func(){ ... })

s.Every(5).Days().Do(func(){ ... })

s.Every(1).Month(1, 2, 3).Do(func(){ ... })

// set time
s.Every(1).Day().At("10:30").Do(func(){ ... })

// set multiple times
s.Every(1).Day().At("10:30;08:00").Do(func(){ ... })

s.Every(1).Day().At("10:30").At("08:00").Do(func(){ ... })

// Schedule each last day of the month
s.Every(1).MonthLastDay().Do(func(){ ... })

// Or each last day of every other month
s.Every(2).MonthLastDay().Do(func(){ ... })

// cron expressions supported
s.Cron("*/1 * * * *").Do(task) // every minute

// you can start running the scheduler in two different ways:
// starts the scheduler asynchronously
s.StartAsync()
// starts the scheduler and blocks current execution path
s.StartBlocking()

For more examples, take a look in our go docs

Options

IntervalSupported schedule options
sub-secondStartAt()
millisecondsStartAt()
secondsStartAt()
minutesStartAt()
hoursStartAt()
daysStartAt(), At()
weeksStartAt(), At(), Weekday() (and all week day named functions)
monthsStartAt(), At()

There are several options available to restrict how jobs run:

ModeFunctionBehavior
Default jobs are rescheduled at every interval
Job singletonSingletonMode()a long running job will not be rescheduled until the current run is completed
Scheduler limitSetMaxConcurrentJobs()set a collective maximum number of concurrent jobs running across the scheduler

Tags

Jobs may have arbitrary tags added which can be useful when tracking many jobs. The scheduler supports both enforcing tags to be unique and when not unique, running all jobs with a given tag.

s := gocron.NewScheduler(time.UTC)
s.TagsUnique()

_, _ = s.Every(1).Week().Tag("foo").Do(task)
_, err := s.Every(1).Week().Tag("foo").Do(task)
// error!!!

s := gocron.NewScheduler(time.UTC)

s.Every(2).Day().Tag("tag").At("10:00").Do(task)
s.Every(1).Minute().Tag("tag").Do(task)
s.RunByTag("tag")
// both jobs will run

FAQ

Q: I'm running multiple pods on a distributed environment. How can I make a job not run once per pod causing duplication?

  • A: We recommend using your own lock solution within the jobs themselves (you could use Redis, for example)

Q: I've removed my job from the scheduler, but how can I stop a long-running job that has already been triggered?

  • A: We recommend using a means of canceling your job, e.g. a context.WithCancel().

Looking to contribute? Try to follow these guidelines:

  • Use issues for everything
  • For a small change, just send a PR!
  • For bigger changes, please open an issue for discussion before sending a PR.
  • PRs should have: tests, documentation and examples (if it makes sense)
  • You can also contribute by:
    • Reporting issues
    • Suggesting new features or enhancements
    • Improving/fixing documentation

Design

design-diagram

Jetbrains supports this project with GoLand licenses. We appreciate their support for free and open source software!

Author: Go-co-op
Source Code: https://github.com/go-co-op/gocron 
License: MIT License

#go #golang #scheduling 

Gocron: Easy and fluent Go cron scheduling
Nigel  Uys

Nigel Uys

1649331240

Cdule: Golang Based Scheduler Library with Database Support

cdule (pronounce as Schedule)

dbschema.png

Golang based scheduler library with database support. Users could use any database which is supported by gorm.io.

To Download the cdule library

go get github.com/deepaksinghvi/cdule

Usage Instruction

In order to schedule jobs with cdule, user needs to

  1. Configure persistence
  2. Implement cdule.Job Interface &
  3. Schedule job with required cron expression.

Job will be persisted in the jobs table.
Next execution would be persisted in schedules tables.
Job history would be persisted and maintained in job_histories table.

Configuration

User needs to create a resources/config.yml in their project home directory with the followling keys

  • cduletype
  • dburl
  • cduleconsistency

cduletype is used to specify whether it is an In-Memory or Database based configuration. Possible values are DATABASE and MEMORY. dburl is the database connection url. cduleconsistency is for reserved for future usage.

config.yml for postgressql based configuration

cduletype: DATABASE
dburl: postgres://cduleuser:cdulepassword@localhost:5432/cdule?sslmode=disable
cduleconsistency: AT_MOST_ONCE

config.yml for sqlite based in-memory configuration

cduletype: MEMORY
dburl: /Users/dsinghvi/sqlite.db
cduleconsistency: AT_MOST_ONCE

Job Interface Implementation

var testJobData map[string]string

type TestJob struct {
	Job cdule.Job
}

func (m TestJob) Execute(jobData map[string]string) {
	log.Info("In TestJob")
	for k, v := range jobData {
		valNum, err := strconv.Atoi(v)
		if nil == err {
			jobData[k] = strconv.Itoa(valNum + 1)
		} else {
			log.Error(err)
		}

	}
	testJobData = jobData
}

func (m TestJob) JobName() string {
	return "job.TestJob"
}

func (m TestJob) GetJobData() map[string]string {
	return testJobData
}

Schedule a Job

It is expected that testJob will be Executed five times, once for every minute and program will exit. TestJob jobData map holds the data in the format of map[string]string where gets stored for every execution and gets updated as the next counter value on Execute() method call.

cdule := cdule.Cdule{}
cdule.NewCdule()
testJob := TestJob{}
jobData := make(map[string]string)
jobData["one"] = "1"
jobData["two"] = "2"
jobData["three"] = "3"
cdule.NewJob(&testJob, jobData).Build(utils.EveryMinute)

time.Sleep(5 * time.Minute)
cdule.StopWatcher()

Demo Project

This demo describes how cdule library can be used.

https://github.com/deepaksinghvi/cduledemo

Database Schema

For the sample app, postgresql is used but users can use any db which is supported by gorm.io.

DB Tables

  • jobs : To store unique jobs.
  • job_histories : To store job history with status as result.
  • schedules : To store schedule for every next run.
  • workers : To store the worker nodes and their health check.

dbschema.png

Sample Cron

Users can use the pre-defined crons or use their own which are the standard cron expression

EveryMinute              = "0 * * ? * *"
EveryEvenMinute          = "0 */2 * ? * *"
EveryUnEvenMinute        = "0 1/2 * ? * *"
EveryTwoMinutes          = "0 */2 * ? * *"
EveryHourAtMin153045     = "0 15,30,45 * ? * *"
EveryHour                = "0 0 * ? * *"
EveryEvenHour            = "0 0 0/2 ? * *"
EveryUnEvenHour          = "0 0 1/2 ? * *"
EveryThreeHours          = "0 0 */3 ? * *"
EveryTwelveHours         = "0 0 */12 ? * *"
EveryDayAtMidNight       = "0 0 0 * * ?"
EveryDayAtOneAM          = "0 0 1 * * ?"
EveryDayAtSixAM          = "0 0 6 * * ?"
EverySundayAtNoon        = "0 0 12 ? * "
EveryMondayAtNoon        = "0 0 12 ? *"
EveryWeekDayAtNoon       = "0 0 12 ? * MON-FRI"
EveryWeekEndAtNoon       = "0 0 12 ? * SUN,SAT"
EveryMonthOnFirstAtNoon  = "0 0 12 1 * ?"
EveryMonthOnSecondAtNoon = "0 0 12 2 * ?"

This library is built using

Other Reports

pkg.go.dev: https://pkg.go.dev/github.com/deepaksinghvi/cdule

goreportcard.com: https://goreportcard.com/report/github.com/deepaksinghvi/cdule

coverage service link: https://app.codecov.io/gh/deepaksinghvi/cdule

Author: Deepaksinghvi
Source Code: https://github.com/deepaksinghvi/cdule 
License: MIT License

#go #golang #scheduling 

Cdule: Golang Based Scheduler Library with Database Support
Sheldon  Grant

Sheldon Grant

1646860440

Bottleneck: Rate Limiter That Makes Throttling Easy

bottleneck

Bottleneck is a lightweight and zero-dependency Task Scheduler and Rate Limiter for Node.js and the browser.

Bottleneck is an easy solution as it adds very little complexity to your code. It is battle-hardened, reliable and production-ready and used on a large scale in private companies and open source software.

It supports Clustering: it can rate limit jobs across multiple Node.js instances. It uses Redis and strictly atomic operations to stay reliable in the presence of unreliable clients and networks. It also supports Redis Cluster and Redis Sentinel.

Upgrading from version 1?

Install

npm install --save bottleneck
import Bottleneck from "bottleneck";

// Note: To support older browsers and Node <6.0, you must import the ES5 bundle instead.
var Bottleneck = require("bottleneck/es5");

Quick Start

Step 1 of 3

Most APIs have a rate limit. For example, to execute 3 requests per second:

const limiter = new Bottleneck({
  minTime: 333
});

If there's a chance some requests might take longer than 333ms and you want to prevent more than 1 request from running at a time, add maxConcurrent: 1:

const limiter = new Bottleneck({
  maxConcurrent: 1,
  minTime: 333
});

minTime and maxConcurrent are enough for the majority of use cases. They work well together to ensure a smooth rate of requests. If your use case requires executing requests in bursts or every time a quota resets, look into Reservoir Intervals.

Step 2 of 3

➤ Using promises?

Instead of this:

myFunction(arg1, arg2)
.then((result) => {
  /* handle result */
});

Do this:

limiter.schedule(() => myFunction(arg1, arg2))
.then((result) => {
  /* handle result */
});

Or this:

const wrapped = limiter.wrap(myFunction);

wrapped(arg1, arg2)
.then((result) => {
  /* handle result */
});

➤ Using async/await?

Instead of this:

const result = await myFunction(arg1, arg2);

Do this:

const result = await limiter.schedule(() => myFunction(arg1, arg2));

Or this:

const wrapped = limiter.wrap(myFunction);

const result = await wrapped(arg1, arg2);

➤ Using callbacks?

Instead of this:

someAsyncCall(arg1, arg2, callback);

Do this:

limiter.submit(someAsyncCall, arg1, arg2, callback);

Step 3 of 3

Remember...

Bottleneck builds a queue of jobs and executes them as soon as possible. By default, the jobs will be executed in the order they were received.

Read the 'Gotchas' and you're good to go. Or keep reading to learn about all the fine tuning and advanced options available. If your rate limits need to be enforced across a cluster of computers, read the Clustering docs.

Need help debugging your application?

Instead of throttling maybe you want to batch up requests into fewer calls?

Gotchas & Common Mistakes

  • Make sure the function you pass to schedule() or wrap() only returns once all the work it does has completed.

Instead of this:

limiter.schedule(() => {
  tasksArray.forEach(x => processTask(x));
  // BAD, we return before our processTask() functions are finished processing!
});

Do this:

limiter.schedule(() => {
  const allTasks = tasksArray.map(x => processTask(x));
  // GOOD, we wait until all tasks are done.
  return Promise.all(allTasks);
});
  • If you're passing an object's method as a job, you'll probably need to bind() the object:
// instead of this:
limiter.schedule(object.doSomething);
// do this:
limiter.schedule(object.doSomething.bind(object));
// or, wrap it in an arrow function instead:
limiter.schedule(() => object.doSomething());

Bottleneck requires Node 6+ to function. However, an ES5 build is included: var Bottleneck = require("bottleneck/es5");.

Make sure you're catching "error" events emitted by your limiters!

Consider setting a maxConcurrent value instead of leaving it null. This can help your application's performance, especially if you think the limiter's queue might become very long.

If you plan on using priorities, make sure to set a maxConcurrent value.

When using submit(), if a callback isn't necessary, you must pass null or an empty function instead. It will not work otherwise.

When using submit(), make sure all the jobs will eventually complete by calling their callback, or set an expiration. Even if you submitted your job with a null callback , it still needs to call its callback. This is particularly important if you are using a maxConcurrent value that isn't null (unlimited), otherwise those not completed jobs will be clogging up the limiter and no new jobs will be allowed to run. It's safe to call the callback more than once, subsequent calls are ignored.

Using tools like mockdate in your tests to change time in JavaScript will likely result in undefined behavior from Bottleneck.

Docs

Constructor

const limiter = new Bottleneck({/* options */});

Basic options:

OptionDefaultDescription
maxConcurrentnull (unlimited)How many jobs can be executing at the same time. Consider setting a value instead of leaving it null, it can help your application's performance, especially if you think the limiter's queue might get very long.
minTime0 msHow long to wait after launching a job before launching another one.
highWaternull (unlimited)How long can the queue be? When the queue length exceeds that value, the selected strategy is executed to shed the load.
strategyBottleneck.strategy.LEAKWhich strategy to use when the queue gets longer than the high water mark. Read about strategies. Strategies are never executed if highWater is null.
penalty15 * minTime, or 5000 when minTime is 0The penalty value used by the BLOCK strategy.
reservoirnull (unlimited)How many jobs can be executed before the limiter stops executing jobs. If reservoir reaches 0, no jobs will be executed until it is no longer 0. New jobs will still be queued up.
reservoirRefreshIntervalnull (disabled)Every reservoirRefreshInterval milliseconds, the reservoir value will be automatically updated to the value of reservoirRefreshAmount. The reservoirRefreshInterval value should be a multiple of 250 (5000 for Clustering).
reservoirRefreshAmountnull (disabled)The value to set reservoir to when reservoirRefreshInterval is in use.
reservoirIncreaseIntervalnull (disabled)Every reservoirIncreaseInterval milliseconds, the reservoir value will be automatically incremented by reservoirIncreaseAmount. The reservoirIncreaseInterval value should be a multiple of 250 (5000 for Clustering).
reservoirIncreaseAmountnull (disabled)The increment applied to reservoir when reservoirIncreaseInterval is in use.
reservoirIncreaseMaximumnull (disabled)The maximum value that reservoir can reach when reservoirIncreaseInterval is in use.
PromisePromise (built-in)This lets you override the Promise library used by Bottleneck.

Reservoir Intervals

Reservoir Intervals let you execute requests in bursts, by automatically controlling the limiter's reservoir value. The reservoir is simply the number of jobs the limiter is allowed to execute. Once the value reaches 0, it stops starting new jobs.

There are 2 types of Reservoir Intervals: Refresh Intervals and Increase Intervals.

Refresh Interval

In this example, we throttle to 100 requests every 60 seconds:

const limiter = new Bottleneck({
  reservoir: 100, // initial value
  reservoirRefreshAmount: 100,
  reservoirRefreshInterval: 60 * 1000, // must be divisible by 250

  // also use maxConcurrent and/or minTime for safety
  maxConcurrent: 1,
  minTime: 333 // pick a value that makes sense for your use case
});

reservoir is a counter decremented every time a job is launched, we set its initial value to 100. Then, every reservoirRefreshInterval (60000 ms), reservoir is automatically updated to be equal to the reservoirRefreshAmount (100).

Increase Interval

In this example, we throttle jobs to meet the Shopify API Rate Limits. Users are allowed to send 40 requests initially, then every second grants 2 more requests up to a maximum of 40.

const limiter = new Bottleneck({
  reservoir: 40, // initial value
  reservoirIncreaseAmount: 2,
  reservoirIncreaseInterval: 1000, // must be divisible by 250
  reservoirIncreaseMaximum: 40,

  // also use maxConcurrent and/or minTime for safety
  maxConcurrent: 5,
  minTime: 250 // pick a value that makes sense for your use case
});

Warnings

Reservoir Intervals are an advanced feature, please take the time to read and understand the following warnings.

Reservoir Intervals are not a replacement for minTime and maxConcurrent. It's strongly recommended to also use minTime and/or maxConcurrent to spread out the load. For example, suppose a lot of jobs are queued up because the reservoir is 0. Every time the Refresh Interval is triggered, a number of jobs equal to reservoirRefreshAmount will automatically be launched, all at the same time! To prevent this flooding effect and keep your application running smoothly, use minTime and maxConcurrent to stagger the jobs.

The Reservoir Interval starts from the moment the limiter is created. Let's suppose we're using reservoirRefreshAmount: 5. If you happen to add 10 jobs just 1ms before the refresh is triggered, the first 5 will run immediately, then 1ms later it will refresh the reservoir value and that will make the last 5 also run right away. It will have run 10 jobs in just over 1ms no matter what your reservoir interval was!

Reservoir Intervals prevent a limiter from being garbage collected. Call limiter.disconnect() to clear the interval and allow the memory to be freed. However, it's not necessary to call .disconnect() to allow the Node.js process to exit.

submit()

Adds a job to the queue. This is the callback version of schedule().

limiter.submit(someAsyncCall, arg1, arg2, callback);

You can pass null instead of an empty function if there is no callback, but someAsyncCall still needs to call its callback to let the limiter know it has completed its work.

submit() can also accept advanced options.

schedule()

Adds a job to the queue. This is the Promise and async/await version of submit().

const fn = function(arg1, arg2) {
  return httpGet(arg1, arg2); // Here httpGet() returns a promise
};

limiter.schedule(fn, arg1, arg2)
.then((result) => {
  /* ... */
});

In other words, schedule() takes a function fn and a list of arguments. schedule() returns a promise that will be executed according to the rate limits.

schedule() can also accept advanced options.

Here's another example:

// suppose that `client.get(url)` returns a promise

const url = "https://wikipedia.org";

limiter.schedule(() => client.get(url))
.then(response => console.log(response.body));

wrap()

Takes a function that returns a promise. Returns a function identical to the original, but rate limited.

const wrapped = limiter.wrap(fn);

wrapped()
.then(function (result) {
  /* ... */
})
.catch(function (error) {
  // Bottleneck might need to fail the job even if the original function can never fail.
  // For example, your job is taking longer than the `expiration` time you've set.
});

Job Options

submit(), schedule(), and wrap() all accept advanced options.

// Submit
limiter.submit({/* options */}, someAsyncCall, arg1, arg2, callback);

// Schedule
limiter.schedule({/* options */}, fn, arg1, arg2);

// Wrap
const wrapped = limiter.wrap(fn);
wrapped.withOptions({/* options */}, arg1, arg2);
OptionDefaultDescription
priority5A priority between 0 and 9. A job with a priority of 4 will be queued ahead of a job with a priority of 5. Important: You must set a low maxConcurrent value for priorities to work, otherwise there is nothing to queue because jobs will be be scheduled immediately!
weight1Must be an integer equal to or higher than 0. The weight is what increases the number of running jobs (up to maxConcurrent) and decreases the reservoir value.
expirationnull (unlimited)The number of milliseconds a job is given to complete. Jobs that execute for longer than expiration ms will be failed with a BottleneckError.
id<no-id>You should give an ID to your jobs, it helps with debugging.

Strategies

A strategy is a simple algorithm that is executed every time adding a job would cause the number of queued jobs to exceed highWater. Strategies are never executed if highWater is null.

Bottleneck.strategy.LEAK

When adding a new job to a limiter, if the queue length reaches highWater, drop the oldest job with the lowest priority. This is useful when jobs that have been waiting for too long are not important anymore. If all the queued jobs are more important (based on their priority value) than the one being added, it will not be added.

Bottleneck.strategy.OVERFLOW_PRIORITY

Same as LEAK, except it will only drop jobs that are less important than the one being added. If all the queued jobs are as or more important than the new one, it will not be added.

Bottleneck.strategy.OVERFLOW

When adding a new job to a limiter, if the queue length reaches highWater, do not add the new job. This strategy totally ignores priority levels.

Bottleneck.strategy.BLOCK

When adding a new job to a limiter, if the queue length reaches highWater, the limiter falls into "blocked mode". All queued jobs are dropped and no new jobs will be accepted until the limiter unblocks. It will unblock after penalty milliseconds have passed without receiving a new job. penalty is equal to 15 * minTime (or 5000 if minTime is 0) by default. This strategy is ideal when bruteforce attacks are to be expected. This strategy totally ignores priority levels.

Jobs lifecycle

  1. Received. Your new job has been added to the limiter. Bottleneck needs to check whether it can be accepted into the queue.
  2. Queued. Bottleneck has accepted your job, but it can not tell at what exact timestamp it will run yet, because it is dependent on previous jobs.
  3. Running. Your job is not in the queue anymore, it will be executed after a delay that was computed according to your minTime setting.
  4. Executing. Your job is executing its code.
  5. Done. Your job has completed.

Note: By default, Bottleneck does not keep track of DONE jobs, to save memory. You can enable this feature by passing trackDoneStatus: true as an option when creating a limiter.

counts()

const counts = limiter.counts();

console.log(counts);
/*
{
  RECEIVED: 0,
  QUEUED: 0,
  RUNNING: 0,
  EXECUTING: 0,
  DONE: 0
}
*/

Returns an object with the current number of jobs per status in the limiter.

jobStatus()

console.log(limiter.jobStatus("some-job-id"));
// Example: QUEUED

Returns the status of the job with the provided job id in the limiter. Returns null if no job with that id exist.

jobs()

console.log(limiter.jobs("RUNNING"));
// Example: ['id1', 'id2']

Returns an array of all the job ids with the specified status in the limiter. Not passing a status string returns all the known ids.

queued()

const count = limiter.queued(priority);

console.log(count);

priority is optional. Returns the number of QUEUED jobs with the given priority level. Omitting the priority argument returns the total number of queued jobs in the limiter.

clusterQueued()

const count = await limiter.clusterQueued();

console.log(count);

Returns the number of QUEUED jobs in the Cluster.

empty()

if (limiter.empty()) {
  // do something...
}

Returns a boolean which indicates whether there are any RECEIVED or QUEUED jobs in the limiter.

running()

limiter.running()
.then((count) => console.log(count));

Returns a promise that returns the total weight of the RUNNING and EXECUTING jobs in the Cluster.

done()

limiter.done()
.then((count) => console.log(count));

Returns a promise that returns the total weight of DONE jobs in the Cluster. Does not require passing the trackDoneStatus: true option.

check()

limiter.check()
.then((wouldRunNow) => console.log(wouldRunNow));

Checks if a new job would be executed immediately if it was submitted now. Returns a promise that returns a boolean.

Events

'error'

limiter.on("error", function (error) {
  /* handle errors here */
});

The two main causes of error events are: uncaught exceptions in your event handlers, and network errors when Clustering is enabled.

'failed'

limiter.on("failed", function (error, jobInfo) {
  // This will be called every time a job fails.
});

'retry'

See Retries to learn how to automatically retry jobs.

limiter.on("retry", function (message, jobInfo) {
  // This will be called every time a job is retried.
});

'empty'

limiter.on("empty", function () {
  // This will be called when `limiter.empty()` becomes true.
});

'idle'

limiter.on("idle", function () {
  // This will be called when `limiter.empty()` is `true` and `limiter.running()` is `0`.
});

'dropped'

limiter.on("dropped", function (dropped) {
  // This will be called when a strategy was triggered.
  // The dropped request is passed to this event listener.
});

'depleted'

limiter.on("depleted", function (empty) {
  // This will be called every time the reservoir drops to 0.
  // The `empty` (boolean) argument indicates whether `limiter.empty()` is currently true.
});

'debug'

limiter.on("debug", function (message, data) {
  // Useful to figure out what the limiter is doing in real time
  // and to help debug your application
});

'received' 'queued' 'scheduled' 'executing' 'done'

limiter.on("queued", function (info) {
  // This event is triggered when a job transitions from one Lifecycle stage to another
});

See Jobs Lifecycle for more information.

These Lifecycle events are not triggered for jobs located on another limiter in a Cluster, for performance reasons.

Other event methods

Use removeAllListeners() with an optional event name as first argument to remove listeners.

Use .once() instead of .on() to only receive a single event.

Retries

The following example:

const limiter = new Bottleneck();

// Listen to the "failed" event
limiter.on("failed", async (error, jobInfo) => {
  const id = jobInfo.options.id;
  console.warn(`Job ${id} failed: ${error}`);

  if (jobInfo.retryCount === 0) { // Here we only retry once
    console.log(`Retrying job ${id} in 25ms!`);
    return 25;
  }
});

// Listen to the "retry" event
limiter.on("retry", (error, jobInfo) => console.log(`Now retrying ${jobInfo.options.id}`));

const main = async function () {
  let executions = 0;

  // Schedule one job
  const result = await limiter.schedule({ id: 'ABC123' }, async () => {
    executions++;
    if (executions === 1) {
      throw new Error("Boom!");
    } else {
      return "Success!";
    }
  });

  console.log(`Result: ${result}`);
}

main();

will output

Job ABC123 failed: Error: Boom!
Retrying job ABC123 in 25ms!
Now retrying ABC123
Result: Success!

To re-run your job, simply return an integer from the 'failed' event handler. The number returned is how many milliseconds to wait before retrying it. Return 0 to retry it immediately.

IMPORTANT: When you ask the limiter to retry a job it will not send it back into the queue. It will stay in the EXECUTING state until it succeeds or until you stop retrying it. This means that it counts as a concurrent job for maxConcurrent even while it's just waiting to be retried. The number of milliseconds to wait ignores your minTime settings.

updateSettings()

limiter.updateSettings(options);

The options are the same as the limiter constructor.

Note: Changes don't affect SCHEDULED jobs.

incrementReservoir()

limiter.incrementReservoir(incrementBy);

Returns a promise that returns the new reservoir value.

currentReservoir()

limiter.currentReservoir()
.then((reservoir) => console.log(reservoir));

Returns a promise that returns the current reservoir value.

stop()

The stop() method is used to safely shutdown a limiter. It prevents any new jobs from being added to the limiter and waits for all EXECUTING jobs to complete.

limiter.stop(options)
.then(() => {
  console.log("Shutdown completed!")
});

stop() returns a promise that resolves once all the EXECUTING jobs have completed and, if desired, once all non-EXECUTING jobs have been dropped.

OptionDefaultDescription
dropWaitingJobstrueWhen true, drop all the RECEIVED, QUEUED and RUNNING jobs. When false, allow those jobs to complete before resolving the Promise returned by this method.
dropErrorMessageThis limiter has been stopped.The error message used to drop jobs when dropWaitingJobs is true.
enqueueErrorMessageThis limiter has been stopped and cannot accept new jobs.The error message used to reject a job added to the limiter after stop() has been called.

chain()

Tasks that are ready to be executed will be added to that other limiter. Suppose you have 2 types of tasks, A and B. They both have their own limiter with their own settings, but both must also follow a global limiter G:

const limiterA = new Bottleneck( /* some settings */ );
const limiterB = new Bottleneck( /* some different settings */ );
const limiterG = new Bottleneck( /* some global settings */ );

limiterA.chain(limiterG);
limiterB.chain(limiterG);

// Requests added to limiterA must follow the A and G rate limits.
// Requests added to limiterB must follow the B and G rate limits.
// Requests added to limiterG must follow the G rate limits.

To unchain, call limiter.chain(null);.

Group

The Group feature of Bottleneck manages many limiters automatically for you. It creates limiters dynamically and transparently.

Let's take a DNS server as an example of how Bottleneck can be used. It's a service that sees a lot of abuse and where incoming DNS requests need to be rate limited. Bottleneck is so tiny, it's acceptable to create one limiter for each origin IP, even if it means creating thousands of limiters. The Group feature is perfect for this use case. Create one Group and use the origin IP to rate limit each IP independently. Each call with the same key (IP) will be routed to the same underlying limiter. A Group is created like a limiter:

const group = new Bottleneck.Group(options);

The options object will be used for every limiter created by the Group.

The Group is then used with the .key(str) method:

// In this example, the key is an IP
group.key("77.66.54.32").schedule(() => {
  /* process the request */
});

key()

  • str : The key to use. All jobs added with the same key will use the same underlying limiter. Default: ""

The return value of .key(str) is a limiter. If it doesn't already exist, it is generated for you. Calling key() is how limiters are created inside a Group.

Limiters that have been idle for longer than 5 minutes are deleted to avoid memory leaks, this value can be changed by passing a different timeout option, in milliseconds.

on("created")

group.on("created", (limiter, key) => {
  console.log("A new limiter was created for key: " + key)

  // Prepare the limiter, for example we'll want to listen to its "error" events!
  limiter.on("error", (err) => {
    // Handle errors here
  })
});

Listening for the "created" event is the recommended way to set up a new limiter. Your event handler is executed before key() returns the newly created limiter.

updateSettings()

const group = new Bottleneck.Group({ maxConcurrent: 2, minTime: 250 });
group.updateSettings({ minTime: 500 });

After executing the above commands, new limiters will be created with { maxConcurrent: 2, minTime: 500 }.

deleteKey()

  • str: The key for the limiter to delete.

Manually deletes the limiter at the specified key. When using Clustering, the Redis data is immediately deleted and the other Groups in the Cluster will eventually delete their local key automatically, unless it is still being used.

keys()

Returns an array containing all the keys in the Group.

clusterKeys()

Same as group.keys(), but returns all keys in this Group ID across the Cluster.

limiters()

const limiters = group.limiters();

console.log(limiters);
// [ { key: "some key", limiter: <limiter> }, { key: "some other key", limiter: <some other limiter> } ]

Batching

Some APIs can accept multiple operations in a single call. Bottleneck's Batching feature helps you take advantage of those APIs:

const batcher = new Bottleneck.Batcher({
  maxTime: 1000,
  maxSize: 10
});

batcher.on("batch", (batch) => {
  console.log(batch); // ["some-data", "some-other-data"]

  // Handle batch here
});

batcher.add("some-data");
batcher.add("some-other-data");

batcher.add() returns a Promise that resolves once the request has been flushed to a "batch" event.

OptionDefaultDescription
maxTimenull (unlimited)Maximum acceptable time (in milliseconds) a request can have to wait before being flushed to the "batch" event.
maxSizenull (unlimited)Maximum number of requests in a batch.

Batching doesn't throttle requests, it only groups them up optimally according to your maxTime and maxSize settings.

Clustering

Clustering lets many limiters access the same shared state, stored in Redis. Changes to the state are Atomic, Consistent and Isolated (and fully ACID with the right Durability configuration), to eliminate any chances of race conditions or state corruption. Your settings, such as maxConcurrent, minTime, etc., are shared across the whole cluster, which means —for example— that { maxConcurrent: 5 } guarantees no more than 5 jobs can ever run at a time in the entire cluster of limiters. 100% of Bottleneck's features are supported in Clustering mode. Enabling Clustering is as simple as changing a few settings. It's also a convenient way to store or export state for later use.

Bottleneck will attempt to spread load evenly across limiters.

Enabling Clustering

First, add redis or ioredis to your application's dependencies:

# NodeRedis (https://github.com/NodeRedis/node_redis)
npm install --save redis

# or ioredis (https://github.com/luin/ioredis)
npm install --save ioredis

Then create a limiter or a Group:

const limiter = new Bottleneck({
  /* Some basic options */
  maxConcurrent: 5,
  minTime: 500
  id: "my-super-app" // All limiters with the same id will be clustered together

  /* Clustering options */
  datastore: "redis", // or "ioredis"
  clearDatastore: false,
  clientOptions: {
    host: "127.0.0.1",
    port: 6379

    // Redis client options
    // Using NodeRedis? See https://github.com/NodeRedis/node_redis#options-object-properties
    // Using ioredis? See https://github.com/luin/ioredis/blob/master/API.md#new-redisport-host-options
  }
});
OptionDefaultDescription
datastore"local"Where the limiter stores its internal state. The default ("local") keeps the state in the limiter itself. Set it to "redis" or "ioredis" to enable Clustering.
clearDatastorefalseWhen set to true, on initial startup, the limiter will wipe any existing Bottleneck state data on the Redis db.
clientOptions{}This object is passed directly to the redis client library you've selected.
clusterNodesnullioredis only. When clusterNodes is not null, the client will be instantiated by calling new Redis.Cluster(clusterNodes, clientOptions) instead of new Redis(clientOptions).
timeoutnull (no TTL)The Redis TTL in milliseconds (TTL) for the keys created by the limiter. When timeout is set, the limiter's state will be automatically removed from Redis after timeout milliseconds of inactivity.
RedisnullOverrides the import/require of the redis/ioredis library. You shouldn't need to set this option unless your application is failing to start due to a failure to require/import the client library.

Note: When using Groups, the timeout option has a default of 300000 milliseconds and the generated limiters automatically receive an id with the pattern ${group.id}-${KEY}.

Note: If you are seeing a runtime error due to the require() function not being able to load redis/ioredis, then directly pass the module as the Redis option. Example:

import Redis from "ioredis"

const limiter = new Bottleneck({
  id: "my-super-app",
  datastore: "ioredis",
  clientOptions: { host: '12.34.56.78', port: 6379 },
  Redis
});

Unfortunately, this is a side effect of having to disable inlining, which is necessary to make Bottleneck easy to use in the browser.

Important considerations when Clustering

The first limiter connecting to Redis will store its constructor options on Redis and all subsequent limiters will be using those settings. You can alter the constructor options used by all the connected limiters by calling updateSettings(). The clearDatastore option instructs a new limiter to wipe any previous Bottleneck data (for that id), including previously stored settings.

Queued jobs are NOT stored on Redis. They are local to each limiter. Exiting the Node.js process will lose those jobs. This is because Bottleneck has no way to propagate the JS code to run a job across a different Node.js process than the one it originated on. Bottleneck doesn't keep track of the queue contents of the limiters on a cluster for performance and reliability reasons. You can use something like BeeQueue in addition to Bottleneck to get around this limitation.

Due to the above, functionality relying on the queue length happens purely locally:

  • Priorities are local. A higher priority job will run before a lower priority job on the same limiter. Another limiter on the cluster might run a lower priority job before our higher priority one.
  • Assuming constant priority levels, Bottleneck guarantees that jobs will be run in the order they were received on the same limiter. Another limiter on the cluster might run a job received later before ours runs.
  • highWater and load shedding (strategies) are per limiter. However, one limiter entering Blocked mode will put the entire cluster in Blocked mode until penalty milliseconds have passed. See Strategies.
  • The "empty" event is triggered when the (local) queue is empty.
  • The "idle" event is triggered when the (local) queue is empty and no jobs are currently running anywhere in the cluster.

You must work around these limitations in your application code if they are an issue to you. The publish() method could be useful here.

The current design guarantees reliability, is highly performant and lets limiters come and go. Your application can scale up or down, and clients can be disconnected at any time without issues.

It is strongly recommended that you give an id to every limiter and Group since it is used to build the name of your limiter's Redis keys! Limiters with the same id inside the same Redis db will be sharing the same datastore.

It is strongly recommended that you set an expiration (See Job Options) on every job, since that lets the cluster recover from crashed or disconnected clients. Otherwise, a client crashing while executing a job would not be able to tell the cluster to decrease its number of "running" jobs. By using expirations, those lost jobs are automatically cleared after the specified time has passed. Using expirations is essential to keeping a cluster reliable in the face of unpredictable application bugs, network hiccups, and so on.

Network latency between Node.js and Redis is not taken into account when calculating timings (such as minTime). To minimize the impact of latency, Bottleneck only performs a single Redis call per lifecycle transition. Keeping the Redis server close to your limiters will help you get a more consistent experience. Keeping the system time consistent across all clients will also help.

It is strongly recommended to set up an "error" listener on all your limiters and on your Groups.

Clustering Methods

The ready(), publish() and clients() methods also exist when using the local datastore, for code compatibility reasons: code written for redis/ioredis won't break with local.

ready()

This method returns a promise that resolves once the limiter is connected to Redis.

As of v2.9.0, it's no longer necessary to wait for .ready() to resolve before issuing commands to a limiter. The commands will be queued until the limiter successfully connects. Make sure to listen to the "error" event to handle connection errors.

const limiter = new Bottleneck({/* options */});

limiter.on("error", (err) => {
  // handle network errors
});

limiter.ready()
.then(() => {
  // The limiter is ready
});

publish(message)

This method broadcasts the message string to every limiter in the Cluster. It returns a promise.

const limiter = new Bottleneck({/* options */});

limiter.on("message", (msg) => {
  console.log(msg); // prints "this is a string"
});

limiter.publish("this is a string");

To send objects, stringify them first:

limiter.on("message", (msg) => {
  console.log(JSON.parse(msg).hello) // prints "world"
});

limiter.publish(JSON.stringify({ hello: "world" }));

clients()

If you need direct access to the redis clients, use .clients():

console.log(limiter.clients());
// { client: <Redis Client>, subscriber: <Redis Client> }

Additional Clustering information

  • Bottleneck is compatible with Redis Clusters, but you must use the ioredis datastore and the clusterNodes option.
  • Bottleneck is compatible with Redis Sentinel, but you must use the ioredis datastore.
  • Bottleneck's data is stored in Redis keys starting with b_. It also uses pubsub channels starting with b_ It will not interfere with any other data stored on the server.
  • Bottleneck loads a few Lua scripts on the Redis server using the SCRIPT LOAD command. These scripts only take up a few Kb of memory. Running the SCRIPT FLUSH command will cause any connected limiters to experience critical errors until a new limiter connects to Redis and loads the scripts again.
  • The Lua scripts are highly optimized and designed to use as few resources as possible.

Managing Redis Connections

Bottleneck needs to create 2 Redis Clients to function, one for normal operations and one for pubsub subscriptions. These 2 clients are kept in a Bottleneck.RedisConnection (NodeRedis) or a Bottleneck.IORedisConnection (ioredis) object, referred to as the Connection object.

By default, every Group and every standalone limiter (a limiter not created by a Group) will create their own Connection object, but it is possible to manually control this behavior. In this example, every Group and limiter is sharing the same Connection object and therefore the same 2 clients:

const connection = new Bottleneck.RedisConnection({
  clientOptions: {/* NodeRedis/ioredis options */}
  // ioredis also accepts `clusterNodes` here
});


const limiter = new Bottleneck({ connection: connection });
const group = new Bottleneck.Group({ connection: connection });

You can access and reuse the Connection object of any Group or limiter:

const group = new Bottleneck.Group({ connection: limiter.connection });

When a Connection object is created manually, the connectivity "error" events are emitted on the Connection itself.

connection.on("error", (err) => { /* handle connectivity errors here */ });

If you already have a NodeRedis/ioredis client, you can ask Bottleneck to reuse it, although currently the Connection object will still create a second client for pubsub operations:

import Redis from "redis";
const client = new Redis.createClient({/* options */});

const connection = new Bottleneck.RedisConnection({
  // `clientOptions` and `clusterNodes` will be ignored since we're passing a raw client
  client: client
});

const limiter = new Bottleneck({ connection: connection });
const group = new Bottleneck.Group({ connection: connection });

Depending on your application, using more clients can improve performance.

Use the disconnect(flush) method to close the Redis clients.

limiter.disconnect();
group.disconnect();

If you created the Connection object manually, you need to call connection.disconnect() instead, for safety reasons.

Debugging your application

Debugging complex scheduling logic can be difficult, especially when priorities, weights, and network latency all interact with one another.

If your application is not behaving as expected, start by making sure you're catching "error" events emitted by your limiters and your Groups. Those errors are most likely uncaught exceptions from your application code.

Make sure you've read the 'Gotchas' section.

To see exactly what a limiter is doing in real time, listen to the "debug" event. It contains detailed information about how the limiter is executing your code. Adding job IDs to all your jobs makes the debug output more readable.

When Bottleneck has to fail one of your jobs, it does so by using BottleneckError objects. This lets you tell those errors apart from your own code's errors:

limiter.schedule(fn)
.then((result) => { /* ... */ } )
.catch((error) => {
  if (error instanceof Bottleneck.BottleneckError) {
    /* ... */
  }
});

Upgrading to v2

The internal algorithms essentially haven't changed from v1, but many small changes to the interface were made to introduce new features.

All the breaking changes:

  • Bottleneck v2 requires Node 6+ or a modern browser. Use require("bottleneck/es5") if you need ES5 support in v2. Bottleneck v1 will continue to use ES5 only.
  • The Bottleneck constructor now takes an options object. See Constructor.
  • The Cluster feature is now called Group. This is to distinguish it from the new v2 Clustering feature.
  • The Group constructor takes an options object to match the limiter constructor.
  • Jobs take an optional options object. See Job options.
  • Removed submitPriority(), use submit() with an options object instead.
  • Removed schedulePriority(), use schedule() with an options object instead.
  • The rejectOnDrop option is now true by default. It can be set to false if you wish to retain v1 behavior. However this option is left undocumented as enabling it is considered to be a poor practice.
  • Use null instead of 0 to indicate an unlimited maxConcurrent value.
  • Use null instead of -1 to indicate an unlimited highWater value.
  • Renamed changeSettings() to updateSettings(), it now returns a promise to indicate completion. It takes the same options object as the constructor.
  • Renamed nbQueued() to queued().
  • Renamed nbRunning to running(), it now returns its result using a promise.
  • Removed isBlocked().
  • Changing the Promise library is now done through the options object like any other limiter setting.
  • Removed changePenalty(), it is now done through the options object like any other limiter setting.
  • Removed changeReservoir(), it is now done through the options object like any other limiter setting.
  • Removed stopAll(). Use the new stop() method.
  • check() now accepts an optional weight argument, and returns its result using a promise.
  • Removed the Group changeTimeout() method. Instead, pass a timeout option when creating a Group.

Version 2 is more user-friendly and powerful.

After upgrading your code, please take a minute to read the Debugging your application chapter.

Contributing

This README is always in need of improvements. If wording can be clearer and simpler, please consider forking this repo and submitting a Pull Request, or simply opening an issue.

Suggestions and bug reports are also welcome.

To work on the Bottleneck code, simply clone the repo, makes your changes to the files located in src/ only, then run ./scripts/build.sh && npm test to ensure that everything is set up correctly.

To speed up compilation time during development, run ./scripts/build.sh dev instead. Make sure to build and test without dev before submitting a PR.

The tests must also pass in Clustering mode and using the ES5 bundle. You'll need a Redis server running locally (latency needs to be minimal to run the tests). If the server isn't using the default hostname and port, you can set those in the .env file. Then run ./scripts/build.sh && npm run test-all.

All contributions are appreciated and will be considered.

Author: SGrondin
Source Code: https://github.com/SGrondin/bottleneck 
License: MIT License

#node #clustering #scheduling 

Bottleneck: Rate Limiter That Makes Throttling Easy
The Data Ways

The Data Ways

1624452813

Automate Birthday Email To Customer Every Year with Google Sheets in 5 minutes

A simple 5-minute setup to automate birthday greeting email to your customer every year without a miss

#googlesheets #automation #coding #scheduling #email

Automate Birthday Email To Customer Every Year with Google Sheets in 5 minutes
Monty  Boehm

Monty Boehm

1624322640

Scheduling: SetTimeout and Setinterval

We may decide to execute a function not right now, but at a later time. That’s called “scheduling a call”.

There are two methods for it:

  • setTimeout allows us to run a function once after the interval of time.
  • setInterval allows us to run a function repeatedly, starting after the interval of time, then repeating continuously at that interval.

These methods are not a part of JavaScript specification. But most environments have an internal scheduler and provide these methods. In particular, they are supported in all browsers and Node.js.

setTimeout

The syntax:

let timerId = setTimeout(func|code, [delay], [arg1], [arg2], ...)

Parameters:

func|code: Function or a string of code to execute.

Usually, that’s a function. For historical reasons, a string of code can be passed, but that’s not recommended.

delay: The delay before running, in milliseconds (1000 ms = 1 second), by default 0.

arg1arg2: Arguments for the function (not supported in IE9-)

#javascript #programming #scheduling

Scheduling: SetTimeout and Setinterval

Advanced Kubernetes Scheduling: Part 1

Introduction

The built-in kubernetes scheduling assigns workloads based a multitude of factors such as resources needs, quality of service etc. which can be provided to Kubernetes scheduler as flags. In addition to these, as a user, you can use certain techniques to affect scheduling decisions. In real-world workloads, there are needs such as:

  • Run a set of pods only on certain nodes, for example running pods with ML workloads on nodes with GPU attached.

  • Always run a set of pods on the same nodes, as an example you might want to run

  • Never run two particular pods together etc.

#kubernetes #scheduling

Advanced Kubernetes Scheduling: Part 1

Kubernetes Scheduling: Node Affinity

This article continues from Part 1 of advanced Kubernetes scheduling. In part 1, we had discussed Taints and tolerations. In this article, we will take a look at other scheduling mechanisms provided by Kubernetes that can help us direct workloads to a particular node or scheduling pods together.

#kubernetes #scheduling #node

Kubernetes Scheduling: Node Affinity
Waylon  Bruen

Waylon Bruen

1622710805

How to Schedule Tasks using Chrono in Golang

In this article, You’ll learn how to schedule tasks and cancel scheduled tasks using Chronowhich is a scheduler library that lets you run your tasks and code periodically.

Scheduling tasks in Golang is quite easy with the help of the Chrono library.

Let’s now jump into the examples and see how easy it is.

Scheduling an One-Shot Task

The **Schedule **method helps us schedule the task to run once at the specified time. In the following example, the task will first be executed 1 second after the current time.

_WithStartTime_option is used to specify the execution time.

taskScheduler := chrono.NewDefaultTaskScheduler()

now := time.Now()

task, err := taskScheduler.Schedule(func(ctx context.Context) {
	log.Print("One-Shot Task")
}, WithStartTime(now.Year(), now.Month(), now.Day(), now.Hour(), now.Minute(), now.Second()+1))

if err == nil {
	log.Print("Task has been scheduled successfully.")
}

#libraries #go #scheduling #golang

How to Schedule Tasks using Chrono in Golang