Gordon  Matlala

Gordon Matlala

1626454860

Detecting Locked Bicycle Stations: An AWS Serverless Story (Part 1)

This series of articles is about me spending way too much time trying to solve a niche problem while learning how to use the AWS Serverless stack (and Kafka, initially).

It’s not supposed to be a tutorial or an exhaustive presentation, but more a story of how I built and hosted a “real” application and the challenges I faced. Hopefully, you will end up with a better understanding of the AWS Serverless stack and what it means to develop with it. Enjoy!

Lockdown and dead stations

Paris, May 2020, end of the first lockdown. I decide to join some of my friends for a picnic. So, like many Parisians, I’m headed toward the Bois de Vincennes, Paris’ largest public park, on the eastern edge of the city.

The weather is perfect for a short bike ride and I quickly rent a Velib, one of the many public bikes that can be rented and returned in any of the 1500 stations scattered across the city. Arriving near the park, I look at the official app to find the nearest station with available slots to return my bike. As I approach, something seems off. I see several people, puzzled, trying to rent or return a bike. I sigh. The station is locked. Dead.

Velib was created in 2007 and quickly proved successful, becoming one of the most used bike sharing platforms in the world. In 2017, after the end of the initial 10 years contract, the city chose a new operator to manage and develop the network. And let’s just say the change was not smooth. Three years later the main problems have been solved but, still, Velib users face two common hurdles: the broken bikes that are not removed from the stations (whoses seats are returned by the disapointed cyclist to kindly warn the next user) and, more rarely, stations that are “locked” for a few hours, without notice on the official app or website.

After finding another station, I join my friends and wash away my misadventure with a cold beer. But I can’t stop thinking about it. It’s not the first time I had to find another station because one was unusable. Why didn’t the official app show the station as dead? Isn’t it easy to detect a station that suddenly stopped renting or returning bikes? How hard could it be with the proper data?

The Velib API

Luckily Velib exposes a public API. The same data can also be found in the Paris OpenData project which incorporates and exposes a lot of public data about Paris (if you ever looked for the dataset of every one of the 200.000 trees in Paris, it’s there!).

#kafka #serverless #dynamodb #aws-lambda

What is GEEK

Buddha Community

Detecting Locked Bicycle Stations: An AWS Serverless Story (Part 1)
Veronica  Roob

Veronica Roob

1653475560

A Pure PHP Implementation Of The MessagePack Serialization Format

msgpack.php

A pure PHP implementation of the MessagePack serialization format.

Features

Installation

The recommended way to install the library is through Composer:

composer require rybakit/msgpack

Usage

Packing

To pack values you can either use an instance of a Packer:

$packer = new Packer();
$packed = $packer->pack($value);

or call a static method on the MessagePack class:

$packed = MessagePack::pack($value);

In the examples above, the method pack automatically packs a value depending on its type. However, not all PHP types can be uniquely translated to MessagePack types. For example, the MessagePack format defines map and array types, which are represented by a single array type in PHP. By default, the packer will pack a PHP array as a MessagePack array if it has sequential numeric keys, starting from 0 and as a MessagePack map otherwise:

$mpArr1 = $packer->pack([1, 2]);               // MP array [1, 2]
$mpArr2 = $packer->pack([0 => 1, 1 => 2]);     // MP array [1, 2]
$mpMap1 = $packer->pack([0 => 1, 2 => 3]);     // MP map {0: 1, 2: 3}
$mpMap2 = $packer->pack([1 => 2, 2 => 3]);     // MP map {1: 2, 2: 3}
$mpMap3 = $packer->pack(['a' => 1, 'b' => 2]); // MP map {a: 1, b: 2}

However, sometimes you need to pack a sequential array as a MessagePack map. To do this, use the packMap method:

$mpMap = $packer->packMap([1, 2]); // {0: 1, 1: 2}

Here is a list of type-specific packing methods:

$packer->packNil();           // MP nil
$packer->packBool(true);      // MP bool
$packer->packInt(42);         // MP int
$packer->packFloat(M_PI);     // MP float (32 or 64)
$packer->packFloat32(M_PI);   // MP float 32
$packer->packFloat64(M_PI);   // MP float 64
$packer->packStr('foo');      // MP str
$packer->packBin("\x80");     // MP bin
$packer->packArray([1, 2]);   // MP array
$packer->packMap(['a' => 1]); // MP map
$packer->packExt(1, "\xaa");  // MP ext

Check the "Custom types" section below on how to pack custom types.

Packing options

The Packer object supports a number of bitmask-based options for fine-tuning the packing process (defaults are in bold):

NameDescription
FORCE_STRForces PHP strings to be packed as MessagePack UTF-8 strings
FORCE_BINForces PHP strings to be packed as MessagePack binary data
DETECT_STR_BINDetects MessagePack str/bin type automatically
  
FORCE_ARRForces PHP arrays to be packed as MessagePack arrays
FORCE_MAPForces PHP arrays to be packed as MessagePack maps
DETECT_ARR_MAPDetects MessagePack array/map type automatically
  
FORCE_FLOAT32Forces PHP floats to be packed as 32-bits MessagePack floats
FORCE_FLOAT64Forces PHP floats to be packed as 64-bits MessagePack floats

The type detection mode (DETECT_STR_BIN/DETECT_ARR_MAP) adds some overhead which can be noticed when you pack large (16- and 32-bit) arrays or strings. However, if you know the value type in advance (for example, you only work with UTF-8 strings or/and associative arrays), you can eliminate this overhead by forcing the packer to use the appropriate type, which will save it from running the auto-detection routine. Another option is to explicitly specify the value type. The library provides 2 auxiliary classes for this, Map and Bin. Check the "Custom types" section below for details.

Examples:

// detect str/bin type and pack PHP 64-bit floats (doubles) to MP 32-bit floats
$packer = new Packer(PackOptions::DETECT_STR_BIN | PackOptions::FORCE_FLOAT32);

// these will throw MessagePack\Exception\InvalidOptionException
$packer = new Packer(PackOptions::FORCE_STR | PackOptions::FORCE_BIN);
$packer = new Packer(PackOptions::FORCE_FLOAT32 | PackOptions::FORCE_FLOAT64);

Unpacking

To unpack data you can either use an instance of a BufferUnpacker:

$unpacker = new BufferUnpacker();

$unpacker->reset($packed);
$value = $unpacker->unpack();

or call a static method on the MessagePack class:

$value = MessagePack::unpack($packed);

If the packed data is received in chunks (e.g. when reading from a stream), use the tryUnpack method, which attempts to unpack data and returns an array of unpacked messages (if any) instead of throwing an InsufficientDataException:

while ($chunk = ...) {
    $unpacker->append($chunk);
    if ($messages = $unpacker->tryUnpack()) {
        return $messages;
    }
}

If you want to unpack from a specific position in a buffer, use seek:

$unpacker->seek(42); // set position equal to 42 bytes
$unpacker->seek(-8); // set position to 8 bytes before the end of the buffer

To skip bytes from the current position, use skip:

$unpacker->skip(10); // set position to 10 bytes ahead of the current position

To get the number of remaining (unread) bytes in the buffer:

$unreadBytesCount = $unpacker->getRemainingCount();

To check whether the buffer has unread data:

$hasUnreadBytes = $unpacker->hasRemaining();

If needed, you can remove already read data from the buffer by calling:

$releasedBytesCount = $unpacker->release();

With the read method you can read raw (packed) data:

$packedData = $unpacker->read(2); // read 2 bytes

Besides the above methods BufferUnpacker provides type-specific unpacking methods, namely:

$unpacker->unpackNil();   // PHP null
$unpacker->unpackBool();  // PHP bool
$unpacker->unpackInt();   // PHP int
$unpacker->unpackFloat(); // PHP float
$unpacker->unpackStr();   // PHP UTF-8 string
$unpacker->unpackBin();   // PHP binary string
$unpacker->unpackArray(); // PHP sequential array
$unpacker->unpackMap();   // PHP associative array
$unpacker->unpackExt();   // PHP MessagePack\Type\Ext object

Unpacking options

The BufferUnpacker object supports a number of bitmask-based options for fine-tuning the unpacking process (defaults are in bold):

NameDescription
BIGINT_AS_STRConverts overflowed integers to strings [1]
BIGINT_AS_GMPConverts overflowed integers to GMP objects [2]
BIGINT_AS_DECConverts overflowed integers to Decimal\Decimal objects [3]

1. The binary MessagePack format has unsigned 64-bit as its largest integer data type, but PHP does not support such integers, which means that an overflow can occur during unpacking.

2. Make sure the GMP extension is enabled.

3. Make sure the Decimal extension is enabled.

Examples:

$packedUint64 = "\xcf"."\xff\xff\xff\xff"."\xff\xff\xff\xff";

$unpacker = new BufferUnpacker($packedUint64);
var_dump($unpacker->unpack()); // string(20) "18446744073709551615"

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_GMP);
var_dump($unpacker->unpack()); // object(GMP) {...}

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_DEC);
var_dump($unpacker->unpack()); // object(Decimal\Decimal) {...}

Custom types

In addition to the basic types, the library provides functionality to serialize and deserialize arbitrary types. This can be done in several ways, depending on your use case. Let's take a look at them.

Type objects

If you need to serialize an instance of one of your classes into one of the basic MessagePack types, the best way to do this is to implement the CanBePacked interface in the class. A good example of such a class is the Map type class that comes with the library. This type is useful when you want to explicitly specify that a given PHP array should be packed as a MessagePack map without triggering an automatic type detection routine:

$packer = new Packer();

$packedMap = $packer->pack(new Map([1, 2, 3]));
$packedArray = $packer->pack([1, 2, 3]);

More type examples can be found in the src/Type directory.

Type transformers

As with type objects, type transformers are only responsible for serializing values. They should be used when you need to serialize a value that does not implement the CanBePacked interface. Examples of such values could be instances of built-in or third-party classes that you don't own, or non-objects such as resources.

A transformer class must implement the CanPack interface. To use a transformer, it must first be registered in the packer. Here is an example of how to serialize PHP streams into the MessagePack bin format type using one of the supplied transformers, StreamTransformer:

$packer = new Packer(null, [new StreamTransformer()]);

$packedBin = $packer->pack(fopen('/path/to/file', 'r+'));

More type transformer examples can be found in the src/TypeTransformer directory.

Extensions

In contrast to the cases described above, extensions are intended to handle extension types and are responsible for both serialization and deserialization of values (types).

An extension class must implement the Extension interface. To use an extension, it must first be registered in the packer and the unpacker.

The MessagePack specification divides extension types into two groups: predefined and application-specific. Currently, there is only one predefined type in the specification, Timestamp.

Timestamp

The Timestamp extension type is a predefined type. Support for this type in the library is done through the TimestampExtension class. This class is responsible for handling Timestamp objects, which represent the number of seconds and optional adjustment in nanoseconds:

$timestampExtension = new TimestampExtension();

$packer = new Packer();
$packer = $packer->extendWith($timestampExtension);

$unpacker = new BufferUnpacker();
$unpacker = $unpacker->extendWith($timestampExtension);

$packedTimestamp = $packer->pack(Timestamp::now());
$timestamp = $unpacker->reset($packedTimestamp)->unpack();

$seconds = $timestamp->getSeconds();
$nanoseconds = $timestamp->getNanoseconds();

When using the MessagePack class, the Timestamp extension is already registered:

$packedTimestamp = MessagePack::pack(Timestamp::now());
$timestamp = MessagePack::unpack($packedTimestamp);

Application-specific extensions

In addition, the format can be extended with your own types. For example, to make the built-in PHP DateTime objects first-class citizens in your code, you can create a corresponding extension, as shown in the example. Please note, that custom extensions have to be registered with a unique extension ID (an integer from 0 to 127).

More extension examples can be found in the examples/MessagePack directory.

To learn more about how extension types can be useful, check out this article.

Exceptions

If an error occurs during packing/unpacking, a PackingFailedException or an UnpackingFailedException will be thrown, respectively. In addition, an InsufficientDataException can be thrown during unpacking.

An InvalidOptionException will be thrown in case an invalid option (or a combination of mutually exclusive options) is used.

Tests

Run tests as follows:

vendor/bin/phpunit

Also, if you already have Docker installed, you can run the tests in a docker container. First, create a container:

./dockerfile.sh | docker build -t msgpack -

The command above will create a container named msgpack with PHP 8.1 runtime. You may change the default runtime by defining the PHP_IMAGE environment variable:

PHP_IMAGE='php:8.0-cli' ./dockerfile.sh | docker build -t msgpack -

See a list of various images here.

Then run the unit tests:

docker run --rm -v $PWD:/msgpack -w /msgpack msgpack

Fuzzing

To ensure that the unpacking works correctly with malformed/semi-malformed data, you can use a testing technique called Fuzzing. The library ships with a help file (target) for PHP-Fuzzer and can be used as follows:

php-fuzzer fuzz tests/fuzz_buffer_unpacker.php

Performance

To check performance, run:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0030 ........ 0.0139
false ................ 0.0037 ........ 0.0144
true ................. 0.0040 ........ 0.0137
7-bit uint #1 ........ 0.0052 ........ 0.0120
7-bit uint #2 ........ 0.0059 ........ 0.0114
7-bit uint #3 ........ 0.0061 ........ 0.0119
5-bit sint #1 ........ 0.0067 ........ 0.0126
5-bit sint #2 ........ 0.0064 ........ 0.0132
5-bit sint #3 ........ 0.0066 ........ 0.0135
8-bit uint #1 ........ 0.0078 ........ 0.0200
8-bit uint #2 ........ 0.0077 ........ 0.0212
8-bit uint #3 ........ 0.0086 ........ 0.0203
16-bit uint #1 ....... 0.0111 ........ 0.0271
16-bit uint #2 ....... 0.0115 ........ 0.0260
16-bit uint #3 ....... 0.0103 ........ 0.0273
32-bit uint #1 ....... 0.0116 ........ 0.0326
32-bit uint #2 ....... 0.0118 ........ 0.0332
32-bit uint #3 ....... 0.0127 ........ 0.0325
64-bit uint #1 ....... 0.0140 ........ 0.0277
64-bit uint #2 ....... 0.0134 ........ 0.0294
64-bit uint #3 ....... 0.0134 ........ 0.0281
8-bit int #1 ......... 0.0086 ........ 0.0241
8-bit int #2 ......... 0.0089 ........ 0.0225
8-bit int #3 ......... 0.0085 ........ 0.0229
16-bit int #1 ........ 0.0118 ........ 0.0280
16-bit int #2 ........ 0.0121 ........ 0.0270
16-bit int #3 ........ 0.0109 ........ 0.0274
32-bit int #1 ........ 0.0128 ........ 0.0346
32-bit int #2 ........ 0.0118 ........ 0.0339
32-bit int #3 ........ 0.0135 ........ 0.0368
64-bit int #1 ........ 0.0138 ........ 0.0276
64-bit int #2 ........ 0.0132 ........ 0.0286
64-bit int #3 ........ 0.0137 ........ 0.0274
64-bit int #4 ........ 0.0180 ........ 0.0285
64-bit float #1 ...... 0.0134 ........ 0.0284
64-bit float #2 ...... 0.0125 ........ 0.0275
64-bit float #3 ...... 0.0126 ........ 0.0283
fix string #1 ........ 0.0035 ........ 0.0133
fix string #2 ........ 0.0094 ........ 0.0216
fix string #3 ........ 0.0094 ........ 0.0222
fix string #4 ........ 0.0091 ........ 0.0241
8-bit string #1 ...... 0.0122 ........ 0.0301
8-bit string #2 ...... 0.0118 ........ 0.0304
8-bit string #3 ...... 0.0119 ........ 0.0315
16-bit string #1 ..... 0.0150 ........ 0.0388
16-bit string #2 ..... 0.1545 ........ 0.1665
32-bit string ........ 0.1570 ........ 0.1756
wide char string #1 .. 0.0091 ........ 0.0236
wide char string #2 .. 0.0122 ........ 0.0313
8-bit binary #1 ...... 0.0100 ........ 0.0302
8-bit binary #2 ...... 0.0123 ........ 0.0324
8-bit binary #3 ...... 0.0126 ........ 0.0327
16-bit binary ........ 0.0168 ........ 0.0372
32-bit binary ........ 0.1588 ........ 0.1754
fix array #1 ......... 0.0042 ........ 0.0131
fix array #2 ......... 0.0294 ........ 0.0367
fix array #3 ......... 0.0412 ........ 0.0472
16-bit array #1 ...... 0.1378 ........ 0.1596
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.1865 ........ 0.2283
fix map #1 ........... 0.0725 ........ 0.1048
fix map #2 ........... 0.0319 ........ 0.0405
fix map #3 ........... 0.0356 ........ 0.0665
fix map #4 ........... 0.0465 ........ 0.0497
16-bit map #1 ........ 0.2540 ........ 0.3028
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.2372 ........ 0.2710
fixext 1 ............. 0.0283 ........ 0.0358
fixext 2 ............. 0.0291 ........ 0.0371
fixext 4 ............. 0.0302 ........ 0.0355
fixext 8 ............. 0.0288 ........ 0.0384
fixext 16 ............ 0.0293 ........ 0.0359
8-bit ext ............ 0.0302 ........ 0.0439
16-bit ext ........... 0.0334 ........ 0.0499
32-bit ext ........... 0.1845 ........ 0.1888
32-bit timestamp #1 .. 0.0337 ........ 0.0547
32-bit timestamp #2 .. 0.0335 ........ 0.0560
64-bit timestamp #1 .. 0.0371 ........ 0.0575
64-bit timestamp #2 .. 0.0374 ........ 0.0542
64-bit timestamp #3 .. 0.0356 ........ 0.0533
96-bit timestamp #1 .. 0.0362 ........ 0.0699
96-bit timestamp #2 .. 0.0381 ........ 0.0701
96-bit timestamp #3 .. 0.0367 ........ 0.0687
=============================================
Total                  2.7618          4.0820
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

With JIT:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0005 ........ 0.0054
false ................ 0.0004 ........ 0.0059
true ................. 0.0004 ........ 0.0059
7-bit uint #1 ........ 0.0010 ........ 0.0047
7-bit uint #2 ........ 0.0010 ........ 0.0046
7-bit uint #3 ........ 0.0010 ........ 0.0046
5-bit sint #1 ........ 0.0025 ........ 0.0046
5-bit sint #2 ........ 0.0023 ........ 0.0046
5-bit sint #3 ........ 0.0024 ........ 0.0045
8-bit uint #1 ........ 0.0043 ........ 0.0081
8-bit uint #2 ........ 0.0043 ........ 0.0079
8-bit uint #3 ........ 0.0041 ........ 0.0080
16-bit uint #1 ....... 0.0064 ........ 0.0095
16-bit uint #2 ....... 0.0064 ........ 0.0091
16-bit uint #3 ....... 0.0064 ........ 0.0094
32-bit uint #1 ....... 0.0085 ........ 0.0114
32-bit uint #2 ....... 0.0077 ........ 0.0122
32-bit uint #3 ....... 0.0077 ........ 0.0120
64-bit uint #1 ....... 0.0085 ........ 0.0159
64-bit uint #2 ....... 0.0086 ........ 0.0157
64-bit uint #3 ....... 0.0086 ........ 0.0158
8-bit int #1 ......... 0.0042 ........ 0.0080
8-bit int #2 ......... 0.0042 ........ 0.0080
8-bit int #3 ......... 0.0042 ........ 0.0081
16-bit int #1 ........ 0.0065 ........ 0.0095
16-bit int #2 ........ 0.0065 ........ 0.0090
16-bit int #3 ........ 0.0056 ........ 0.0085
32-bit int #1 ........ 0.0067 ........ 0.0107
32-bit int #2 ........ 0.0066 ........ 0.0106
32-bit int #3 ........ 0.0063 ........ 0.0104
64-bit int #1 ........ 0.0072 ........ 0.0162
64-bit int #2 ........ 0.0073 ........ 0.0174
64-bit int #3 ........ 0.0072 ........ 0.0164
64-bit int #4 ........ 0.0077 ........ 0.0161
64-bit float #1 ...... 0.0053 ........ 0.0135
64-bit float #2 ...... 0.0053 ........ 0.0135
64-bit float #3 ...... 0.0052 ........ 0.0135
fix string #1 ....... -0.0002 ........ 0.0044
fix string #2 ........ 0.0035 ........ 0.0067
fix string #3 ........ 0.0035 ........ 0.0077
fix string #4 ........ 0.0033 ........ 0.0078
8-bit string #1 ...... 0.0059 ........ 0.0110
8-bit string #2 ...... 0.0063 ........ 0.0121
8-bit string #3 ...... 0.0064 ........ 0.0124
16-bit string #1 ..... 0.0099 ........ 0.0146
16-bit string #2 ..... 0.1522 ........ 0.1474
32-bit string ........ 0.1511 ........ 0.1483
wide char string #1 .. 0.0039 ........ 0.0084
wide char string #2 .. 0.0073 ........ 0.0123
8-bit binary #1 ...... 0.0040 ........ 0.0112
8-bit binary #2 ...... 0.0075 ........ 0.0123
8-bit binary #3 ...... 0.0077 ........ 0.0129
16-bit binary ........ 0.0096 ........ 0.0145
32-bit binary ........ 0.1535 ........ 0.1479
fix array #1 ......... 0.0008 ........ 0.0061
fix array #2 ......... 0.0121 ........ 0.0165
fix array #3 ......... 0.0193 ........ 0.0222
16-bit array #1 ...... 0.0607 ........ 0.0479
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.0749 ........ 0.0824
fix map #1 ........... 0.0329 ........ 0.0431
fix map #2 ........... 0.0161 ........ 0.0189
fix map #3 ........... 0.0205 ........ 0.0262
fix map #4 ........... 0.0252 ........ 0.0205
16-bit map #1 ........ 0.1016 ........ 0.0927
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.1096 ........ 0.1030
fixext 1 ............. 0.0157 ........ 0.0161
fixext 2 ............. 0.0175 ........ 0.0183
fixext 4 ............. 0.0156 ........ 0.0185
fixext 8 ............. 0.0163 ........ 0.0184
fixext 16 ............ 0.0164 ........ 0.0182
8-bit ext ............ 0.0158 ........ 0.0207
16-bit ext ........... 0.0203 ........ 0.0219
32-bit ext ........... 0.1614 ........ 0.1539
32-bit timestamp #1 .. 0.0195 ........ 0.0249
32-bit timestamp #2 .. 0.0188 ........ 0.0260
64-bit timestamp #1 .. 0.0207 ........ 0.0281
64-bit timestamp #2 .. 0.0212 ........ 0.0291
64-bit timestamp #3 .. 0.0207 ........ 0.0295
96-bit timestamp #1 .. 0.0222 ........ 0.0358
96-bit timestamp #2 .. 0.0228 ........ 0.0353
96-bit timestamp #3 .. 0.0210 ........ 0.0319
=============================================
Total                  1.6432          1.9674
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

You may change default benchmark settings by defining the following environment variables:

NameDefault
MP_BENCH_TARGETSpure_p,pure_u, see a list of available targets
MP_BENCH_ITERATIONS100_000
MP_BENCH_DURATIONnot set
MP_BENCH_ROUNDS3
MP_BENCH_TESTS-@slow, see a list of available tests

For example:

export MP_BENCH_TARGETS=pure_p
export MP_BENCH_ITERATIONS=1000000
export MP_BENCH_ROUNDS=5
# a comma separated list of test names
export MP_BENCH_TESTS='complex array, complex map'
# or a group name
# export MP_BENCH_TESTS='-@slow' // @pecl_comp
# or a regexp
# export MP_BENCH_TESTS='/complex (array|map)/'

Another example, benchmarking both the library and the PECL extension:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0031 ........ 0.0141 ...... 0.0055 ........ 0.0064
false ................ 0.0039 ........ 0.0154 ...... 0.0056 ........ 0.0053
true ................. 0.0038 ........ 0.0139 ...... 0.0056 ........ 0.0044
7-bit uint #1 ........ 0.0061 ........ 0.0110 ...... 0.0059 ........ 0.0046
7-bit uint #2 ........ 0.0065 ........ 0.0119 ...... 0.0042 ........ 0.0029
7-bit uint #3 ........ 0.0054 ........ 0.0117 ...... 0.0045 ........ 0.0025
5-bit sint #1 ........ 0.0047 ........ 0.0103 ...... 0.0038 ........ 0.0022
5-bit sint #2 ........ 0.0048 ........ 0.0117 ...... 0.0038 ........ 0.0022
5-bit sint #3 ........ 0.0046 ........ 0.0102 ...... 0.0038 ........ 0.0023
8-bit uint #1 ........ 0.0063 ........ 0.0174 ...... 0.0039 ........ 0.0031
8-bit uint #2 ........ 0.0063 ........ 0.0167 ...... 0.0040 ........ 0.0029
8-bit uint #3 ........ 0.0063 ........ 0.0168 ...... 0.0039 ........ 0.0030
16-bit uint #1 ....... 0.0092 ........ 0.0222 ...... 0.0049 ........ 0.0030
16-bit uint #2 ....... 0.0096 ........ 0.0227 ...... 0.0042 ........ 0.0046
16-bit uint #3 ....... 0.0123 ........ 0.0274 ...... 0.0059 ........ 0.0051
32-bit uint #1 ....... 0.0136 ........ 0.0331 ...... 0.0060 ........ 0.0048
32-bit uint #2 ....... 0.0130 ........ 0.0336 ...... 0.0070 ........ 0.0048
32-bit uint #3 ....... 0.0127 ........ 0.0329 ...... 0.0051 ........ 0.0048
64-bit uint #1 ....... 0.0126 ........ 0.0268 ...... 0.0055 ........ 0.0049
64-bit uint #2 ....... 0.0135 ........ 0.0281 ...... 0.0052 ........ 0.0046
64-bit uint #3 ....... 0.0131 ........ 0.0274 ...... 0.0069 ........ 0.0044
8-bit int #1 ......... 0.0077 ........ 0.0236 ...... 0.0058 ........ 0.0044
8-bit int #2 ......... 0.0087 ........ 0.0244 ...... 0.0058 ........ 0.0048
8-bit int #3 ......... 0.0084 ........ 0.0241 ...... 0.0055 ........ 0.0049
16-bit int #1 ........ 0.0112 ........ 0.0271 ...... 0.0048 ........ 0.0045
16-bit int #2 ........ 0.0124 ........ 0.0292 ...... 0.0057 ........ 0.0049
16-bit int #3 ........ 0.0118 ........ 0.0270 ...... 0.0058 ........ 0.0050
32-bit int #1 ........ 0.0137 ........ 0.0366 ...... 0.0058 ........ 0.0051
32-bit int #2 ........ 0.0133 ........ 0.0366 ...... 0.0056 ........ 0.0049
32-bit int #3 ........ 0.0129 ........ 0.0350 ...... 0.0052 ........ 0.0048
64-bit int #1 ........ 0.0145 ........ 0.0254 ...... 0.0034 ........ 0.0025
64-bit int #2 ........ 0.0097 ........ 0.0214 ...... 0.0034 ........ 0.0025
64-bit int #3 ........ 0.0096 ........ 0.0287 ...... 0.0059 ........ 0.0050
64-bit int #4 ........ 0.0143 ........ 0.0277 ...... 0.0059 ........ 0.0046
64-bit float #1 ...... 0.0134 ........ 0.0281 ...... 0.0057 ........ 0.0052
64-bit float #2 ...... 0.0141 ........ 0.0281 ...... 0.0057 ........ 0.0050
64-bit float #3 ...... 0.0144 ........ 0.0282 ...... 0.0057 ........ 0.0050
fix string #1 ........ 0.0036 ........ 0.0143 ...... 0.0066 ........ 0.0053
fix string #2 ........ 0.0107 ........ 0.0222 ...... 0.0065 ........ 0.0068
fix string #3 ........ 0.0116 ........ 0.0245 ...... 0.0063 ........ 0.0069
fix string #4 ........ 0.0105 ........ 0.0253 ...... 0.0083 ........ 0.0077
8-bit string #1 ...... 0.0126 ........ 0.0318 ...... 0.0075 ........ 0.0088
8-bit string #2 ...... 0.0121 ........ 0.0295 ...... 0.0076 ........ 0.0086
8-bit string #3 ...... 0.0125 ........ 0.0293 ...... 0.0130 ........ 0.0093
16-bit string #1 ..... 0.0159 ........ 0.0368 ...... 0.0117 ........ 0.0086
16-bit string #2 ..... 0.1547 ........ 0.1686 ...... 0.1516 ........ 0.1373
32-bit string ........ 0.1558 ........ 0.1729 ...... 0.1511 ........ 0.1396
wide char string #1 .. 0.0098 ........ 0.0237 ...... 0.0066 ........ 0.0065
wide char string #2 .. 0.0128 ........ 0.0291 ...... 0.0061 ........ 0.0082
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0040 ........ 0.0129 ...... 0.0120 ........ 0.0058
fix array #2 ......... 0.0279 ........ 0.0390 ...... 0.0143 ........ 0.0165
fix array #3 ......... 0.0415 ........ 0.0463 ...... 0.0162 ........ 0.0187
16-bit array #1 ...... 0.1349 ........ 0.1628 ...... 0.0334 ........ 0.0341
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0345 ........ 0.0391 ...... 0.0143 ........ 0.0168
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0459 ........ 0.0473 ...... 0.0151 ........ 0.0163
16-bit map #1 ........ 0.2518 ........ 0.2962 ...... 0.0400 ........ 0.0490
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.2380 ........ 0.2682 ...... 0.0545 ........ 0.0579
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  1.5625          2.3866        0.7735          0.7243
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

With JIT:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0001 ........ 0.0052 ...... 0.0053 ........ 0.0042
false ................ 0.0007 ........ 0.0060 ...... 0.0057 ........ 0.0043
true ................. 0.0008 ........ 0.0060 ...... 0.0056 ........ 0.0041
7-bit uint #1 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0041
7-bit uint #2 ........ 0.0021 ........ 0.0043 ...... 0.0062 ........ 0.0041
7-bit uint #3 ........ 0.0022 ........ 0.0044 ...... 0.0061 ........ 0.0040
5-bit sint #1 ........ 0.0030 ........ 0.0048 ...... 0.0062 ........ 0.0040
5-bit sint #2 ........ 0.0032 ........ 0.0046 ...... 0.0062 ........ 0.0040
5-bit sint #3 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0040
8-bit uint #1 ........ 0.0054 ........ 0.0079 ...... 0.0062 ........ 0.0050
8-bit uint #2 ........ 0.0051 ........ 0.0079 ...... 0.0064 ........ 0.0044
8-bit uint #3 ........ 0.0051 ........ 0.0082 ...... 0.0062 ........ 0.0044
16-bit uint #1 ....... 0.0077 ........ 0.0094 ...... 0.0065 ........ 0.0045
16-bit uint #2 ....... 0.0077 ........ 0.0094 ...... 0.0063 ........ 0.0045
16-bit uint #3 ....... 0.0077 ........ 0.0095 ...... 0.0064 ........ 0.0047
32-bit uint #1 ....... 0.0088 ........ 0.0119 ...... 0.0063 ........ 0.0043
32-bit uint #2 ....... 0.0089 ........ 0.0117 ...... 0.0062 ........ 0.0039
32-bit uint #3 ....... 0.0089 ........ 0.0118 ...... 0.0063 ........ 0.0044
64-bit uint #1 ....... 0.0097 ........ 0.0155 ...... 0.0063 ........ 0.0045
64-bit uint #2 ....... 0.0095 ........ 0.0153 ...... 0.0061 ........ 0.0045
64-bit uint #3 ....... 0.0096 ........ 0.0156 ...... 0.0063 ........ 0.0047
8-bit int #1 ......... 0.0053 ........ 0.0083 ...... 0.0062 ........ 0.0044
8-bit int #2 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0044
8-bit int #3 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0043
16-bit int #1 ........ 0.0089 ........ 0.0097 ...... 0.0069 ........ 0.0046
16-bit int #2 ........ 0.0075 ........ 0.0093 ...... 0.0063 ........ 0.0043
16-bit int #3 ........ 0.0075 ........ 0.0094 ...... 0.0062 ........ 0.0046
32-bit int #1 ........ 0.0086 ........ 0.0122 ...... 0.0063 ........ 0.0044
32-bit int #2 ........ 0.0087 ........ 0.0120 ...... 0.0066 ........ 0.0046
32-bit int #3 ........ 0.0086 ........ 0.0121 ...... 0.0060 ........ 0.0044
64-bit int #1 ........ 0.0096 ........ 0.0149 ...... 0.0060 ........ 0.0045
64-bit int #2 ........ 0.0096 ........ 0.0157 ...... 0.0062 ........ 0.0044
64-bit int #3 ........ 0.0096 ........ 0.0160 ...... 0.0063 ........ 0.0046
64-bit int #4 ........ 0.0097 ........ 0.0157 ...... 0.0061 ........ 0.0044
64-bit float #1 ...... 0.0079 ........ 0.0153 ...... 0.0056 ........ 0.0044
64-bit float #2 ...... 0.0079 ........ 0.0152 ...... 0.0057 ........ 0.0045
64-bit float #3 ...... 0.0079 ........ 0.0155 ...... 0.0057 ........ 0.0044
fix string #1 ........ 0.0010 ........ 0.0045 ...... 0.0071 ........ 0.0044
fix string #2 ........ 0.0048 ........ 0.0075 ...... 0.0070 ........ 0.0060
fix string #3 ........ 0.0048 ........ 0.0086 ...... 0.0068 ........ 0.0060
fix string #4 ........ 0.0050 ........ 0.0088 ...... 0.0070 ........ 0.0059
8-bit string #1 ...... 0.0081 ........ 0.0129 ...... 0.0069 ........ 0.0062
8-bit string #2 ...... 0.0086 ........ 0.0128 ...... 0.0069 ........ 0.0065
8-bit string #3 ...... 0.0086 ........ 0.0126 ...... 0.0115 ........ 0.0065
16-bit string #1 ..... 0.0105 ........ 0.0137 ...... 0.0128 ........ 0.0068
16-bit string #2 ..... 0.1510 ........ 0.1486 ...... 0.1526 ........ 0.1391
32-bit string ........ 0.1517 ........ 0.1475 ...... 0.1504 ........ 0.1370
wide char string #1 .. 0.0044 ........ 0.0085 ...... 0.0067 ........ 0.0057
wide char string #2 .. 0.0081 ........ 0.0125 ...... 0.0069 ........ 0.0063
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0014 ........ 0.0059 ...... 0.0132 ........ 0.0055
fix array #2 ......... 0.0146 ........ 0.0156 ...... 0.0155 ........ 0.0148
fix array #3 ......... 0.0211 ........ 0.0229 ...... 0.0179 ........ 0.0180
16-bit array #1 ...... 0.0673 ........ 0.0498 ...... 0.0343 ........ 0.0388
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0148 ........ 0.0180 ...... 0.0156 ........ 0.0179
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0252 ........ 0.0201 ...... 0.0214 ........ 0.0167
16-bit map #1 ........ 0.1027 ........ 0.0836 ...... 0.0388 ........ 0.0510
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.1104 ........ 0.1010 ...... 0.0556 ........ 0.0602
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  0.9642          1.0909        0.8224          0.7213
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

Note that the msgpack extension (v2.1.2) doesn't support ext, bin and UTF-8 str types.

License

The library is released under the MIT License. See the bundled LICENSE file for details.

Author: rybakit
Source Code: https://github.com/rybakit/msgpack.php
License: MIT License

#php 

Treebender: A Symbolic Natural Language Parsing Library for Rust

Treebender

A symbolic natural language parsing library for Rust, inspired by HDPSG.

What is this?

This is a library for parsing natural or constructed languages into syntax trees and feature structures. There's no machine learning or probabilistic models, everything is hand-crafted and deterministic.

You can find out more about the motivations of this project in this blog post.

But what are you using it for?

I'm using this to parse a constructed language for my upcoming xenolinguistics game, Themengi.

Motivation

Using a simple 80-line grammar, introduced in the tutorial below, we can parse a simple subset of English, checking reflexive pronoun binding, case, and number agreement.

$ cargo run --bin cli examples/reflexives.fgr
> she likes himself
Parsed 0 trees

> her likes herself
Parsed 0 trees

> she like herself
Parsed 0 trees

> she likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-2: [
    case: acc
    pron: ref
    needs_pron: #0 she
    num: sg
    child-0: [ word: herself ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1 sg
  ]
  child-0: [
    child-0: [ word: she ]
    case: nom
    pron: #0
    num: #1
  ]
]

Low resource language? Low problem! No need to train on gigabytes of text, just write a grammar using your brain. Let's hypothesize that in American Sign Language, topicalized nouns (expressed with raised eyebrows) must appear first in the sentence. We can write a small grammar (18 lines), and plug in some sentences:

$ cargo run --bin cli examples/asl-wordorder.fgr -n
> boy sit
Parsed 1 tree
(0..2: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: IV (1..2: sit)))

> boy throw ball
Parsed 1 tree
(0..3: S
  (0..1: NP ((0..1: N (0..1: boy))))
  (1..2: TV (1..2: throw))
  (2..3: NP ((2..3: N (2..3: ball)))))

> ball nm-raised-eyebrows boy throw
Parsed 1 tree
(0..4: S
  (0..2: NP
    (0..1: N (0..1: ball))
    (1..2: Topic (1..2: nm-raised-eyebrows)))
  (2..3: NP ((2..3: N (2..3: boy))))
  (3..4: TV (3..4: throw)))

> boy throw ball nm-raised-eyebrows
Parsed 0 trees

Tutorial

As an example, let's say we want to build a parser for English reflexive pronouns (himself, herself, themselves, themself, itself). We'll also support number ("He likes X" v.s. "They like X") and simple embedded clauses ("He said that they like X").

Grammar files are written in a custom language, similar to BNF, called Feature GRammar (.fgr). There's a VSCode syntax highlighting extension for these files available as fgr-syntax.

We'll start by defining our lexicon. The lexicon is the set of terminal symbols (symbols in the actual input) that the grammar will match. Terminal symbols must start with a lowercase letter, and non-terminal symbols must start with an uppercase letter.

// pronouns
N -> he
N -> him
N -> himself
N -> she
N -> her
N -> herself
N -> they
N -> them
N -> themselves
N -> themself

// names, lowercase as they are terminals
N -> mary
N -> sue
N -> takeshi
N -> robert

// complementizer
Comp -> that

// verbs -- intransitive, transitive, and clausal
IV -> falls
IV -> fall
IV -> fell

TV -> likes
TV -> like
TV -> liked

CV -> says
CV -> say
CV -> said

Next, we can add our sentence rules (they must be added at the top, as the first rule in the file is assumed to be the top-level rule):

// sentence rules
S -> N IV
S -> N TV N
S -> N CV Comp S

// ... previous lexicon ...

Assuming this file is saved as examples/no-features.fgr (which it is :wink:), we can test this file with the built-in CLI:

$ cargo run --bin cli examples/no-features.fgr
> he falls
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: falls)))
[
  child-1: [ child-0: [ word: falls ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he falls her
Parsed 0 trees

> he likes her
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: likes ] ]
  child-0: [ child-0: [ word: he ] ]
]

> he likes
Parsed 0 trees

> he said that he likes her
Parsed 1 tree
(0..6: S
  (0..1: N (0..1: he))
  (1..2: CV (1..2: said))
  (2..3: Comp (2..3: that))
  (3..6: S
    (3..4: N (3..4: he))
    (4..5: TV (4..5: likes))
    (5..6: N (5..6: her))))
[
  child-0: [ child-0: [ word: he ] ]
  child-2: [ child-0: [ word: that ] ]
  child-1: [ child-0: [ word: said ] ]
  child-3: [
    child-2: [ child-0: [ word: her ] ]
    child-1: [ child-0: [ word: likes ] ]
    child-0: [ child-0: [ word: he ] ]
  ]
]

> he said that he
Parsed 0 trees

This grammar already parses some correct sentences, and blocks some trivially incorrect ones. However, it doesn't care about number, case, or reflexives right now:

> she likes himself  // unbound reflexive pronoun
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: she))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-0: [ child-0: [ word: she ] ]
  child-2: [ child-0: [ word: himself ] ]
  child-1: [ child-0: [ word: likes ] ]
]

> him like her  // incorrect case on the subject pronoun, should be nominative
                // (he) instead of accusative (him)
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: him))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-0: [ child-0: [ word: him ] ]
  child-1: [ child-0: [ word: like ] ]
  child-2: [ child-0: [ word: her ] ]
]

> he like her  // incorrect verb number agreement
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: like))
  (2..3: N (2..3: her)))
[
  child-2: [ child-0: [ word: her ] ]
  child-1: [ child-0: [ word: like ] ]
  child-0: [ child-0: [ word: he ] ]
]

To fix this, we need to add features to our lexicon, and restrict the sentence rules based on features.

Features are added with square brackets, and are key: value pairs separated by commas. **top** is a special feature value, which basically means "unspecified" -- we'll come back to it later. Features that are unspecified are also assumed to have a **top** value, but sometimes explicitly stating top is more clear.

/// Pronouns
// The added features are:
// * num: sg or pl, whether this noun wants a singular verb (likes) or
//   a plural verb (like). note this is grammatical number, so for example
//   singular they takes plural agreement ("they like X", not *"they likes X")
// * case: nom or acc, whether this noun is nominative or accusative case.
//   nominative case goes in the subject, and accusative in the object.
//   e.g., "he fell" and "she likes him", not *"him fell" and *"her likes he"
// * pron: he, she, they, or ref -- what type of pronoun this is
// * needs_pron: whether this is a reflexive that needs to bind to another
//   pronoun.
N[ num: sg, case: nom, pron: he ]                    -> he
N[ num: sg, case: acc, pron: he ]                    -> him
N[ num: sg, case: acc, pron: ref, needs_pron: he ]   -> himself
N[ num: sg, case: nom, pron: she ]                   -> she
N[ num: sg, case: acc, pron: she ]                   -> her
N[ num: sg, case: acc, pron: ref, needs_pron: she]   -> herself
N[ num: pl, case: nom, pron: they ]                  -> they
N[ num: pl, case: acc, pron: they ]                  -> them
N[ num: pl, case: acc, pron: ref, needs_pron: they ] -> themselves
N[ num: sg, case: acc, pron: ref, needs_pron: they ] -> themself

// Names
// The added features are:
// * num: sg, as people are singular ("mary likes her" / *"mary like her")
// * case: **top**, as names can be both subjects and objects
//   ("mary likes her" / "she likes mary")
// * pron: whichever pronoun the person uses for reflexive agreement
//   mary    pron: she  => mary likes herself
//   sue     pron: they => sue likes themself
//   takeshi pron: he   => takeshi likes himself
N[ num: sg, case: **top**, pron: she ]  -> mary
N[ num: sg, case: **top**, pron: they ] -> sue
N[ num: sg, case: **top**, pron: he ]   -> takeshi
N[ num: sg, case: **top**, pron: he ]   -> robert

// Complementizer doesn't need features
Comp -> that

// Verbs -- intransitive, transitive, and clausal
// The added features are:
// * num: sg, pl, or **top** -- to match the noun numbers.
//   **top** will match either sg or pl, as past-tense verbs in English
//   don't agree in number: "he fell" and "they fell" are both fine
// * tense: past or nonpast -- this won't be used for agreement, but will be
//   copied into the final feature structure, and the client code could do
//   something with it
IV[ num:      sg, tense: nonpast ] -> falls
IV[ num:      pl, tense: nonpast ] -> fall
IV[ num: **top**, tense: past ]    -> fell

TV[ num:      sg, tense: nonpast ] -> likes
TV[ num:      pl, tense: nonpast ] -> like
TV[ num: **top**, tense: past ]    -> liked

CV[ num:      sg, tense: nonpast ] -> says
CV[ num:      pl, tense: nonpast ] -> say
CV[ num: **top**, tense: past ]    -> said

Now that our lexicon is updated with features, we can update our sentence rules to constrain parsing based on those features. This uses two new features, tags and unification. Tags allow features to be associated between nodes in a rule, and unification controls how those features are compatible. The rules for unification are:

  1. A string feature can unify with a string feature with the same value
  2. A top feature can unify with anything, and the nodes are merged
  3. A complex feature ([ ... ] structure) is recursively unified with another complex feature.

If unification fails anywhere, the parse is aborted and the tree is discarded. This allows the programmer to discard trees if features don't match.

// Sentence rules
// Intransitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
S -> N[ case: nom, num: #1 ] IV[ num: #1 ]
// Transitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #2)
// * If there's a reflexive in the object position, make sure its `needs_pron`
//   feature matches the subject's `pron` feature. If the object isn't a
//   reflexive, then its `needs_pron` feature will implicitly be `**top**`, so
//   will unify with anything.
S -> N[ case: nom, pron: #1, num: #2 ] TV[ num: #2 ] N[ case: acc, needs_pron: #1 ]
// Clausal verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
// * Reflexives can't cross clause boundaries (*"He said that she likes himself"),
//   so we can ignore reflexives and delegate to inner clause rule
S -> N[ case: nom, num: #1 ] CV[ num: #1 ] Comp S

Now that we have this augmented grammar (available as examples/reflexives.fgr), we can try it out and see that it rejects illicit sentences that were previously accepted, while still accepting valid ones:

> he fell
Parsed 1 tree
(0..2: S
  (0..1: N (0..1: he))
  (1..2: IV (1..2: fell)))
[
  child-1: [
    child-0: [ word: fell ]
    num: #0 sg
    tense: past
  ]
  child-0: [
    pron: he
    case: nom
    num: #0
    child-0: [ word: he ]
  ]
]

> he like him
Parsed 0 trees

> he likes himself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: he))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: himself)))
[
  child-1: [
    num: #0 sg
    child-0: [ word: likes ]
    tense: nonpast
  ]
  child-2: [
    needs_pron: #1 he
    num: sg
    child-0: [ word: himself ]
    pron: ref
    case: acc
  ]
  child-0: [
    child-0: [ word: he ]
    pron: #1
    num: #0
    case: nom
  ]
]

> he likes herself
Parsed 0 trees

> mary likes herself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: mary))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: herself)))
[
  child-0: [
    pron: #0 she
    num: #1 sg
    case: nom
    child-0: [ word: mary ]
  ]
  child-1: [
    tense: nonpast
    child-0: [ word: likes ]
    num: #1
  ]
  child-2: [
    child-0: [ word: herself ]
    num: sg
    pron: ref
    case: acc
    needs_pron: #0
  ]
]

> mary likes themself
Parsed 0 trees

> sue likes themself
Parsed 1 tree
(0..3: S
  (0..1: N (0..1: sue))
  (1..2: TV (1..2: likes))
  (2..3: N (2..3: themself)))
[
  child-0: [
    pron: #0 they
    child-0: [ word: sue ]
    case: nom
    num: #1 sg
  ]
  child-1: [
    tense: nonpast
    num: #1
    child-0: [ word: likes ]
  ]
  child-2: [
    needs_pron: #0
    case: acc
    pron: ref
    child-0: [ word: themself ]
    num: sg
  ]
]

> sue likes himself
Parsed 0 trees

If this is interesting to you and you want to learn more, you can check out my blog series, the excellent textbook Syntactic Theory: A Formal Introduction (2nd ed.), and the DELPH-IN project, whose work on the LKB inspired this simplified version.

Using from code

I need to write this section in more detail, but if you're comfortable with Rust, I suggest looking through the codebase. It's not perfect, it started as one of my first Rust projects (after migrating through F# -> TypeScript -> C in search of the right performance/ergonomics tradeoff), and it could use more tests, but overall it's not too bad.

Basically, the processing pipeline is:

  1. Make a Grammar struct
  • Grammar is defined in rules.rs.
  • The easiest way to make a Grammar is Grammar::parse_from_file, which is mostly a hand-written recusive descent parser in parse_grammar.rs. Yes, I recognize the irony here.
  1. It takes input (in Grammar::parse, which does everything for you, or Grammar::parse_chart, which just does the chart)
  2. The input is first chart-parsed in earley.rs
  3. Then, a forest is built from the chart, in forest.rs, using an algorithm I found in a very useful blog series I forget the URL for, because the algorithms in the academic literature for this are... weird.
  4. Finally, the feature unification is used to prune the forest down to only valid trees. It would be more efficient to do this during parsing, but meh.

The most interesting thing you can do via code and not via the CLI is probably getting at the raw feature DAG, as that would let you do things like pronoun coreference. The DAG code is in featurestructure.rs, and should be fairly approachable -- there's a lot of Rust ceremony around Rc<RefCell<...>> because using an arena allocation crate seemed too harlike overkill, but that is somewhat mitigated by the NodeRef type alias. Hit me up at https://vgel.me/contact if you need help with anything here!

Download Details:
Author: vgel
Source Code: https://github.com/vgel/treebender
License: MIT License

#rust  #machinelearning 

A Plugin for D3.js That Allows You to Easy Use Context-menus

d3-context-menu

This is a plugin for d3.js that allows you to easy use context-menus in your visualizations. It's 100% d3 based and done in the "d3 way", so you don't need to worry about including additional frameworks.

Install with Bower

bower install d3-context-menu

Basic usage:

// Define your menu
var menu = [
    {
        title: 'Item #1',
        action: function(d) {
            console.log('Item #1 clicked!');
            console.log('The data for this circle is: ' + d);
        },
        disabled: false // optional, defaults to false
    },
    {
        title: 'Item #2',
        action: function(d) {
            console.log('You have clicked the second item!');
            console.log('The data for this circle is: ' + d);
        }
    }
]

var data = [1, 2, 3];

var g = d3.select('body').append('svg')
    .attr('width', 200)
    .attr('height', 400)
    .append('g');

g.selectAll('circles')
    .data(data)
    .enter()
    .append('circle')
    .attr('r', 30)
    .attr('fill', 'steelblue')
    .attr('cx', function(d) {
        return 100;
    })
    .attr('cy', function(d) {
        return d * 100;
    })
    .on('contextmenu', d3.contextMenu(menu)); // attach menu to element
});

Advanced usage:

Headers and Dividers

Menus can have Headers and Dividers. To specify a header simply don't define an "action" property. To specify a divider, simply add a "divider: true" property to the menu item, and it'll be considered a divider. Example menu definition:

var menu = [
    {
        title: 'Header',
    },
    {
        title: 'Normal item',
        action: function() {}
    },
    {
        divider: true
    },
    {
        title: 'Last item',
        action: function() {}
    }
];

Nested Menu

Menus can have Nested Menu. To specify a nested menu, simply add "children" property. Children has item of array.

var menu = [
    {
        title: 'Parent',
        children: [
            {
                title: 'Child',
                children: [
                    {
                        // header
                        title: 'Grand-Child1'
                    },
                    {
                        // normal
                        title: 'Grand-Child2',
                        action: function() {}
                    },
                    {
                        // divider
                        divider: true
                    },
                    {
                        // disable
                        title: 'Grand-Child3',
                        action: function() {}
                    }
                ]
            }
        ]
    },
];

See the index.htm file in the example folder to see this in action.

Pre-show callback

You can pass in a callback that will be executed before the context menu appears. This can be useful if you need something to close tooltips or perform some other task before the menu appears:

    ...
    .on('contextmenu', d3.contextMenu(menu, function() {
        console.log('Quick! Before the menu appears!');
    })); // attach menu to element

Post-show callback

You can pass in a callback that will be executed after the context menu appears using the onClose option:

    ...
    .on('contextmenu', d3.contextMenu(menu, {
        onOpen: function() {
            console.log('Quick! Before the menu appears!');
        },
        onClose: function() {
            console.log('Menu has been closed.');
        }
    })); // attach menu to element

Context-sensitive menu items

You can use information from your context in menu names, simply specify a function for title which returns a string:

var menu = [
    {
        title: function(d) {
            return 'Delete circle '+d.circleName;
        },
        action: function(d) {
            // delete it
        }
    },
    {
        title: function(d) {
            return 'Item 2';
        },
        action: function(d) {
            // do nothing interesting
        }
    }
];

// Menu shown is:

[Delete Circle MyCircle]
[Item 2]

Dynamic menu list

You can also have different lists of menu items for different nodes if menu is a function:

var menu = function(data) {
    if (data.x > 100) {
        return [{
            title: 'Item #1',
            action: function(d) {
                console.log('Item #1 clicked!');
                console.log('The data for this circle is: ' + d);
            }
        }];
    } else {
        return [{
            title: 'Item #1',
            action: function(d) {
                console.log('Item #1 clicked!');
                console.log('The data for this circle is: ' + d);
            }
        }, {
            title: 'Item #2',
            action: function(d) {
                console.log('Item #2 clicked!');
                console.log('The data for this circle is: ' + d);
            }
        }];
    }
};

// Menu shown for nodes with x < 100 contains 1 item, while other nodes have 2 menu items

Deleting Nodes Example

The following example shows how to add a right click menu to a tree diagram:

http://plnkr.co/edit/bDBe0xGX1mCLzqYGOqOS?p=info

Explicitly set menu position

Default position can be overwritten by providing a position option (either object or function returning an object):

    ...
    .on('contextmenu', d3.contextMenu(menu, {
        onOpen: function() {
            ...
        },
        onClose: function() {
            ...
        },
        position: {
            top: 100,
            left: 200
        }
    })); // attach menu to element

or

    ...
    .on('contextmenu', d3.contextMenu(menu, {
        onOpen: function() {
            ...
        },
        onClose: function() {
            ...
        },
        position: function(d) {
            var elm = this;
            var bounds = elm.getBoundingClientRect();

            // eg. align bottom-left
            return {
                top: bounds.top + bounds.height,
                left: bounds.left
            }
        }
    })); // attach menu to element

Set your own CSS class as theme (make sure to style it)

d3.contextMenu(menu, {
    ...
    theme: 'my-awesome-theme'
});

or

d3.contextMenu(menu, {
    ...
    theme: function () {
        if (foo) {
            return 'my-foo-theme';
        }
        else {
            return 'my-awesome-theme';
        }
    }
});

Close the context menu programatically (can be used as cleanup, as well)

d3.contextMenu('close');

The following example shows how to add a right click menu to a tree diagram:

http://plnkr.co/edit/bDBe0xGX1mCLzqYGOqOS?p=info

Additional callback arguments

Depending on the D3 library version used the callback functions can provide an additional argument:

  • for D3 6.x or above it will be the event, since the global d3.event is not available.
var menu = [
    {
        title: 'Item #1',
        action: function(d, event) {
            console.log('Item #1 clicked!');
            console.log('The data for this circle is: ' + d);
            console.log('The event is: ' + event);
        }
    }
]
  • for D3 5.x or below it will be the index, for backward compatibility reasons.
var menu = [
    {
        title: 'Item #1',
        action: function(d, index) {
            console.log('Item #1 clicked!');
            console.log('The data for this circle is: ' + d);
            console.log('The index is: ' + index);
        }
    }
]

What's new in version 2.1.0

  • Added support for accessing event information in with D3 6.x.

What's new in version 2.0.0

  • Added support for D3 6.x
  • The index parameter of callbacks are undefined when using D3 6.x or above. See the index.htm file in the example folder to see how to get the proper index value in that case.
  • Added class property for menu items that allows specifying CSS classes (see: https://github.com/patorjk/d3-context-menu/pull/56).

What's new in version 1.1.2

  • Menu updated so it wont go off bottom or right of screen when window is smaller.

What's new in version 1.1.1

  • Menu close bug fix.

What's new in version 1.1.0

  • Nested submenus are now supported.

What's new in version 1.0.1

  • Default theme styles extracted to their own CSS class (d3-context-menu-theme)
  • Ability to specify own theme css class via the theme configuration option (as string or function returning string)
  • onOpen/onClose callbacks now have consistent signature (they receive data and index, and this argument refers to the DOM element the context menu is related to)
  • all other functions (eg. position, menu) have the same signature and this object as onClose/onOpen
  • Context menu now closes on mousedown outside of the menu, instead of click outside (to mimic behaviour of the native context menu)
  • disabled and divider can now be functions as well and have the same signature and this object as explained above
  • Close the context menu programatically using d3.contextMenu('close');

What's new in version 0.2.1

  • Ability to set menu position
  • Minified css and js versions

What's new in version 0.1.3

  • Fixed issue where context menu element is never removed from DOM
  • Fixed issue where <body> click event is never removed
  • Fixed issue where the incorrect onClose callback was called when menu was closed as a result of clicking outside

What's new in version 0.1.2

  • If contextmenu is clicked twice it will close rather than open the browser's context menu.

What's new in version 0.1.1

  • Header and Divider items.
  • Ability to disable items.

It's written to be very light weight and customizable. You can see it in action here:

http://plnkr.co/edit/hAx36JQhb0RsvVn7TomS?p=info

Author: Patorjk
Source Code: https://github.com/patorjk/d3-context-menu 
License: MIT license

#javascript #d3 #menu #visualization 

A Wrapper for Sembast and SQFlite to Enable Easy

FHIR_DB

This is really just a wrapper around Sembast_SQFLite - so all of the heavy lifting was done by Alex Tekartik. I highly recommend that if you have any questions about working with this package that you take a look at Sembast. He's also just a super nice guy, and even answered a question for me when I was deciding which sembast version to use. As usual, ResoCoder also has a good tutorial.

I have an interest in low-resource settings and thus a specific reason to be able to store data offline. To encourage this use, there are a number of other packages I have created based around the data format FHIR. FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of this product by HL7.

Using the Db

So, while not absolutely necessary, I highly recommend that you use some sort of interface class. This adds the benefit of more easily handling errors, plus if you change to a different database in the future, you don't have to change the rest of your app, just the interface.

I've used something like this in my projects:

class IFhirDb {
  IFhirDb();
  final ResourceDao resourceDao = ResourceDao();

  Future<Either<DbFailure, Resource>> save(Resource resource) async {
    Resource resultResource;
    try {
      resultResource = await resourceDao.save(resource);
    } catch (error) {
      return left(DbFailure.unableToSave(error: error.toString()));
    }
    return right(resultResource);
  }

  Future<Either<DbFailure, List<Resource>>> returnListOfSingleResourceType(
      String resourceType) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.getAllSortedById(resourceType: resourceType);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }

  Future<Either<DbFailure, List<Resource>>> searchFunction(
      String resourceType, String searchString, String reference) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.searchFor(resourceType, searchString, reference);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }
}

I like this because in case there's an i/o error or something, it won't crash your app. Then, you can call this interface in your app like the following:

final patient = Patient(
    resourceType: 'Patient',
    name: [HumanName(text: 'New Patient Name')],
    birthDate: Date(DateTime.now()),
);

final saveResult = await IFhirDb().save(patient);

This will save your newly created patient to the locally embedded database.

IMPORTANT: this database will expect that all previously created resources have an id. When you save a resource, it will check to see if that resource type has already been stored. (Each resource type is saved in it's own store in the database). It will then check if there is an ID. If there's no ID, it will create a new one for that resource (along with metadata on version number and creation time). It will save it, and return the resource. If it already has an ID, it will copy the the old version of the resource into a _history store. It will then update the metadata of the new resource and save that version into the appropriate store for that resource. If, for instance, we have a previously created patient:

{
    "resourceType": "Patient",
    "id": "fhirfli-294057507-6811107",
    "meta": {
        "versionId": "1",
        "lastUpdated": "2020-10-16T19:41:28.054369Z"
    },
    "name": [
        {
            "given": ["New"],
            "family": "Patient"
        }
    ],
    "birthDate": "2020-10-16"
}

And we update the last name to 'Provider'. The above version of the patient will be kept in _history, while in the 'Patient' store in the db, we will have the updated version:

{
    "resourceType": "Patient",
    "id": "fhirfli-294057507-6811107",
    "meta": {
        "versionId": "2",
        "lastUpdated": "2020-10-16T19:45:07.316698Z"
    },
    "name": [
        {
            "given": ["New"],
            "family": "Provider"
        }
    ],
    "birthDate": "2020-10-16"
}

This way we can keep track of all previous version of all resources (which is obviously important in medicine).

For most of the interactions (saving, deleting, etc), they work the way you'd expect. The only difference is search. Because Sembast is NoSQL, we can search on any of the fields in a resource. If in our interface class, we have the following function:

  Future<Either<DbFailure, List<Resource>>> searchFunction(
      String resourceType, String searchString, String reference) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.searchFor(resourceType, searchString, reference);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }

You can search for all immunizations of a certain patient:

searchFunction(
        'Immunization', 'patient.reference', 'Patient/$patientId');

This function will search through all entries in the 'Immunization' store. It will look at all 'patient.reference' fields, and return any that match 'Patient/$patientId'.

The last thing I'll mention is that this is a password protected db, using AES-256 encryption (although it can also use Salsa20). Anytime you use the db, you have the option of using a password for encryption/decryption. Remember, if you setup the database using encryption, you will only be able to access it using that same password. When you're ready to change the password, you will need to call the update password function. If we again assume we created a change password method in our interface, it might look something like this:

class IFhirDb {
  IFhirDb();
  final ResourceDao resourceDao = ResourceDao();
  ...
    Future<Either<DbFailure, Unit>> updatePassword(String oldPassword, String newPassword) async {
    try {
      await resourceDao.updatePw(oldPassword, newPassword);
    } catch (error) {
      return left(DbFailure.unableToUpdatePassword(error: error.toString()));
    }
    return right(Unit);
  }

You don't have to use a password, and in that case, it will save the db file as plain text. If you want to add a password later, it will encrypt it at that time.

General Store

After using this for a while in an app, I've realized that it needs to be able to store data apart from just FHIR resources, at least on occasion. For this, I've added a second class for all versions of the database called GeneralDao. This is similar to the ResourceDao, but fewer options. So, in order to save something, it would look like this:

await GeneralDao().save('password', {'new':'map'});
await GeneralDao().save('password', {'new':'map'}, 'key');

The difference between these two options is that the first one will generate a key for the map being stored, while the second will store the map using the key provided. Both will return the key after successfully storing the map.

Other functions available include:

// deletes everything in the general store
await GeneralDao().deleteAllGeneral('password'); 

// delete specific entry
await GeneralDao().delete('password','key'); 

// returns map with that key
await GeneralDao().find('password', 'key'); 

FHIR® is a registered trademark of Health Level Seven International (HL7) and its use does not constitute an endorsement of products by HL7®

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add fhir_db

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  fhir_db: ^0.4.3

Alternatively, your editor might support or flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:fhir_db/dstu2.dart';
import 'package:fhir_db/dstu2/fhir_db.dart';
import 'package:fhir_db/dstu2/general_dao.dart';
import 'package:fhir_db/dstu2/resource_dao.dart';
import 'package:fhir_db/encrypt/aes.dart';
import 'package:fhir_db/encrypt/salsa.dart';
import 'package:fhir_db/r4.dart';
import 'package:fhir_db/r4/fhir_db.dart';
import 'package:fhir_db/r4/general_dao.dart';
import 'package:fhir_db/r4/resource_dao.dart';
import 'package:fhir_db/r5.dart';
import 'package:fhir_db/r5/fhir_db.dart';
import 'package:fhir_db/r5/general_dao.dart';
import 'package:fhir_db/r5/resource_dao.dart';
import 'package:fhir_db/stu3.dart';
import 'package:fhir_db/stu3/fhir_db.dart';
import 'package:fhir_db/stu3/general_dao.dart';
import 'package:fhir_db/stu3/resource_dao.dart'; 

example/lib/main.dart

import 'package:fhir/r4.dart';
import 'package:fhir_db/r4.dart';
import 'package:flutter/material.dart';
import 'package:test/test.dart';

Future<void> main() async {
  WidgetsFlutterBinding.ensureInitialized();

  final resourceDao = ResourceDao();

  // await resourceDao.updatePw('newPw', null);
  await resourceDao.deleteAllResources(null);

  group('Playing with passwords', () {
    test('Playing with Passwords', () async {
      final patient = Patient(id: Id('1'));

      final saved = await resourceDao.save(null, patient);

      await resourceDao.updatePw(null, 'newPw');
      final search1 = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search1[0]);

      await resourceDao.updatePw('newPw', 'newerPw');
      final search2 = await resourceDao.find('newerPw',
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search2[0]);

      await resourceDao.updatePw('newerPw', null);
      final search3 = await resourceDao.find(null,
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search3[0]);

      await resourceDao.deleteAllResources(null);
    });
  });

  final id = Id('12345');
  group('Saving Things:', () {
    test('Save Patient', () async {
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);
      final patient = Patient(id: id, name: [humanName]);
      final saved = await resourceDao.save(null, patient);

      expect(saved.id, id);

      expect((saved as Patient).name?[0], humanName);
    });

    test('Save Organization', () async {
      final organization = Organization(id: id, name: 'FhirFli');
      final saved = await resourceDao.save(null, organization);

      expect(saved.id, id);

      expect((saved as Organization).name, 'FhirFli');
    });

    test('Save Observation1', () async {
      final observation1 = Observation(
        id: Id('obs1'),
        code: CodeableConcept(text: 'Observation #1'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1');
    });

    test('Save Observation1 Again', () async {
      final observation1 = Observation(
          id: Id('obs1'),
          code: CodeableConcept(text: 'Observation #1 - Updated'));
      final saved = await resourceDao.save(null, observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1 - Updated');

      expect(saved.meta?.versionId, Id('2'));
    });

    test('Save Observation2', () async {
      final observation2 = Observation(
        id: Id('obs2'),
        code: CodeableConcept(text: 'Observation #2'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation2);

      expect(saved.id, Id('obs2'));

      expect((saved as Observation).code.text, 'Observation #2');
    });

    test('Save Observation3', () async {
      final observation3 = Observation(
        id: Id('obs3'),
        code: CodeableConcept(text: 'Observation #3'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation3);

      expect(saved.id, Id('obs3'));

      expect((saved as Observation).code.text, 'Observation #3');
    });
  });

  group('Finding Things:', () {
    test('Find 1st Patient', () async {
      final search = await resourceDao.find(null,
          resourceType: R4ResourceType.Patient, id: id);
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);

      expect(search.length, 1);

      expect((search[0] as Patient).name?[0], humanName);
    });

    test('Find 3rd Observation', () async {
      final search = await resourceDao.find(null,
          resourceType: R4ResourceType.Observation, id: Id('obs3'));

      expect(search.length, 1);

      expect(search[0].id, Id('obs3'));

      expect((search[0] as Observation).code.text, 'Observation #3');
    });

    test('Find All Observations', () async {
      final search = await resourceDao.getResourceType(
        null,
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 3);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), true);

      expect(idList.contains('obs3'), true);
    });

    test('Find All (non-historical) Resources', () async {
      final search = await resourceDao.getAll(null);

      expect(search.length, 5);
      final patList = search.toList();
      final orgList = search.toList();
      final obsList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);
      obsList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Observation);

      expect(patList.length, 1);

      expect(orgList.length, 1);

      expect(obsList.length, 3);
    });
  });

  group('Deleting Things:', () {
    test('Delete 2nd Observation', () async {
      await resourceDao.delete(
          null, null, R4ResourceType.Observation, Id('obs2'), null, null);

      final search = await resourceDao.getResourceType(
        null,
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 2);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), false);

      expect(idList.contains('obs3'), true);
    });

    test('Delete All Observations', () async {
      await resourceDao.deleteSingleType(null,
          resourceType: R4ResourceType.Observation);

      final search = await resourceDao.getAll(null);

      expect(search.length, 2);

      final patList = search.toList();
      final orgList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);

      expect(patList.length, 1);

      expect(patList.length, 1);
    });

    test('Delete All Resources', () async {
      await resourceDao.deleteAllResources(null);

      final search = await resourceDao.getAll(null);

      expect(search.length, 0);
    });
  });

  group('Password - Saving Things:', () {
    test('Save Patient', () async {
      await resourceDao.updatePw(null, 'newPw');
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);
      final patient = Patient(id: id, name: [humanName]);
      final saved = await resourceDao.save('newPw', patient);

      expect(saved.id, id);

      expect((saved as Patient).name?[0], humanName);
    });

    test('Save Organization', () async {
      final organization = Organization(id: id, name: 'FhirFli');
      final saved = await resourceDao.save('newPw', organization);

      expect(saved.id, id);

      expect((saved as Organization).name, 'FhirFli');
    });

    test('Save Observation1', () async {
      final observation1 = Observation(
        id: Id('obs1'),
        code: CodeableConcept(text: 'Observation #1'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1');
    });

    test('Save Observation1 Again', () async {
      final observation1 = Observation(
          id: Id('obs1'),
          code: CodeableConcept(text: 'Observation #1 - Updated'));
      final saved = await resourceDao.save('newPw', observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1 - Updated');

      expect(saved.meta?.versionId, Id('2'));
    });

    test('Save Observation2', () async {
      final observation2 = Observation(
        id: Id('obs2'),
        code: CodeableConcept(text: 'Observation #2'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation2);

      expect(saved.id, Id('obs2'));

      expect((saved as Observation).code.text, 'Observation #2');
    });

    test('Save Observation3', () async {
      final observation3 = Observation(
        id: Id('obs3'),
        code: CodeableConcept(text: 'Observation #3'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation3);

      expect(saved.id, Id('obs3'));

      expect((saved as Observation).code.text, 'Observation #3');
    });
  });

  group('Password - Finding Things:', () {
    test('Find 1st Patient', () async {
      final search = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Patient, id: id);
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);

      expect(search.length, 1);

      expect((search[0] as Patient).name?[0], humanName);
    });

    test('Find 3rd Observation', () async {
      final search = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Observation, id: Id('obs3'));

      expect(search.length, 1);

      expect(search[0].id, Id('obs3'));

      expect((search[0] as Observation).code.text, 'Observation #3');
    });

    test('Find All Observations', () async {
      final search = await resourceDao.getResourceType(
        'newPw',
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 3);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), true);

      expect(idList.contains('obs3'), true);
    });

    test('Find All (non-historical) Resources', () async {
      final search = await resourceDao.getAll('newPw');

      expect(search.length, 5);
      final patList = search.toList();
      final orgList = search.toList();
      final obsList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);
      obsList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Observation);

      expect(patList.length, 1);

      expect(orgList.length, 1);

      expect(obsList.length, 3);
    });
  });

  group('Password - Deleting Things:', () {
    test('Delete 2nd Observation', () async {
      await resourceDao.delete(
          'newPw', null, R4ResourceType.Observation, Id('obs2'), null, null);

      final search = await resourceDao.getResourceType(
        'newPw',
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 2);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), false);

      expect(idList.contains('obs3'), true);
    });

    test('Delete All Observations', () async {
      await resourceDao.deleteSingleType('newPw',
          resourceType: R4ResourceType.Observation);

      final search = await resourceDao.getAll('newPw');

      expect(search.length, 2);

      final patList = search.toList();
      final orgList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);

      expect(patList.length, 1);

      expect(patList.length, 1);
    });

    test('Delete All Resources', () async {
      await resourceDao.deleteAllResources('newPw');

      final search = await resourceDao.getAll('newPw');

      expect(search.length, 0);

      await resourceDao.updatePw('newPw', null);
    });
  });
} 

Download Details:

Author: MayJuun

Source Code: https://github.com/MayJuun/fhir/tree/main/fhir_db

#sqflite  #dart  #flutter 

Rufus Scheduler: Job Scheduler for Ruby (at, Cron, in and Every Jobs)

rufus-scheduler

Job scheduler for Ruby (at, cron, in and every jobs).

It uses threads.

Note: maybe are you looking for the README of rufus-scheduler 2.x? (especially if you're using Dashing which is stuck on rufus-scheduler 2.0.24)

Quickstart:

# quickstart.rb

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

scheduler.in '3s' do
  puts 'Hello... Rufus'
end

scheduler.join
  #
  # let the current thread join the scheduler thread
  #
  # (please note that this join should be removed when scheduling
  # in a web application (Rails and friends) initializer)

(run with ruby quickstart.rb)

Various forms of scheduling are supported:

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

# ...

scheduler.in '10d' do
  # do something in 10 days
end

scheduler.at '2030/12/12 23:30:00' do
  # do something at a given point in time
end

scheduler.every '3h' do
  # do something every 3 hours
end
scheduler.every '3h10m' do
  # do something every 3 hours and 10 minutes
end

scheduler.cron '5 0 * * *' do
  # do something every day, five minutes after midnight
  # (see "man 5 crontab" in your terminal)
end

# ...

Rufus-scheduler uses fugit for parsing time strings, et-orbi for pairing time and tzinfo timezones.

non-features

Rufus-scheduler (out of the box) is an in-process, in-memory scheduler. It uses threads.

It does not persist your schedules. When the process is gone and the scheduler instance with it, the schedules are gone.

A rufus-scheduler instance will go on scheduling while it is present among the objects in a Ruby process. To make it stop scheduling you have to call its #shutdown method.

related and similar gems

  • Whenever - let cron call back your Ruby code, trusted and reliable cron drives your schedule
  • ruby-clock - a clock process / job scheduler for Ruby
  • Clockwork - rufus-scheduler inspired gem
  • Crono - an in-Rails cron scheduler
  • PerfectSched - highly available distributed cron built on Sequel and more

(please note: rufus-scheduler is not a cron replacement)

note about the 3.0 line

It's a complete rewrite of rufus-scheduler.

There is no EventMachine-based scheduler anymore.

I don't know what this Ruby thing is, where are my Rails?

I'll drive you right to the tracks.

notable changes:

  • As said, no more EventMachine-based scheduler
  • scheduler.every('100') { will schedule every 100 seconds (previously, it would have been 0.1s). This aligns rufus-scheduler with Ruby's sleep(100)
  • The scheduler isn't catching the whole of Exception anymore, only StandardError
  • The error_handler is #on_error (instead of #on_exception), by default it now prints the details of the error to $stderr (used to be $stdout)
  • Rufus::Scheduler::TimeOutError renamed to Rufus::Scheduler::TimeoutError
  • Introduction of "interval" jobs. Whereas "every" jobs are like "every 10 minutes, do this", interval jobs are like "do that, then wait for 10 minutes, then do that again, and so on"
  • Introduction of a lockfile: true/filename mechanism to prevent multiple schedulers from executing
  • "discard_past" is on by default. If the scheduler (its host) sleeps for 1 hour and a every '10m' job is on, it will trigger once at wakeup, not 6 times (discard_past was false by default in rufus-scheduler 2.x). No intention to re-introduce discard_past: false in 3.0 for now.
  • Introduction of Scheduler #on_pre_trigger and #on_post_trigger callback points

getting help

So you need help. People can help you, but first help them help you, and don't waste their time. Provide a complete description of the issue. If it works on A but not on B and others have to ask you: "so what is different between A and B" you are wasting everyone's time.

"hello", "please" and "thanks" are not swear words.

Go read how to report bugs effectively, twice.

Update: help_help.md might help help you.

on Gitter

You can find help via chat over at https://gitter.im/floraison/fugit. It's fugit, et-orbi, and rufus-scheduler combined chat room.

Please be courteous.

issues

Yes, issues can be reported in rufus-scheduler issues, I'd actually prefer bugs in there. If there is nothing wrong with rufus-scheduler, a Stack Overflow question is better.

faq

scheduling

Rufus-scheduler supports five kinds of jobs. in, at, every, interval and cron jobs.

Most of the rufus-scheduler examples show block scheduling, but it's also OK to schedule handler instances or handler classes.

in, at, every, interval, cron

In and at jobs trigger once.

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

scheduler.in '10d' do
  puts "10 days reminder for review X!"
end

scheduler.at '2014/12/24 2000' do
  puts "merry xmas!"
end

In jobs are scheduled with a time interval, they trigger after that time elapsed. At jobs are scheduled with a point in time, they trigger when that point in time is reached (better to choose a point in the future).

Every, interval and cron jobs trigger repeatedly.

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

scheduler.every '3h' do
  puts "change the oil filter!"
end

scheduler.interval '2h' do
  puts "thinking..."
  puts sleep(rand * 1000)
  puts "thought."
end

scheduler.cron '00 09 * * *' do
  puts "it's 9am! good morning!"
end

Every jobs try hard to trigger following the frequency they were scheduled with.

Interval jobs trigger, execute and then trigger again after the interval elapsed. (every jobs time between trigger times, interval jobs time between trigger termination and the next trigger start).

Cron jobs are based on the venerable cron utility (man 5 crontab). They trigger following a pattern given in (almost) the same language cron uses.

 

#schedule_x vs #x

schedule_in, schedule_at, schedule_cron, etc will return the new Job instance.

in, at, cron will return the new Job instance's id (a String).

job_id =
  scheduler.in '10d' do
    # ...
  end
job = scheduler.job(job_id)

# versus

job =
  scheduler.schedule_in '10d' do
    # ...
  end

# also

job =
  scheduler.in '10d', job: true do
    # ...
  end

#schedule and #repeat

Sometimes it pays to be less verbose.

The #schedule methods schedules an at, in or cron job. It just decides based on its input. It returns the Job instance.

scheduler.schedule '10d' do; end.class
  # => Rufus::Scheduler::InJob

scheduler.schedule '2013/12/12 12:30' do; end.class
  # => Rufus::Scheduler::AtJob

scheduler.schedule '* * * * *' do; end.class
  # => Rufus::Scheduler::CronJob

The #repeat method schedules and returns an EveryJob or a CronJob.

scheduler.repeat '10d' do; end.class
  # => Rufus::Scheduler::EveryJob

scheduler.repeat '* * * * *' do; end.class
  # => Rufus::Scheduler::CronJob

(Yes, no combination here gives back an IntervalJob).

schedule blocks arguments (job, time)

A schedule block may be given 0, 1 or 2 arguments.

The first argument is "job", it's simply the Job instance involved. It might be useful if the job is to be unscheduled for some reason.

scheduler.every '10m' do |job|

  status = determine_pie_status

  if status == 'burnt' || status == 'cooked'
    stop_oven
    takeout_pie
    job.unschedule
  end
end

The second argument is "time", it's the time when the job got cleared for triggering (not Time.now).

Note that time is the time when the job got cleared for triggering. If there are mutexes involved, now = mutex_wait_time + time...

"every" jobs and changing the next_time in-flight

It's OK to change the next_time of an every job in-flight:

scheduler.every '10m' do |job|

  # ...

  status = determine_pie_status

  job.next_time = Time.now + 30 * 60 if status == 'burnt'
    #
    # if burnt, wait 30 minutes for the oven to cool a bit
end

It should work as well with cron jobs, not so with interval jobs whose next_time is computed after their block ends its current run.

scheduling handler instances

It's OK to pass any object, as long as it responds to #call(), when scheduling:

class Handler
  def self.call(job, time)
    p "- Handler called for #{job.id} at #{time}"
  end
end

scheduler.in '10d', Handler

# or

class OtherHandler
  def initialize(name)
    @name = name
  end
  def call(job, time)
    p "* #{time} - Handler #{name.inspect} called for #{job.id}"
  end
end

oh = OtherHandler.new('Doe')

scheduler.every '10m', oh
scheduler.in '3d5m', oh

The call method must accept 2 (job, time), 1 (job) or 0 arguments.

Note that time is the time when the job got cleared for triggering. If there are mutexes involved, now = mutex_wait_time + time...

scheduling handler classes

One can pass a handler class to rufus-scheduler when scheduling. Rufus will instantiate it and that instance will be available via job#handler.

class MyHandler
  attr_reader :count
  def initialize
    @count = 0
  end
  def call(job)
    @count += 1
    puts ". #{self.class} called at #{Time.now} (#{@count})"
  end
end

job = scheduler.schedule_every '35m', MyHandler

job.handler
  # => #<MyHandler:0x000000021034f0>
job.handler.count
  # => 0

If you want to keep that "block feeling":

job_id =
  scheduler.every '10m', Class.new do
    def call(job)
      puts ". hello #{self.inspect} at #{Time.now}"
    end
  end

pause and resume the scheduler

The scheduler can be paused via the #pause and #resume methods. One can determine if the scheduler is currently paused by calling #paused?.

While paused, the scheduler still accepts schedules, but no schedule will get triggered as long as #resume isn't called.

job options

name: string

Sets the name of the job.

scheduler.cron '*/15 8 * * *', name: 'Robert' do |job|
  puts "A, it's #{Time.now} and my name is #{job.name}"
end

job1 =
  scheduler.schedule_cron '*/30 9 * * *', n: 'temporary' do |job|
    puts "B, it's #{Time.now} and my name is #{job.name}"
  end
# ...
job1.name = 'Beowulf'

blocking: true

By default, jobs are triggered in their own, new threads. When blocking: true, the job is triggered in the scheduler thread (a new thread is not created). Yes, while a blocking job is running, the scheduler is not scheduling.

overlap: false

Since, by default, jobs are triggered in their own new threads, job instances might overlap. For example, a job that takes 10 minutes and is scheduled every 7 minutes will have overlaps.

To prevent overlap, one can set overlap: false. Such a job will not trigger if one of its instances is already running.

The :overlap option is considered before the :mutex option when the scheduler is reviewing jobs for triggering.

mutex: mutex_instance / mutex_name / array of mutexes

When a job with a mutex triggers, the job's block is executed with the mutex around it, preventing other jobs with the same mutex from entering (it makes the other jobs wait until it exits the mutex).

This is different from overlap: false, which is, first, limited to instances of the same job, and, second, doesn't make the incoming job instance block/wait but give up.

:mutex accepts a mutex instance or a mutex name (String). It also accept an array of mutex names / mutex instances. It allows for complex relations between jobs.

Array of mutexes: original idea and implementation by Rainux Luo

Note: creating lots of different mutexes is OK. Rufus-scheduler will place them in its Scheduler#mutexes hash... And they won't get garbage collected.

The :overlap option is considered before the :mutex option when the scheduler is reviewing jobs for triggering.

timeout: duration or point in time

It's OK to specify a timeout when scheduling some work. After the time specified, it gets interrupted via a Rufus::Scheduler::TimeoutError.

scheduler.in '10d', timeout: '1d' do
  begin
    # ... do something
  rescue Rufus::Scheduler::TimeoutError
    # ... that something got interrupted after 1 day
  end
end

The :timeout option accepts either a duration (like "1d" or "2w3d") or a point in time (like "2013/12/12 12:00").

:first_at, :first_in, :first, :first_time

This option is for repeat jobs (cron / every) only.

It's used to specify the first time after which the repeat job should trigger for the first time.

In the case of an "every" job, this will be the first time (modulo the scheduler frequency) the job triggers. For a "cron" job as well, the :first will point to the first time the job has to trigger, the following trigger times are then determined by the cron string.

scheduler.every '2d', first_at: Time.now + 10 * 3600 do
  # ... every two days, but start in 10 hours
end

scheduler.every '2d', first_in: '10h' do
  # ... every two days, but start in 10 hours
end

scheduler.cron '00 14 * * *', first_in: '3d' do
  # ... every day at 14h00, but start after 3 * 24 hours
end

:first, :first_at and :first_in all accept a point in time or a duration (number or time string). Use the symbol you think makes your schedule more readable.

Note: it's OK to change the first_at (a Time instance) directly:

job.first_at = Time.now + 10
job.first_at = Rufus::Scheduler.parse('2029-12-12')

The first argument (in all its flavours) accepts a :now or :immediately value. That schedules the first occurrence for immediate triggering. Consider:

require 'rufus-scheduler'

s = Rufus::Scheduler.new

n = Time.now; p [ :scheduled_at, n, n.to_f ]

s.every '3s', first: :now do
  n = Time.now; p [ :in, n, n.to_f ]
end

s.join

that'll output something like:

[:scheduled_at, 2014-01-22 22:21:21 +0900, 1390396881.344438]
[:in, 2014-01-22 22:21:21 +0900, 1390396881.6453865]
[:in, 2014-01-22 22:21:24 +0900, 1390396884.648807]
[:in, 2014-01-22 22:21:27 +0900, 1390396887.651686]
[:in, 2014-01-22 22:21:30 +0900, 1390396890.6571937]
...

:last_at, :last_in, :last

This option is for repeat jobs (cron / every) only.

It indicates the point in time after which the job should unschedule itself.

scheduler.cron '5 23 * * *', last_in: '10d' do
  # ... do something every evening at 23:05 for 10 days
end

scheduler.every '10m', last_at: Time.now + 10 * 3600 do
  # ... do something every 10 minutes for 10 hours
end

scheduler.every '10m', last_in: 10 * 3600 do
  # ... do something every 10 minutes for 10 hours
end

:last, :last_at and :last_in all accept a point in time or a duration (number or time string). Use the symbol you think makes your schedule more readable.

Note: it's OK to change the last_at (nil or a Time instance) directly:

job.last_at = nil
  # remove the "last" bound

job.last_at = Rufus::Scheduler.parse('2029-12-12')
  # set the last bound

times: nb of times (before auto-unscheduling)

One can tell how many times a repeat job (CronJob or EveryJob) is to execute before unscheduling by itself.

scheduler.every '2d', times: 10 do
  # ... do something every two days, but not more than 10 times
end

scheduler.cron '0 23 * * *', times: 31 do
  # ... do something every day at 23:00 but do it no more than 31 times
end

It's OK to assign nil to :times to make sure the repeat job is not limited. It's useful when the :times is determined at scheduling time.

scheduler.cron '0 23 * * *', times: (nolimit ? nil : 10) do
  # ...
end

The value set by :times is accessible in the job. It can be modified anytime.

job =
  scheduler.cron '0 23 * * *' do
    # ...
  end

# later on...

job.times = 10
  # 10 days and it will be over

Job methods

When calling a schedule method, the id (String) of the job is returned. Longer schedule methods return Job instances directly. Calling the shorter schedule methods with the job: true also returns Job instances instead of Job ids (Strings).

  require 'rufus-scheduler'

  scheduler = Rufus::Scheduler.new

  job_id =
    scheduler.in '10d' do
      # ...
    end

  job =
    scheduler.schedule_in '1w' do
      # ...
    end

  job =
    scheduler.in '1w', job: true do
      # ...
    end

Those Job instances have a few interesting methods / properties:

id, job_id

Returns the job id.

job = scheduler.schedule_in('10d') do; end
job.id
  # => "in_1374072446.8923042_0.0_0"

scheduler

Returns the scheduler instance itself.

opts

Returns the options passed at the Job creation.

job = scheduler.schedule_in('10d', tag: 'hello') do; end
job.opts
  # => { :tag => 'hello' }

original

Returns the original schedule.

job = scheduler.schedule_in('10d', tag: 'hello') do; end
job.original
  # => '10d'

callable, handler

callable() returns the scheduled block (or the call method of the callable object passed in lieu of a block)

handler() returns nil if a block was scheduled and the instance scheduled otherwise.

# when passing a block

job =
  scheduler.schedule_in('10d') do
    # ...
  end

job.handler
  # => nil
job.callable
  # => #<Proc:0x00000001dc6f58@/home/jmettraux/whatever.rb:115>

and

# when passing something else than a block

class MyHandler
  attr_reader :counter
  def initialize
    @counter = 0
  end
  def call(job, time)
    @counter = @counter + 1
  end
end

job = scheduler.schedule_in('10d', MyHandler.new)

job.handler
  # => #<Method: MyHandler#call>
job.callable
  # => #<MyHandler:0x0000000163ae88 @counter=0>

source_location

Added to rufus-scheduler 3.8.0.

Returns the array [ 'path/to/file.rb', 123 ] like Proc#source_location does.

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

job = scheduler.schedule_every('2h') { p Time.now }

p job.source_location
  # ==> [ '/home/jmettraux/rufus-scheduler/test.rb', 6 ]

scheduled_at

Returns the Time instance when the job got created.

job = scheduler.schedule_in('10d', tag: 'hello') do; end
job.scheduled_at
  # => 2013-07-17 23:48:54 +0900

last_time

Returns the last time the job triggered (is usually nil for AtJob and InJob).

job = scheduler.schedule_every('10s') do; end

job.scheduled_at
  # => 2013-07-17 23:48:54 +0900
job.last_time
  # => nil (since we've just scheduled it)

# after 10 seconds

job.scheduled_at
  # => 2013-07-17 23:48:54 +0900 (same as above)
job.last_time
  # => 2013-07-17 23:49:04 +0900

previous_time

Returns the previous #next_time

scheduler.every('10s') do |job|
  puts "job scheduled for #{job.previous_time} triggered at #{Time.now}"
  puts "next time will be around #{job.next_time}"
  puts "."
end

last_work_time, mean_work_time

The job keeps track of how long its work was in the last_work_time attribute. For a one time job (in, at) it's probably not very useful.

The attribute mean_work_time contains a computed mean work time. It's recomputed after every run (if it's a repeat job).

next_times(n)

Returns an array of EtOrbi::EoTime instances (Time instances with a designated time zone), listing the n next occurrences for this job.

Please note that for "interval" jobs, a mean work time is computed each time and it's used by this #next_times(n) method to approximate the next times beyond the immediate next time.

unschedule

Unschedule the job, preventing it from firing again and removing it from the schedule. This doesn't prevent a running thread for this job to run until its end.

threads

Returns the list of threads currently "hosting" runs of this Job instance.

kill

Interrupts all the work threads currently running for this job instance. They discard their work and are free for their next run (of whatever job).

Note: this doesn't unschedule the Job instance.

Note: if the job is pooled for another run, a free work thread will probably pick up that next run and the job will appear as running again. You'd have to unschedule and kill to make sure the job doesn't run again.

running?

Returns true if there is at least one running Thread hosting a run of this Job instance.

scheduled?

Returns true if the job is scheduled (is due to trigger). For repeat jobs it should return true until the job gets unscheduled. "at" and "in" jobs will respond with false as soon as they start running (execution triggered).

pause, resume, paused?, paused_at

These four methods are only available to CronJob, EveryJob and IntervalJob instances. One can pause or resume such jobs thanks to these methods.

job =
  scheduler.schedule_every('10s') do
    # ...
  end

job.pause
  # => 2013-07-20 01:22:22 +0900
job.paused?
  # => true
job.paused_at
  # => 2013-07-20 01:22:22 +0900

job.resume
  # => nil

tags

Returns the list of tags attached to this Job instance.

By default, returns an empty array.

job = scheduler.schedule_in('10d') do; end
job.tags
  # => []

job = scheduler.schedule_in('10d', tag: 'hello') do; end
job.tags
  # => [ 'hello' ]

[]=, [], key?, has_key?, keys, values, and entries

Threads have thread-local variables, similarly Rufus-scheduler jobs have job-local variables. Those are more like a dict with thread-safe access.

job =
  @scheduler.schedule_every '1s' do |job|
    job[:timestamp] = Time.now.to_f
    job[:counter] ||= 0
    job[:counter] += 1
  end

sleep 3.6

job[:counter]
  # => 3

job.key?(:timestamp) # => true
job.has_key?(:timestamp) # => true
job.keys # => [ :timestamp, :counter ]

Locals can be set at schedule time:

job0 =
  @scheduler.schedule_cron '*/15 12 * * *', locals: { a: 0 } do
    # ...
  end
job1 =
  @scheduler.schedule_cron '*/15 13 * * *', l: { a: 1 } do
    # ...
  end

One can fetch the Hash directly with Job#locals. Of course, direct manipulation is not thread-safe.

job.locals.entries do |k, v|
  p "#{k}: #{v}"
end

call

Job instances have a #call method. It simply calls the scheduled block or callable immediately.

job =
  @scheduler.schedule_every '10m' do |job|
    # ...
  end

job.call

Warning: the Scheduler#on_error handler is not involved. Error handling is the responsibility of the caller.

If the call has to be rescued by the error handler of the scheduler, call(true) might help:

require 'rufus-scheduler'

s = Rufus::Scheduler.new

def s.on_error(job, err)
  if job
    p [ 'error in scheduled job', job.class, job.original, err.message ]
  else
    p [ 'error while scheduling', err.message ]
  end
rescue
  p $!
end

job =
  s.schedule_in('1d') do
    fail 'again'
  end

job.call(true)
  #
  # true lets the error_handler deal with error in the job call

AtJob and InJob methods

time

Returns when the job will trigger (hopefully).

next_time

An alias for time.

EveryJob, IntervalJob and CronJob methods

next_time

Returns the next time the job will trigger (hopefully).

count

Returns how many times the job fired.

EveryJob methods

frequency

It returns the scheduling frequency. For a job scheduled "every 20s", it's 20.

It's used to determine if the job frequency is higher than the scheduler frequency (it raises an ArgumentError if that is the case).

IntervalJob methods

interval

Returns the interval scheduled between each execution of the job.

Every jobs use a time duration between each start of their execution, while interval jobs use a time duration between the end of an execution and the start of the next.

CronJob methods

brute_frequency

An expensive method to run, it's brute. It caches its results. By default it runs for 2017 (a non leap-year).

  require 'rufus-scheduler'

  Rufus::Scheduler.parse('* * * * *').brute_frequency
    #
    # => #<Fugit::Cron::Frequency:0x00007fdf4520c5e8
    #      @span=31536000.0, @delta_min=60, @delta_max=60,
    #      @occurrences=525600, @span_years=1.0, @yearly_occurrences=525600.0>
      #
      # Occurs 525600 times in a span of 1 year (2017) and 1 day.
      # There are least 60 seconds between "triggers" and at most 60 seconds.

  Rufus::Scheduler.parse('0 12 * * *').brute_frequency
    # => #<Fugit::Cron::Frequency:0x00007fdf451ec6d0
    #      @span=31536000.0, @delta_min=86400, @delta_max=86400,
    #      @occurrences=365, @span_years=1.0, @yearly_occurrences=365.0>
  Rufus::Scheduler.parse('0 12 * * *').brute_frequency.to_debug_s
    # => "dmin: 1D, dmax: 1D, ocs: 365, spn: 52W1D, spnys: 1, yocs: 365"
      #
      # 365 occurrences, at most 1 day between each, at least 1 day.

The CronJob#frequency method found in rufus-scheduler < 3.5 has been retired.

looking up jobs

Scheduler#job(job_id)

The scheduler #job(job_id) method can be used to look up Job instances.

  require 'rufus-scheduler'

  scheduler = Rufus::Scheduler.new

  job_id =
    scheduler.in '10d' do
      # ...
    end

  # later on...

  job = scheduler.job(job_id)

Scheduler #jobs #at_jobs #in_jobs #every_jobs #interval_jobs and #cron_jobs

Are methods for looking up lists of scheduled Job instances.

Here is an example:

  #
  # let's unschedule all the at jobs

  scheduler.at_jobs.each(&:unschedule)

Scheduler#jobs(tag: / tags: x)

When scheduling a job, one can specify one or more tags attached to the job. These can be used to look up the job later on.

  scheduler.in '10d', tag: 'main_process' do
    # ...
  end
  scheduler.in '10d', tags: [ 'main_process', 'side_dish' ] do
    # ...
  end

  # ...

  jobs = scheduler.jobs(tag: 'main_process')
    # find all the jobs with the 'main_process' tag

  jobs = scheduler.jobs(tags: [ 'main_process', 'side_dish' ]
    # find all the jobs with the 'main_process' AND 'side_dish' tags

Scheduler#running_jobs

Returns the list of Job instance that have currently running instances.

Whereas other "_jobs" method scan the scheduled job list, this method scans the thread list to find the job. It thus comprises jobs that are running but are not scheduled anymore (that happens for at and in jobs).

misc Scheduler methods

Scheduler#unschedule(job_or_job_id)

Unschedule a job given directly or by its id.

Scheduler#shutdown

Shuts down the scheduler, ceases any scheduler/triggering activity.

Scheduler#shutdown(:wait)

Shuts down the scheduler, waits (blocks) until all the jobs cease running.

Scheduler#shutdown(wait: n)

Shuts down the scheduler, waits (blocks) at most n seconds until all the jobs cease running. (Jobs are killed after n seconds have elapsed).

Scheduler#shutdown(:kill)

Kills all the job (threads) and then shuts the scheduler down. Radical.

Scheduler#down?

Returns true if the scheduler has been shut down.

Scheduler#started_at

Returns the Time instance at which the scheduler got started.

Scheduler #uptime / #uptime_s

Returns since the count of seconds for which the scheduler has been running.

#uptime_s returns this count in a String easier to grasp for humans, like "3d12m45s123".

Scheduler#join

Lets the current thread join the scheduling thread in rufus-scheduler. The thread comes back when the scheduler gets shut down.

#join is mostly used in standalone scheduling script (or tiny one file examples). Calling #join from a web application initializer will probably hijack the main thread and prevent the web application from being served. Do not put a #join in such a web application initializer file.

Scheduler#threads

Returns all the threads associated with the scheduler, including the scheduler thread itself.

Scheduler#work_threads(query=:all/:active/:vacant)

Lists the work threads associated with the scheduler. The query option defaults to :all.

  • :all : all the work threads
  • :active : all the work threads currently running a Job
  • :vacant : all the work threads currently not running a Job

Note that the main schedule thread will be returned if it is currently running a Job (ie one of those blocking: true jobs).

Scheduler#scheduled?(job_or_job_id)

Returns true if the arg is a currently scheduled job (see Job#scheduled?).

Scheduler#occurrences(time0, time1)

Returns a hash { job => [ t0, t1, ... ] } mapping jobs to their potential trigger time within the [ time0, time1 ] span.

Please note that, for interval jobs, the #mean_work_time is used, so the result is only a prediction.

Scheduler#timeline(time0, time1)

Like #occurrences but returns a list [ [ t0, job0 ], [ t1, job1 ], ... ] of time + job pairs.

dealing with job errors

The easy, job-granular way of dealing with errors is to rescue and deal with them immediately. The two next sections show examples. Skip them for explanations on how to deal with errors at the scheduler level.

block jobs

As said, jobs could take care of their errors themselves.

scheduler.every '10m' do
  begin
    # do something that might fail...
  rescue => e
    $stderr.puts '-' * 80
    $stderr.puts e.message
    $stderr.puts e.stacktrace
    $stderr.puts '-' * 80
  end
end

callable jobs

Jobs are not only shrunk to blocks, here is how the above would look like with a dedicated class.

scheduler.every '10m', Class.new do
  def call(job)
    # do something that might fail...
  rescue => e
    $stderr.puts '-' * 80
    $stderr.puts e.message
    $stderr.puts e.stacktrace
    $stderr.puts '-' * 80
  end
end

TODO: talk about callable#on_error (if implemented)

(see scheduling handler instances and scheduling handler classes for more about those "callable jobs")

Rufus::Scheduler#stderr=

By default, rufus-scheduler intercepts all errors (that inherit from StandardError) and dumps abundant details to $stderr.

If, for example, you'd like to divert that flow to another file (descriptor), you can reassign $stderr for the current Ruby process

$stderr = File.open('/var/log/myapplication.log', 'ab')

or, you can limit that reassignement to the scheduler itself

scheduler.stderr = File.open('/var/log/myapplication.log', 'ab')

Rufus::Scheduler#on_error(job, error)

We've just seen that, by default, rufus-scheduler dumps error information to $stderr. If one needs to completely change what happens in case of error, it's OK to overwrite #on_error

def scheduler.on_error(job, error)

  Logger.warn("intercepted error in #{job.id}: #{error.message}")
end

On Rails, the on_error method redefinition might look like:

def scheduler.on_error(job, error)

  Rails.logger.error(
    "err#{error.object_id} rufus-scheduler intercepted #{error.inspect}" +
    " in job #{job.inspect}")
  error.backtrace.each_with_index do |line, i|
    Rails.logger.error(
      "err#{error.object_id} #{i}: #{line}")
  end
end

Callbacks

Rufus::Scheduler #on_pre_trigger and #on_post_trigger callbacks

One can bind callbacks before and after jobs trigger:

s = Rufus::Scheduler.new

def s.on_pre_trigger(job, trigger_time)
  puts "triggering job #{job.id}..."
end

def s.on_post_trigger(job, trigger_time)
  puts "triggered job #{job.id}."
end

s.every '1s' do
  # ...
end

The trigger_time is the time at which the job triggers. It might be a bit before Time.now.

Warning: these two callbacks are executed in the scheduler thread, not in the work threads (the threads where the job execution really happens).

Rufus::Scheduler#around_trigger

One can create an around callback which will wrap a job:

def s.around_trigger(job)
  t = Time.now
  puts "Starting job #{job.id}..."
  yield
  puts "job #{job.id} finished in #{Time.now-t} seconds."
end

The around callback is executed in the thread.

Rufus::Scheduler#on_pre_trigger as a guard

Returning false in on_pre_trigger will prevent the job from triggering. Returning anything else (nil, -1, true, ...) will let the job trigger.

Note: your business logic should go in the scheduled block itself (or the scheduled instance). Don't put business logic in on_pre_trigger. Return false for admin reasons (backend down, etc), not for business reasons that are tied to the job itself.

def s.on_pre_trigger(job, trigger_time)

  return false if Backend.down?

  puts "triggering job #{job.id}..."
end

Rufus::Scheduler.new options

:frequency

By default, rufus-scheduler sleeps 0.300 second between every step. At each step it checks for jobs to trigger and so on.

The :frequency option lets you change that 0.300 second to something else.

scheduler = Rufus::Scheduler.new(frequency: 5)

It's OK to use a time string to specify the frequency.

scheduler = Rufus::Scheduler.new(frequency: '2h10m')
  # this scheduler will sleep 2 hours and 10 minutes between every "step"

Use with care.

lockfile: "mylockfile.txt"

This feature only works on OSes that support the flock (man 2 flock) call.

Starting the scheduler with lockfile: '.rufus-scheduler.lock' will make the scheduler attempt to create and lock the file .rufus-scheduler.lock in the current working directory. If that fails, the scheduler will not start.

The idea is to guarantee only one scheduler (in a group of schedulers sharing the same lockfile) is running.

This is useful in environments where the Ruby process holding the scheduler gets started multiple times.

If the lockfile mechanism here is not sufficient, you can plug your custom mechanism. It's explained in advanced lock schemes below.

:scheduler_lock

(since rufus-scheduler 3.0.9)

The scheduler lock is an object that responds to #lock and #unlock. The scheduler calls #lock when starting up. If the answer is false, the scheduler stops its initialization work and won't schedule anything.

Here is a sample of a scheduler lock that only lets the scheduler on host "coffee.example.com" start:

class HostLock
  def initialize(lock_name)
    @lock_name = lock_name
  end
  def lock
    @lock_name == `hostname -f`.strip
  end
  def unlock
    true
  end
end

scheduler =
  Rufus::Scheduler.new(scheduler_lock: HostLock.new('coffee.example.com'))

By default, the scheduler_lock is an instance of Rufus::Scheduler::NullLock, with a #lock that returns true.

:trigger_lock

(since rufus-scheduler 3.0.9)

The trigger lock in an object that responds to #lock. The scheduler calls that method on the job lock right before triggering any job. If the answer is false, the trigger doesn't happen, the job is not done (at least not in this scheduler).

Here is a (stupid) PingLock example, it'll only trigger if an "other host" is not responding to ping. Do not use that in production, you don't want to fork a ping process for each trigger attempt...

class PingLock
  def initialize(other_host)
    @other_host = other_host
  end
  def lock
    ! system("ping -c 1 #{@other_host}")
  end
end

scheduler =
  Rufus::Scheduler.new(trigger_lock: PingLock.new('main.example.com'))

By default, the trigger_lock is an instance of Rufus::Scheduler::NullLock, with a #lock that always returns true.

As explained in advanced lock schemes, another way to tune that behaviour is by overriding the scheduler's #confirm_lock method. (You could also do that with an #on_pre_trigger callback).

:max_work_threads

In rufus-scheduler 2.x, by default, each job triggering received its own, brand new, thread of execution. In rufus-scheduler 3.x, execution happens in a pooled work thread. The max work thread count (the pool size) defaults to 28.

One can set this maximum value when starting the scheduler.

scheduler = Rufus::Scheduler.new(max_work_threads: 77)

It's OK to increase the :max_work_threads of a running scheduler.

scheduler.max_work_threads += 10

Rufus::Scheduler.singleton

Do not want to store a reference to your rufus-scheduler instance? Then Rufus::Scheduler.singleton can help, it returns a singleton instance of the scheduler, initialized the first time this class method is called.

Rufus::Scheduler.singleton.every '10s' { puts "hello, world!" }

It's OK to pass initialization arguments (like :frequency or :max_work_threads) but they will only be taken into account the first time .singleton is called.

Rufus::Scheduler.singleton(max_work_threads: 77)
Rufus::Scheduler.singleton(max_work_threads: 277) # no effect

The .s is a shortcut for .singleton.

Rufus::Scheduler.s.every '10s' { puts "hello, world!" }

advanced lock schemes

As seen above, rufus-scheduler proposes the :lockfile system out of the box. If in a group of schedulers only one is supposed to run, the lockfile mechanism prevents schedulers that have not set/created the lockfile from running.

There are situations where this is not sufficient.

By overriding #lock and #unlock, one can customize how schedulers lock.

This example was provided by Eric Lindvall:

class ZookeptScheduler < Rufus::Scheduler

  def initialize(zookeeper, opts={})
    @zk = zookeeper
    super(opts)
  end

  def lock
    @zk_locker = @zk.exclusive_locker('scheduler')
    @zk_locker.lock # returns true if the lock was acquired, false else
  end

  def unlock
    @zk_locker.unlock
  end

  def confirm_lock
    return false if down?
    @zk_locker.assert!
  rescue ZK::Exceptions::LockAssertionFailedError => e
    # we've lost the lock, shutdown (and return false to at least prevent
    # this job from triggering
    shutdown
    false
  end
end

This uses a zookeeper to make sure only one scheduler in a group of distributed schedulers runs.

The methods #lock and #unlock are overridden and #confirm_lock is provided, to make sure that the lock is still valid.

The #confirm_lock method is called right before a job triggers (if it is provided). The more generic callback #on_pre_trigger is called right after #confirm_lock.

:scheduler_lock and :trigger_lock

(introduced in rufus-scheduler 3.0.9).

Another way of prodiving #lock, #unlock and #confirm_lock to a rufus-scheduler is by using the :scheduler_lock and :trigger_lock options.

See :trigger_lock and :scheduler_lock.

The scheduler lock may be used to prevent a scheduler from starting, while a trigger lock prevents individual jobs from triggering (the scheduler goes on scheduling).

One has to be careful with what goes in #confirm_lock or in a trigger lock, as it gets called before each trigger.

Warning: you may think you're heading towards "high availability" by using a trigger lock and having lots of schedulers at hand. It may be so if you limit yourself to scheduling the same set of jobs at scheduler startup. But if you add schedules at runtime, they stay local to their scheduler. There is no magic that propagates the jobs to all the schedulers in your pack.

parsing cronlines and time strings

(Please note that fugit does the heavy-lifting parsing work for rufus-scheduler).

Rufus::Scheduler provides a class method .parse to parse time durations and cron strings. It's what it's using when receiving schedules. One can use it directly (no need to instantiate a Scheduler).

require 'rufus-scheduler'

Rufus::Scheduler.parse('1w2d')
  # => 777600.0
Rufus::Scheduler.parse('1.0w1.0d')
  # => 777600.0

Rufus::Scheduler.parse('Sun Nov 18 16:01:00 2012').strftime('%c')
  # => 'Sun Nov 18 16:01:00 2012'

Rufus::Scheduler.parse('Sun Nov 18 16:01:00 2012 Europe/Berlin').strftime('%c %z')
  # => 'Sun Nov 18 15:01:00 2012 +0000'

Rufus::Scheduler.parse(0.1)
  # => 0.1

Rufus::Scheduler.parse('* * * * *')
  # => #<Fugit::Cron:0x00007fb7a3045508
  #      @original="* * * * *", @cron_s=nil,
  #      @seconds=[0], @minutes=nil, @hours=nil, @monthdays=nil, @months=nil,
  #      @weekdays=nil, @zone=nil, @timezone=nil>

It returns a number when the input is a duration and a Fugit::Cron instance when the input is a cron string.

It will raise an ArgumentError if it can't parse the input.

Beyond .parse, there are also .parse_cron and .parse_duration, for finer granularity.

There is an interesting helper method named .to_duration_hash:

require 'rufus-scheduler'

Rufus::Scheduler.to_duration_hash(60)
  # => { :m => 1 }
Rufus::Scheduler.to_duration_hash(62.127)
  # => { :m => 1, :s => 2, :ms => 127 }

Rufus::Scheduler.to_duration_hash(62.127, drop_seconds: true)
  # => { :m => 1 }

cronline notations specific to rufus-scheduler

first Monday, last Sunday et al

To schedule something at noon every first Monday of the month:

scheduler.cron('00 12 * * mon#1') do
  # ...
end

To schedule something at noon the last Sunday of every month:

scheduler.cron('00 12 * * sun#-1') do
  # ...
end
#
# OR
#
scheduler.cron('00 12 * * sun#L') do
  # ...
end

Such cronlines can be tested with scripts like:

require 'rufus-scheduler'

Time.now
  # => 2013-10-26 07:07:08 +0900
Rufus::Scheduler.parse('* * * * mon#1').next_time.to_s
  # => 2013-11-04 00:00:00 +0900

L (last day of month)

L can be used in the "day" slot:

In this example, the cronline is supposed to trigger every last day of the month at noon:

require 'rufus-scheduler'
Time.now
  # => 2013-10-26 07:22:09 +0900
Rufus::Scheduler.parse('00 12 L * *').next_time.to_s
  # => 2013-10-31 12:00:00 +0900

negative day (x days before the end of the month)

It's OK to pass negative values in the "day" slot:

scheduler.cron '0 0 -5 * *' do
  # do it at 00h00 5 days before the end of the month...
end

Negative ranges (-10--5-: 10 days before the end of the month to 5 days before the end of the month) are OK, but mixed positive / negative ranges will raise an ArgumentError.

Negative ranges with increments (-10---2/2) are accepted as well.

Descending day ranges are not accepted (10-8 or -8--10 for example).

a note about timezones

Cron schedules and at schedules support the specification of a timezone.

scheduler.cron '0 22 * * 1-5 America/Chicago' do
  # the job...
end

scheduler.at '2013-12-12 14:00 Pacific/Samoa' do
  puts "it's tea time!"
end

# or even

Rufus::Scheduler.parse("2013-12-12 14:00 Pacific/Saipan")
  # => #<Rufus::Scheduler::ZoTime:0x007fb424abf4e8 @seconds=1386820800.0, @zone=#<TZInfo::DataTimezone: Pacific/Saipan>, @time=nil>

I get "zotime.rb:41:in `initialize': cannot determine timezone from nil"

For when you see an error like:

rufus-scheduler/lib/rufus/scheduler/zotime.rb:41:
  in `initialize':
    cannot determine timezone from nil (etz:nil,tnz:"中国标准时间",tzid:nil)
      (ArgumentError)
    from rufus-scheduler/lib/rufus/scheduler/zotime.rb:198:in `new'
    from rufus-scheduler/lib/rufus/scheduler/zotime.rb:198:in `now'
    from rufus-scheduler/lib/rufus/scheduler.rb:561:in `start'
    ...

It may happen on Windows or on systems that poorly hint to Ruby which timezone to use. It should be solved by setting explicitly the ENV['TZ'] before the scheduler instantiation:

ENV['TZ'] = 'Asia/Shanghai'
scheduler = Rufus::Scheduler.new
scheduler.every '2s' do
  puts "#{Time.now} Hello #{ENV['TZ']}!"
end

On Rails you might want to try with:

ENV['TZ'] = Time.zone.name # Rails only
scheduler = Rufus::Scheduler.new
scheduler.every '2s' do
  puts "#{Time.now} Hello #{ENV['TZ']}!"
end

(Hat tip to Alexander in gh-230)

Rails sets its timezone under config/application.rb.

Rufus-Scheduler 3.3.3 detects the presence of Rails and uses its timezone setting (tested with Rails 4), so setting ENV['TZ'] should not be necessary.

The value can be determined thanks to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.

Use a "continent/city" identifier (for example "Asia/Shanghai"). Do not use an abbreviation (not "CST") and do not use a local time zone name (not "中国标准时间" nor "Eastern Standard Time" which, for instance, points to a time zone in America and to another one in Australia...).

If the error persists (and especially on Windows), try to add the tzinfo-data to your Gemfile, as in:

gem 'tzinfo-data'

or by manually requiring it before requiring rufus-scheduler (if you don't use Bundler):

require 'tzinfo/data'
require 'rufus-scheduler'

so Rails?

Yes, I know, all of the above is boring and you're only looking for a snippet to paste in your Ruby-on-Rails application to schedule...

Here is an example initializer:

#
# config/initializers/scheduler.rb

require 'rufus-scheduler'

# Let's use the rufus-scheduler singleton
#
s = Rufus::Scheduler.singleton


# Stupid recurrent task...
#
s.every '1m' do

  Rails.logger.info "hello, it's #{Time.now}"
  Rails.logger.flush
end

And now you tell me that this is good, but you want to schedule stuff from your controller.

Maybe:

class ScheController < ApplicationController

  # GET /sche/
  #
  def index

    job_id =
      Rufus::Scheduler.singleton.in '5s' do
        Rails.logger.info "time flies, it's now #{Time.now}"
      end

    render text: "scheduled job #{job_id}"
  end
end

The rufus-scheduler singleton is instantiated in the config/initializers/scheduler.rb file, it's then available throughout the webapp via Rufus::Scheduler.singleton.

Warning: this works well with single-process Ruby servers like Webrick and Thin. Using rufus-scheduler with Passenger or Unicorn requires a bit more knowledge and tuning, gently provided by a bit of googling and reading, see Faq above.

avoid scheduling when running the Ruby on Rails console

(Written in reply to gh-186)

If you don't want rufus-scheduler to trigger anything while running the Ruby on Rails console, running for tests/specs, or running from a Rake task, you can insert a conditional return statement before jobs are added to the scheduler instance:

#
# config/initializers/scheduler.rb

require 'rufus-scheduler'

return if defined?(Rails::Console) || Rails.env.test? || File.split($PROGRAM_NAME).last == 'rake'
  #
  # do not schedule when Rails is run from its console, for a test/spec, or
  # from a Rake task

# return if $PROGRAM_NAME.include?('spring')
  #
  # see https://github.com/jmettraux/rufus-scheduler/issues/186

s = Rufus::Scheduler.singleton

s.every '1m' do
  Rails.logger.info "hello, it's #{Time.now}"
  Rails.logger.flush
end

(Beware later version of Rails where Spring takes care pre-running the initializers. Running spring stop or disabling Spring might be necessary in some cases to see changes to initializers being taken into account.)

rails server -d

(Written in reply to https://github.com/jmettraux/rufus-scheduler/issues/165 )

There is the handy rails server -d that starts a development Rails as a daemon. The annoying thing is that the scheduler as seen above is started in the main process that then gets forked and daemonized. The rufus-scheduler thread (and any other thread) gets lost, no scheduling happens.

I avoid running -d in development mode and bother about daemonizing only for production deployment.

These are two well crafted articles on process daemonization, please read them:

If, anyway, you need something like rails server -d, why not try bundle exec unicorn -D instead? In my (limited) experience, it worked out of the box (well, had to add gem 'unicorn' to Gemfile first).

executor / reloader

You might benefit from wraping your scheduled code in the executor or reloader. Read more here: https://guides.rubyonrails.org/threading_and_code_execution.html

support

see getting help above.


Author: jmettraux
Source code: https://github.com/jmettraux/rufus-scheduler
License: MIT license

#ruby