Brooke  Giles

Brooke Giles

1569035058

How to perform Speech Recognition with the 3 Deep Learning Frameworks

Introduction

Speech recognition is invading our lives. It’s built into our phones (Siri), our game consoles (Kinect), our smartwatches (Apple Watch), and even our homes (Amazon Echo). But speech recognition has been around for decades, so why is it just now hitting the mainstream?

The reason is that Deep Learning finally made speech recognition accurate enough to be useful outside of carefully-controlled environments. In this post, we’ll learn how to perform speech recognition with 3 different implementations of popular Deep Learning Frameworks.

Speech Recognition — The Classic Way

In the era of OK Google, I might not really need to define ASR, but here’s a basic description: Say you have a person or an audio source saying something textual, and you have a bunch of microphones that are receiving the audio signals. You can get these signals from one or many devices, and then pass them into an ASR system — whose job it is to infer the original source transcript that the person spoke or that the device played.

So why is ASR important?

Firstly, it’s a very natural interface for human communication. You don’t need a mouse or a keyboard, so it’s obviously a good way to interact with machines. You don’t even really need to learn new techniques because most people learn to speak as a function of natural development. It’s a very natural interface for talking with simple devices such as cars, handheld phones, and chatbots.

So how is this done classically?

As observed above, the classic way of building a speech recognition system is to build a generative model of language. On the rightmost side, you produce a certain sequence of words from language models. And then for each word, you have a pronunciation model that says how this particular word is spoken. Typically it’s written out as the sequence of phonemes — which are basic units of sound, but for our vocabulary, we’ll just say a sequence of tokens — which represent a cluster of things that have been defined by linguistics experts.

Then, the pronunciation models are fed into an acoustic model, which basically defines how does a given token sounds. These acoustic models are now used to describe the data itself. Here the data would be x, which is the sequence of frames of audio features from x1 to xT. Typically, these features are something that signal processing experts have defined (such as the frequency components of the audio waveforms that are captured).

Each of these different components in this pipeline uses a different statistical model:

  • In the past, language models were typically N-gram models, which worked very well for simple problems with limited speech input data. They are essentially tables describing the probabilities of token sequences.
  • The pronunciation models were simple lookup tables with probabilities associated with pronunciations. These tables would be very large tables of different pronunciations.
  • Acoustic models are built using Gaussian Mixture Models with very specific architectures associated with them.
  • The speech processing was pre-defined.

Once we have this kind of model built, we can perform the recognition by doing the inference on the data received. So you get a waveform, you compute the features for it (X) and do a search for Y that gives the highest probabilities of X.

The Neural Network Invasion

Over time, researchers started noticing that each of these components could work more effectively if we used neural networks.

  • Instead of the N-gram language models, we can build neural language models and feed them into a speech recognition system to restore things that were produced by a first path speech recognition system.
  • Looking into the pronunciation models, we can figure out how to do pronunciation for a new sequence of characters that we’ve never seen before using a neural network.
  • For acoustic models, we can build deep neural networks (such as LSTM-based models) to get much better classification accuracy scores of the features for the current frame.
  • Interestingly enough, even the speech pre-processing steps were found to be replaceable with convolutional neural networks on raw speech signals.

However, there’s still a problem. There are neural networks in each component, but they’re trained independently with different objectives. Because of that, the errors in one component may not behave well with the errors in another component. So that’s the basic motivation for devising a process where you can train the entire model as one big component itself.

These so-called end-to-end models encompass more and more components in the pipeline discussed above. The 2 most popular ones are (1) Connectionist Temporal Classification (CTC), which is in wide usage these days at Baidu and Google, but it requires a lot of training; and (2) Sequence-To-Sequence (Seq-2-Seq), which doesn’t require manual customization.

The basic motivation is that we want to do end-to-end speech recognition. We are given the audio X — which is a sequence of frames from x1 to xT, and the corresponding output text Y — which is a sequence of y1 to yL. Y is just a text sequence (transcript) and X is the audio processed spectrogram. We want to perform speech recognition by learning a probabilistic model p(Y|X): starting with the data and predicting the target sequences themselves.

1 — Connectionist Temporal Classification

The first of these models is called Connectionist Temporal Classification (CTC) ([1], [2], [3]). X is a sequence of data frames with length T: x1, x2, …, xT, and Y is the output tokens of length L: y1, y2, …, yL. Because of the way the model is constructed, we require T to be greater than L.

This model has a very specific structure that makes it suitable for speech:

  • You get the spectrogram at the bottom (X). You feed it into a bi-directional recurrent neural network, and as a result, the arrow pointing at any time step depends on the entirety of the input data. As such, it can compute a fairly complicated function of the entire data X.
  • This model, at the top, has softmax functions at every timeframe corresponding to the input. The softmax function is applied to a vocabulary with a particular length that you’re interested in. In this case, you have the lowercase letters a to z and some punctuation symbols. So the vocabulary for CTC would be all that and an extra token called a blank token.
  • Each frame of the prediction is basically producing a log probability for a different token class at that time step. In the case above, a score s(k, t) is the log probability of category k at time step t given the data X.

In a CTC model, if you look at just the softmax functions that are produced by the recurring neural network over the entire time step, you’ll be able to find the probability of the transcript through these individual softmax functions over time.

Let’s take a look at an example (below). The CTC model can represent all these paths through the entire space of softmax functions and look at only the symbols that correspond to each of the time steps.

As seen on the left, the CTC model will go through 2 C symbols, then through a blank symbol, then produce 2 A symbols, then produce another blank symbol, then transition to a T symbol, and then finally produce a blank symbol again.

So when you go through these paths with the constraint, you can only transition between the same phoneme from one step to the next. Therefore, you’ll end up with different ways of representing an output sequence.

For the example above, we have cc <b> aa <b> t <b> or cc <b> <b> a <b> t <b> or cccc <b> aaaa <b> tttt <b>. Given these constraints, it turns out that even though there’s an exponential number of paths by which you can produce the same output symbol, you can actually do it correctly using a dynamic programming algorithm. Because of dynamic programming, it’s possible to compute both the log probability p(Y|X) and its gradient exactly. This gradient can be backpropagated to a neural network whose parameters can then be adjusted by your favorite optimizer!

Below are some results for CTC, which show how the model functions on given audio. A raw waveform is aligned at the bottom, and the corresponding predictions are outputted at the top. You can see that it produces the symbol H at the beginning. At a certain point, it gets a very high probability, which means that the model is confident that it hears the sound corresponding to H.

However, there are some drawbacks to CTC language models. They often misspell words and struggle with grammar. So if you had some way to figure out how to rank the different paths produced from the model and re-rank them just by the language model, the results should be much better.

Google actually fixed these problems by integrating a language model as part of the CTC model itself during training. That’s the kind of production model currently being deployed with OK Google.

2 — Sequence-To-Sequence

An alternative approach to speech processing is the sequence-to-sequence model that makes next-step predictions. Let’s say that you’re given some data X and that you need to produce some symbols y1 to y{i}. The model predicts the probability of the next symbol of y{i+1}. The goal here is to basically learn a very good model for p.

With the model architecture (left), you have a neural network (which is the decoder in a sequence-to-sequence model) that looks at the entire input (which is the encoder). It feeds in the path symbols that are produced as a recurrent neural network, and then you predict the next token itself as the output.

So this model does speech recognition with the sequence-to-sequence framework. In translation, the X would be the source language. In the speech domain, the X would be a huge sequence of audio that’s now encoded with a recurrent neural network.

What it needs to function is the ability to look at different parts of temporal space, because the input is really long. Intuitively, translation results get worse as the source sentence becomes longer. That’s because it’s really difficult for the model to look in the right place. Turns out, that problem is aggravated a lot more with audio streams that are much longer. Therefore, you would need to implement an attention mechanism if you want to make this model work at all.

Seen in the example on the left, you’re trying to produce the 1st character C. You create an attention vector that essentially looks at different parts of the input time steps and produces the next chapter (which is A) after changing the attention.

If you keep doing this over the entire input stream, then you’re moving forward attention just learned by the model itself. Seen here, it produces the output sequence “cancel, cancel, cancel.”

The Listen, Attend, and Spell [4] model is the canonical model for the seq-2-seq category. Let’s look at the diagram below taken from the paper:

  • In the Listener architecture, you have an encoder structure. For every time step of the input, it produces a vector representation that encodes the input and is represented as h_t at time step t.
  • In the Speller architecture, you have a decoder architecture. You generate the next character c_t at every time step t.
  • The LAS model uses a hierarchical encoder to replace the traditional recurrent neural network. Instead of processing one frame for every time step, it collapses neighboring frames as you feed into the next layer. Because of that, it reduces the number of time steps to be processed, thus making the processing faster.

So what are the limitations of this model?

  • One of the big limitations preventing its use in an online system is that the output produced is being conditioned on the entire input. That means if you’re going to put the model in a real-world speech recognition system, you’d have to first wait for the entire audio to be received before outputting the symbol.
  • Another limitation is that the attention model itself is a computational bottleneck since every output token pays attention to every input time step. This makes it harder and slower for the model to do its learning.
  • Further, as the input is received and becomes longer, the word error rate goes down.

3 — Online Sequence-to-Sequence

Online sequence-to-sequence models are designed to overcome the limits of sequence-to-sequence models—you don’t want to wait for the entire input sequence to arrive, and you also want to avoid using the attention model itself over the entire sequence. Essentially, the intention is to produce the outputs as the inputs arrive. It has to solve the following problem: is the model ready to produce an output now that it’s received this much input?

The most notable online seq-2-seq model is called a Neural Transducer [5]. If you take the input as it comes in, and every so often at a regular interval, you can run a seq-2-seq model on what’s been received in the last block. As seen in the architecture below, the encoder's attention (instead of looking at the entire input) will focus only on a little block. The transducer will produce the output symbols.

The nice thing about the neural transducer is that it maintains causality. More specifically, the model preserves the disadvantage of a seq-2-seq model. It also introduces an alignment problem: in essence, what you want to know is that you have to produce some symbols as outputs, but you don’t know which chunk these symbols should be aligned to.

You can actually make this model better by incorporating convolutional neural networks, which are borrowed from computer vision. The paper [6] uses CNNs to do the encoder side in speech architecture.

You take the traditional model for the pyramid as seen to the left, and instead of building the pyramid by simply stacking 2 things together, you can put a fancy architecture on top when you do the stacking. More specifically, as seen below, you can stack them as feature maps and put a CNN on the top. For the speech recognition problem, the frequency bands and the timestamps of the features that you look at will correspond to a natural substructure of the input data. The convolutional architecture essentially looks at that substructure.

Conclusion

You should now generally be up to speed on the 3 most common deep learning-based frameworks for performing automatic speech recognition in a variety of contexts. The papers that I’ve referenced below will help you get into the nitty-gritty technical details of how they work if you’re inclined to do that.

#deep-learning #machine-learning #data-science

What is GEEK

Buddha Community

How to perform Speech Recognition with the 3 Deep Learning Frameworks
Veronica  Roob

Veronica Roob

1653475560

A Pure PHP Implementation Of The MessagePack Serialization Format

msgpack.php

A pure PHP implementation of the MessagePack serialization format.

Features

Installation

The recommended way to install the library is through Composer:

composer require rybakit/msgpack

Usage

Packing

To pack values you can either use an instance of a Packer:

$packer = new Packer();
$packed = $packer->pack($value);

or call a static method on the MessagePack class:

$packed = MessagePack::pack($value);

In the examples above, the method pack automatically packs a value depending on its type. However, not all PHP types can be uniquely translated to MessagePack types. For example, the MessagePack format defines map and array types, which are represented by a single array type in PHP. By default, the packer will pack a PHP array as a MessagePack array if it has sequential numeric keys, starting from 0 and as a MessagePack map otherwise:

$mpArr1 = $packer->pack([1, 2]);               // MP array [1, 2]
$mpArr2 = $packer->pack([0 => 1, 1 => 2]);     // MP array [1, 2]
$mpMap1 = $packer->pack([0 => 1, 2 => 3]);     // MP map {0: 1, 2: 3}
$mpMap2 = $packer->pack([1 => 2, 2 => 3]);     // MP map {1: 2, 2: 3}
$mpMap3 = $packer->pack(['a' => 1, 'b' => 2]); // MP map {a: 1, b: 2}

However, sometimes you need to pack a sequential array as a MessagePack map. To do this, use the packMap method:

$mpMap = $packer->packMap([1, 2]); // {0: 1, 1: 2}

Here is a list of type-specific packing methods:

$packer->packNil();           // MP nil
$packer->packBool(true);      // MP bool
$packer->packInt(42);         // MP int
$packer->packFloat(M_PI);     // MP float (32 or 64)
$packer->packFloat32(M_PI);   // MP float 32
$packer->packFloat64(M_PI);   // MP float 64
$packer->packStr('foo');      // MP str
$packer->packBin("\x80");     // MP bin
$packer->packArray([1, 2]);   // MP array
$packer->packMap(['a' => 1]); // MP map
$packer->packExt(1, "\xaa");  // MP ext

Check the "Custom types" section below on how to pack custom types.

Packing options

The Packer object supports a number of bitmask-based options for fine-tuning the packing process (defaults are in bold):

NameDescription
FORCE_STRForces PHP strings to be packed as MessagePack UTF-8 strings
FORCE_BINForces PHP strings to be packed as MessagePack binary data
DETECT_STR_BINDetects MessagePack str/bin type automatically
  
FORCE_ARRForces PHP arrays to be packed as MessagePack arrays
FORCE_MAPForces PHP arrays to be packed as MessagePack maps
DETECT_ARR_MAPDetects MessagePack array/map type automatically
  
FORCE_FLOAT32Forces PHP floats to be packed as 32-bits MessagePack floats
FORCE_FLOAT64Forces PHP floats to be packed as 64-bits MessagePack floats

The type detection mode (DETECT_STR_BIN/DETECT_ARR_MAP) adds some overhead which can be noticed when you pack large (16- and 32-bit) arrays or strings. However, if you know the value type in advance (for example, you only work with UTF-8 strings or/and associative arrays), you can eliminate this overhead by forcing the packer to use the appropriate type, which will save it from running the auto-detection routine. Another option is to explicitly specify the value type. The library provides 2 auxiliary classes for this, Map and Bin. Check the "Custom types" section below for details.

Examples:

// detect str/bin type and pack PHP 64-bit floats (doubles) to MP 32-bit floats
$packer = new Packer(PackOptions::DETECT_STR_BIN | PackOptions::FORCE_FLOAT32);

// these will throw MessagePack\Exception\InvalidOptionException
$packer = new Packer(PackOptions::FORCE_STR | PackOptions::FORCE_BIN);
$packer = new Packer(PackOptions::FORCE_FLOAT32 | PackOptions::FORCE_FLOAT64);

Unpacking

To unpack data you can either use an instance of a BufferUnpacker:

$unpacker = new BufferUnpacker();

$unpacker->reset($packed);
$value = $unpacker->unpack();

or call a static method on the MessagePack class:

$value = MessagePack::unpack($packed);

If the packed data is received in chunks (e.g. when reading from a stream), use the tryUnpack method, which attempts to unpack data and returns an array of unpacked messages (if any) instead of throwing an InsufficientDataException:

while ($chunk = ...) {
    $unpacker->append($chunk);
    if ($messages = $unpacker->tryUnpack()) {
        return $messages;
    }
}

If you want to unpack from a specific position in a buffer, use seek:

$unpacker->seek(42); // set position equal to 42 bytes
$unpacker->seek(-8); // set position to 8 bytes before the end of the buffer

To skip bytes from the current position, use skip:

$unpacker->skip(10); // set position to 10 bytes ahead of the current position

To get the number of remaining (unread) bytes in the buffer:

$unreadBytesCount = $unpacker->getRemainingCount();

To check whether the buffer has unread data:

$hasUnreadBytes = $unpacker->hasRemaining();

If needed, you can remove already read data from the buffer by calling:

$releasedBytesCount = $unpacker->release();

With the read method you can read raw (packed) data:

$packedData = $unpacker->read(2); // read 2 bytes

Besides the above methods BufferUnpacker provides type-specific unpacking methods, namely:

$unpacker->unpackNil();   // PHP null
$unpacker->unpackBool();  // PHP bool
$unpacker->unpackInt();   // PHP int
$unpacker->unpackFloat(); // PHP float
$unpacker->unpackStr();   // PHP UTF-8 string
$unpacker->unpackBin();   // PHP binary string
$unpacker->unpackArray(); // PHP sequential array
$unpacker->unpackMap();   // PHP associative array
$unpacker->unpackExt();   // PHP MessagePack\Type\Ext object

Unpacking options

The BufferUnpacker object supports a number of bitmask-based options for fine-tuning the unpacking process (defaults are in bold):

NameDescription
BIGINT_AS_STRConverts overflowed integers to strings [1]
BIGINT_AS_GMPConverts overflowed integers to GMP objects [2]
BIGINT_AS_DECConverts overflowed integers to Decimal\Decimal objects [3]

1. The binary MessagePack format has unsigned 64-bit as its largest integer data type, but PHP does not support such integers, which means that an overflow can occur during unpacking.

2. Make sure the GMP extension is enabled.

3. Make sure the Decimal extension is enabled.

Examples:

$packedUint64 = "\xcf"."\xff\xff\xff\xff"."\xff\xff\xff\xff";

$unpacker = new BufferUnpacker($packedUint64);
var_dump($unpacker->unpack()); // string(20) "18446744073709551615"

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_GMP);
var_dump($unpacker->unpack()); // object(GMP) {...}

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_DEC);
var_dump($unpacker->unpack()); // object(Decimal\Decimal) {...}

Custom types

In addition to the basic types, the library provides functionality to serialize and deserialize arbitrary types. This can be done in several ways, depending on your use case. Let's take a look at them.

Type objects

If you need to serialize an instance of one of your classes into one of the basic MessagePack types, the best way to do this is to implement the CanBePacked interface in the class. A good example of such a class is the Map type class that comes with the library. This type is useful when you want to explicitly specify that a given PHP array should be packed as a MessagePack map without triggering an automatic type detection routine:

$packer = new Packer();

$packedMap = $packer->pack(new Map([1, 2, 3]));
$packedArray = $packer->pack([1, 2, 3]);

More type examples can be found in the src/Type directory.

Type transformers

As with type objects, type transformers are only responsible for serializing values. They should be used when you need to serialize a value that does not implement the CanBePacked interface. Examples of such values could be instances of built-in or third-party classes that you don't own, or non-objects such as resources.

A transformer class must implement the CanPack interface. To use a transformer, it must first be registered in the packer. Here is an example of how to serialize PHP streams into the MessagePack bin format type using one of the supplied transformers, StreamTransformer:

$packer = new Packer(null, [new StreamTransformer()]);

$packedBin = $packer->pack(fopen('/path/to/file', 'r+'));

More type transformer examples can be found in the src/TypeTransformer directory.

Extensions

In contrast to the cases described above, extensions are intended to handle extension types and are responsible for both serialization and deserialization of values (types).

An extension class must implement the Extension interface. To use an extension, it must first be registered in the packer and the unpacker.

The MessagePack specification divides extension types into two groups: predefined and application-specific. Currently, there is only one predefined type in the specification, Timestamp.

Timestamp

The Timestamp extension type is a predefined type. Support for this type in the library is done through the TimestampExtension class. This class is responsible for handling Timestamp objects, which represent the number of seconds and optional adjustment in nanoseconds:

$timestampExtension = new TimestampExtension();

$packer = new Packer();
$packer = $packer->extendWith($timestampExtension);

$unpacker = new BufferUnpacker();
$unpacker = $unpacker->extendWith($timestampExtension);

$packedTimestamp = $packer->pack(Timestamp::now());
$timestamp = $unpacker->reset($packedTimestamp)->unpack();

$seconds = $timestamp->getSeconds();
$nanoseconds = $timestamp->getNanoseconds();

When using the MessagePack class, the Timestamp extension is already registered:

$packedTimestamp = MessagePack::pack(Timestamp::now());
$timestamp = MessagePack::unpack($packedTimestamp);

Application-specific extensions

In addition, the format can be extended with your own types. For example, to make the built-in PHP DateTime objects first-class citizens in your code, you can create a corresponding extension, as shown in the example. Please note, that custom extensions have to be registered with a unique extension ID (an integer from 0 to 127).

More extension examples can be found in the examples/MessagePack directory.

To learn more about how extension types can be useful, check out this article.

Exceptions

If an error occurs during packing/unpacking, a PackingFailedException or an UnpackingFailedException will be thrown, respectively. In addition, an InsufficientDataException can be thrown during unpacking.

An InvalidOptionException will be thrown in case an invalid option (or a combination of mutually exclusive options) is used.

Tests

Run tests as follows:

vendor/bin/phpunit

Also, if you already have Docker installed, you can run the tests in a docker container. First, create a container:

./dockerfile.sh | docker build -t msgpack -

The command above will create a container named msgpack with PHP 8.1 runtime. You may change the default runtime by defining the PHP_IMAGE environment variable:

PHP_IMAGE='php:8.0-cli' ./dockerfile.sh | docker build -t msgpack -

See a list of various images here.

Then run the unit tests:

docker run --rm -v $PWD:/msgpack -w /msgpack msgpack

Fuzzing

To ensure that the unpacking works correctly with malformed/semi-malformed data, you can use a testing technique called Fuzzing. The library ships with a help file (target) for PHP-Fuzzer and can be used as follows:

php-fuzzer fuzz tests/fuzz_buffer_unpacker.php

Performance

To check performance, run:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0030 ........ 0.0139
false ................ 0.0037 ........ 0.0144
true ................. 0.0040 ........ 0.0137
7-bit uint #1 ........ 0.0052 ........ 0.0120
7-bit uint #2 ........ 0.0059 ........ 0.0114
7-bit uint #3 ........ 0.0061 ........ 0.0119
5-bit sint #1 ........ 0.0067 ........ 0.0126
5-bit sint #2 ........ 0.0064 ........ 0.0132
5-bit sint #3 ........ 0.0066 ........ 0.0135
8-bit uint #1 ........ 0.0078 ........ 0.0200
8-bit uint #2 ........ 0.0077 ........ 0.0212
8-bit uint #3 ........ 0.0086 ........ 0.0203
16-bit uint #1 ....... 0.0111 ........ 0.0271
16-bit uint #2 ....... 0.0115 ........ 0.0260
16-bit uint #3 ....... 0.0103 ........ 0.0273
32-bit uint #1 ....... 0.0116 ........ 0.0326
32-bit uint #2 ....... 0.0118 ........ 0.0332
32-bit uint #3 ....... 0.0127 ........ 0.0325
64-bit uint #1 ....... 0.0140 ........ 0.0277
64-bit uint #2 ....... 0.0134 ........ 0.0294
64-bit uint #3 ....... 0.0134 ........ 0.0281
8-bit int #1 ......... 0.0086 ........ 0.0241
8-bit int #2 ......... 0.0089 ........ 0.0225
8-bit int #3 ......... 0.0085 ........ 0.0229
16-bit int #1 ........ 0.0118 ........ 0.0280
16-bit int #2 ........ 0.0121 ........ 0.0270
16-bit int #3 ........ 0.0109 ........ 0.0274
32-bit int #1 ........ 0.0128 ........ 0.0346
32-bit int #2 ........ 0.0118 ........ 0.0339
32-bit int #3 ........ 0.0135 ........ 0.0368
64-bit int #1 ........ 0.0138 ........ 0.0276
64-bit int #2 ........ 0.0132 ........ 0.0286
64-bit int #3 ........ 0.0137 ........ 0.0274
64-bit int #4 ........ 0.0180 ........ 0.0285
64-bit float #1 ...... 0.0134 ........ 0.0284
64-bit float #2 ...... 0.0125 ........ 0.0275
64-bit float #3 ...... 0.0126 ........ 0.0283
fix string #1 ........ 0.0035 ........ 0.0133
fix string #2 ........ 0.0094 ........ 0.0216
fix string #3 ........ 0.0094 ........ 0.0222
fix string #4 ........ 0.0091 ........ 0.0241
8-bit string #1 ...... 0.0122 ........ 0.0301
8-bit string #2 ...... 0.0118 ........ 0.0304
8-bit string #3 ...... 0.0119 ........ 0.0315
16-bit string #1 ..... 0.0150 ........ 0.0388
16-bit string #2 ..... 0.1545 ........ 0.1665
32-bit string ........ 0.1570 ........ 0.1756
wide char string #1 .. 0.0091 ........ 0.0236
wide char string #2 .. 0.0122 ........ 0.0313
8-bit binary #1 ...... 0.0100 ........ 0.0302
8-bit binary #2 ...... 0.0123 ........ 0.0324
8-bit binary #3 ...... 0.0126 ........ 0.0327
16-bit binary ........ 0.0168 ........ 0.0372
32-bit binary ........ 0.1588 ........ 0.1754
fix array #1 ......... 0.0042 ........ 0.0131
fix array #2 ......... 0.0294 ........ 0.0367
fix array #3 ......... 0.0412 ........ 0.0472
16-bit array #1 ...... 0.1378 ........ 0.1596
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.1865 ........ 0.2283
fix map #1 ........... 0.0725 ........ 0.1048
fix map #2 ........... 0.0319 ........ 0.0405
fix map #3 ........... 0.0356 ........ 0.0665
fix map #4 ........... 0.0465 ........ 0.0497
16-bit map #1 ........ 0.2540 ........ 0.3028
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.2372 ........ 0.2710
fixext 1 ............. 0.0283 ........ 0.0358
fixext 2 ............. 0.0291 ........ 0.0371
fixext 4 ............. 0.0302 ........ 0.0355
fixext 8 ............. 0.0288 ........ 0.0384
fixext 16 ............ 0.0293 ........ 0.0359
8-bit ext ............ 0.0302 ........ 0.0439
16-bit ext ........... 0.0334 ........ 0.0499
32-bit ext ........... 0.1845 ........ 0.1888
32-bit timestamp #1 .. 0.0337 ........ 0.0547
32-bit timestamp #2 .. 0.0335 ........ 0.0560
64-bit timestamp #1 .. 0.0371 ........ 0.0575
64-bit timestamp #2 .. 0.0374 ........ 0.0542
64-bit timestamp #3 .. 0.0356 ........ 0.0533
96-bit timestamp #1 .. 0.0362 ........ 0.0699
96-bit timestamp #2 .. 0.0381 ........ 0.0701
96-bit timestamp #3 .. 0.0367 ........ 0.0687
=============================================
Total                  2.7618          4.0820
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

With JIT:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0005 ........ 0.0054
false ................ 0.0004 ........ 0.0059
true ................. 0.0004 ........ 0.0059
7-bit uint #1 ........ 0.0010 ........ 0.0047
7-bit uint #2 ........ 0.0010 ........ 0.0046
7-bit uint #3 ........ 0.0010 ........ 0.0046
5-bit sint #1 ........ 0.0025 ........ 0.0046
5-bit sint #2 ........ 0.0023 ........ 0.0046
5-bit sint #3 ........ 0.0024 ........ 0.0045
8-bit uint #1 ........ 0.0043 ........ 0.0081
8-bit uint #2 ........ 0.0043 ........ 0.0079
8-bit uint #3 ........ 0.0041 ........ 0.0080
16-bit uint #1 ....... 0.0064 ........ 0.0095
16-bit uint #2 ....... 0.0064 ........ 0.0091
16-bit uint #3 ....... 0.0064 ........ 0.0094
32-bit uint #1 ....... 0.0085 ........ 0.0114
32-bit uint #2 ....... 0.0077 ........ 0.0122
32-bit uint #3 ....... 0.0077 ........ 0.0120
64-bit uint #1 ....... 0.0085 ........ 0.0159
64-bit uint #2 ....... 0.0086 ........ 0.0157
64-bit uint #3 ....... 0.0086 ........ 0.0158
8-bit int #1 ......... 0.0042 ........ 0.0080
8-bit int #2 ......... 0.0042 ........ 0.0080
8-bit int #3 ......... 0.0042 ........ 0.0081
16-bit int #1 ........ 0.0065 ........ 0.0095
16-bit int #2 ........ 0.0065 ........ 0.0090
16-bit int #3 ........ 0.0056 ........ 0.0085
32-bit int #1 ........ 0.0067 ........ 0.0107
32-bit int #2 ........ 0.0066 ........ 0.0106
32-bit int #3 ........ 0.0063 ........ 0.0104
64-bit int #1 ........ 0.0072 ........ 0.0162
64-bit int #2 ........ 0.0073 ........ 0.0174
64-bit int #3 ........ 0.0072 ........ 0.0164
64-bit int #4 ........ 0.0077 ........ 0.0161
64-bit float #1 ...... 0.0053 ........ 0.0135
64-bit float #2 ...... 0.0053 ........ 0.0135
64-bit float #3 ...... 0.0052 ........ 0.0135
fix string #1 ....... -0.0002 ........ 0.0044
fix string #2 ........ 0.0035 ........ 0.0067
fix string #3 ........ 0.0035 ........ 0.0077
fix string #4 ........ 0.0033 ........ 0.0078
8-bit string #1 ...... 0.0059 ........ 0.0110
8-bit string #2 ...... 0.0063 ........ 0.0121
8-bit string #3 ...... 0.0064 ........ 0.0124
16-bit string #1 ..... 0.0099 ........ 0.0146
16-bit string #2 ..... 0.1522 ........ 0.1474
32-bit string ........ 0.1511 ........ 0.1483
wide char string #1 .. 0.0039 ........ 0.0084
wide char string #2 .. 0.0073 ........ 0.0123
8-bit binary #1 ...... 0.0040 ........ 0.0112
8-bit binary #2 ...... 0.0075 ........ 0.0123
8-bit binary #3 ...... 0.0077 ........ 0.0129
16-bit binary ........ 0.0096 ........ 0.0145
32-bit binary ........ 0.1535 ........ 0.1479
fix array #1 ......... 0.0008 ........ 0.0061
fix array #2 ......... 0.0121 ........ 0.0165
fix array #3 ......... 0.0193 ........ 0.0222
16-bit array #1 ...... 0.0607 ........ 0.0479
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.0749 ........ 0.0824
fix map #1 ........... 0.0329 ........ 0.0431
fix map #2 ........... 0.0161 ........ 0.0189
fix map #3 ........... 0.0205 ........ 0.0262
fix map #4 ........... 0.0252 ........ 0.0205
16-bit map #1 ........ 0.1016 ........ 0.0927
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.1096 ........ 0.1030
fixext 1 ............. 0.0157 ........ 0.0161
fixext 2 ............. 0.0175 ........ 0.0183
fixext 4 ............. 0.0156 ........ 0.0185
fixext 8 ............. 0.0163 ........ 0.0184
fixext 16 ............ 0.0164 ........ 0.0182
8-bit ext ............ 0.0158 ........ 0.0207
16-bit ext ........... 0.0203 ........ 0.0219
32-bit ext ........... 0.1614 ........ 0.1539
32-bit timestamp #1 .. 0.0195 ........ 0.0249
32-bit timestamp #2 .. 0.0188 ........ 0.0260
64-bit timestamp #1 .. 0.0207 ........ 0.0281
64-bit timestamp #2 .. 0.0212 ........ 0.0291
64-bit timestamp #3 .. 0.0207 ........ 0.0295
96-bit timestamp #1 .. 0.0222 ........ 0.0358
96-bit timestamp #2 .. 0.0228 ........ 0.0353
96-bit timestamp #3 .. 0.0210 ........ 0.0319
=============================================
Total                  1.6432          1.9674
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

You may change default benchmark settings by defining the following environment variables:

NameDefault
MP_BENCH_TARGETSpure_p,pure_u, see a list of available targets
MP_BENCH_ITERATIONS100_000
MP_BENCH_DURATIONnot set
MP_BENCH_ROUNDS3
MP_BENCH_TESTS-@slow, see a list of available tests

For example:

export MP_BENCH_TARGETS=pure_p
export MP_BENCH_ITERATIONS=1000000
export MP_BENCH_ROUNDS=5
# a comma separated list of test names
export MP_BENCH_TESTS='complex array, complex map'
# or a group name
# export MP_BENCH_TESTS='-@slow' // @pecl_comp
# or a regexp
# export MP_BENCH_TESTS='/complex (array|map)/'

Another example, benchmarking both the library and the PECL extension:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0031 ........ 0.0141 ...... 0.0055 ........ 0.0064
false ................ 0.0039 ........ 0.0154 ...... 0.0056 ........ 0.0053
true ................. 0.0038 ........ 0.0139 ...... 0.0056 ........ 0.0044
7-bit uint #1 ........ 0.0061 ........ 0.0110 ...... 0.0059 ........ 0.0046
7-bit uint #2 ........ 0.0065 ........ 0.0119 ...... 0.0042 ........ 0.0029
7-bit uint #3 ........ 0.0054 ........ 0.0117 ...... 0.0045 ........ 0.0025
5-bit sint #1 ........ 0.0047 ........ 0.0103 ...... 0.0038 ........ 0.0022
5-bit sint #2 ........ 0.0048 ........ 0.0117 ...... 0.0038 ........ 0.0022
5-bit sint #3 ........ 0.0046 ........ 0.0102 ...... 0.0038 ........ 0.0023
8-bit uint #1 ........ 0.0063 ........ 0.0174 ...... 0.0039 ........ 0.0031
8-bit uint #2 ........ 0.0063 ........ 0.0167 ...... 0.0040 ........ 0.0029
8-bit uint #3 ........ 0.0063 ........ 0.0168 ...... 0.0039 ........ 0.0030
16-bit uint #1 ....... 0.0092 ........ 0.0222 ...... 0.0049 ........ 0.0030
16-bit uint #2 ....... 0.0096 ........ 0.0227 ...... 0.0042 ........ 0.0046
16-bit uint #3 ....... 0.0123 ........ 0.0274 ...... 0.0059 ........ 0.0051
32-bit uint #1 ....... 0.0136 ........ 0.0331 ...... 0.0060 ........ 0.0048
32-bit uint #2 ....... 0.0130 ........ 0.0336 ...... 0.0070 ........ 0.0048
32-bit uint #3 ....... 0.0127 ........ 0.0329 ...... 0.0051 ........ 0.0048
64-bit uint #1 ....... 0.0126 ........ 0.0268 ...... 0.0055 ........ 0.0049
64-bit uint #2 ....... 0.0135 ........ 0.0281 ...... 0.0052 ........ 0.0046
64-bit uint #3 ....... 0.0131 ........ 0.0274 ...... 0.0069 ........ 0.0044
8-bit int #1 ......... 0.0077 ........ 0.0236 ...... 0.0058 ........ 0.0044
8-bit int #2 ......... 0.0087 ........ 0.0244 ...... 0.0058 ........ 0.0048
8-bit int #3 ......... 0.0084 ........ 0.0241 ...... 0.0055 ........ 0.0049
16-bit int #1 ........ 0.0112 ........ 0.0271 ...... 0.0048 ........ 0.0045
16-bit int #2 ........ 0.0124 ........ 0.0292 ...... 0.0057 ........ 0.0049
16-bit int #3 ........ 0.0118 ........ 0.0270 ...... 0.0058 ........ 0.0050
32-bit int #1 ........ 0.0137 ........ 0.0366 ...... 0.0058 ........ 0.0051
32-bit int #2 ........ 0.0133 ........ 0.0366 ...... 0.0056 ........ 0.0049
32-bit int #3 ........ 0.0129 ........ 0.0350 ...... 0.0052 ........ 0.0048
64-bit int #1 ........ 0.0145 ........ 0.0254 ...... 0.0034 ........ 0.0025
64-bit int #2 ........ 0.0097 ........ 0.0214 ...... 0.0034 ........ 0.0025
64-bit int #3 ........ 0.0096 ........ 0.0287 ...... 0.0059 ........ 0.0050
64-bit int #4 ........ 0.0143 ........ 0.0277 ...... 0.0059 ........ 0.0046
64-bit float #1 ...... 0.0134 ........ 0.0281 ...... 0.0057 ........ 0.0052
64-bit float #2 ...... 0.0141 ........ 0.0281 ...... 0.0057 ........ 0.0050
64-bit float #3 ...... 0.0144 ........ 0.0282 ...... 0.0057 ........ 0.0050
fix string #1 ........ 0.0036 ........ 0.0143 ...... 0.0066 ........ 0.0053
fix string #2 ........ 0.0107 ........ 0.0222 ...... 0.0065 ........ 0.0068
fix string #3 ........ 0.0116 ........ 0.0245 ...... 0.0063 ........ 0.0069
fix string #4 ........ 0.0105 ........ 0.0253 ...... 0.0083 ........ 0.0077
8-bit string #1 ...... 0.0126 ........ 0.0318 ...... 0.0075 ........ 0.0088
8-bit string #2 ...... 0.0121 ........ 0.0295 ...... 0.0076 ........ 0.0086
8-bit string #3 ...... 0.0125 ........ 0.0293 ...... 0.0130 ........ 0.0093
16-bit string #1 ..... 0.0159 ........ 0.0368 ...... 0.0117 ........ 0.0086
16-bit string #2 ..... 0.1547 ........ 0.1686 ...... 0.1516 ........ 0.1373
32-bit string ........ 0.1558 ........ 0.1729 ...... 0.1511 ........ 0.1396
wide char string #1 .. 0.0098 ........ 0.0237 ...... 0.0066 ........ 0.0065
wide char string #2 .. 0.0128 ........ 0.0291 ...... 0.0061 ........ 0.0082
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0040 ........ 0.0129 ...... 0.0120 ........ 0.0058
fix array #2 ......... 0.0279 ........ 0.0390 ...... 0.0143 ........ 0.0165
fix array #3 ......... 0.0415 ........ 0.0463 ...... 0.0162 ........ 0.0187
16-bit array #1 ...... 0.1349 ........ 0.1628 ...... 0.0334 ........ 0.0341
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0345 ........ 0.0391 ...... 0.0143 ........ 0.0168
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0459 ........ 0.0473 ...... 0.0151 ........ 0.0163
16-bit map #1 ........ 0.2518 ........ 0.2962 ...... 0.0400 ........ 0.0490
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.2380 ........ 0.2682 ...... 0.0545 ........ 0.0579
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  1.5625          2.3866        0.7735          0.7243
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

With JIT:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0001 ........ 0.0052 ...... 0.0053 ........ 0.0042
false ................ 0.0007 ........ 0.0060 ...... 0.0057 ........ 0.0043
true ................. 0.0008 ........ 0.0060 ...... 0.0056 ........ 0.0041
7-bit uint #1 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0041
7-bit uint #2 ........ 0.0021 ........ 0.0043 ...... 0.0062 ........ 0.0041
7-bit uint #3 ........ 0.0022 ........ 0.0044 ...... 0.0061 ........ 0.0040
5-bit sint #1 ........ 0.0030 ........ 0.0048 ...... 0.0062 ........ 0.0040
5-bit sint #2 ........ 0.0032 ........ 0.0046 ...... 0.0062 ........ 0.0040
5-bit sint #3 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0040
8-bit uint #1 ........ 0.0054 ........ 0.0079 ...... 0.0062 ........ 0.0050
8-bit uint #2 ........ 0.0051 ........ 0.0079 ...... 0.0064 ........ 0.0044
8-bit uint #3 ........ 0.0051 ........ 0.0082 ...... 0.0062 ........ 0.0044
16-bit uint #1 ....... 0.0077 ........ 0.0094 ...... 0.0065 ........ 0.0045
16-bit uint #2 ....... 0.0077 ........ 0.0094 ...... 0.0063 ........ 0.0045
16-bit uint #3 ....... 0.0077 ........ 0.0095 ...... 0.0064 ........ 0.0047
32-bit uint #1 ....... 0.0088 ........ 0.0119 ...... 0.0063 ........ 0.0043
32-bit uint #2 ....... 0.0089 ........ 0.0117 ...... 0.0062 ........ 0.0039
32-bit uint #3 ....... 0.0089 ........ 0.0118 ...... 0.0063 ........ 0.0044
64-bit uint #1 ....... 0.0097 ........ 0.0155 ...... 0.0063 ........ 0.0045
64-bit uint #2 ....... 0.0095 ........ 0.0153 ...... 0.0061 ........ 0.0045
64-bit uint #3 ....... 0.0096 ........ 0.0156 ...... 0.0063 ........ 0.0047
8-bit int #1 ......... 0.0053 ........ 0.0083 ...... 0.0062 ........ 0.0044
8-bit int #2 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0044
8-bit int #3 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0043
16-bit int #1 ........ 0.0089 ........ 0.0097 ...... 0.0069 ........ 0.0046
16-bit int #2 ........ 0.0075 ........ 0.0093 ...... 0.0063 ........ 0.0043
16-bit int #3 ........ 0.0075 ........ 0.0094 ...... 0.0062 ........ 0.0046
32-bit int #1 ........ 0.0086 ........ 0.0122 ...... 0.0063 ........ 0.0044
32-bit int #2 ........ 0.0087 ........ 0.0120 ...... 0.0066 ........ 0.0046
32-bit int #3 ........ 0.0086 ........ 0.0121 ...... 0.0060 ........ 0.0044
64-bit int #1 ........ 0.0096 ........ 0.0149 ...... 0.0060 ........ 0.0045
64-bit int #2 ........ 0.0096 ........ 0.0157 ...... 0.0062 ........ 0.0044
64-bit int #3 ........ 0.0096 ........ 0.0160 ...... 0.0063 ........ 0.0046
64-bit int #4 ........ 0.0097 ........ 0.0157 ...... 0.0061 ........ 0.0044
64-bit float #1 ...... 0.0079 ........ 0.0153 ...... 0.0056 ........ 0.0044
64-bit float #2 ...... 0.0079 ........ 0.0152 ...... 0.0057 ........ 0.0045
64-bit float #3 ...... 0.0079 ........ 0.0155 ...... 0.0057 ........ 0.0044
fix string #1 ........ 0.0010 ........ 0.0045 ...... 0.0071 ........ 0.0044
fix string #2 ........ 0.0048 ........ 0.0075 ...... 0.0070 ........ 0.0060
fix string #3 ........ 0.0048 ........ 0.0086 ...... 0.0068 ........ 0.0060
fix string #4 ........ 0.0050 ........ 0.0088 ...... 0.0070 ........ 0.0059
8-bit string #1 ...... 0.0081 ........ 0.0129 ...... 0.0069 ........ 0.0062
8-bit string #2 ...... 0.0086 ........ 0.0128 ...... 0.0069 ........ 0.0065
8-bit string #3 ...... 0.0086 ........ 0.0126 ...... 0.0115 ........ 0.0065
16-bit string #1 ..... 0.0105 ........ 0.0137 ...... 0.0128 ........ 0.0068
16-bit string #2 ..... 0.1510 ........ 0.1486 ...... 0.1526 ........ 0.1391
32-bit string ........ 0.1517 ........ 0.1475 ...... 0.1504 ........ 0.1370
wide char string #1 .. 0.0044 ........ 0.0085 ...... 0.0067 ........ 0.0057
wide char string #2 .. 0.0081 ........ 0.0125 ...... 0.0069 ........ 0.0063
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0014 ........ 0.0059 ...... 0.0132 ........ 0.0055
fix array #2 ......... 0.0146 ........ 0.0156 ...... 0.0155 ........ 0.0148
fix array #3 ......... 0.0211 ........ 0.0229 ...... 0.0179 ........ 0.0180
16-bit array #1 ...... 0.0673 ........ 0.0498 ...... 0.0343 ........ 0.0388
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0148 ........ 0.0180 ...... 0.0156 ........ 0.0179
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0252 ........ 0.0201 ...... 0.0214 ........ 0.0167
16-bit map #1 ........ 0.1027 ........ 0.0836 ...... 0.0388 ........ 0.0510
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.1104 ........ 0.1010 ...... 0.0556 ........ 0.0602
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  0.9642          1.0909        0.8224          0.7213
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

Note that the msgpack extension (v2.1.2) doesn't support ext, bin and UTF-8 str types.

License

The library is released under the MIT License. See the bundled LICENSE file for details.

Author: rybakit
Source Code: https://github.com/rybakit/msgpack.php
License: MIT License

#php 

Marget D

Marget D

1618317562

Top Deep Learning Development Services | Hire Deep Learning Developer

View more: https://www.inexture.com/services/deep-learning-development/

We at Inexture, strategically work on every project we are associated with. We propose a robust set of AI, ML, and DL consulting services. Our virtuoso team of data scientists and developers meticulously work on every project and add a personalized touch to it. Because we keep our clientele aware of everything being done associated with their project so there’s a sense of transparency being maintained. Leverage our services for your next AI project for end-to-end optimum services.

#deep learning development #deep learning framework #deep learning expert #deep learning ai #deep learning services

宇野  和也

宇野 和也

1593877440

Indian Accent Speech Recognition

Traditional ASR (Signal Analysis, MFCC, DTW, HMM & Language Modelling) and DNNs (Custom Models & Baidu DeepSpeech Model) on Indian Accent Speech

Courtesy_: _Speech and Music Technology Lab, IIT Madras

Image Courtesy

Notwithstanding an approved Indian-English accent speech, accent-less enunciation is a myth. Irregardless of the racial stereotypes, our speech is naturally shaped by the vernacular we speak, and the Indian vernaculars are numerous! Then how does a computer decipher speech from different Indian states, which even Indians from other states, find ambiguous to understand?

**ASR (Automatic Speech Recognition) **takes any continuous audio speech and output the equivalent text . In this blog, we will explore some challenges in speech recognition with focus on the speaker-independent recognition, both in theory and practice.

The** challenges in ASR** include

  • Variability of volume
  • Variability of words speed
  • Variability of Speaker
  • Variability of** pitch**
  • Word boundaries: we speak words without pause.
  • **Noises **like background sound, audience talks etc.

Lets address** each of the above problems** in the sections discussed below.

The complete source code of the above studies can be found here.

Models in speech recognition can conceptually be divided into:

  • Acoustic model: Turn sound signals into some kind of phonetic representation.
  • Language model: houses domain knowledge of words, grammar, and sentence structure for the language.

Signal Analysis

When we speak we create sinusoidal vibrations in the air. Higher pitches vibrate faster with a higher frequency than lower pitches. A microphone transduce acoustical energy in vibrations to electrical energy.

If we say “Hello World’ then the corresponding signal would contain 2 blobs

Some of the vibrations in the signal have higher amplitude. The amplitude tells us how much acoustical energy is there in the sound

Our speech is made up of many frequencies at the same time, i.e. it is a sum of all those frequencies. To analyze the signal, we use the component frequencies as features. **Fourier transform **is used to break the signal into these components.

We can use this splitting technique to convert the sound to a Spectrogram, where **frequency **on the vertical axis is plotted against time. The intensity of shading indicates the amplitude of the signal.

Spectrogram of the hello world phrase

To create a Spectrogram,

  1. **Divide the signal **into time frames.
  2. Split each frame signal into frequency components with an FFT.
  3. Each time frame is now represented with a** vector of amplitudes** at each frequency.

one dimensional vector for one time frame

If we line up the vectors again in their time series order, we can have a visual picture of the sound components, the Spectrogram.

Spectrogram can be lined up with the original audio signal in time

Next, we’ll look at Feature Extraction techniques which would reduce the noise and dimensionality of our data.

Unnecessary information is encoded in Spectrograph

Feature Extraction with MFCC

Mel Frequency Cepstrum Coefficient Analysis is the reduction of an audio signal to essential speech component features using both Mel frequency analysis and Cepstral analysis. The range of frequencies are reduced and binned into groups of frequencies that humans can distinguish. The signal is further separated into source and filter so that variations between speakers unrelated to articulation can be filtered away.

a) Mel Frequency Analysis

Only **those frequencies humans can hear are **important for recognizing speech. We can split the frequencies of the Spectrogram into bins relevant to our own ears and filter out sound that we can’t hear.

Frequencies above the black line will be filtered out

b) Cepstral Analysis

We also need to separate the elements of sound that are speaker-independent. We can think of a human voice production model as a combination of source and filter, where the source is unique to an individual and the filter is the articulation of words that we all use when speaking.

Cepstral analysis relies on this model for separating the two. The cepstrum can be extracted from a signal with an algorithm. Thus, we drop the component of speech unique to individual vocal chords and preserving the shape of the sound made by the vocal tract.

Cepstral analysis combined with Mel frequency analysis get you 12 or 13 MFCC features related to speech. **Delta and Delta-Delta MFCC features **can optionally be appended to the feature set, effectively doubling (or tripling) the number of features, up to 39 features, but gives better results in ASR.

Thus MFCC (Mel-frequency cepstral coefficients) Features Extraction,

  • Reduced the dimensionality of our data and
  • We squeeze noise out of the system

So there are 2 Acoustic Features for Speech Recognition:

  • Spectrograms
  • Mel-Frequency Cepstral Coefficients (MFCCs):

When you construct your pipeline, you will be able to choose to use either spectrogram or MFCC features. Next, we’ll look at sound from a language perspective, i.e. the phonetics of the words we hear.

Phonetics

Phonetics is the study of sound in human speech. Linguistic analysis is used to break down human words into their smallest sound segments.


phonemes define the distinct sounds

  • Phoneme is the smallest sound segment that can be used to distinguish one word from another.
  • Grapheme, in contrast, is the smallest distinct unit written in a language. Eg: English has 26 alphabets plus a space (27 graphemes).

Unfortunately, we can’t map phonemes to grapheme, as some letters map to multiple phonemes & some phonemes map to many letters. For example, the C letter sounds different in cat, chat, and circle.

Phonemes are often a useful intermediary between speech and text. If we can successfully produce an acoustic model that decodes a sound signal into phonemes the remaining task would be to map those phonemes to their matching words. This step is called Lexical Decoding, named so as it is based on a lexicon or dictionary of the data set.

If we want to train a limited vocabulary of words we might just skip the phonemes. If we have a large vocabulary, then converting to smaller units first, reduces the total number of comparisons needed.

Acoustic Models and the Trouble with Time

With feature extraction, we’ve addressed noise problems as well as variability of speakers. But we still haven’t solved the problem of matching variable lengths of the same word.

Dynamic Time Warping (DTW) calculates the similarity between two signals, even if their time lengths differ. This can be used to align the sequence data of a new word to its most similar counterpart in a dictionary of word examples.

2 signals mapped with Dynamic Time Warping

#deep-speech #speech #deep-learning #speech-recognition #machine-learning #deep learning

Mikel  Okuneva

Mikel Okuneva

1603735200

Top 10 Deep Learning Sessions To Look Forward To At DVDC 2020

The Deep Learning DevCon 2020, DLDC 2020, has exciting talks and sessions around the latest developments in the field of deep learning, that will not only be interesting for professionals of this field but also for the enthusiasts who are willing to make a career in the field of deep learning. The two-day conference scheduled for 29th and 30th October will host paper presentations, tech talks, workshops that will uncover some interesting developments as well as the latest research and advancement of this area. Further to this, with deep learning gaining massive traction, this conference will highlight some fascinating use cases across the world.

Here are ten interesting talks and sessions of DLDC 2020 that one should definitely attend:

Also Read: Why Deep Learning DevCon Comes At The Right Time


Adversarial Robustness in Deep Learning

By Dipanjan Sarkar

**About: **Adversarial Robustness in Deep Learning is a session presented by Dipanjan Sarkar, a Data Science Lead at Applied Materials, as well as a Google Developer Expert in Machine Learning. In this session, he will focus on the adversarial robustness in the field of deep learning, where he talks about its importance, different types of adversarial attacks, and will showcase some ways to train the neural networks with adversarial realisation. Considering abstract deep learning has brought us tremendous achievements in the fields of computer vision and natural language processing, this talk will be really interesting for people working in this area. With this session, the attendees will have a comprehensive understanding of adversarial perturbations in the field of deep learning and ways to deal with them with common recipes.

Read an interview with Dipanjan Sarkar.

Imbalance Handling with Combination of Deep Variational Autoencoder and NEATER

By Divye Singh

**About: **Imbalance Handling with Combination of Deep Variational Autoencoder and NEATER is a paper presentation by Divye Singh, who has a masters in technology degree in Mathematical Modeling and Simulation and has the interest to research in the field of artificial intelligence, learning-based systems, machine learning, etc. In this paper presentation, he will talk about the common problem of class imbalance in medical diagnosis and anomaly detection, and how the problem can be solved with a deep learning framework. The talk focuses on the paper, where he has proposed a synergistic over-sampling method generating informative synthetic minority class data by filtering the noise from the over-sampled examples. Further, he will also showcase the experimental results on several real-life imbalanced datasets to prove the effectiveness of the proposed method for binary classification problems.

Default Rate Prediction Models for Self-Employment in Korea using Ridge, Random Forest & Deep Neural Network

By Dongsuk Hong

About: This is a paper presentation given by Dongsuk Hong, who is a PhD in Computer Science, and works in the big data centre of Korea Credit Information Services. This talk will introduce the attendees with machine learning and deep learning models for predicting self-employment default rates using credit information. He will talk about the study, where the DNN model is implemented for two purposes — a sub-model for the selection of credit information variables; and works for cascading to the final model that predicts default rates. Hong’s main research area is data analysis of credit information, where she is particularly interested in evaluating the performance of prediction models based on machine learning and deep learning. This talk will be interesting for the deep learning practitioners who are willing to make a career in this field.


#opinions #attend dldc 2020 #deep learning #deep learning sessions #deep learning talks #dldc 2020 #top deep learning sessions at dldc 2020 #top deep learning talks at dldc 2020

Chatgpt-api: Node.js client for the official ChatGPT API

ChatGPT API

Node.js client for the official ChatGPT API.

Intro

This package is a Node.js wrapper around ChatGPT by OpenAI. TS batteries included. ✨

Example usage

Updates

March 1, 2023

The official OpenAI chat completions API has been released, and it is now the default for this package! 🔥

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes✅️ Real ChatGPT models
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT

Note: We strongly recommend using ChatGPTAPI since it uses the officially supported API from OpenAI. We may remove support for ChatGPTUnofficialProxyAPI in a future release.

  1. ChatGPTAPI - Uses the gpt-3.5-turbo-0301 model with the official OpenAI chat completions API (official, robust approach, but it's not free)
  2. ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)

CLI

To run the CLI, you'll need an OpenAI API key:

export OPENAI_API_KEY="sk-TODO"
npx chatgpt "your prompt here"

By default, the response is streamed to stdout, the results are stored in a local config file, and every invocation starts a new conversation. You can use -c to continue the previous conversation and --no-stream to disable streaming.

Under the hood, the CLI uses ChatGPTAPI with text-davinci-003 to mimic ChatGPT.

Usage:
  $ chatgpt <prompt>

Commands:
  <prompt>  Ask ChatGPT a question
  rm-cache  Clears the local message cache
  ls-cache  Prints the local message cache path

For more info, run any command with the `--help` flag:
  $ chatgpt --help
  $ chatgpt rm-cache --help
  $ chatgpt ls-cache --help

Options:
  -c, --continue          Continue last conversation (default: false)
  -d, --debug             Enables debug logging (default: false)
  -s, --stream            Streams the response (default: true)
  -s, --store             Enables the local message cache (default: true)
  -t, --timeout           Timeout in milliseconds
  -k, --apiKey            OpenAI API key
  -n, --conversationName  Unique name for the conversation
  -h, --help              Display this message
  -v, --version           Display version number

Install

npm install chatgpt

Make sure you're using node >= 18 so fetch is available (or node >= 14 if you install a fetch polyfill).

Usage

To use this module from Node.js, you need to pick between two methods:

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes✅️ Real ChatGPT models
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT

ChatGPTAPI - Uses the gpt-3.5-turbo-0301 model with the official OpenAI chat completions API (official, robust approach, but it's not free). You can override the model, completion params, and system message to fully customize your assistant.

ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)

Both approaches have very similar APIs, so it should be simple to swap between them.

Note: We strongly recommend using ChatGPTAPI since it uses the officially supported API from OpenAI. We may remove support for ChatGPTUnofficialProxyAPI in a future release.

Usage - ChatGPTAPI

Sign up for an OpenAI API key and store it in your environment.

import { ChatGPTAPI } from 'chatgpt'

async function example() {
  const api = new ChatGPTAPI({
    apiKey: process.env.OPENAI_API_KEY
  })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

You can override the default model (gpt-3.5-turbo-0301) and any OpenAI chat completion params using completionParams:

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY,
  completionParams: {
    temperature: 0.5,
    top_p: 0.8
  }
})

If you want to track the conversation, you'll need to pass the parentMessageId like this:

const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

// send a message and wait for the response
let res = await api.sendMessage('What is OpenAI?')
console.log(res.text)

// send a follow-up
res = await api.sendMessage('Can you expand on that?', {
  parentMessageId: res.id
})
console.log(res.text)

// send another follow-up
res = await api.sendMessage('What were we talking about?', {
  parentMessageId: res.id
})
console.log(res.text)

You can add streaming via the onProgress handler:

const res = await api.sendMessage('Write a 500 word essay on frogs.', {
  // print the partial response as the AI is "typing"
  onProgress: (partialResponse) => console.log(partialResponse.text)
})

// print the full text at the end
console.log(res.text)

You can add a timeout using the timeoutMs option:

// timeout after 2 minutes (which will also abort the underlying HTTP request)
const response = await api.sendMessage(
  'write me a really really long essay on frogs',
  {
    timeoutMs: 2 * 60 * 1000
  }
)

If you want to see more info about what's actually being sent to OpenAI's chat completions API, set the debug: true option in the ChatGPTAPI constructor:

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY,
  debug: true
})

We default to a basic systemMessage. You can override this in either the ChatGPTAPI constructor or sendMessage:

const res = await api.sendMessage('what is the answer to the universe?', {
  systemMessage: `You are ChatGPT, a large language model trained by OpenAI. You answer as concisely as possible for each responseIf you are generating a list, do not have too many items.
Current date: ${new Date().toISOString()}\n\n`
})

Note that we automatically handle appending the previous messages to the prompt and attempt to optimize for the available tokens (which defaults to 4096).

Usage in CommonJS (Dynamic import)

async function example() {
  // To use ESM in CommonJS, you can use a dynamic import
  const { ChatGPTAPI } = await import('chatgpt')

  const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

Usage - ChatGPTUnofficialProxyAPI

The API for ChatGPTUnofficialProxyAPI is almost exactly the same. You just need to provide a ChatGPT accessToken instead of an OpenAI API key.

import { ChatGPTUnofficialProxyAPI } from 'chatgpt'

async function example() {
  const api = new ChatGPTUnofficialProxyAPI({
    accessToken: process.env.OPENAI_ACCESS_TOKEN
  })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

See demos/demo-reverse-proxy for a full example:

npx tsx demos/demo-reverse-proxy.ts

ChatGPTUnofficialProxyAPI messages also contain a conversationid in addition to parentMessageId, since the ChatGPT webapp can't reference messages across

Reverse Proxy

You can override the reverse proxy by passing apiReverseProxyUrl:

const api = new ChatGPTUnofficialProxyAPI({
  accessToken: process.env.OPENAI_ACCESS_TOKEN,
  apiReverseProxyUrl: 'https://your-example-server.com/api/conversation'
})

Known reverse proxies run by community members include:

Reverse Proxy URLAuthorRate LimitsLast Checked
https://chat.duti.tech/api/conversation@acheong08120 req/min by IP2/19/2023
https://gpt.pawan.krd/backend-api/conversation@PawanOsman?2/19/2023

Note: info on how the reverse proxies work is not being published at this time in order to prevent OpenAI from disabling access.

Access Token

To use ChatGPTUnofficialProxyAPI, you'll need an OpenAI access token from the ChatGPT webapp. To do this, you can use any of the following methods which take an email and password and return an access token:

These libraries work with email + password accounts (e.g., they do not support accounts where you auth via Microsoft / Google).

Alternatively, you can manually get an accessToken by logging in to the ChatGPT webapp and then opening https://chat.openai.com/api/auth/session, which will return a JSON object containing your accessToken string.

Access tokens last for days.

Note: using a reverse proxy will expose your access token to a third-party. There shouldn't be any adverse effects possible from this, but please consider the risks before using this method.

Docs

See the auto-generated docs for more info on methods and parameters.

Demos

Most of the demos use ChatGPTAPI. It should be pretty easy to convert them to use ChatGPTUnofficialProxyAPI if you'd rather use that approach. The only thing that needs to change is how you initialize the api with an accessToken instead of an apiKey.

To run the included demos:

  1. clone repo
  2. install node deps
  3. set OPENAI_API_KEY in .env

A basic demo is included for testing purposes:

npx tsx demos/demo.ts

A demo showing on progress handler:

npx tsx demos/demo-on-progress.ts

The on progress demo uses the optional onProgress parameter to sendMessage to receive intermediary results as ChatGPT is "typing".

A conversation demo:

npx tsx demos/demo-conversation.ts

A persistence demo shows how to store messages in Redis for persistence:

npx tsx demos/demo-persistence.ts

Any keyv adaptor is supported for persistence, and there are overrides if you'd like to use a different way of storing / retrieving messages.

Note that persisting message is required for remembering the context of previous conversations beyond the scope of the current Node.js process, since by default, we only store messages in memory. Here's an external demo of using a completely custom database solution to persist messages.

Note: Persistence is handled automatically when using ChatGPTUnofficialProxyAPI because it is connecting indirectly to ChatGPT.

Projects

All of these awesome projects are built using the chatgpt package. 🤯

If you create a cool integration, feel free to open a PR and add it to the list.

Compatibility

  • This package is ESM-only.
  • This package supports node >= 14.
  • This module assumes that fetch is installed.
    • In node >= 18, it's installed by default.
    • In node < 18, you need to install a polyfill like unfetch/polyfill (guide) or isomorphic-fetch (guide).
  • If you want to build a website using chatgpt, we recommend using it only from your backend API

Credits


Previous Updates

Feb 19, 2023
 

We now provide three ways of accessing the unofficial ChatGPT API, all of which have tradeoffs:

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes☑️ Mimics ChatGPT
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT
ChatGPTAPIBrowser (v3)✅ Yes❌ No✅ Real ChatGPT

Note: I recommend that you use either ChatGPTAPI or ChatGPTUnofficialProxyAPI.

  1. ChatGPTAPI - Uses text-davinci-003 to mimic ChatGPT via the official OpenAI completions API (most robust approach, but it's not free and doesn't use a model fine-tuned for chat)
  2. ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)
  3. ChatGPTAPIBrowser - (deprecated; v3.5.1 of this package) Uses Puppeteer to access the official ChatGPT webapp (uses the real ChatGPT, but very flaky, heavyweight, and error prone)

Feb 5, 2023
 

OpenAI has disabled the leaked chat model we were previously using, so we're now defaulting to text-davinci-003, which is not free.

We've found several other hidden, fine-tuned chat models, but OpenAI keeps disabling them, so we're searching for alternative workarounds.

Feb 1, 2023
 

This package no longer requires any browser hacks – it is now using the official OpenAI completions API with a leaked model that ChatGPT uses under the hood. 🔥

import { ChatGPTAPI } from 'chatgpt'

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY
})

const res = await api.sendMessage('Hello World!')
console.log(res.text)

Please upgrade to chatgpt@latest (at least v4.0.0). The updated version is significantly more lightweight and robust compared with previous versions. You also don't have to worry about IP issues or rate limiting.

Huge shoutout to @waylaidwanderer for discovering the leaked chat model!

If you run into any issues, we do have a pretty active Discord with a bunch of ChatGPT hackers from the Node.js & Python communities.

Lastly, please consider starring this repo and following me on twitter twitter to help support the project.

Thanks && cheers, Travis


Download Details:

Author: Transitive-bullshit
Source Code: https://github.com/transitive-bullshit/chatgpt-api 
License: MIT license

#chatgpt #api #node #AI #openai #chatbot