Python  Library

Python Library

1644408360

A Guide on Features Of Python 3 with Examples

Python became a mainstream language for machine learning and other scientific fields that heavily operate with data; it boasts various deep learning frameworks and well-established set of tools for data processing and visualization.

However, Python ecosystem co-exists in Python 2 and Python 3, and Python 2 is still used among data scientists. By the end of 2019 the scientific stack will stop supporting Python2. As for numpy, after 2018 any new feature releases will only support Python3. Update (Sep 2018): same story now with pandas, matplotlib, ipython, jupyter notebook and jupyter lab.

To make the transition less frustrating, I've collected a bunch of Python 3 features that you may find useful.

Image from Dario Bertini post (toptal)

Better paths handling with pathlib

pathlib is a default module in python3, that helps you to avoid tons of os.path.joins:

from pathlib import Path

dataset = 'wiki_images'
datasets_root = Path('/path/to/datasets/')

train_path = datasets_root / dataset / 'train'
test_path = datasets_root / dataset / 'test'

for image_path in train_path.iterdir():
    with image_path.open() as f: # note, open is a method of Path object
        # do something with an image

Previously it was always tempting to use string concatenation (concise, but obviously bad), now with pathlib the code is safe, concise, and readable.

Also pathlib.Path has a bunch of methods and properties, that every python novice previously had to google:

p.exists()
p.is_dir()
p.parts
p.with_name('sibling.png') # only change the name, but keep the folder
p.with_suffix('.jpg') # only change the extension, but keep the folder and the name
p.chmod(mode)
p.rmdir()

pathlib should save you lots of time, please see docs and reference for more.

Type hinting is now part of the language

Example of type hinting in pycharm: 

Python is not just a language for small scripts anymore, data pipelines these days include numerous steps each involving different frameworks (and sometimes very different logic).

Type hinting was introduced to help with growing complexity of programs, so machines could help with code verification. Previously different modules used custom ways to point types in docstrings (Hint: pycharm can convert old docstrings to fresh type hinting).

As a simple example, the following code may work with different types of data (that's what we like about python data stack).

def repeat_each_entry(data):
    """ Each entry in the data is doubled
    <blah blah nobody reads the documentation till the end>
    """
    index = numpy.repeat(numpy.arange(len(data)), 2)
    return data[index]

This code e.g. works for numpy.array (incl. multidimensional ones), astropy.Table and astropy.Column, bcolz, cupy, mxnet.ndarray and others.

This code will work for pandas.Series, but in the wrong way:

repeat_each_entry(pandas.Series(data=[0, 1, 2], index=[3, 4, 5])) # returns Series with Nones inside

This was two lines of code. Imagine how unpredictable behavior of a complex system, because just one function may misbehave. Stating explicitly which types a method expects is very helpful in large systems, this will warn you if a function was passed unexpected arguments.

def repeat_each_entry(data: Union[numpy.ndarray, bcolz.carray]):

If you have a significant codebase, hinting tools like MyPy are likely to become part of your continuous integration pipeline. A webinar "Putting Type Hints to Work" by Daniel Pyrathon is good for a brief introduction.

Sidenote: unfortunately, hinting is not yet powerful enough to provide fine-grained typing for ndarrays/tensors, but maybe we'll have it once, and this will be a great feature for DS.

Type hinting → type checking in runtime

By default, function annotations do not influence how your code is working, but merely help you to point code intentions.

However, you can enforce type checking in runtime with tools like ... enforce, this can help you in debugging (there are many cases when type hinting is not working).

@enforce.runtime_validation
def foo(text: str) -> None:
    print(text)

foo('Hi') # ok
foo(5)    # fails


@enforce.runtime_validation
def any2(x: List[bool]) -> bool:
    return any(x)

any ([False, False, True, False]) # True
any2([False, False, True, False]) # True

any (['False']) # True
any2(['False']) # fails

any ([False, None, "", 0]) # False
any2([False, None, "", 0]) # fails

Other usages of function annotations

Update: starting from python 3.7 this behavior was deprecated, and function annotations should be used for type hinting only. Python 4 will not support other usages of annotations.

As mentioned before, annotations do not influence code execution, but rather provide some meta-information, and you can use it as you wish.

For instance, measurement units are a common pain in scientific areas, astropy package provides a simple decorator to control units of input quantities and convert output to required units

# Python 3
from astropy import units as u
@u.quantity_input()
def frequency(speed: u.meter / u.s, wavelength: u.nm) -> u.terahertz:
    return speed / wavelength

frequency(speed=300_000 * u.km / u.s, wavelength=555 * u.nm)
# output: 540.5405405405404 THz, frequency of green visible light

If you're processing tabular scientific data in python (not necessarily astronomical), you should give astropy a shot.

You can also define your application-specific decorators to perform control / conversion of inputs and output in the same manner.

Matrix multiplication with @

Let's implement one of the simplest ML models — a linear regression with l2 regularization (a.k.a. ridge regression):

# l2-regularized linear regression: || AX - y ||^2 + alpha * ||x||^2 -> min

# Python 2
X = np.linalg.inv(np.dot(A.T, A) + alpha * np.eye(A.shape[1])).dot(A.T.dot(y))
# Python 3
X = np.linalg.inv(A.T @ A + alpha * np.eye(A.shape[1])) @ (A.T @ y)

The code with @ becomes more readable and more translatable between deep learning frameworks: same code X @ W + b[None, :] for a single layer of perceptron works in numpy, cupy, pytorch, tensorflow (and other frameworks that operate with tensors).

Globbing with **

Recursive folder globbing is not easy in Python 2, even though the glob2 custom module exists that overcomes this. A recursive flag is supported since Python 3.5:

import glob

# Python 2
found_images = (
    glob.glob('/path/*.jpg')
  + glob.glob('/path/*/*.jpg')
  + glob.glob('/path/*/*/*.jpg')
  + glob.glob('/path/*/*/*/*.jpg')
  + glob.glob('/path/*/*/*/*/*.jpg'))

# Python 3
found_images = glob.glob('/path/**/*.jpg', recursive=True)

A better option is to use pathlib in python3 (minus one import!):

# Python 3
found_images = pathlib.Path('/path/').glob('**/*.jpg')

Note: there are minor differences between glob.glob, Path.glob and bash globbing.

Print is a function now

Yes, code now has these annoying parentheses, but there are some advantages:

  • simple syntax for using file descriptor:
print >>sys.stderr, "critical error"      # Python 2
print("critical error", file=sys.stderr)  # Python 3
  • printing tab-aligned tables without str.join:
# Python 3
print(*array, sep='\t')
print(batch, epoch, loss, accuracy, time, sep='\t')
  • hacky suppressing / redirection of printing output:
# Python 3
_print = print # store the original print function
def print(*args, **kargs):
    pass  # do something useful, e.g. store output to some file

In jupyter it is desirable to log each output to a separate file (to track what's happening after you got disconnected), so you can override print now.

Below you can see a context manager that temporarily overrides behavior of print:

@contextlib.contextmanager
def replace_print():
    import builtins
    _print = print # saving old print function
    # or use some other function here
    builtins.print = lambda *args, **kwargs: _print('new printing', *args, **kwargs)
    yield
    builtins.print = _print

with replace_print():
    <code here will invoke other print function>

It is not a recommended approach, but a small dirty hack that is now possible.

  • print can participate in list comprehensions and other language constructs
# Python 3
result = process(x) if is_valid(x) else print('invalid item: ', x)

Underscores in Numeric Literal (Thousands Separator)

PEP-515 introduced underscores in Numeric Literals. In Python3, underscores can be used to group digits visually in integral, floating-point, and complex number literals.

# grouping decimal numbers by thousands
one_million = 1_000_000

# grouping hexadecimal addresses by words
addr = 0xCAFE_F00D

# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110

# same, for string conversions
flags = int('0b_1111_0000', 2)

f-strings for simple and reliable formatting

The default formatting system provides a flexibility that is not required in data experiments. The resulting code is either too verbose or too fragile towards any changes.

Quite typically data scientists outputs some logging information iteratively in a fixed format. It is common to have a code like:

# Python 2
print '{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {acc_mean:0.4f}±{acc_std:0.4f} time: {avg_time:3.2f}'.format(
    batch=batch, epoch=epoch, total_epochs=total_epochs,
    acc_mean=numpy.mean(accuracies), acc_std=numpy.std(accuracies),
    avg_time=time / len(data_batch)
)

# Python 2 (too error-prone during fast modifications, please avoid):
print '{:3} {:3} / {:3}  accuracy: {:0.4f}±{:0.4f} time: {:3.2f}'.format(
    batch, epoch, total_epochs, numpy.mean(accuracies), numpy.std(accuracies),
    time / len(data_batch)
)

Sample output:

120  12 / 300  accuracy: 0.8180±0.4649 time: 56.60

f-strings aka formatted string literals were introduced in Python 3.6:

# Python 3.6+
print(f'{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {numpy.mean(accuracies):0.4f}±{numpy.std(accuracies):0.4f} time: {time / len(data_batch):3.2f}')

Explicit difference between 'true division' and 'floor division'

For data science this is definitely a handy change

data = pandas.read_csv('timing.csv')
velocity = data['distance'] / data['time']

Results in Python 2 depend on whether 'time' and 'distance' (e.g. measured in meters and seconds) are stored as integers. In Python 3, the result is correct in both cases, because the result of division is float.

Another case is floor division, which is now an explicit operation:

n_gifts = money // gift_price  # correct for int and float arguments

In a nutshell:

>>> from operator import truediv, floordiv
>>> truediv.__doc__, floordiv.__doc__
('truediv(a, b) -- Same as a / b.', 'floordiv(a, b) -- Same as a // b.')
>>> (3 / 2), (3 // 2), (3.0 // 2.0)
(1.5, 1, 1.0)

Note, that this applies both to built-in types and to custom types provided by data packages (e.g. numpy or pandas).

Strict ordering

# All these comparisons are illegal in Python 3
3 < '3'
2 < None
(3, 4) < (3, None)
(4, 5) < [4, 5]

# False in both Python 2 and Python 3
(4, 5) == [4, 5]
  • prevents from occasional sorting of instances of different types
sorted([2, '1', 3])  # invalid for Python 3, in Python 2 returns [2, 3, '1']
  • helps to spot some problems that arise when processing raw data

Sidenote: proper check for None is (in both Python versions)

if a is not None:
  pass

if a: # WRONG check for None
  pass

Unicode for NLP

s = '您好'
print(len(s))
print(s[:2])

Output:

  • Python 2: 6\n��
  • Python 3: 2\n您好.
x = u'со'
x += 'co' # ok
x += 'со' # fail

Python 2 fails, Python 3 works as expected (because I've used russian letters in strings).

In Python 3 strs are unicode strings, and it is more convenient for NLP processing of non-english texts.

There are other funny things, for instance:

'a' < type < u'a'  # Python 2: True
'a' < u'a'         # Python 2: False
from collections import Counter
Counter('Möbelstück')
  • Python 2: Counter({'\xc3': 2, 'b': 1, 'e': 1, 'c': 1, 'k': 1, 'M': 1, 'l': 1, 's': 1, 't': 1, '\xb6': 1, '\xbc': 1})
  • Python 3: Counter({'M': 1, 'ö': 1, 'b': 1, 'e': 1, 'l': 1, 's': 1, 't': 1, 'ü': 1, 'c': 1, 'k': 1})

You can handle all of this in Python 2 properly, but Python 3 is more friendly.

Preserving order of dictionaries and **kwargs

In CPython 3.6+ dicts behave like OrderedDict by default (and this is guaranteed in Python 3.7+). This preserves order during dict comprehensions (and other operations, e.g. during json serialization/deserialization)

import json
x = {str(i):i for i in range(5)}
json.loads(json.dumps(x))
# Python 2
{u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}
# Python 3
{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}

Same applies to **kwargs (in Python 3.6+), they're kept in the same order as they appear in parameters. Order is crucial when it comes to data pipelines, previously we had to write it in a cumbersome manner:

from torch import nn

# Python 2
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

# Python 3.6+, how it *can* be done, not supported right now in pytorch
model = nn.Sequential(
    conv1=nn.Conv2d(1,20,5),
    relu1=nn.ReLU(),
    conv2=nn.Conv2d(20,64,5),
    relu2=nn.ReLU())
)

Did you notice? Uniqueness of names is also checked automatically.

Iterable unpacking

# handy when amount of additional stored info may vary between experiments, but the same code can be used in all cases
model_paramteres, optimizer_parameters, *other_params = load(checkpoint_name)

# picking two last values from a sequence
*prev, next_to_last, last = values_history

# This also works with any iterables, so if you have a function that yields e.g. qualities,
# below is a simple way to take only last two values from a list
*prev, next_to_last, last = iter_train(args)

Default pickle engine provides better compression for arrays

Pickling is a mechanism to pass data between threads / processes, in particular used inside multiprocessing package.

# Python 2
import cPickle as pickle
import numpy
print len(pickle.dumps(numpy.random.normal(size=[1000, 1000])))
# result: 23691675

# Python 3
import pickle
import numpy
len(pickle.dumps(numpy.random.normal(size=[1000, 1000])))
# result: 8000162

Three times less space. And it is much faster. Actually similar compression (but not speed) is achievable with protocol=2 parameter, but developers typically ignore this option (or simply are not aware of it).

Note: pickle is not safe (and not quite transferrable), so never unpickle data received from an untrusted or unauthenticated source.

Safer comprehensions

labels = <initial_value>
predictions = [model.predict(data) for data, labels in dataset]

# labels are overwritten in Python 2
# labels are not affected by comprehension in Python 3

Super, simply super()

Python 2 super(...) was a frequent source of mistakes in code.

# Python 2
class MySubClass(MySuperClass):
    def __init__(self, name, **options):
        super(MySubClass, self).__init__(name='subclass', **options)

# Python 3
class MySubClass(MySuperClass):
    def __init__(self, name, **options):
        super().__init__(name='subclass', **options)

More on super and method resolution order on stackoverflow.

Better IDE suggestions with variable annotations

The most enjoyable thing about programming in languages like Java, C# and alike is that IDE can make very good suggestions, because type of each identifier is known before executing a program.

In python this is hard to achieve, but annotations will help you

  • write your expectations in a clear form
  • and get good suggestions from IDE


This is an example of PyCharm suggestions with variable annotations. This works even in situations when functions you use are not annotated (e.g. due to backward compatibility).

Multiple unpacking

Here is how you merge two dicts now:

x = dict(a=1, b=2)
y = dict(b=3, d=4)
# Python 3.5+
z = {**x, **y}
# z = {'a': 1, 'b': 3, 'd': 4}, note that value for `b` is taken from the latter dict.

See this thread at StackOverflow for a comparison with Python 2.

The same approach also works for lists, tuples, and sets (a, b, c are any iterables):

[*a, *b, *c] # list, concatenating
(*a, *b, *c) # tuple, concatenating
{*a, *b, *c} # set, union

Functions also support multiple unpacking for *args and **kwargs:

# Python 3.5+
do_something(**{**default_settings, **custom_settings})

# Also possible, this code also checks there is no intersection between keys of dictionaries
do_something(**first_args, **second_args)

Future-proof APIs with keyword-only arguments

Let's consider this snippet

model = sklearn.svm.SVC(2, 'poly', 2, 4, 0.5)

Obviously, an author of this code didn't get the Python style of coding yet (most probably, just jumped from cpp or rust). Unfortunately, this is not just question of taste, because changing the order of arguments (adding/deleting) in SVC will break this code. In particular, sklearn does some reordering/renaming from time to time of numerous algorithm parameters to provide consistent API. Each such refactoring may drive to broken code.

In Python 3, library authors may demand explicitly named parameters by using *:

class SVC(BaseSVC):
    def __init__(self, *, C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, ... )
  • users have to specify names of parameters sklearn.svm.SVC(C=2, kernel='poly', degree=2, gamma=4, coef0=0.5) now
  • this mechanism provides a great combination of reliability and flexibility of APIs

Data classes

Python 3.7 introduces data classes, a good replacement for namedtuple in most cases.

@dataclass
class Person:
    name: str
    age: int

@dataclass
class Coder(Person):
    preferred_language: str = 'Python 3'

dataclass decorator takes the job of implementing routine methods for you (initialization, representation, comparison, and hashing when applicable). Let's name some features:

  • data classes can be both mutable and immutable
  • default values for fields are supported
  • inheritance
  • data classes are still old good classes: you can define new methods and override existing
  • post-init processing (e.g. to verify consistency)

Geir Arne Hjelle gives a good overview of dataclasses in his post.

Customizing access to module attributes

In Python you can control attribute access and hinting with __getattr__ and __dir__ for any object. Since python 3.7 you can do it for modules too.

A natural example is implementing a random submodule of tensor libraries, which is typically a shortcut to skip initialization and passing of RandomState objects. Here's implementation for numpy:

# nprandom.py
import numpy
__random_state = numpy.random.RandomState()

def __getattr__(name):
    return getattr(__random_state, name)

def __dir__():
    return dir(__random_state)
    
def seed(seed):
    __random_state = numpy.random.RandomState(seed=seed)

One can also mix this way functionalities of different objects/submodules. Compare with tricks in pytorch and cupy.

Additionally, now one can

Built-in breakpoint()

Just write breakpoint() in the code to invoke debugger.

# Python 3.7+, not all IDEs support this at the moment
foo()
breakpoint()
bar()

For remote debugging you may want to try combining breakpoint() with web-pdb

Minor: constants in math module

# Python 3
math.inf # Infinite float
math.nan # not a number

max_quality = -math.inf  # no more magic initial values!

for model in trained_models:
    max_quality = max(max_quality, compute_quality(model, data))

Minor: single integer type

Python 2 provides two basic integer types, which are int (64-bit signed integer) and long for long arithmetics (quite confusing after C++).

Python 3 has a single type int, which incorporates long arithmetics.

Here is how you check that value is integer:

isinstance(x, numbers.Integral) # Python 2, the canonical way
isinstance(x, (long, int))      # Python 2
isinstance(x, int)              # Python 3, easier to remember

Update: first check also works for other integral types, such as numpy.int32, numpy.int64, but others don't. So they're not equivalent.

Other stuff

  • Enums are theoretically useful, but
    • string-typing is already widely adopted in the python data stack
    • Enums don't seem to interplay with numpy and categorical from pandas
  • coroutines also sound very promising for data pipelining (see slides by David Beazley), but I don't see their adoption in the wild.
  • Python 3 has stable ABI
  • Python 3 supports unicode identifies (so ω = Δφ / Δt is ok), but you'd better use good old ASCII names
  • some libraries e.g. jupyterhub (jupyter in cloud), django and fresh ipython only support Python 3, so features that sound useless for you are useful for libraries you'll probably want to use once.

Problems for code migration specific for data science (and how to resolve those)

map(lambda x, (y, z): x, z, dict.items())
  • However, it is still perfectly working with different comprehensions:
{x:z for x, (y, z) in d.items()}

In general, comprehensions are also better 'translatable' between Python 2 and 3.

map(), .keys(), .values(), .items(), etc. return iterators, not lists. Main problems with iterators are:

  • no trivial slicing
  • can't be iterated twice

see Python FAQ: How do I port to Python 3? when in trouble

Main problems for teaching machine learning and data science with python

Course authors should spend time in the first lectures to explain what is an iterator, why it can't be sliced / concatenated / multiplied / iterated twice like a string (and how to deal with it).

I think most course authors would be happy to avoid these details, but now it is hardly possible.

Conclusion

Python 2 and Python 3 have co-existed for almost 10 years, but we should move to Python 3.

Research and production code should become a bit shorter, more readable, and significantly safer after moving to Python 3-only codebase.

Right now most libraries support both Python versions. And I can't wait for the bright moment when packages drop support for Python 2 and enjoy new language features.

Following migrations are promised to be smoother: "we will never do this kind of backwards-incompatible change again"

Links

Download Details:
Author: arogozhnikov
Source Code: https://github.com/arogozhnikov/python3_with_pleasure
License:

#python 

What is GEEK

Buddha Community

A Guide on Features Of Python 3 with Examples
Veronica  Roob

Veronica Roob

1653475560

A Pure PHP Implementation Of The MessagePack Serialization Format

msgpack.php

A pure PHP implementation of the MessagePack serialization format.

Features

Installation

The recommended way to install the library is through Composer:

composer require rybakit/msgpack

Usage

Packing

To pack values you can either use an instance of a Packer:

$packer = new Packer();
$packed = $packer->pack($value);

or call a static method on the MessagePack class:

$packed = MessagePack::pack($value);

In the examples above, the method pack automatically packs a value depending on its type. However, not all PHP types can be uniquely translated to MessagePack types. For example, the MessagePack format defines map and array types, which are represented by a single array type in PHP. By default, the packer will pack a PHP array as a MessagePack array if it has sequential numeric keys, starting from 0 and as a MessagePack map otherwise:

$mpArr1 = $packer->pack([1, 2]);               // MP array [1, 2]
$mpArr2 = $packer->pack([0 => 1, 1 => 2]);     // MP array [1, 2]
$mpMap1 = $packer->pack([0 => 1, 2 => 3]);     // MP map {0: 1, 2: 3}
$mpMap2 = $packer->pack([1 => 2, 2 => 3]);     // MP map {1: 2, 2: 3}
$mpMap3 = $packer->pack(['a' => 1, 'b' => 2]); // MP map {a: 1, b: 2}

However, sometimes you need to pack a sequential array as a MessagePack map. To do this, use the packMap method:

$mpMap = $packer->packMap([1, 2]); // {0: 1, 1: 2}

Here is a list of type-specific packing methods:

$packer->packNil();           // MP nil
$packer->packBool(true);      // MP bool
$packer->packInt(42);         // MP int
$packer->packFloat(M_PI);     // MP float (32 or 64)
$packer->packFloat32(M_PI);   // MP float 32
$packer->packFloat64(M_PI);   // MP float 64
$packer->packStr('foo');      // MP str
$packer->packBin("\x80");     // MP bin
$packer->packArray([1, 2]);   // MP array
$packer->packMap(['a' => 1]); // MP map
$packer->packExt(1, "\xaa");  // MP ext

Check the "Custom types" section below on how to pack custom types.

Packing options

The Packer object supports a number of bitmask-based options for fine-tuning the packing process (defaults are in bold):

NameDescription
FORCE_STRForces PHP strings to be packed as MessagePack UTF-8 strings
FORCE_BINForces PHP strings to be packed as MessagePack binary data
DETECT_STR_BINDetects MessagePack str/bin type automatically
  
FORCE_ARRForces PHP arrays to be packed as MessagePack arrays
FORCE_MAPForces PHP arrays to be packed as MessagePack maps
DETECT_ARR_MAPDetects MessagePack array/map type automatically
  
FORCE_FLOAT32Forces PHP floats to be packed as 32-bits MessagePack floats
FORCE_FLOAT64Forces PHP floats to be packed as 64-bits MessagePack floats

The type detection mode (DETECT_STR_BIN/DETECT_ARR_MAP) adds some overhead which can be noticed when you pack large (16- and 32-bit) arrays or strings. However, if you know the value type in advance (for example, you only work with UTF-8 strings or/and associative arrays), you can eliminate this overhead by forcing the packer to use the appropriate type, which will save it from running the auto-detection routine. Another option is to explicitly specify the value type. The library provides 2 auxiliary classes for this, Map and Bin. Check the "Custom types" section below for details.

Examples:

// detect str/bin type and pack PHP 64-bit floats (doubles) to MP 32-bit floats
$packer = new Packer(PackOptions::DETECT_STR_BIN | PackOptions::FORCE_FLOAT32);

// these will throw MessagePack\Exception\InvalidOptionException
$packer = new Packer(PackOptions::FORCE_STR | PackOptions::FORCE_BIN);
$packer = new Packer(PackOptions::FORCE_FLOAT32 | PackOptions::FORCE_FLOAT64);

Unpacking

To unpack data you can either use an instance of a BufferUnpacker:

$unpacker = new BufferUnpacker();

$unpacker->reset($packed);
$value = $unpacker->unpack();

or call a static method on the MessagePack class:

$value = MessagePack::unpack($packed);

If the packed data is received in chunks (e.g. when reading from a stream), use the tryUnpack method, which attempts to unpack data and returns an array of unpacked messages (if any) instead of throwing an InsufficientDataException:

while ($chunk = ...) {
    $unpacker->append($chunk);
    if ($messages = $unpacker->tryUnpack()) {
        return $messages;
    }
}

If you want to unpack from a specific position in a buffer, use seek:

$unpacker->seek(42); // set position equal to 42 bytes
$unpacker->seek(-8); // set position to 8 bytes before the end of the buffer

To skip bytes from the current position, use skip:

$unpacker->skip(10); // set position to 10 bytes ahead of the current position

To get the number of remaining (unread) bytes in the buffer:

$unreadBytesCount = $unpacker->getRemainingCount();

To check whether the buffer has unread data:

$hasUnreadBytes = $unpacker->hasRemaining();

If needed, you can remove already read data from the buffer by calling:

$releasedBytesCount = $unpacker->release();

With the read method you can read raw (packed) data:

$packedData = $unpacker->read(2); // read 2 bytes

Besides the above methods BufferUnpacker provides type-specific unpacking methods, namely:

$unpacker->unpackNil();   // PHP null
$unpacker->unpackBool();  // PHP bool
$unpacker->unpackInt();   // PHP int
$unpacker->unpackFloat(); // PHP float
$unpacker->unpackStr();   // PHP UTF-8 string
$unpacker->unpackBin();   // PHP binary string
$unpacker->unpackArray(); // PHP sequential array
$unpacker->unpackMap();   // PHP associative array
$unpacker->unpackExt();   // PHP MessagePack\Type\Ext object

Unpacking options

The BufferUnpacker object supports a number of bitmask-based options for fine-tuning the unpacking process (defaults are in bold):

NameDescription
BIGINT_AS_STRConverts overflowed integers to strings [1]
BIGINT_AS_GMPConverts overflowed integers to GMP objects [2]
BIGINT_AS_DECConverts overflowed integers to Decimal\Decimal objects [3]

1. The binary MessagePack format has unsigned 64-bit as its largest integer data type, but PHP does not support such integers, which means that an overflow can occur during unpacking.

2. Make sure the GMP extension is enabled.

3. Make sure the Decimal extension is enabled.

Examples:

$packedUint64 = "\xcf"."\xff\xff\xff\xff"."\xff\xff\xff\xff";

$unpacker = new BufferUnpacker($packedUint64);
var_dump($unpacker->unpack()); // string(20) "18446744073709551615"

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_GMP);
var_dump($unpacker->unpack()); // object(GMP) {...}

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_DEC);
var_dump($unpacker->unpack()); // object(Decimal\Decimal) {...}

Custom types

In addition to the basic types, the library provides functionality to serialize and deserialize arbitrary types. This can be done in several ways, depending on your use case. Let's take a look at them.

Type objects

If you need to serialize an instance of one of your classes into one of the basic MessagePack types, the best way to do this is to implement the CanBePacked interface in the class. A good example of such a class is the Map type class that comes with the library. This type is useful when you want to explicitly specify that a given PHP array should be packed as a MessagePack map without triggering an automatic type detection routine:

$packer = new Packer();

$packedMap = $packer->pack(new Map([1, 2, 3]));
$packedArray = $packer->pack([1, 2, 3]);

More type examples can be found in the src/Type directory.

Type transformers

As with type objects, type transformers are only responsible for serializing values. They should be used when you need to serialize a value that does not implement the CanBePacked interface. Examples of such values could be instances of built-in or third-party classes that you don't own, or non-objects such as resources.

A transformer class must implement the CanPack interface. To use a transformer, it must first be registered in the packer. Here is an example of how to serialize PHP streams into the MessagePack bin format type using one of the supplied transformers, StreamTransformer:

$packer = new Packer(null, [new StreamTransformer()]);

$packedBin = $packer->pack(fopen('/path/to/file', 'r+'));

More type transformer examples can be found in the src/TypeTransformer directory.

Extensions

In contrast to the cases described above, extensions are intended to handle extension types and are responsible for both serialization and deserialization of values (types).

An extension class must implement the Extension interface. To use an extension, it must first be registered in the packer and the unpacker.

The MessagePack specification divides extension types into two groups: predefined and application-specific. Currently, there is only one predefined type in the specification, Timestamp.

Timestamp

The Timestamp extension type is a predefined type. Support for this type in the library is done through the TimestampExtension class. This class is responsible for handling Timestamp objects, which represent the number of seconds and optional adjustment in nanoseconds:

$timestampExtension = new TimestampExtension();

$packer = new Packer();
$packer = $packer->extendWith($timestampExtension);

$unpacker = new BufferUnpacker();
$unpacker = $unpacker->extendWith($timestampExtension);

$packedTimestamp = $packer->pack(Timestamp::now());
$timestamp = $unpacker->reset($packedTimestamp)->unpack();

$seconds = $timestamp->getSeconds();
$nanoseconds = $timestamp->getNanoseconds();

When using the MessagePack class, the Timestamp extension is already registered:

$packedTimestamp = MessagePack::pack(Timestamp::now());
$timestamp = MessagePack::unpack($packedTimestamp);

Application-specific extensions

In addition, the format can be extended with your own types. For example, to make the built-in PHP DateTime objects first-class citizens in your code, you can create a corresponding extension, as shown in the example. Please note, that custom extensions have to be registered with a unique extension ID (an integer from 0 to 127).

More extension examples can be found in the examples/MessagePack directory.

To learn more about how extension types can be useful, check out this article.

Exceptions

If an error occurs during packing/unpacking, a PackingFailedException or an UnpackingFailedException will be thrown, respectively. In addition, an InsufficientDataException can be thrown during unpacking.

An InvalidOptionException will be thrown in case an invalid option (or a combination of mutually exclusive options) is used.

Tests

Run tests as follows:

vendor/bin/phpunit

Also, if you already have Docker installed, you can run the tests in a docker container. First, create a container:

./dockerfile.sh | docker build -t msgpack -

The command above will create a container named msgpack with PHP 8.1 runtime. You may change the default runtime by defining the PHP_IMAGE environment variable:

PHP_IMAGE='php:8.0-cli' ./dockerfile.sh | docker build -t msgpack -

See a list of various images here.

Then run the unit tests:

docker run --rm -v $PWD:/msgpack -w /msgpack msgpack

Fuzzing

To ensure that the unpacking works correctly with malformed/semi-malformed data, you can use a testing technique called Fuzzing. The library ships with a help file (target) for PHP-Fuzzer and can be used as follows:

php-fuzzer fuzz tests/fuzz_buffer_unpacker.php

Performance

To check performance, run:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0030 ........ 0.0139
false ................ 0.0037 ........ 0.0144
true ................. 0.0040 ........ 0.0137
7-bit uint #1 ........ 0.0052 ........ 0.0120
7-bit uint #2 ........ 0.0059 ........ 0.0114
7-bit uint #3 ........ 0.0061 ........ 0.0119
5-bit sint #1 ........ 0.0067 ........ 0.0126
5-bit sint #2 ........ 0.0064 ........ 0.0132
5-bit sint #3 ........ 0.0066 ........ 0.0135
8-bit uint #1 ........ 0.0078 ........ 0.0200
8-bit uint #2 ........ 0.0077 ........ 0.0212
8-bit uint #3 ........ 0.0086 ........ 0.0203
16-bit uint #1 ....... 0.0111 ........ 0.0271
16-bit uint #2 ....... 0.0115 ........ 0.0260
16-bit uint #3 ....... 0.0103 ........ 0.0273
32-bit uint #1 ....... 0.0116 ........ 0.0326
32-bit uint #2 ....... 0.0118 ........ 0.0332
32-bit uint #3 ....... 0.0127 ........ 0.0325
64-bit uint #1 ....... 0.0140 ........ 0.0277
64-bit uint #2 ....... 0.0134 ........ 0.0294
64-bit uint #3 ....... 0.0134 ........ 0.0281
8-bit int #1 ......... 0.0086 ........ 0.0241
8-bit int #2 ......... 0.0089 ........ 0.0225
8-bit int #3 ......... 0.0085 ........ 0.0229
16-bit int #1 ........ 0.0118 ........ 0.0280
16-bit int #2 ........ 0.0121 ........ 0.0270
16-bit int #3 ........ 0.0109 ........ 0.0274
32-bit int #1 ........ 0.0128 ........ 0.0346
32-bit int #2 ........ 0.0118 ........ 0.0339
32-bit int #3 ........ 0.0135 ........ 0.0368
64-bit int #1 ........ 0.0138 ........ 0.0276
64-bit int #2 ........ 0.0132 ........ 0.0286
64-bit int #3 ........ 0.0137 ........ 0.0274
64-bit int #4 ........ 0.0180 ........ 0.0285
64-bit float #1 ...... 0.0134 ........ 0.0284
64-bit float #2 ...... 0.0125 ........ 0.0275
64-bit float #3 ...... 0.0126 ........ 0.0283
fix string #1 ........ 0.0035 ........ 0.0133
fix string #2 ........ 0.0094 ........ 0.0216
fix string #3 ........ 0.0094 ........ 0.0222
fix string #4 ........ 0.0091 ........ 0.0241
8-bit string #1 ...... 0.0122 ........ 0.0301
8-bit string #2 ...... 0.0118 ........ 0.0304
8-bit string #3 ...... 0.0119 ........ 0.0315
16-bit string #1 ..... 0.0150 ........ 0.0388
16-bit string #2 ..... 0.1545 ........ 0.1665
32-bit string ........ 0.1570 ........ 0.1756
wide char string #1 .. 0.0091 ........ 0.0236
wide char string #2 .. 0.0122 ........ 0.0313
8-bit binary #1 ...... 0.0100 ........ 0.0302
8-bit binary #2 ...... 0.0123 ........ 0.0324
8-bit binary #3 ...... 0.0126 ........ 0.0327
16-bit binary ........ 0.0168 ........ 0.0372
32-bit binary ........ 0.1588 ........ 0.1754
fix array #1 ......... 0.0042 ........ 0.0131
fix array #2 ......... 0.0294 ........ 0.0367
fix array #3 ......... 0.0412 ........ 0.0472
16-bit array #1 ...... 0.1378 ........ 0.1596
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.1865 ........ 0.2283
fix map #1 ........... 0.0725 ........ 0.1048
fix map #2 ........... 0.0319 ........ 0.0405
fix map #3 ........... 0.0356 ........ 0.0665
fix map #4 ........... 0.0465 ........ 0.0497
16-bit map #1 ........ 0.2540 ........ 0.3028
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.2372 ........ 0.2710
fixext 1 ............. 0.0283 ........ 0.0358
fixext 2 ............. 0.0291 ........ 0.0371
fixext 4 ............. 0.0302 ........ 0.0355
fixext 8 ............. 0.0288 ........ 0.0384
fixext 16 ............ 0.0293 ........ 0.0359
8-bit ext ............ 0.0302 ........ 0.0439
16-bit ext ........... 0.0334 ........ 0.0499
32-bit ext ........... 0.1845 ........ 0.1888
32-bit timestamp #1 .. 0.0337 ........ 0.0547
32-bit timestamp #2 .. 0.0335 ........ 0.0560
64-bit timestamp #1 .. 0.0371 ........ 0.0575
64-bit timestamp #2 .. 0.0374 ........ 0.0542
64-bit timestamp #3 .. 0.0356 ........ 0.0533
96-bit timestamp #1 .. 0.0362 ........ 0.0699
96-bit timestamp #2 .. 0.0381 ........ 0.0701
96-bit timestamp #3 .. 0.0367 ........ 0.0687
=============================================
Total                  2.7618          4.0820
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

With JIT:

php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

=============================================
Test/Target            Packer  BufferUnpacker
---------------------------------------------
nil .................. 0.0005 ........ 0.0054
false ................ 0.0004 ........ 0.0059
true ................. 0.0004 ........ 0.0059
7-bit uint #1 ........ 0.0010 ........ 0.0047
7-bit uint #2 ........ 0.0010 ........ 0.0046
7-bit uint #3 ........ 0.0010 ........ 0.0046
5-bit sint #1 ........ 0.0025 ........ 0.0046
5-bit sint #2 ........ 0.0023 ........ 0.0046
5-bit sint #3 ........ 0.0024 ........ 0.0045
8-bit uint #1 ........ 0.0043 ........ 0.0081
8-bit uint #2 ........ 0.0043 ........ 0.0079
8-bit uint #3 ........ 0.0041 ........ 0.0080
16-bit uint #1 ....... 0.0064 ........ 0.0095
16-bit uint #2 ....... 0.0064 ........ 0.0091
16-bit uint #3 ....... 0.0064 ........ 0.0094
32-bit uint #1 ....... 0.0085 ........ 0.0114
32-bit uint #2 ....... 0.0077 ........ 0.0122
32-bit uint #3 ....... 0.0077 ........ 0.0120
64-bit uint #1 ....... 0.0085 ........ 0.0159
64-bit uint #2 ....... 0.0086 ........ 0.0157
64-bit uint #3 ....... 0.0086 ........ 0.0158
8-bit int #1 ......... 0.0042 ........ 0.0080
8-bit int #2 ......... 0.0042 ........ 0.0080
8-bit int #3 ......... 0.0042 ........ 0.0081
16-bit int #1 ........ 0.0065 ........ 0.0095
16-bit int #2 ........ 0.0065 ........ 0.0090
16-bit int #3 ........ 0.0056 ........ 0.0085
32-bit int #1 ........ 0.0067 ........ 0.0107
32-bit int #2 ........ 0.0066 ........ 0.0106
32-bit int #3 ........ 0.0063 ........ 0.0104
64-bit int #1 ........ 0.0072 ........ 0.0162
64-bit int #2 ........ 0.0073 ........ 0.0174
64-bit int #3 ........ 0.0072 ........ 0.0164
64-bit int #4 ........ 0.0077 ........ 0.0161
64-bit float #1 ...... 0.0053 ........ 0.0135
64-bit float #2 ...... 0.0053 ........ 0.0135
64-bit float #3 ...... 0.0052 ........ 0.0135
fix string #1 ....... -0.0002 ........ 0.0044
fix string #2 ........ 0.0035 ........ 0.0067
fix string #3 ........ 0.0035 ........ 0.0077
fix string #4 ........ 0.0033 ........ 0.0078
8-bit string #1 ...... 0.0059 ........ 0.0110
8-bit string #2 ...... 0.0063 ........ 0.0121
8-bit string #3 ...... 0.0064 ........ 0.0124
16-bit string #1 ..... 0.0099 ........ 0.0146
16-bit string #2 ..... 0.1522 ........ 0.1474
32-bit string ........ 0.1511 ........ 0.1483
wide char string #1 .. 0.0039 ........ 0.0084
wide char string #2 .. 0.0073 ........ 0.0123
8-bit binary #1 ...... 0.0040 ........ 0.0112
8-bit binary #2 ...... 0.0075 ........ 0.0123
8-bit binary #3 ...... 0.0077 ........ 0.0129
16-bit binary ........ 0.0096 ........ 0.0145
32-bit binary ........ 0.1535 ........ 0.1479
fix array #1 ......... 0.0008 ........ 0.0061
fix array #2 ......... 0.0121 ........ 0.0165
fix array #3 ......... 0.0193 ........ 0.0222
16-bit array #1 ...... 0.0607 ........ 0.0479
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.0749 ........ 0.0824
fix map #1 ........... 0.0329 ........ 0.0431
fix map #2 ........... 0.0161 ........ 0.0189
fix map #3 ........... 0.0205 ........ 0.0262
fix map #4 ........... 0.0252 ........ 0.0205
16-bit map #1 ........ 0.1016 ........ 0.0927
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.1096 ........ 0.1030
fixext 1 ............. 0.0157 ........ 0.0161
fixext 2 ............. 0.0175 ........ 0.0183
fixext 4 ............. 0.0156 ........ 0.0185
fixext 8 ............. 0.0163 ........ 0.0184
fixext 16 ............ 0.0164 ........ 0.0182
8-bit ext ............ 0.0158 ........ 0.0207
16-bit ext ........... 0.0203 ........ 0.0219
32-bit ext ........... 0.1614 ........ 0.1539
32-bit timestamp #1 .. 0.0195 ........ 0.0249
32-bit timestamp #2 .. 0.0188 ........ 0.0260
64-bit timestamp #1 .. 0.0207 ........ 0.0281
64-bit timestamp #2 .. 0.0212 ........ 0.0291
64-bit timestamp #3 .. 0.0207 ........ 0.0295
96-bit timestamp #1 .. 0.0222 ........ 0.0358
96-bit timestamp #2 .. 0.0228 ........ 0.0353
96-bit timestamp #3 .. 0.0210 ........ 0.0319
=============================================
Total                  1.6432          1.9674
Skipped                     4               4
Failed                      0               0
Ignored                     0               0

You may change default benchmark settings by defining the following environment variables:

NameDefault
MP_BENCH_TARGETSpure_p,pure_u, see a list of available targets
MP_BENCH_ITERATIONS100_000
MP_BENCH_DURATIONnot set
MP_BENCH_ROUNDS3
MP_BENCH_TESTS-@slow, see a list of available tests

For example:

export MP_BENCH_TARGETS=pure_p
export MP_BENCH_ITERATIONS=1000000
export MP_BENCH_ROUNDS=5
# a comma separated list of test names
export MP_BENCH_TESTS='complex array, complex map'
# or a group name
# export MP_BENCH_TESTS='-@slow' // @pecl_comp
# or a regexp
# export MP_BENCH_TESTS='/complex (array|map)/'

Another example, benchmarking both the library and the PECL extension:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0031 ........ 0.0141 ...... 0.0055 ........ 0.0064
false ................ 0.0039 ........ 0.0154 ...... 0.0056 ........ 0.0053
true ................. 0.0038 ........ 0.0139 ...... 0.0056 ........ 0.0044
7-bit uint #1 ........ 0.0061 ........ 0.0110 ...... 0.0059 ........ 0.0046
7-bit uint #2 ........ 0.0065 ........ 0.0119 ...... 0.0042 ........ 0.0029
7-bit uint #3 ........ 0.0054 ........ 0.0117 ...... 0.0045 ........ 0.0025
5-bit sint #1 ........ 0.0047 ........ 0.0103 ...... 0.0038 ........ 0.0022
5-bit sint #2 ........ 0.0048 ........ 0.0117 ...... 0.0038 ........ 0.0022
5-bit sint #3 ........ 0.0046 ........ 0.0102 ...... 0.0038 ........ 0.0023
8-bit uint #1 ........ 0.0063 ........ 0.0174 ...... 0.0039 ........ 0.0031
8-bit uint #2 ........ 0.0063 ........ 0.0167 ...... 0.0040 ........ 0.0029
8-bit uint #3 ........ 0.0063 ........ 0.0168 ...... 0.0039 ........ 0.0030
16-bit uint #1 ....... 0.0092 ........ 0.0222 ...... 0.0049 ........ 0.0030
16-bit uint #2 ....... 0.0096 ........ 0.0227 ...... 0.0042 ........ 0.0046
16-bit uint #3 ....... 0.0123 ........ 0.0274 ...... 0.0059 ........ 0.0051
32-bit uint #1 ....... 0.0136 ........ 0.0331 ...... 0.0060 ........ 0.0048
32-bit uint #2 ....... 0.0130 ........ 0.0336 ...... 0.0070 ........ 0.0048
32-bit uint #3 ....... 0.0127 ........ 0.0329 ...... 0.0051 ........ 0.0048
64-bit uint #1 ....... 0.0126 ........ 0.0268 ...... 0.0055 ........ 0.0049
64-bit uint #2 ....... 0.0135 ........ 0.0281 ...... 0.0052 ........ 0.0046
64-bit uint #3 ....... 0.0131 ........ 0.0274 ...... 0.0069 ........ 0.0044
8-bit int #1 ......... 0.0077 ........ 0.0236 ...... 0.0058 ........ 0.0044
8-bit int #2 ......... 0.0087 ........ 0.0244 ...... 0.0058 ........ 0.0048
8-bit int #3 ......... 0.0084 ........ 0.0241 ...... 0.0055 ........ 0.0049
16-bit int #1 ........ 0.0112 ........ 0.0271 ...... 0.0048 ........ 0.0045
16-bit int #2 ........ 0.0124 ........ 0.0292 ...... 0.0057 ........ 0.0049
16-bit int #3 ........ 0.0118 ........ 0.0270 ...... 0.0058 ........ 0.0050
32-bit int #1 ........ 0.0137 ........ 0.0366 ...... 0.0058 ........ 0.0051
32-bit int #2 ........ 0.0133 ........ 0.0366 ...... 0.0056 ........ 0.0049
32-bit int #3 ........ 0.0129 ........ 0.0350 ...... 0.0052 ........ 0.0048
64-bit int #1 ........ 0.0145 ........ 0.0254 ...... 0.0034 ........ 0.0025
64-bit int #2 ........ 0.0097 ........ 0.0214 ...... 0.0034 ........ 0.0025
64-bit int #3 ........ 0.0096 ........ 0.0287 ...... 0.0059 ........ 0.0050
64-bit int #4 ........ 0.0143 ........ 0.0277 ...... 0.0059 ........ 0.0046
64-bit float #1 ...... 0.0134 ........ 0.0281 ...... 0.0057 ........ 0.0052
64-bit float #2 ...... 0.0141 ........ 0.0281 ...... 0.0057 ........ 0.0050
64-bit float #3 ...... 0.0144 ........ 0.0282 ...... 0.0057 ........ 0.0050
fix string #1 ........ 0.0036 ........ 0.0143 ...... 0.0066 ........ 0.0053
fix string #2 ........ 0.0107 ........ 0.0222 ...... 0.0065 ........ 0.0068
fix string #3 ........ 0.0116 ........ 0.0245 ...... 0.0063 ........ 0.0069
fix string #4 ........ 0.0105 ........ 0.0253 ...... 0.0083 ........ 0.0077
8-bit string #1 ...... 0.0126 ........ 0.0318 ...... 0.0075 ........ 0.0088
8-bit string #2 ...... 0.0121 ........ 0.0295 ...... 0.0076 ........ 0.0086
8-bit string #3 ...... 0.0125 ........ 0.0293 ...... 0.0130 ........ 0.0093
16-bit string #1 ..... 0.0159 ........ 0.0368 ...... 0.0117 ........ 0.0086
16-bit string #2 ..... 0.1547 ........ 0.1686 ...... 0.1516 ........ 0.1373
32-bit string ........ 0.1558 ........ 0.1729 ...... 0.1511 ........ 0.1396
wide char string #1 .. 0.0098 ........ 0.0237 ...... 0.0066 ........ 0.0065
wide char string #2 .. 0.0128 ........ 0.0291 ...... 0.0061 ........ 0.0082
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0040 ........ 0.0129 ...... 0.0120 ........ 0.0058
fix array #2 ......... 0.0279 ........ 0.0390 ...... 0.0143 ........ 0.0165
fix array #3 ......... 0.0415 ........ 0.0463 ...... 0.0162 ........ 0.0187
16-bit array #1 ...... 0.1349 ........ 0.1628 ...... 0.0334 ........ 0.0341
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0345 ........ 0.0391 ...... 0.0143 ........ 0.0168
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0459 ........ 0.0473 ...... 0.0151 ........ 0.0163
16-bit map #1 ........ 0.2518 ........ 0.2962 ...... 0.0400 ........ 0.0490
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.2380 ........ 0.2682 ...... 0.0545 ........ 0.0579
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  1.5625          2.3866        0.7735          0.7243
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

With JIT:

MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php

Example output

Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000

===========================================================================
Test/Target            Packer  BufferUnpacker  msgpack_pack  msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0001 ........ 0.0052 ...... 0.0053 ........ 0.0042
false ................ 0.0007 ........ 0.0060 ...... 0.0057 ........ 0.0043
true ................. 0.0008 ........ 0.0060 ...... 0.0056 ........ 0.0041
7-bit uint #1 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0041
7-bit uint #2 ........ 0.0021 ........ 0.0043 ...... 0.0062 ........ 0.0041
7-bit uint #3 ........ 0.0022 ........ 0.0044 ...... 0.0061 ........ 0.0040
5-bit sint #1 ........ 0.0030 ........ 0.0048 ...... 0.0062 ........ 0.0040
5-bit sint #2 ........ 0.0032 ........ 0.0046 ...... 0.0062 ........ 0.0040
5-bit sint #3 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0040
8-bit uint #1 ........ 0.0054 ........ 0.0079 ...... 0.0062 ........ 0.0050
8-bit uint #2 ........ 0.0051 ........ 0.0079 ...... 0.0064 ........ 0.0044
8-bit uint #3 ........ 0.0051 ........ 0.0082 ...... 0.0062 ........ 0.0044
16-bit uint #1 ....... 0.0077 ........ 0.0094 ...... 0.0065 ........ 0.0045
16-bit uint #2 ....... 0.0077 ........ 0.0094 ...... 0.0063 ........ 0.0045
16-bit uint #3 ....... 0.0077 ........ 0.0095 ...... 0.0064 ........ 0.0047
32-bit uint #1 ....... 0.0088 ........ 0.0119 ...... 0.0063 ........ 0.0043
32-bit uint #2 ....... 0.0089 ........ 0.0117 ...... 0.0062 ........ 0.0039
32-bit uint #3 ....... 0.0089 ........ 0.0118 ...... 0.0063 ........ 0.0044
64-bit uint #1 ....... 0.0097 ........ 0.0155 ...... 0.0063 ........ 0.0045
64-bit uint #2 ....... 0.0095 ........ 0.0153 ...... 0.0061 ........ 0.0045
64-bit uint #3 ....... 0.0096 ........ 0.0156 ...... 0.0063 ........ 0.0047
8-bit int #1 ......... 0.0053 ........ 0.0083 ...... 0.0062 ........ 0.0044
8-bit int #2 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0044
8-bit int #3 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0043
16-bit int #1 ........ 0.0089 ........ 0.0097 ...... 0.0069 ........ 0.0046
16-bit int #2 ........ 0.0075 ........ 0.0093 ...... 0.0063 ........ 0.0043
16-bit int #3 ........ 0.0075 ........ 0.0094 ...... 0.0062 ........ 0.0046
32-bit int #1 ........ 0.0086 ........ 0.0122 ...... 0.0063 ........ 0.0044
32-bit int #2 ........ 0.0087 ........ 0.0120 ...... 0.0066 ........ 0.0046
32-bit int #3 ........ 0.0086 ........ 0.0121 ...... 0.0060 ........ 0.0044
64-bit int #1 ........ 0.0096 ........ 0.0149 ...... 0.0060 ........ 0.0045
64-bit int #2 ........ 0.0096 ........ 0.0157 ...... 0.0062 ........ 0.0044
64-bit int #3 ........ 0.0096 ........ 0.0160 ...... 0.0063 ........ 0.0046
64-bit int #4 ........ 0.0097 ........ 0.0157 ...... 0.0061 ........ 0.0044
64-bit float #1 ...... 0.0079 ........ 0.0153 ...... 0.0056 ........ 0.0044
64-bit float #2 ...... 0.0079 ........ 0.0152 ...... 0.0057 ........ 0.0045
64-bit float #3 ...... 0.0079 ........ 0.0155 ...... 0.0057 ........ 0.0044
fix string #1 ........ 0.0010 ........ 0.0045 ...... 0.0071 ........ 0.0044
fix string #2 ........ 0.0048 ........ 0.0075 ...... 0.0070 ........ 0.0060
fix string #3 ........ 0.0048 ........ 0.0086 ...... 0.0068 ........ 0.0060
fix string #4 ........ 0.0050 ........ 0.0088 ...... 0.0070 ........ 0.0059
8-bit string #1 ...... 0.0081 ........ 0.0129 ...... 0.0069 ........ 0.0062
8-bit string #2 ...... 0.0086 ........ 0.0128 ...... 0.0069 ........ 0.0065
8-bit string #3 ...... 0.0086 ........ 0.0126 ...... 0.0115 ........ 0.0065
16-bit string #1 ..... 0.0105 ........ 0.0137 ...... 0.0128 ........ 0.0068
16-bit string #2 ..... 0.1510 ........ 0.1486 ...... 0.1526 ........ 0.1391
32-bit string ........ 0.1517 ........ 0.1475 ...... 0.1504 ........ 0.1370
wide char string #1 .. 0.0044 ........ 0.0085 ...... 0.0067 ........ 0.0057
wide char string #2 .. 0.0081 ........ 0.0125 ...... 0.0069 ........ 0.0063
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0014 ........ 0.0059 ...... 0.0132 ........ 0.0055
fix array #2 ......... 0.0146 ........ 0.0156 ...... 0.0155 ........ 0.0148
fix array #3 ......... 0.0211 ........ 0.0229 ...... 0.0179 ........ 0.0180
16-bit array #1 ...... 0.0673 ........ 0.0498 ...... 0.0343 ........ 0.0388
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0148 ........ 0.0180 ...... 0.0156 ........ 0.0179
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0252 ........ 0.0201 ...... 0.0214 ........ 0.0167
16-bit map #1 ........ 0.1027 ........ 0.0836 ...... 0.0388 ........ 0.0510
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.1104 ........ 0.1010 ...... 0.0556 ........ 0.0602
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total                  0.9642          1.0909        0.8224          0.7213
Skipped                     4               4             4               4
Failed                      0               0            24              17
Ignored                    24              24             0               7

Note that the msgpack extension (v2.1.2) doesn't support ext, bin and UTF-8 str types.

License

The library is released under the MIT License. See the bundled LICENSE file for details.

Author: rybakit
Source Code: https://github.com/rybakit/msgpack.php
License: MIT License

#php 

Lawrence  Lesch

Lawrence Lesch

1677668905

TS-mockito: Mocking Library for TypeScript

TS-mockito

Mocking library for TypeScript inspired by http://mockito.org/

1.x to 2.x migration guide

1.x to 2.x migration guide

Main features

  • Strongly typed
  • IDE autocomplete
  • Mock creation (mock) (also abstract classes) #example
  • Spying on real objects (spy) #example
  • Changing mock behavior (when) via:
  • Checking if methods were called with given arguments (verify)
    • anything, notNull, anyString, anyOfClass etc. - for more flexible comparision
    • once, twice, times, atLeast etc. - allows call count verification #example
    • calledBefore, calledAfter - allows call order verification #example
  • Resetting mock (reset, resetCalls) #example, #example
  • Capturing arguments passed to method (capture) #example
  • Recording multiple behaviors #example
  • Readable error messages (ex. 'Expected "convertNumberToString(strictEqual(3))" to be called 2 time(s). But has been called 1 time(s).')

Installation

npm install ts-mockito --save-dev

Usage

Basics

// Creating mock
let mockedFoo:Foo = mock(Foo);

// Getting instance from mock
let foo:Foo = instance(mockedFoo);

// Using instance in source code
foo.getBar(3);
foo.getBar(5);

// Explicit, readable verification
verify(mockedFoo.getBar(3)).called();
verify(mockedFoo.getBar(anything())).called();

Stubbing method calls

// Creating mock
let mockedFoo:Foo = mock(Foo);

// stub method before execution
when(mockedFoo.getBar(3)).thenReturn('three');

// Getting instance
let foo:Foo = instance(mockedFoo);

// prints three
console.log(foo.getBar(3));

// prints null, because "getBar(999)" was not stubbed
console.log(foo.getBar(999));

Stubbing getter value

// Creating mock
let mockedFoo:Foo = mock(Foo);

// stub getter before execution
when(mockedFoo.sampleGetter).thenReturn('three');

// Getting instance
let foo:Foo = instance(mockedFoo);

// prints three
console.log(foo.sampleGetter);

Stubbing property values that have no getters

Syntax is the same as with getter values.

Please note, that stubbing properties that don't have getters only works if Proxy object is available (ES6).

Call count verification

// Creating mock
let mockedFoo:Foo = mock(Foo);

// Getting instance
let foo:Foo = instance(mockedFoo);

// Some calls
foo.getBar(1);
foo.getBar(2);
foo.getBar(2);
foo.getBar(3);

// Call count verification
verify(mockedFoo.getBar(1)).once();               // was called with arg === 1 only once
verify(mockedFoo.getBar(2)).twice();              // was called with arg === 2 exactly two times
verify(mockedFoo.getBar(between(2, 3))).thrice(); // was called with arg between 2-3 exactly three times
verify(mockedFoo.getBar(anyNumber()).times(4);    // was called with any number arg exactly four times
verify(mockedFoo.getBar(2)).atLeast(2);           // was called with arg === 2 min two times
verify(mockedFoo.getBar(anything())).atMost(4);   // was called with any argument max four times
verify(mockedFoo.getBar(4)).never();              // was never called with arg === 4

Call order verification

// Creating mock
let mockedFoo:Foo = mock(Foo);
let mockedBar:Bar = mock(Bar);

// Getting instance
let foo:Foo = instance(mockedFoo);
let bar:Bar = instance(mockedBar);

// Some calls
foo.getBar(1);
bar.getFoo(2);

// Call order verification
verify(mockedFoo.getBar(1)).calledBefore(mockedBar.getFoo(2));    // foo.getBar(1) has been called before bar.getFoo(2)
verify(mockedBar.getFoo(2)).calledAfter(mockedFoo.getBar(1));    // bar.getFoo(2) has been called before foo.getBar(1)
verify(mockedFoo.getBar(1)).calledBefore(mockedBar.getFoo(999999));    // throws error (mockedBar.getFoo(999999) has never been called)

Throwing errors

let mockedFoo:Foo = mock(Foo);

when(mockedFoo.getBar(10)).thenThrow(new Error('fatal error'));

let foo:Foo = instance(mockedFoo);
try {
    foo.getBar(10);
} catch (error:Error) {
    console.log(error.message); // 'fatal error'
}

Custom function

You can also stub method with your own implementation

let mockedFoo:Foo = mock(Foo);
let foo:Foo = instance(mockedFoo);

when(mockedFoo.sumTwoNumbers(anyNumber(), anyNumber())).thenCall((arg1:number, arg2:number) => {
    return arg1 * arg2; 
});

// prints '50' because we've changed sum method implementation to multiply!
console.log(foo.sumTwoNumbers(5, 10));

Resolving / rejecting promises

You can also stub method to resolve / reject promise

let mockedFoo:Foo = mock(Foo);

when(mockedFoo.fetchData("a")).thenResolve({id: "a", value: "Hello world"});
when(mockedFoo.fetchData("b")).thenReject(new Error("b does not exist"));

Resetting mock calls

You can reset just mock call counter

// Creating mock
let mockedFoo:Foo = mock(Foo);

// Getting instance
let foo:Foo = instance(mockedFoo);

// Some calls
foo.getBar(1);
foo.getBar(1);
verify(mockedFoo.getBar(1)).twice();      // getBar with arg "1" has been called twice

// Reset mock
resetCalls(mockedFoo);

// Call count verification
verify(mockedFoo.getBar(1)).never();      // has never been called after reset

You can also reset calls of multiple mocks at once resetCalls(firstMock, secondMock, thirdMock)

Resetting mock

Or reset mock call counter with all stubs

// Creating mock
let mockedFoo:Foo = mock(Foo);
when(mockedFoo.getBar(1)).thenReturn("one").

// Getting instance
let foo:Foo = instance(mockedFoo);

// Some calls
console.log(foo.getBar(1));               // "one" - as defined in stub
console.log(foo.getBar(1));               // "one" - as defined in stub
verify(mockedFoo.getBar(1)).twice();      // getBar with arg "1" has been called twice

// Reset mock
reset(mockedFoo);

// Call count verification
verify(mockedFoo.getBar(1)).never();      // has never been called after reset
console.log(foo.getBar(1));               // null - previously added stub has been removed

You can also reset multiple mocks at once reset(firstMock, secondMock, thirdMock)

Capturing method arguments

let mockedFoo:Foo = mock(Foo);
let foo:Foo = instance(mockedFoo);

// Call method
foo.sumTwoNumbers(1, 2);

// Check first arg captor values
const [firstArg, secondArg] = capture(mockedFoo.sumTwoNumbers).last();
console.log(firstArg);    // prints 1
console.log(secondArg);    // prints 2

You can also get other calls using first(), second(), byCallIndex(3) and more...

Recording multiple behaviors

You can set multiple returning values for same matching values

const mockedFoo:Foo = mock(Foo);

when(mockedFoo.getBar(anyNumber())).thenReturn('one').thenReturn('two').thenReturn('three');

const foo:Foo = instance(mockedFoo);

console.log(foo.getBar(1));    // one
console.log(foo.getBar(1));    // two
console.log(foo.getBar(1));    // three
console.log(foo.getBar(1));    // three - last defined behavior will be repeated infinitely

Another example with specific values

let mockedFoo:Foo = mock(Foo);

when(mockedFoo.getBar(1)).thenReturn('one').thenReturn('another one');
when(mockedFoo.getBar(2)).thenReturn('two');

let foo:Foo = instance(mockedFoo);

console.log(foo.getBar(1));    // one
console.log(foo.getBar(2));    // two
console.log(foo.getBar(1));    // another one
console.log(foo.getBar(1));    // another one - this is last defined behavior for arg '1' so it will be repeated
console.log(foo.getBar(2));    // two
console.log(foo.getBar(2));    // two - this is last defined behavior for arg '2' so it will be repeated

Short notation:

const mockedFoo:Foo = mock(Foo);

// You can specify return values as multiple thenReturn args
when(mockedFoo.getBar(anyNumber())).thenReturn('one', 'two', 'three');

const foo:Foo = instance(mockedFoo);

console.log(foo.getBar(1));    // one
console.log(foo.getBar(1));    // two
console.log(foo.getBar(1));    // three
console.log(foo.getBar(1));    // three - last defined behavior will be repeated infinity

Possible errors:

const mockedFoo:Foo = mock(Foo);

// When multiple matchers, matches same result:
when(mockedFoo.getBar(anyNumber())).thenReturn('one');
when(mockedFoo.getBar(3)).thenReturn('one');

const foo:Foo = instance(mockedFoo);
foo.getBar(3); // MultipleMatchersMatchSameStubError will be thrown, two matchers match same method call

Mocking interfaces

You can mock interfaces too, just instead of passing type to mock function, set mock function generic type Mocking interfaces requires Proxy implementation

let mockedFoo:Foo = mock<FooInterface>(); // instead of mock(FooInterface)
const foo: SampleGeneric<FooInterface> = instance(mockedFoo);

Mocking types

You can mock abstract classes

const mockedFoo: SampleAbstractClass = mock(SampleAbstractClass);
const foo: SampleAbstractClass = instance(mockedFoo);

You can also mock generic classes, but note that generic type is just needed by mock type definition

const mockedFoo: SampleGeneric<SampleInterface> = mock(SampleGeneric);
const foo: SampleGeneric<SampleInterface> = instance(mockedFoo);

Spying on real objects

You can partially mock an existing instance:

const foo: Foo = new Foo();
const spiedFoo = spy(foo);

when(spiedFoo.getBar(3)).thenReturn('one');

console.log(foo.getBar(3)); // 'one'
console.log(foo.getBaz()); // call to a real method

You can spy on plain objects too:

const foo = { bar: () => 42 };
const spiedFoo = spy(foo);

foo.bar();

console.log(capture(spiedFoo.bar).last()); // [42] 

Thanks


Download Details:

Author: NagRock
Source Code: https://github.com/NagRock/ts-mockito 
License: MIT license

#typescript #testing #mock 

Shardul Bhatt

Shardul Bhatt

1626775355

Why use Python for Software Development

No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas. 

By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities. 

Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly. 

5 Reasons to Utilize Python for Programming Web Apps 

Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.

Robust frameworks 

Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions. 

Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events. 

Simple to read and compose 

Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building. 

The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties. 

Utilized by the best 

Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player. 

Massive community support 

Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions. 

Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking. 

Progressive applications 

Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.

The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.

Summary

Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential. 

The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.

#python development services #python development company #python app development #python development #python in web development #python software development

Garry Taylor

Garry Taylor

1669952228

Dijkstra's Algorithm Explained with Examples

In this tutorial, you'll learn: What is Dijkstra's Algorithm and how Dijkstra's algorithm works with the help of visual guides.

You can use algorithms in programming to solve specific problems through a set of precise instructions or procedures.

Dijkstra's algorithm is one of many graph algorithms you'll come across. It is used to find the shortest path from a fixed node to all other nodes in a graph.

There are different representations of Dijkstra's algorithm. You can either find the shortest path between two nodes, or the shortest path from a fixed node to the rest of the nodes in a graph.

In this article, you'll learn how Dijkstra's algorithm works with the help of visual guides.

How Does Dijkstra’s Algorithm Work?

Before we dive into more detailed visual examples, you need to understand how Dijkstra's algorithm works.

Although the theoretical explanation may seem a bit abstract, it'll help you understand the practical aspect better.

In a given graph containing different nodes, we are required to get the shortest path from a given node to the rest of the nodes.

These nodes can represent any object like the names of cities, letters, and so on.

Between each node is a number denoting the distance between two nodes, as you can see in the image below:

nodes-1

We usually work with two arrays – one for visited nodes, and another for unvisited nodes. You'll learn more about the arrays in the next section.

When a node is visited, the algorithm calculates how long it took to get to the node and stores the distance. If a shorter path to a node is found, the initial value assigned for the distance is updated.

Note that a node cannot be visited twice.

The algorithm runs recursively until all the nodes have been visited.

Dijkstra's Algorithm Example

In this section, we'll take a look at a practical example that shows how Dijkstra's algorithm works.

Here's the graph we'll be working with:

nodes

We'll use the table below to put down the visited nodes and their distance from the fixed node:

NODESHORTEST DISTANCE FROM FIXED NODE
A
B
C
D
E

Visited nodes = []
Unvisited nodes = [A,B,C,D,E]

Above, we have a table showing each node and the shortest distance from the that node to the fixed node. We are yet to choose the fixed node.

Note that the distance for each node in the table is currently denoted as infinity (∞). This is because we don't know the shortest distance yet.

We also have two arrays – visited and unvisited. Whenever a node is visited, it is added to the visited nodes array.

Let's get started!

To simplify things, I'll break the process down into iterations. You'll see what happens in each step with the aid of diagrams.

Iteration #1

The first iteration might seem confusing, but that's totally fine. Once we start repeating the process in each iteration, you'll have a clearer picture of how the algorithm works.

Step #1 - Pick an unvisited node

We'll choose A as the fixed node. So we'll find the shortest distance from A to every other node in the graph.

node1-1

We're going to give A a distance of 0 because it is the initial node. So the table would look like this:

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B
C
D
E

Step #2 - Find the distance from current nodenode1a-3

The next thing to do after choosing a node is to find the distance from it to the unvisited nodes around it.

The two unvisited nodes directly linked to A are B and C.

To get the distance from A to B:

0 + 4 = 4

0 being the value of the current node (A), and 4 being the distance between A and B in the graph.

To get the distance from A to C:

0 + 2 = 2

Step #3 - Update table with known distances

In the last step, we got 4 and 2 as the values of B and C respectively. So we'll update the table with those values:

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B4
C2
D
E

Step #4 - Update arrays

At this point, the first iteration is complete. We'll move node A to the visited nodes array:

Visited nodes = [A]
Unvisited nodes = [B,C,D,E]

Before we proceed to the next iteration, you should know the following:

  • Once a node has been visited, it cannot be linked to the current node. Refer to step #2 in the iteration above and step #2 in the next iteration.
  • A node cannot be visited twice.
  • You can only update the shortest known distance if you get a value smaller than the recorded distance.

Iteration #2

Step #1 - Pick an unvisited node

We have four unvisited nodes — [B,C,D,E]. So how do you know which node to pick for the next iteration?

Well, we pick the node with the smallest known distance recorded in the table. Here's the table:

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B4
C2
D
E

So we're going with node C.

node2-2

Step #2 - Find the distance from current node

To find the distance from the current node to the fixed node, we have to consider the nodes linked to the current node.

The nodes linked to the current node are A and B.

But A has been visited in the previous iteration so it will not be linked to the current node. That is:

node2a-1

From the diagram above,

  • The green color denotes the current node.
  • The blue color denotes the visited nodes. We cannot link to them or visit them again.
  • The red color shows the link from the unvisited nodes to the current node.

To find the distance from C to B:

2 + 1 = 3

2 above is recorded distance for node C while 1 is the distance between C and B in the graph.

Step #3 - Update table with known distances

In the last step, we got the value of B to be 3. In the first iteration, it was 4.

We're going to update the distance in the table to 3.

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B3
C2
D
E

So, A --> B = 4 (First iteration).

A --> C --> B = 3 (Second iteration).

The algorithm has helped us find the shortest path to B from A.

Step #4 - Update arrays

We're done with the last visited node. Let's add it to the visited nodes array:

Visited nodes = [A,C]
Unvisited nodes = [B,D,E]

Iteration #3

Step #1 - Pick an unvisited node

We're down to three unvisited nodes — [B,D,E]. From the array, B has the shortest known distance.

node3-2

To restate what is going on in the diagram above:

  • The green color denotes the current node.
  • The blue color denotes the visited nodes. We cannot link to them or visit them again.
  • The red color shows the link from the unvisited nodes to the current node.

Step #2 - Find the distance from current node

The nodes linked to the current node are D and E.

B (the current node) has a value of 3. Therefore,

For node D, 3 + 3 = 6.

For node E, 3 + 2 = 5.

Step #3 - Update table with known distances

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B3
C2
D6
E5

Step #4 - Update arrays

Visited nodes = [A,C,B]
Unvisited nodes = [D,E]

Iteration #4

Step #1 - Pick an unvisited node

Like other iterations, we'll go with the unvisited node with the shortest known distance. That is E.

node4-1

Step #2 - Find the distance from current node

According to our table, E has a value of 5.

For D in the current iteration,

5 + 5 = 10.

The value gotten for D here is 10, which is greater than the recorded value of 6 in the previous iteration. For this reason, we'll not update the table.

Step #3 - Update table with known distances

Our table remains the same:

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B3
C2
D6
E5

Step #4 - Update arrays

Visited nodes = [A,C,B,E]
Unvisited nodes = [D]

Iteration #5

Step #1 - Pick an unvisited node

We're currently left with one node in the unvisited array — D.

node5-1

Step #2 - Find the distance from current node

The algorithm has gotten to the last iteration. This is because all nodes linked to the current node have been visited already so we can't link to them.

Step #3 - Update table with known distances

Our table remains the same:

NODESHORTEST DISTANCE FROM FIXED NODE
A0
B3
C2
D6
E5

At this point, we have updated the table with the shortest distance from the fixed node to every other node in the graph.

Step #4 - Update arrays

Visited nodes = [A,C,B,E,D]
Unvisited nodes = []

As can be seen above, we have no nodes left to visit. Using Dijkstra's algorithm, we've found the shortest distance from the fixed node to others nodes in the graph.

Dijkstra's Algorithm Pseudocode Example

The pseudocode example in this section was gotten from Wikipedia. Here it is:

 1  function Dijkstra(Graph, source):
 2      
 3      for each vertex v in Graph.Vertices:
 4          dist[v] ← INFINITY
 5          prev[v] ← UNDEFINED
 6          add v to Q
 7      dist[source] ← 0
 8      
 9      while Q is not empty:
10          u ← vertex in Q with min dist[u]
11          remove u from Q
12          
13          for each neighbor v of u still in Q:
14              alt ← dist[u] + Graph.Edges(u, v)
15              if alt < dist[v]:
16                  dist[v] ← alt
17                  prev[v] ← u
18
19      return dist[], prev[]

Applications of Dijkstra's Algorithm

Here are some of the common applications of Dijkstra's algorithm:

  • In maps to get the shortest distance between locations. An example is Google Maps.
  • In telecommunications to determine transmission rate.
  • In robotic design to determine shortest path for automated robots.

Summary

In this article, we talked about Dijkstra's algorithm. It is used to find the shortest distance from a fixed node to all other nodes in a graph.

We started by giving a brief summary of how the algorithm works.

We then had a look at an example that further explained Dijkstra's algorithm in steps using visual guides.

We concluded with a pseudocode example and some of the applications of Dijkstra's algorithm.

Happy coding!

Original article source at https://www.freecodecamp.org

#algorithm #datastructures

Chatgpt-api: Node.js client for the official ChatGPT API

ChatGPT API

Node.js client for the official ChatGPT API.

Intro

This package is a Node.js wrapper around ChatGPT by OpenAI. TS batteries included. ✨

Example usage

Updates

March 1, 2023

The official OpenAI chat completions API has been released, and it is now the default for this package! 🔥

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes✅️ Real ChatGPT models
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT

Note: We strongly recommend using ChatGPTAPI since it uses the officially supported API from OpenAI. We may remove support for ChatGPTUnofficialProxyAPI in a future release.

  1. ChatGPTAPI - Uses the gpt-3.5-turbo-0301 model with the official OpenAI chat completions API (official, robust approach, but it's not free)
  2. ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)

CLI

To run the CLI, you'll need an OpenAI API key:

export OPENAI_API_KEY="sk-TODO"
npx chatgpt "your prompt here"

By default, the response is streamed to stdout, the results are stored in a local config file, and every invocation starts a new conversation. You can use -c to continue the previous conversation and --no-stream to disable streaming.

Under the hood, the CLI uses ChatGPTAPI with text-davinci-003 to mimic ChatGPT.

Usage:
  $ chatgpt <prompt>

Commands:
  <prompt>  Ask ChatGPT a question
  rm-cache  Clears the local message cache
  ls-cache  Prints the local message cache path

For more info, run any command with the `--help` flag:
  $ chatgpt --help
  $ chatgpt rm-cache --help
  $ chatgpt ls-cache --help

Options:
  -c, --continue          Continue last conversation (default: false)
  -d, --debug             Enables debug logging (default: false)
  -s, --stream            Streams the response (default: true)
  -s, --store             Enables the local message cache (default: true)
  -t, --timeout           Timeout in milliseconds
  -k, --apiKey            OpenAI API key
  -n, --conversationName  Unique name for the conversation
  -h, --help              Display this message
  -v, --version           Display version number

Install

npm install chatgpt

Make sure you're using node >= 18 so fetch is available (or node >= 14 if you install a fetch polyfill).

Usage

To use this module from Node.js, you need to pick between two methods:

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes✅️ Real ChatGPT models
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT

ChatGPTAPI - Uses the gpt-3.5-turbo-0301 model with the official OpenAI chat completions API (official, robust approach, but it's not free). You can override the model, completion params, and system message to fully customize your assistant.

ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)

Both approaches have very similar APIs, so it should be simple to swap between them.

Note: We strongly recommend using ChatGPTAPI since it uses the officially supported API from OpenAI. We may remove support for ChatGPTUnofficialProxyAPI in a future release.

Usage - ChatGPTAPI

Sign up for an OpenAI API key and store it in your environment.

import { ChatGPTAPI } from 'chatgpt'

async function example() {
  const api = new ChatGPTAPI({
    apiKey: process.env.OPENAI_API_KEY
  })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

You can override the default model (gpt-3.5-turbo-0301) and any OpenAI chat completion params using completionParams:

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY,
  completionParams: {
    temperature: 0.5,
    top_p: 0.8
  }
})

If you want to track the conversation, you'll need to pass the parentMessageId like this:

const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

// send a message and wait for the response
let res = await api.sendMessage('What is OpenAI?')
console.log(res.text)

// send a follow-up
res = await api.sendMessage('Can you expand on that?', {
  parentMessageId: res.id
})
console.log(res.text)

// send another follow-up
res = await api.sendMessage('What were we talking about?', {
  parentMessageId: res.id
})
console.log(res.text)

You can add streaming via the onProgress handler:

const res = await api.sendMessage('Write a 500 word essay on frogs.', {
  // print the partial response as the AI is "typing"
  onProgress: (partialResponse) => console.log(partialResponse.text)
})

// print the full text at the end
console.log(res.text)

You can add a timeout using the timeoutMs option:

// timeout after 2 minutes (which will also abort the underlying HTTP request)
const response = await api.sendMessage(
  'write me a really really long essay on frogs',
  {
    timeoutMs: 2 * 60 * 1000
  }
)

If you want to see more info about what's actually being sent to OpenAI's chat completions API, set the debug: true option in the ChatGPTAPI constructor:

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY,
  debug: true
})

We default to a basic systemMessage. You can override this in either the ChatGPTAPI constructor or sendMessage:

const res = await api.sendMessage('what is the answer to the universe?', {
  systemMessage: `You are ChatGPT, a large language model trained by OpenAI. You answer as concisely as possible for each responseIf you are generating a list, do not have too many items.
Current date: ${new Date().toISOString()}\n\n`
})

Note that we automatically handle appending the previous messages to the prompt and attempt to optimize for the available tokens (which defaults to 4096).

Usage in CommonJS (Dynamic import)

async function example() {
  // To use ESM in CommonJS, you can use a dynamic import
  const { ChatGPTAPI } = await import('chatgpt')

  const api = new ChatGPTAPI({ apiKey: process.env.OPENAI_API_KEY })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

Usage - ChatGPTUnofficialProxyAPI

The API for ChatGPTUnofficialProxyAPI is almost exactly the same. You just need to provide a ChatGPT accessToken instead of an OpenAI API key.

import { ChatGPTUnofficialProxyAPI } from 'chatgpt'

async function example() {
  const api = new ChatGPTUnofficialProxyAPI({
    accessToken: process.env.OPENAI_ACCESS_TOKEN
  })

  const res = await api.sendMessage('Hello World!')
  console.log(res.text)
}

See demos/demo-reverse-proxy for a full example:

npx tsx demos/demo-reverse-proxy.ts

ChatGPTUnofficialProxyAPI messages also contain a conversationid in addition to parentMessageId, since the ChatGPT webapp can't reference messages across

Reverse Proxy

You can override the reverse proxy by passing apiReverseProxyUrl:

const api = new ChatGPTUnofficialProxyAPI({
  accessToken: process.env.OPENAI_ACCESS_TOKEN,
  apiReverseProxyUrl: 'https://your-example-server.com/api/conversation'
})

Known reverse proxies run by community members include:

Reverse Proxy URLAuthorRate LimitsLast Checked
https://chat.duti.tech/api/conversation@acheong08120 req/min by IP2/19/2023
https://gpt.pawan.krd/backend-api/conversation@PawanOsman?2/19/2023

Note: info on how the reverse proxies work is not being published at this time in order to prevent OpenAI from disabling access.

Access Token

To use ChatGPTUnofficialProxyAPI, you'll need an OpenAI access token from the ChatGPT webapp. To do this, you can use any of the following methods which take an email and password and return an access token:

These libraries work with email + password accounts (e.g., they do not support accounts where you auth via Microsoft / Google).

Alternatively, you can manually get an accessToken by logging in to the ChatGPT webapp and then opening https://chat.openai.com/api/auth/session, which will return a JSON object containing your accessToken string.

Access tokens last for days.

Note: using a reverse proxy will expose your access token to a third-party. There shouldn't be any adverse effects possible from this, but please consider the risks before using this method.

Docs

See the auto-generated docs for more info on methods and parameters.

Demos

Most of the demos use ChatGPTAPI. It should be pretty easy to convert them to use ChatGPTUnofficialProxyAPI if you'd rather use that approach. The only thing that needs to change is how you initialize the api with an accessToken instead of an apiKey.

To run the included demos:

  1. clone repo
  2. install node deps
  3. set OPENAI_API_KEY in .env

A basic demo is included for testing purposes:

npx tsx demos/demo.ts

A demo showing on progress handler:

npx tsx demos/demo-on-progress.ts

The on progress demo uses the optional onProgress parameter to sendMessage to receive intermediary results as ChatGPT is "typing".

A conversation demo:

npx tsx demos/demo-conversation.ts

A persistence demo shows how to store messages in Redis for persistence:

npx tsx demos/demo-persistence.ts

Any keyv adaptor is supported for persistence, and there are overrides if you'd like to use a different way of storing / retrieving messages.

Note that persisting message is required for remembering the context of previous conversations beyond the scope of the current Node.js process, since by default, we only store messages in memory. Here's an external demo of using a completely custom database solution to persist messages.

Note: Persistence is handled automatically when using ChatGPTUnofficialProxyAPI because it is connecting indirectly to ChatGPT.

Projects

All of these awesome projects are built using the chatgpt package. 🤯

If you create a cool integration, feel free to open a PR and add it to the list.

Compatibility

  • This package is ESM-only.
  • This package supports node >= 14.
  • This module assumes that fetch is installed.
    • In node >= 18, it's installed by default.
    • In node < 18, you need to install a polyfill like unfetch/polyfill (guide) or isomorphic-fetch (guide).
  • If you want to build a website using chatgpt, we recommend using it only from your backend API

Credits


Previous Updates

Feb 19, 2023
 

We now provide three ways of accessing the unofficial ChatGPT API, all of which have tradeoffs:

MethodFree?Robust?Quality?
ChatGPTAPI❌ No✅ Yes☑️ Mimics ChatGPT
ChatGPTUnofficialProxyAPI✅ Yes☑️ Maybe✅ Real ChatGPT
ChatGPTAPIBrowser (v3)✅ Yes❌ No✅ Real ChatGPT

Note: I recommend that you use either ChatGPTAPI or ChatGPTUnofficialProxyAPI.

  1. ChatGPTAPI - Uses text-davinci-003 to mimic ChatGPT via the official OpenAI completions API (most robust approach, but it's not free and doesn't use a model fine-tuned for chat)
  2. ChatGPTUnofficialProxyAPI - Uses an unofficial proxy server to access ChatGPT's backend API in a way that circumvents Cloudflare (uses the real ChatGPT and is pretty lightweight, but relies on a third-party server and is rate-limited)
  3. ChatGPTAPIBrowser - (deprecated; v3.5.1 of this package) Uses Puppeteer to access the official ChatGPT webapp (uses the real ChatGPT, but very flaky, heavyweight, and error prone)

Feb 5, 2023
 

OpenAI has disabled the leaked chat model we were previously using, so we're now defaulting to text-davinci-003, which is not free.

We've found several other hidden, fine-tuned chat models, but OpenAI keeps disabling them, so we're searching for alternative workarounds.

Feb 1, 2023
 

This package no longer requires any browser hacks – it is now using the official OpenAI completions API with a leaked model that ChatGPT uses under the hood. 🔥

import { ChatGPTAPI } from 'chatgpt'

const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY
})

const res = await api.sendMessage('Hello World!')
console.log(res.text)

Please upgrade to chatgpt@latest (at least v4.0.0). The updated version is significantly more lightweight and robust compared with previous versions. You also don't have to worry about IP issues or rate limiting.

Huge shoutout to @waylaidwanderer for discovering the leaked chat model!

If you run into any issues, we do have a pretty active Discord with a bunch of ChatGPT hackers from the Node.js & Python communities.

Lastly, please consider starring this repo and following me on twitter twitter to help support the project.

Thanks && cheers, Travis


Download Details:

Author: Transitive-bullshit
Source Code: https://github.com/transitive-bullshit/chatgpt-api 
License: MIT license

#chatgpt #api #node #AI #openai #chatbot