1606059664
PyTorch Basics and Gradient Descent | Deep Learning with PyTorch: Zero to GANs | Part 1 of 6
“Deep Learning with PyTorch: Zero to GANs” is a beginner-friendly online course offering a practical and coding-focused introduction to deep learning using the PyTorch framework. Learn more and register for a certificate of accomplishment here: http://zerotogans.com/
Code and Resources:
PyTorch basics: https://jovian.ai/aakashns/01-pytorch-basics
Linear regression: https://jovian.ai/aakashns/02-linear-regression
Machine learning: https://jovian.ai/aakashns/machine-learning-intro
Discussion forum: https://jovian.ai/forum/c/pytorch-zero-to-gans/lecture-1-pytorch-basics-linear-regression/63
Topics covered in this video:
Introduction to machine learning and Jupyter notebooks
PyTorch basics: tensors, gradients, and autograd
Linear regression & gradient descent from scratch
Using PyTorch modules: nn.Linear & nn.functional
This course is taught by Aakash N S, co-founder & CEO of Jovian - a data science platform and global community.
#deep-learning #pytorch #python #artificial-intelligence #machine-learning
1653475560
msgpack.php
A pure PHP implementation of the MessagePack serialization format.
The recommended way to install the library is through Composer:
composer require rybakit/msgpack
To pack values you can either use an instance of a Packer
:
$packer = new Packer();
$packed = $packer->pack($value);
or call a static method on the MessagePack
class:
$packed = MessagePack::pack($value);
In the examples above, the method pack
automatically packs a value depending on its type. However, not all PHP types can be uniquely translated to MessagePack types. For example, the MessagePack format defines map
and array
types, which are represented by a single array
type in PHP. By default, the packer will pack a PHP array as a MessagePack array if it has sequential numeric keys, starting from 0
and as a MessagePack map otherwise:
$mpArr1 = $packer->pack([1, 2]); // MP array [1, 2]
$mpArr2 = $packer->pack([0 => 1, 1 => 2]); // MP array [1, 2]
$mpMap1 = $packer->pack([0 => 1, 2 => 3]); // MP map {0: 1, 2: 3}
$mpMap2 = $packer->pack([1 => 2, 2 => 3]); // MP map {1: 2, 2: 3}
$mpMap3 = $packer->pack(['a' => 1, 'b' => 2]); // MP map {a: 1, b: 2}
However, sometimes you need to pack a sequential array as a MessagePack map. To do this, use the packMap
method:
$mpMap = $packer->packMap([1, 2]); // {0: 1, 1: 2}
Here is a list of type-specific packing methods:
$packer->packNil(); // MP nil
$packer->packBool(true); // MP bool
$packer->packInt(42); // MP int
$packer->packFloat(M_PI); // MP float (32 or 64)
$packer->packFloat32(M_PI); // MP float 32
$packer->packFloat64(M_PI); // MP float 64
$packer->packStr('foo'); // MP str
$packer->packBin("\x80"); // MP bin
$packer->packArray([1, 2]); // MP array
$packer->packMap(['a' => 1]); // MP map
$packer->packExt(1, "\xaa"); // MP ext
Check the "Custom types" section below on how to pack custom types.
The Packer
object supports a number of bitmask-based options for fine-tuning the packing process (defaults are in bold):
Name | Description |
---|---|
FORCE_STR | Forces PHP strings to be packed as MessagePack UTF-8 strings |
FORCE_BIN | Forces PHP strings to be packed as MessagePack binary data |
DETECT_STR_BIN | Detects MessagePack str/bin type automatically |
FORCE_ARR | Forces PHP arrays to be packed as MessagePack arrays |
FORCE_MAP | Forces PHP arrays to be packed as MessagePack maps |
DETECT_ARR_MAP | Detects MessagePack array/map type automatically |
FORCE_FLOAT32 | Forces PHP floats to be packed as 32-bits MessagePack floats |
FORCE_FLOAT64 | Forces PHP floats to be packed as 64-bits MessagePack floats |
The type detection mode (
DETECT_STR_BIN
/DETECT_ARR_MAP
) adds some overhead which can be noticed when you pack large (16- and 32-bit) arrays or strings. However, if you know the value type in advance (for example, you only work with UTF-8 strings or/and associative arrays), you can eliminate this overhead by forcing the packer to use the appropriate type, which will save it from running the auto-detection routine. Another option is to explicitly specify the value type. The library provides 2 auxiliary classes for this,Map
andBin
. Check the "Custom types" section below for details.
Examples:
// detect str/bin type and pack PHP 64-bit floats (doubles) to MP 32-bit floats
$packer = new Packer(PackOptions::DETECT_STR_BIN | PackOptions::FORCE_FLOAT32);
// these will throw MessagePack\Exception\InvalidOptionException
$packer = new Packer(PackOptions::FORCE_STR | PackOptions::FORCE_BIN);
$packer = new Packer(PackOptions::FORCE_FLOAT32 | PackOptions::FORCE_FLOAT64);
To unpack data you can either use an instance of a BufferUnpacker
:
$unpacker = new BufferUnpacker();
$unpacker->reset($packed);
$value = $unpacker->unpack();
or call a static method on the MessagePack
class:
$value = MessagePack::unpack($packed);
If the packed data is received in chunks (e.g. when reading from a stream), use the tryUnpack
method, which attempts to unpack data and returns an array of unpacked messages (if any) instead of throwing an InsufficientDataException
:
while ($chunk = ...) {
$unpacker->append($chunk);
if ($messages = $unpacker->tryUnpack()) {
return $messages;
}
}
If you want to unpack from a specific position in a buffer, use seek
:
$unpacker->seek(42); // set position equal to 42 bytes
$unpacker->seek(-8); // set position to 8 bytes before the end of the buffer
To skip bytes from the current position, use skip
:
$unpacker->skip(10); // set position to 10 bytes ahead of the current position
To get the number of remaining (unread) bytes in the buffer:
$unreadBytesCount = $unpacker->getRemainingCount();
To check whether the buffer has unread data:
$hasUnreadBytes = $unpacker->hasRemaining();
If needed, you can remove already read data from the buffer by calling:
$releasedBytesCount = $unpacker->release();
With the read
method you can read raw (packed) data:
$packedData = $unpacker->read(2); // read 2 bytes
Besides the above methods BufferUnpacker
provides type-specific unpacking methods, namely:
$unpacker->unpackNil(); // PHP null
$unpacker->unpackBool(); // PHP bool
$unpacker->unpackInt(); // PHP int
$unpacker->unpackFloat(); // PHP float
$unpacker->unpackStr(); // PHP UTF-8 string
$unpacker->unpackBin(); // PHP binary string
$unpacker->unpackArray(); // PHP sequential array
$unpacker->unpackMap(); // PHP associative array
$unpacker->unpackExt(); // PHP MessagePack\Type\Ext object
The BufferUnpacker
object supports a number of bitmask-based options for fine-tuning the unpacking process (defaults are in bold):
Name | Description |
---|---|
BIGINT_AS_STR | Converts overflowed integers to strings [1] |
BIGINT_AS_GMP | Converts overflowed integers to GMP objects [2] |
BIGINT_AS_DEC | Converts overflowed integers to Decimal\Decimal objects [3] |
1. The binary MessagePack format has unsigned 64-bit as its largest integer data type, but PHP does not support such integers, which means that an overflow can occur during unpacking.
2. Make sure the GMP extension is enabled.
3. Make sure the Decimal extension is enabled.
Examples:
$packedUint64 = "\xcf"."\xff\xff\xff\xff"."\xff\xff\xff\xff";
$unpacker = new BufferUnpacker($packedUint64);
var_dump($unpacker->unpack()); // string(20) "18446744073709551615"
$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_GMP);
var_dump($unpacker->unpack()); // object(GMP) {...}
$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_DEC);
var_dump($unpacker->unpack()); // object(Decimal\Decimal) {...}
In addition to the basic types, the library provides functionality to serialize and deserialize arbitrary types. This can be done in several ways, depending on your use case. Let's take a look at them.
If you need to serialize an instance of one of your classes into one of the basic MessagePack types, the best way to do this is to implement the CanBePacked interface in the class. A good example of such a class is the Map
type class that comes with the library. This type is useful when you want to explicitly specify that a given PHP array should be packed as a MessagePack map without triggering an automatic type detection routine:
$packer = new Packer();
$packedMap = $packer->pack(new Map([1, 2, 3]));
$packedArray = $packer->pack([1, 2, 3]);
More type examples can be found in the src/Type directory.
As with type objects, type transformers are only responsible for serializing values. They should be used when you need to serialize a value that does not implement the CanBePacked interface. Examples of such values could be instances of built-in or third-party classes that you don't own, or non-objects such as resources.
A transformer class must implement the CanPack interface. To use a transformer, it must first be registered in the packer. Here is an example of how to serialize PHP streams into the MessagePack bin
format type using one of the supplied transformers, StreamTransformer
:
$packer = new Packer(null, [new StreamTransformer()]);
$packedBin = $packer->pack(fopen('/path/to/file', 'r+'));
More type transformer examples can be found in the src/TypeTransformer directory.
In contrast to the cases described above, extensions are intended to handle extension types and are responsible for both serialization and deserialization of values (types).
An extension class must implement the Extension interface. To use an extension, it must first be registered in the packer and the unpacker.
The MessagePack specification divides extension types into two groups: predefined and application-specific. Currently, there is only one predefined type in the specification, Timestamp.
Timestamp
The Timestamp extension type is a predefined type. Support for this type in the library is done through the TimestampExtension
class. This class is responsible for handling Timestamp
objects, which represent the number of seconds and optional adjustment in nanoseconds:
$timestampExtension = new TimestampExtension();
$packer = new Packer();
$packer = $packer->extendWith($timestampExtension);
$unpacker = new BufferUnpacker();
$unpacker = $unpacker->extendWith($timestampExtension);
$packedTimestamp = $packer->pack(Timestamp::now());
$timestamp = $unpacker->reset($packedTimestamp)->unpack();
$seconds = $timestamp->getSeconds();
$nanoseconds = $timestamp->getNanoseconds();
When using the MessagePack
class, the Timestamp extension is already registered:
$packedTimestamp = MessagePack::pack(Timestamp::now());
$timestamp = MessagePack::unpack($packedTimestamp);
Application-specific extensions
In addition, the format can be extended with your own types. For example, to make the built-in PHP DateTime
objects first-class citizens in your code, you can create a corresponding extension, as shown in the example. Please note, that custom extensions have to be registered with a unique extension ID (an integer from 0
to 127
).
More extension examples can be found in the examples/MessagePack directory.
To learn more about how extension types can be useful, check out this article.
If an error occurs during packing/unpacking, a PackingFailedException
or an UnpackingFailedException
will be thrown, respectively. In addition, an InsufficientDataException
can be thrown during unpacking.
An InvalidOptionException
will be thrown in case an invalid option (or a combination of mutually exclusive options) is used.
Run tests as follows:
vendor/bin/phpunit
Also, if you already have Docker installed, you can run the tests in a docker container. First, create a container:
./dockerfile.sh | docker build -t msgpack -
The command above will create a container named msgpack
with PHP 8.1 runtime. You may change the default runtime by defining the PHP_IMAGE
environment variable:
PHP_IMAGE='php:8.0-cli' ./dockerfile.sh | docker build -t msgpack -
See a list of various images here.
Then run the unit tests:
docker run --rm -v $PWD:/msgpack -w /msgpack msgpack
To ensure that the unpacking works correctly with malformed/semi-malformed data, you can use a testing technique called Fuzzing. The library ships with a help file (target) for PHP-Fuzzer and can be used as follows:
php-fuzzer fuzz tests/fuzz_buffer_unpacker.php
To check performance, run:
php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php
Example output
Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000
=============================================
Test/Target Packer BufferUnpacker
---------------------------------------------
nil .................. 0.0030 ........ 0.0139
false ................ 0.0037 ........ 0.0144
true ................. 0.0040 ........ 0.0137
7-bit uint #1 ........ 0.0052 ........ 0.0120
7-bit uint #2 ........ 0.0059 ........ 0.0114
7-bit uint #3 ........ 0.0061 ........ 0.0119
5-bit sint #1 ........ 0.0067 ........ 0.0126
5-bit sint #2 ........ 0.0064 ........ 0.0132
5-bit sint #3 ........ 0.0066 ........ 0.0135
8-bit uint #1 ........ 0.0078 ........ 0.0200
8-bit uint #2 ........ 0.0077 ........ 0.0212
8-bit uint #3 ........ 0.0086 ........ 0.0203
16-bit uint #1 ....... 0.0111 ........ 0.0271
16-bit uint #2 ....... 0.0115 ........ 0.0260
16-bit uint #3 ....... 0.0103 ........ 0.0273
32-bit uint #1 ....... 0.0116 ........ 0.0326
32-bit uint #2 ....... 0.0118 ........ 0.0332
32-bit uint #3 ....... 0.0127 ........ 0.0325
64-bit uint #1 ....... 0.0140 ........ 0.0277
64-bit uint #2 ....... 0.0134 ........ 0.0294
64-bit uint #3 ....... 0.0134 ........ 0.0281
8-bit int #1 ......... 0.0086 ........ 0.0241
8-bit int #2 ......... 0.0089 ........ 0.0225
8-bit int #3 ......... 0.0085 ........ 0.0229
16-bit int #1 ........ 0.0118 ........ 0.0280
16-bit int #2 ........ 0.0121 ........ 0.0270
16-bit int #3 ........ 0.0109 ........ 0.0274
32-bit int #1 ........ 0.0128 ........ 0.0346
32-bit int #2 ........ 0.0118 ........ 0.0339
32-bit int #3 ........ 0.0135 ........ 0.0368
64-bit int #1 ........ 0.0138 ........ 0.0276
64-bit int #2 ........ 0.0132 ........ 0.0286
64-bit int #3 ........ 0.0137 ........ 0.0274
64-bit int #4 ........ 0.0180 ........ 0.0285
64-bit float #1 ...... 0.0134 ........ 0.0284
64-bit float #2 ...... 0.0125 ........ 0.0275
64-bit float #3 ...... 0.0126 ........ 0.0283
fix string #1 ........ 0.0035 ........ 0.0133
fix string #2 ........ 0.0094 ........ 0.0216
fix string #3 ........ 0.0094 ........ 0.0222
fix string #4 ........ 0.0091 ........ 0.0241
8-bit string #1 ...... 0.0122 ........ 0.0301
8-bit string #2 ...... 0.0118 ........ 0.0304
8-bit string #3 ...... 0.0119 ........ 0.0315
16-bit string #1 ..... 0.0150 ........ 0.0388
16-bit string #2 ..... 0.1545 ........ 0.1665
32-bit string ........ 0.1570 ........ 0.1756
wide char string #1 .. 0.0091 ........ 0.0236
wide char string #2 .. 0.0122 ........ 0.0313
8-bit binary #1 ...... 0.0100 ........ 0.0302
8-bit binary #2 ...... 0.0123 ........ 0.0324
8-bit binary #3 ...... 0.0126 ........ 0.0327
16-bit binary ........ 0.0168 ........ 0.0372
32-bit binary ........ 0.1588 ........ 0.1754
fix array #1 ......... 0.0042 ........ 0.0131
fix array #2 ......... 0.0294 ........ 0.0367
fix array #3 ......... 0.0412 ........ 0.0472
16-bit array #1 ...... 0.1378 ........ 0.1596
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.1865 ........ 0.2283
fix map #1 ........... 0.0725 ........ 0.1048
fix map #2 ........... 0.0319 ........ 0.0405
fix map #3 ........... 0.0356 ........ 0.0665
fix map #4 ........... 0.0465 ........ 0.0497
16-bit map #1 ........ 0.2540 ........ 0.3028
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.2372 ........ 0.2710
fixext 1 ............. 0.0283 ........ 0.0358
fixext 2 ............. 0.0291 ........ 0.0371
fixext 4 ............. 0.0302 ........ 0.0355
fixext 8 ............. 0.0288 ........ 0.0384
fixext 16 ............ 0.0293 ........ 0.0359
8-bit ext ............ 0.0302 ........ 0.0439
16-bit ext ........... 0.0334 ........ 0.0499
32-bit ext ........... 0.1845 ........ 0.1888
32-bit timestamp #1 .. 0.0337 ........ 0.0547
32-bit timestamp #2 .. 0.0335 ........ 0.0560
64-bit timestamp #1 .. 0.0371 ........ 0.0575
64-bit timestamp #2 .. 0.0374 ........ 0.0542
64-bit timestamp #3 .. 0.0356 ........ 0.0533
96-bit timestamp #1 .. 0.0362 ........ 0.0699
96-bit timestamp #2 .. 0.0381 ........ 0.0701
96-bit timestamp #3 .. 0.0367 ........ 0.0687
=============================================
Total 2.7618 4.0820
Skipped 4 4
Failed 0 0
Ignored 0 0
With JIT:
php -n -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php
Example output
Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000
=============================================
Test/Target Packer BufferUnpacker
---------------------------------------------
nil .................. 0.0005 ........ 0.0054
false ................ 0.0004 ........ 0.0059
true ................. 0.0004 ........ 0.0059
7-bit uint #1 ........ 0.0010 ........ 0.0047
7-bit uint #2 ........ 0.0010 ........ 0.0046
7-bit uint #3 ........ 0.0010 ........ 0.0046
5-bit sint #1 ........ 0.0025 ........ 0.0046
5-bit sint #2 ........ 0.0023 ........ 0.0046
5-bit sint #3 ........ 0.0024 ........ 0.0045
8-bit uint #1 ........ 0.0043 ........ 0.0081
8-bit uint #2 ........ 0.0043 ........ 0.0079
8-bit uint #3 ........ 0.0041 ........ 0.0080
16-bit uint #1 ....... 0.0064 ........ 0.0095
16-bit uint #2 ....... 0.0064 ........ 0.0091
16-bit uint #3 ....... 0.0064 ........ 0.0094
32-bit uint #1 ....... 0.0085 ........ 0.0114
32-bit uint #2 ....... 0.0077 ........ 0.0122
32-bit uint #3 ....... 0.0077 ........ 0.0120
64-bit uint #1 ....... 0.0085 ........ 0.0159
64-bit uint #2 ....... 0.0086 ........ 0.0157
64-bit uint #3 ....... 0.0086 ........ 0.0158
8-bit int #1 ......... 0.0042 ........ 0.0080
8-bit int #2 ......... 0.0042 ........ 0.0080
8-bit int #3 ......... 0.0042 ........ 0.0081
16-bit int #1 ........ 0.0065 ........ 0.0095
16-bit int #2 ........ 0.0065 ........ 0.0090
16-bit int #3 ........ 0.0056 ........ 0.0085
32-bit int #1 ........ 0.0067 ........ 0.0107
32-bit int #2 ........ 0.0066 ........ 0.0106
32-bit int #3 ........ 0.0063 ........ 0.0104
64-bit int #1 ........ 0.0072 ........ 0.0162
64-bit int #2 ........ 0.0073 ........ 0.0174
64-bit int #3 ........ 0.0072 ........ 0.0164
64-bit int #4 ........ 0.0077 ........ 0.0161
64-bit float #1 ...... 0.0053 ........ 0.0135
64-bit float #2 ...... 0.0053 ........ 0.0135
64-bit float #3 ...... 0.0052 ........ 0.0135
fix string #1 ....... -0.0002 ........ 0.0044
fix string #2 ........ 0.0035 ........ 0.0067
fix string #3 ........ 0.0035 ........ 0.0077
fix string #4 ........ 0.0033 ........ 0.0078
8-bit string #1 ...... 0.0059 ........ 0.0110
8-bit string #2 ...... 0.0063 ........ 0.0121
8-bit string #3 ...... 0.0064 ........ 0.0124
16-bit string #1 ..... 0.0099 ........ 0.0146
16-bit string #2 ..... 0.1522 ........ 0.1474
32-bit string ........ 0.1511 ........ 0.1483
wide char string #1 .. 0.0039 ........ 0.0084
wide char string #2 .. 0.0073 ........ 0.0123
8-bit binary #1 ...... 0.0040 ........ 0.0112
8-bit binary #2 ...... 0.0075 ........ 0.0123
8-bit binary #3 ...... 0.0077 ........ 0.0129
16-bit binary ........ 0.0096 ........ 0.0145
32-bit binary ........ 0.1535 ........ 0.1479
fix array #1 ......... 0.0008 ........ 0.0061
fix array #2 ......... 0.0121 ........ 0.0165
fix array #3 ......... 0.0193 ........ 0.0222
16-bit array #1 ...... 0.0607 ........ 0.0479
16-bit array #2 ........... S ............. S
32-bit array .............. S ............. S
complex array ........ 0.0749 ........ 0.0824
fix map #1 ........... 0.0329 ........ 0.0431
fix map #2 ........... 0.0161 ........ 0.0189
fix map #3 ........... 0.0205 ........ 0.0262
fix map #4 ........... 0.0252 ........ 0.0205
16-bit map #1 ........ 0.1016 ........ 0.0927
16-bit map #2 ............. S ............. S
32-bit map ................ S ............. S
complex map .......... 0.1096 ........ 0.1030
fixext 1 ............. 0.0157 ........ 0.0161
fixext 2 ............. 0.0175 ........ 0.0183
fixext 4 ............. 0.0156 ........ 0.0185
fixext 8 ............. 0.0163 ........ 0.0184
fixext 16 ............ 0.0164 ........ 0.0182
8-bit ext ............ 0.0158 ........ 0.0207
16-bit ext ........... 0.0203 ........ 0.0219
32-bit ext ........... 0.1614 ........ 0.1539
32-bit timestamp #1 .. 0.0195 ........ 0.0249
32-bit timestamp #2 .. 0.0188 ........ 0.0260
64-bit timestamp #1 .. 0.0207 ........ 0.0281
64-bit timestamp #2 .. 0.0212 ........ 0.0291
64-bit timestamp #3 .. 0.0207 ........ 0.0295
96-bit timestamp #1 .. 0.0222 ........ 0.0358
96-bit timestamp #2 .. 0.0228 ........ 0.0353
96-bit timestamp #3 .. 0.0210 ........ 0.0319
=============================================
Total 1.6432 1.9674
Skipped 4 4
Failed 0 0
Ignored 0 0
You may change default benchmark settings by defining the following environment variables:
Name | Default |
---|---|
MP_BENCH_TARGETS | pure_p,pure_u , see a list of available targets |
MP_BENCH_ITERATIONS | 100_000 |
MP_BENCH_DURATION | not set |
MP_BENCH_ROUNDS | 3 |
MP_BENCH_TESTS | -@slow , see a list of available tests |
For example:
export MP_BENCH_TARGETS=pure_p
export MP_BENCH_ITERATIONS=1000000
export MP_BENCH_ROUNDS=5
# a comma separated list of test names
export MP_BENCH_TESTS='complex array, complex map'
# or a group name
# export MP_BENCH_TESTS='-@slow' // @pecl_comp
# or a regexp
# export MP_BENCH_TESTS='/complex (array|map)/'
Another example, benchmarking both the library and the PECL extension:
MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php
Example output
Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000
===========================================================================
Test/Target Packer BufferUnpacker msgpack_pack msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0031 ........ 0.0141 ...... 0.0055 ........ 0.0064
false ................ 0.0039 ........ 0.0154 ...... 0.0056 ........ 0.0053
true ................. 0.0038 ........ 0.0139 ...... 0.0056 ........ 0.0044
7-bit uint #1 ........ 0.0061 ........ 0.0110 ...... 0.0059 ........ 0.0046
7-bit uint #2 ........ 0.0065 ........ 0.0119 ...... 0.0042 ........ 0.0029
7-bit uint #3 ........ 0.0054 ........ 0.0117 ...... 0.0045 ........ 0.0025
5-bit sint #1 ........ 0.0047 ........ 0.0103 ...... 0.0038 ........ 0.0022
5-bit sint #2 ........ 0.0048 ........ 0.0117 ...... 0.0038 ........ 0.0022
5-bit sint #3 ........ 0.0046 ........ 0.0102 ...... 0.0038 ........ 0.0023
8-bit uint #1 ........ 0.0063 ........ 0.0174 ...... 0.0039 ........ 0.0031
8-bit uint #2 ........ 0.0063 ........ 0.0167 ...... 0.0040 ........ 0.0029
8-bit uint #3 ........ 0.0063 ........ 0.0168 ...... 0.0039 ........ 0.0030
16-bit uint #1 ....... 0.0092 ........ 0.0222 ...... 0.0049 ........ 0.0030
16-bit uint #2 ....... 0.0096 ........ 0.0227 ...... 0.0042 ........ 0.0046
16-bit uint #3 ....... 0.0123 ........ 0.0274 ...... 0.0059 ........ 0.0051
32-bit uint #1 ....... 0.0136 ........ 0.0331 ...... 0.0060 ........ 0.0048
32-bit uint #2 ....... 0.0130 ........ 0.0336 ...... 0.0070 ........ 0.0048
32-bit uint #3 ....... 0.0127 ........ 0.0329 ...... 0.0051 ........ 0.0048
64-bit uint #1 ....... 0.0126 ........ 0.0268 ...... 0.0055 ........ 0.0049
64-bit uint #2 ....... 0.0135 ........ 0.0281 ...... 0.0052 ........ 0.0046
64-bit uint #3 ....... 0.0131 ........ 0.0274 ...... 0.0069 ........ 0.0044
8-bit int #1 ......... 0.0077 ........ 0.0236 ...... 0.0058 ........ 0.0044
8-bit int #2 ......... 0.0087 ........ 0.0244 ...... 0.0058 ........ 0.0048
8-bit int #3 ......... 0.0084 ........ 0.0241 ...... 0.0055 ........ 0.0049
16-bit int #1 ........ 0.0112 ........ 0.0271 ...... 0.0048 ........ 0.0045
16-bit int #2 ........ 0.0124 ........ 0.0292 ...... 0.0057 ........ 0.0049
16-bit int #3 ........ 0.0118 ........ 0.0270 ...... 0.0058 ........ 0.0050
32-bit int #1 ........ 0.0137 ........ 0.0366 ...... 0.0058 ........ 0.0051
32-bit int #2 ........ 0.0133 ........ 0.0366 ...... 0.0056 ........ 0.0049
32-bit int #3 ........ 0.0129 ........ 0.0350 ...... 0.0052 ........ 0.0048
64-bit int #1 ........ 0.0145 ........ 0.0254 ...... 0.0034 ........ 0.0025
64-bit int #2 ........ 0.0097 ........ 0.0214 ...... 0.0034 ........ 0.0025
64-bit int #3 ........ 0.0096 ........ 0.0287 ...... 0.0059 ........ 0.0050
64-bit int #4 ........ 0.0143 ........ 0.0277 ...... 0.0059 ........ 0.0046
64-bit float #1 ...... 0.0134 ........ 0.0281 ...... 0.0057 ........ 0.0052
64-bit float #2 ...... 0.0141 ........ 0.0281 ...... 0.0057 ........ 0.0050
64-bit float #3 ...... 0.0144 ........ 0.0282 ...... 0.0057 ........ 0.0050
fix string #1 ........ 0.0036 ........ 0.0143 ...... 0.0066 ........ 0.0053
fix string #2 ........ 0.0107 ........ 0.0222 ...... 0.0065 ........ 0.0068
fix string #3 ........ 0.0116 ........ 0.0245 ...... 0.0063 ........ 0.0069
fix string #4 ........ 0.0105 ........ 0.0253 ...... 0.0083 ........ 0.0077
8-bit string #1 ...... 0.0126 ........ 0.0318 ...... 0.0075 ........ 0.0088
8-bit string #2 ...... 0.0121 ........ 0.0295 ...... 0.0076 ........ 0.0086
8-bit string #3 ...... 0.0125 ........ 0.0293 ...... 0.0130 ........ 0.0093
16-bit string #1 ..... 0.0159 ........ 0.0368 ...... 0.0117 ........ 0.0086
16-bit string #2 ..... 0.1547 ........ 0.1686 ...... 0.1516 ........ 0.1373
32-bit string ........ 0.1558 ........ 0.1729 ...... 0.1511 ........ 0.1396
wide char string #1 .. 0.0098 ........ 0.0237 ...... 0.0066 ........ 0.0065
wide char string #2 .. 0.0128 ........ 0.0291 ...... 0.0061 ........ 0.0082
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0040 ........ 0.0129 ...... 0.0120 ........ 0.0058
fix array #2 ......... 0.0279 ........ 0.0390 ...... 0.0143 ........ 0.0165
fix array #3 ......... 0.0415 ........ 0.0463 ...... 0.0162 ........ 0.0187
16-bit array #1 ...... 0.1349 ........ 0.1628 ...... 0.0334 ........ 0.0341
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0345 ........ 0.0391 ...... 0.0143 ........ 0.0168
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0459 ........ 0.0473 ...... 0.0151 ........ 0.0163
16-bit map #1 ........ 0.2518 ........ 0.2962 ...... 0.0400 ........ 0.0490
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.2380 ........ 0.2682 ...... 0.0545 ........ 0.0579
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total 1.5625 2.3866 0.7735 0.7243
Skipped 4 4 4 4
Failed 0 0 24 17
Ignored 24 24 0 7
With JIT:
MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \
php -n -dextension=msgpack.so -dzend_extension=opcache.so \
-dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \
tests/bench.php
Example output
Filter: MessagePack\Tests\Perf\Filter\ListFilter
Rounds: 3
Iterations: 100000
===========================================================================
Test/Target Packer BufferUnpacker msgpack_pack msgpack_unpack
---------------------------------------------------------------------------
nil .................. 0.0001 ........ 0.0052 ...... 0.0053 ........ 0.0042
false ................ 0.0007 ........ 0.0060 ...... 0.0057 ........ 0.0043
true ................. 0.0008 ........ 0.0060 ...... 0.0056 ........ 0.0041
7-bit uint #1 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0041
7-bit uint #2 ........ 0.0021 ........ 0.0043 ...... 0.0062 ........ 0.0041
7-bit uint #3 ........ 0.0022 ........ 0.0044 ...... 0.0061 ........ 0.0040
5-bit sint #1 ........ 0.0030 ........ 0.0048 ...... 0.0062 ........ 0.0040
5-bit sint #2 ........ 0.0032 ........ 0.0046 ...... 0.0062 ........ 0.0040
5-bit sint #3 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0040
8-bit uint #1 ........ 0.0054 ........ 0.0079 ...... 0.0062 ........ 0.0050
8-bit uint #2 ........ 0.0051 ........ 0.0079 ...... 0.0064 ........ 0.0044
8-bit uint #3 ........ 0.0051 ........ 0.0082 ...... 0.0062 ........ 0.0044
16-bit uint #1 ....... 0.0077 ........ 0.0094 ...... 0.0065 ........ 0.0045
16-bit uint #2 ....... 0.0077 ........ 0.0094 ...... 0.0063 ........ 0.0045
16-bit uint #3 ....... 0.0077 ........ 0.0095 ...... 0.0064 ........ 0.0047
32-bit uint #1 ....... 0.0088 ........ 0.0119 ...... 0.0063 ........ 0.0043
32-bit uint #2 ....... 0.0089 ........ 0.0117 ...... 0.0062 ........ 0.0039
32-bit uint #3 ....... 0.0089 ........ 0.0118 ...... 0.0063 ........ 0.0044
64-bit uint #1 ....... 0.0097 ........ 0.0155 ...... 0.0063 ........ 0.0045
64-bit uint #2 ....... 0.0095 ........ 0.0153 ...... 0.0061 ........ 0.0045
64-bit uint #3 ....... 0.0096 ........ 0.0156 ...... 0.0063 ........ 0.0047
8-bit int #1 ......... 0.0053 ........ 0.0083 ...... 0.0062 ........ 0.0044
8-bit int #2 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0044
8-bit int #3 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0043
16-bit int #1 ........ 0.0089 ........ 0.0097 ...... 0.0069 ........ 0.0046
16-bit int #2 ........ 0.0075 ........ 0.0093 ...... 0.0063 ........ 0.0043
16-bit int #3 ........ 0.0075 ........ 0.0094 ...... 0.0062 ........ 0.0046
32-bit int #1 ........ 0.0086 ........ 0.0122 ...... 0.0063 ........ 0.0044
32-bit int #2 ........ 0.0087 ........ 0.0120 ...... 0.0066 ........ 0.0046
32-bit int #3 ........ 0.0086 ........ 0.0121 ...... 0.0060 ........ 0.0044
64-bit int #1 ........ 0.0096 ........ 0.0149 ...... 0.0060 ........ 0.0045
64-bit int #2 ........ 0.0096 ........ 0.0157 ...... 0.0062 ........ 0.0044
64-bit int #3 ........ 0.0096 ........ 0.0160 ...... 0.0063 ........ 0.0046
64-bit int #4 ........ 0.0097 ........ 0.0157 ...... 0.0061 ........ 0.0044
64-bit float #1 ...... 0.0079 ........ 0.0153 ...... 0.0056 ........ 0.0044
64-bit float #2 ...... 0.0079 ........ 0.0152 ...... 0.0057 ........ 0.0045
64-bit float #3 ...... 0.0079 ........ 0.0155 ...... 0.0057 ........ 0.0044
fix string #1 ........ 0.0010 ........ 0.0045 ...... 0.0071 ........ 0.0044
fix string #2 ........ 0.0048 ........ 0.0075 ...... 0.0070 ........ 0.0060
fix string #3 ........ 0.0048 ........ 0.0086 ...... 0.0068 ........ 0.0060
fix string #4 ........ 0.0050 ........ 0.0088 ...... 0.0070 ........ 0.0059
8-bit string #1 ...... 0.0081 ........ 0.0129 ...... 0.0069 ........ 0.0062
8-bit string #2 ...... 0.0086 ........ 0.0128 ...... 0.0069 ........ 0.0065
8-bit string #3 ...... 0.0086 ........ 0.0126 ...... 0.0115 ........ 0.0065
16-bit string #1 ..... 0.0105 ........ 0.0137 ...... 0.0128 ........ 0.0068
16-bit string #2 ..... 0.1510 ........ 0.1486 ...... 0.1526 ........ 0.1391
32-bit string ........ 0.1517 ........ 0.1475 ...... 0.1504 ........ 0.1370
wide char string #1 .. 0.0044 ........ 0.0085 ...... 0.0067 ........ 0.0057
wide char string #2 .. 0.0081 ........ 0.0125 ...... 0.0069 ........ 0.0063
8-bit binary #1 ........... I ............. I ........... F ............. I
8-bit binary #2 ........... I ............. I ........... F ............. I
8-bit binary #3 ........... I ............. I ........... F ............. I
16-bit binary ............. I ............. I ........... F ............. I
32-bit binary ............. I ............. I ........... F ............. I
fix array #1 ......... 0.0014 ........ 0.0059 ...... 0.0132 ........ 0.0055
fix array #2 ......... 0.0146 ........ 0.0156 ...... 0.0155 ........ 0.0148
fix array #3 ......... 0.0211 ........ 0.0229 ...... 0.0179 ........ 0.0180
16-bit array #1 ...... 0.0673 ........ 0.0498 ...... 0.0343 ........ 0.0388
16-bit array #2 ........... S ............. S ........... S ............. S
32-bit array .............. S ............. S ........... S ............. S
complex array ............. I ............. I ........... F ............. F
fix map #1 ................ I ............. I ........... F ............. I
fix map #2 ........... 0.0148 ........ 0.0180 ...... 0.0156 ........ 0.0179
fix map #3 ................ I ............. I ........... F ............. I
fix map #4 ........... 0.0252 ........ 0.0201 ...... 0.0214 ........ 0.0167
16-bit map #1 ........ 0.1027 ........ 0.0836 ...... 0.0388 ........ 0.0510
16-bit map #2 ............. S ............. S ........... S ............. S
32-bit map ................ S ............. S ........... S ............. S
complex map .......... 0.1104 ........ 0.1010 ...... 0.0556 ........ 0.0602
fixext 1 .................. I ............. I ........... F ............. F
fixext 2 .................. I ............. I ........... F ............. F
fixext 4 .................. I ............. I ........... F ............. F
fixext 8 .................. I ............. I ........... F ............. F
fixext 16 ................. I ............. I ........... F ............. F
8-bit ext ................. I ............. I ........... F ............. F
16-bit ext ................ I ............. I ........... F ............. F
32-bit ext ................ I ............. I ........... F ............. F
32-bit timestamp #1 ....... I ............. I ........... F ............. F
32-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #1 ....... I ............. I ........... F ............. F
64-bit timestamp #2 ....... I ............. I ........... F ............. F
64-bit timestamp #3 ....... I ............. I ........... F ............. F
96-bit timestamp #1 ....... I ............. I ........... F ............. F
96-bit timestamp #2 ....... I ............. I ........... F ............. F
96-bit timestamp #3 ....... I ............. I ........... F ............. F
===========================================================================
Total 0.9642 1.0909 0.8224 0.7213
Skipped 4 4 4 4
Failed 0 0 24 17
Ignored 24 24 0 7
Note that the msgpack extension (v2.1.2) doesn't support ext, bin and UTF-8 str types.
The library is released under the MIT License. See the bundled LICENSE file for details.
Author: rybakit
Source Code: https://github.com/rybakit/msgpack.php
License: MIT License
1648641360
A symbolic natural language parsing library for Rust, inspired by HDPSG.
This is a library for parsing natural or constructed languages into syntax trees and feature structures. There's no machine learning or probabilistic models, everything is hand-crafted and deterministic.
You can find out more about the motivations of this project in this blog post.
I'm using this to parse a constructed language for my upcoming xenolinguistics game, Themengi.
Using a simple 80-line grammar, introduced in the tutorial below, we can parse a simple subset of English, checking reflexive pronoun binding, case, and number agreement.
$ cargo run --bin cli examples/reflexives.fgr
> she likes himself
Parsed 0 trees
> her likes herself
Parsed 0 trees
> she like herself
Parsed 0 trees
> she likes herself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: she))
(1..2: TV (1..2: likes))
(2..3: N (2..3: herself)))
[
child-2: [
case: acc
pron: ref
needs_pron: #0 she
num: sg
child-0: [ word: herself ]
]
child-1: [
tense: nonpast
child-0: [ word: likes ]
num: #1 sg
]
child-0: [
child-0: [ word: she ]
case: nom
pron: #0
num: #1
]
]
Low resource language? Low problem! No need to train on gigabytes of text, just write a grammar using your brain. Let's hypothesize that in American Sign Language, topicalized nouns (expressed with raised eyebrows) must appear first in the sentence. We can write a small grammar (18 lines), and plug in some sentences:
$ cargo run --bin cli examples/asl-wordorder.fgr -n
> boy sit
Parsed 1 tree
(0..2: S
(0..1: NP ((0..1: N (0..1: boy))))
(1..2: IV (1..2: sit)))
> boy throw ball
Parsed 1 tree
(0..3: S
(0..1: NP ((0..1: N (0..1: boy))))
(1..2: TV (1..2: throw))
(2..3: NP ((2..3: N (2..3: ball)))))
> ball nm-raised-eyebrows boy throw
Parsed 1 tree
(0..4: S
(0..2: NP
(0..1: N (0..1: ball))
(1..2: Topic (1..2: nm-raised-eyebrows)))
(2..3: NP ((2..3: N (2..3: boy))))
(3..4: TV (3..4: throw)))
> boy throw ball nm-raised-eyebrows
Parsed 0 trees
As an example, let's say we want to build a parser for English reflexive pronouns (himself, herself, themselves, themself, itself). We'll also support number ("He likes X" v.s. "They like X") and simple embedded clauses ("He said that they like X").
Grammar files are written in a custom language, similar to BNF, called Feature GRammar (.fgr). There's a VSCode syntax highlighting extension for these files available as fgr-syntax
.
We'll start by defining our lexicon. The lexicon is the set of terminal symbols (symbols in the actual input) that the grammar will match. Terminal symbols must start with a lowercase letter, and non-terminal symbols must start with an uppercase letter.
// pronouns
N -> he
N -> him
N -> himself
N -> she
N -> her
N -> herself
N -> they
N -> them
N -> themselves
N -> themself
// names, lowercase as they are terminals
N -> mary
N -> sue
N -> takeshi
N -> robert
// complementizer
Comp -> that
// verbs -- intransitive, transitive, and clausal
IV -> falls
IV -> fall
IV -> fell
TV -> likes
TV -> like
TV -> liked
CV -> says
CV -> say
CV -> said
Next, we can add our sentence rules (they must be added at the top, as the first rule in the file is assumed to be the top-level rule):
// sentence rules
S -> N IV
S -> N TV N
S -> N CV Comp S
// ... previous lexicon ...
Assuming this file is saved as examples/no-features.fgr
(which it is :wink:), we can test this file with the built-in CLI:
$ cargo run --bin cli examples/no-features.fgr
> he falls
Parsed 1 tree
(0..2: S
(0..1: N (0..1: he))
(1..2: IV (1..2: falls)))
[
child-1: [ child-0: [ word: falls ] ]
child-0: [ child-0: [ word: he ] ]
]
> he falls her
Parsed 0 trees
> he likes her
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: likes))
(2..3: N (2..3: her)))
[
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: likes ] ]
child-0: [ child-0: [ word: he ] ]
]
> he likes
Parsed 0 trees
> he said that he likes her
Parsed 1 tree
(0..6: S
(0..1: N (0..1: he))
(1..2: CV (1..2: said))
(2..3: Comp (2..3: that))
(3..6: S
(3..4: N (3..4: he))
(4..5: TV (4..5: likes))
(5..6: N (5..6: her))))
[
child-0: [ child-0: [ word: he ] ]
child-2: [ child-0: [ word: that ] ]
child-1: [ child-0: [ word: said ] ]
child-3: [
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: likes ] ]
child-0: [ child-0: [ word: he ] ]
]
]
> he said that he
Parsed 0 trees
This grammar already parses some correct sentences, and blocks some trivially incorrect ones. However, it doesn't care about number, case, or reflexives right now:
> she likes himself // unbound reflexive pronoun
Parsed 1 tree
(0..3: S
(0..1: N (0..1: she))
(1..2: TV (1..2: likes))
(2..3: N (2..3: himself)))
[
child-0: [ child-0: [ word: she ] ]
child-2: [ child-0: [ word: himself ] ]
child-1: [ child-0: [ word: likes ] ]
]
> him like her // incorrect case on the subject pronoun, should be nominative
// (he) instead of accusative (him)
Parsed 1 tree
(0..3: S
(0..1: N (0..1: him))
(1..2: TV (1..2: like))
(2..3: N (2..3: her)))
[
child-0: [ child-0: [ word: him ] ]
child-1: [ child-0: [ word: like ] ]
child-2: [ child-0: [ word: her ] ]
]
> he like her // incorrect verb number agreement
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: like))
(2..3: N (2..3: her)))
[
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: like ] ]
child-0: [ child-0: [ word: he ] ]
]
To fix this, we need to add features to our lexicon, and restrict the sentence rules based on features.
Features are added with square brackets, and are key: value pairs separated by commas. **top**
is a special feature value, which basically means "unspecified" -- we'll come back to it later. Features that are unspecified are also assumed to have a **top**
value, but sometimes explicitly stating top is more clear.
/// Pronouns
// The added features are:
// * num: sg or pl, whether this noun wants a singular verb (likes) or
// a plural verb (like). note this is grammatical number, so for example
// singular they takes plural agreement ("they like X", not *"they likes X")
// * case: nom or acc, whether this noun is nominative or accusative case.
// nominative case goes in the subject, and accusative in the object.
// e.g., "he fell" and "she likes him", not *"him fell" and *"her likes he"
// * pron: he, she, they, or ref -- what type of pronoun this is
// * needs_pron: whether this is a reflexive that needs to bind to another
// pronoun.
N[ num: sg, case: nom, pron: he ] -> he
N[ num: sg, case: acc, pron: he ] -> him
N[ num: sg, case: acc, pron: ref, needs_pron: he ] -> himself
N[ num: sg, case: nom, pron: she ] -> she
N[ num: sg, case: acc, pron: she ] -> her
N[ num: sg, case: acc, pron: ref, needs_pron: she] -> herself
N[ num: pl, case: nom, pron: they ] -> they
N[ num: pl, case: acc, pron: they ] -> them
N[ num: pl, case: acc, pron: ref, needs_pron: they ] -> themselves
N[ num: sg, case: acc, pron: ref, needs_pron: they ] -> themself
// Names
// The added features are:
// * num: sg, as people are singular ("mary likes her" / *"mary like her")
// * case: **top**, as names can be both subjects and objects
// ("mary likes her" / "she likes mary")
// * pron: whichever pronoun the person uses for reflexive agreement
// mary pron: she => mary likes herself
// sue pron: they => sue likes themself
// takeshi pron: he => takeshi likes himself
N[ num: sg, case: **top**, pron: she ] -> mary
N[ num: sg, case: **top**, pron: they ] -> sue
N[ num: sg, case: **top**, pron: he ] -> takeshi
N[ num: sg, case: **top**, pron: he ] -> robert
// Complementizer doesn't need features
Comp -> that
// Verbs -- intransitive, transitive, and clausal
// The added features are:
// * num: sg, pl, or **top** -- to match the noun numbers.
// **top** will match either sg or pl, as past-tense verbs in English
// don't agree in number: "he fell" and "they fell" are both fine
// * tense: past or nonpast -- this won't be used for agreement, but will be
// copied into the final feature structure, and the client code could do
// something with it
IV[ num: sg, tense: nonpast ] -> falls
IV[ num: pl, tense: nonpast ] -> fall
IV[ num: **top**, tense: past ] -> fell
TV[ num: sg, tense: nonpast ] -> likes
TV[ num: pl, tense: nonpast ] -> like
TV[ num: **top**, tense: past ] -> liked
CV[ num: sg, tense: nonpast ] -> says
CV[ num: pl, tense: nonpast ] -> say
CV[ num: **top**, tense: past ] -> said
Now that our lexicon is updated with features, we can update our sentence rules to constrain parsing based on those features. This uses two new features, tags and unification. Tags allow features to be associated between nodes in a rule, and unification controls how those features are compatible. The rules for unification are:
If unification fails anywhere, the parse is aborted and the tree is discarded. This allows the programmer to discard trees if features don't match.
// Sentence rules
// Intransitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
S -> N[ case: nom, num: #1 ] IV[ num: #1 ]
// Transitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #2)
// * If there's a reflexive in the object position, make sure its `needs_pron`
// feature matches the subject's `pron` feature. If the object isn't a
// reflexive, then its `needs_pron` feature will implicitly be `**top**`, so
// will unify with anything.
S -> N[ case: nom, pron: #1, num: #2 ] TV[ num: #2 ] N[ case: acc, needs_pron: #1 ]
// Clausal verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
// * Reflexives can't cross clause boundaries (*"He said that she likes himself"),
// so we can ignore reflexives and delegate to inner clause rule
S -> N[ case: nom, num: #1 ] CV[ num: #1 ] Comp S
Now that we have this augmented grammar (available as examples/reflexives.fgr
), we can try it out and see that it rejects illicit sentences that were previously accepted, while still accepting valid ones:
> he fell
Parsed 1 tree
(0..2: S
(0..1: N (0..1: he))
(1..2: IV (1..2: fell)))
[
child-1: [
child-0: [ word: fell ]
num: #0 sg
tense: past
]
child-0: [
pron: he
case: nom
num: #0
child-0: [ word: he ]
]
]
> he like him
Parsed 0 trees
> he likes himself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: likes))
(2..3: N (2..3: himself)))
[
child-1: [
num: #0 sg
child-0: [ word: likes ]
tense: nonpast
]
child-2: [
needs_pron: #1 he
num: sg
child-0: [ word: himself ]
pron: ref
case: acc
]
child-0: [
child-0: [ word: he ]
pron: #1
num: #0
case: nom
]
]
> he likes herself
Parsed 0 trees
> mary likes herself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: mary))
(1..2: TV (1..2: likes))
(2..3: N (2..3: herself)))
[
child-0: [
pron: #0 she
num: #1 sg
case: nom
child-0: [ word: mary ]
]
child-1: [
tense: nonpast
child-0: [ word: likes ]
num: #1
]
child-2: [
child-0: [ word: herself ]
num: sg
pron: ref
case: acc
needs_pron: #0
]
]
> mary likes themself
Parsed 0 trees
> sue likes themself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: sue))
(1..2: TV (1..2: likes))
(2..3: N (2..3: themself)))
[
child-0: [
pron: #0 they
child-0: [ word: sue ]
case: nom
num: #1 sg
]
child-1: [
tense: nonpast
num: #1
child-0: [ word: likes ]
]
child-2: [
needs_pron: #0
case: acc
pron: ref
child-0: [ word: themself ]
num: sg
]
]
> sue likes himself
Parsed 0 trees
If this is interesting to you and you want to learn more, you can check out my blog series, the excellent textbook Syntactic Theory: A Formal Introduction (2nd ed.), and the DELPH-IN project, whose work on the LKB inspired this simplified version.
I need to write this section in more detail, but if you're comfortable with Rust, I suggest looking through the codebase. It's not perfect, it started as one of my first Rust projects (after migrating through F# -> TypeScript -> C in search of the right performance/ergonomics tradeoff), and it could use more tests, but overall it's not too bad.
Basically, the processing pipeline is:
Grammar
structGrammar
is defined in rules.rs
.Grammar
is Grammar::parse_from_file
, which is mostly a hand-written recusive descent parser in parse_grammar.rs
. Yes, I recognize the irony here.Grammar::parse
, which does everything for you, or Grammar::parse_chart
, which just does the chart)earley.rs
forest.rs
, using an algorithm I found in a very useful blog series I forget the URL for, because the algorithms in the academic literature for this are... weird.The most interesting thing you can do via code and not via the CLI is probably getting at the raw feature DAG, as that would let you do things like pronoun coreference. The DAG code is in featurestructure.rs
, and should be fairly approachable -- there's a lot of Rust ceremony around Rc<RefCell<...>>
because using an arena allocation crate seemed too harlike overkill, but that is somewhat mitigated by the NodeRef
type alias. Hit me up at https://vgel.me/contact if you need help with anything here!
Download Details:
Author: vgel
Source Code: https://github.com/vgel/treebender
License: MIT License
1635917640
このモジュールでは、Rustでハッシュマップ複合データ型を操作する方法について説明します。ハッシュマップのようなコレクション内のデータを反復処理するループ式を実装する方法を学びます。演習として、要求された注文をループし、条件をテストし、さまざまなタイプのデータを処理することによって車を作成するRustプログラムを作成します。
錆遊び場は錆コンパイラにブラウザインタフェースです。言語をローカルにインストールする前、またはコンパイラが利用できない場合は、Playgroundを使用してRustコードの記述を試すことができます。このコース全体を通して、サンプルコードと演習へのPlaygroundリンクを提供します。現時点でRustツールチェーンを使用できない場合でも、コードを操作できます。
Rust Playgroundで実行されるすべてのコードは、ローカルの開発環境でコンパイルして実行することもできます。コンピューターからRustコンパイラーと対話することを躊躇しないでください。Rust Playgroundの詳細については、What isRust?をご覧ください。モジュール。
このモジュールでは、次のことを行います。
Rustのもう1つの一般的なコレクションの種類は、ハッシュマップです。このHashMap<K, V>
型は、各キーK
をその値にマッピングすることによってデータを格納しますV
。ベクトル内のデータは整数インデックスを使用してアクセスされますが、ハッシュマップ内のデータはキーを使用してアクセスされます。
ハッシュマップタイプは、オブジェクト、ハッシュテーブル、辞書などのデータ項目の多くのプログラミング言語で使用されます。
ベクトルのように、ハッシュマップは拡張可能です。データはヒープに格納され、ハッシュマップアイテムへのアクセスは実行時にチェックされます。
次の例では、書評を追跡するためのハッシュマップを定義しています。ハッシュマップキーは本の名前であり、値は読者のレビューです。
use std::collections::HashMap;
let mut reviews: HashMap<String, String> = HashMap::new();
reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));
reviews.insert(String::from("Cooking with Rhubarb"), String::from("Sweet recipes."));
reviews.insert(String::from("Programming in Rust"), String::from("Great examples."));
このコードをさらに詳しく調べてみましょう。最初の行に、新しいタイプの構文が表示されます。
use std::collections::HashMap;
このuse
コマンドは、Rust標準ライブラリの一部HashMap
からの定義をcollections
プログラムのスコープに取り込みます。この構文は、他のプログラミング言語がインポートと呼ぶものと似ています。
HashMap::new
メソッドを使用して空のハッシュマップを作成します。reviews
必要に応じてキーと値を追加または削除できるように、変数を可変として宣言します。この例では、ハッシュマップのキーと値の両方がString
タイプを使用しています。
let mut reviews: HashMap<String, String> = HashMap::new();
このinsert(<key>, <value>)
メソッドを使用して、ハッシュマップに要素を追加します。コードでは、構文は<hash_map_name>.insert()
次のとおりです。
reviews.insert(String::from("Ancient Roman History"), String::from("Very accurate."));
ハッシュマップにデータを追加した後、get(<key>)
メソッドを使用してキーの特定の値を取得できます。
// Look for a specific review
let book: &str = "Programming in Rust";
println!("\nReview for \'{}\': {:?}", book, reviews.get(book));
出力は次のとおりです。
Review for 'Programming in Rust': Some("Great examples.")
ノート
出力には、書評が単なる「すばらしい例」ではなく「Some( "すばらしい例。")」として表示されていることに注意してください。get
メソッドはOption<&Value>
型を返すため、Rustはメソッド呼び出しの結果を「Some()」表記でラップします。
この.remove()
メソッドを使用して、ハッシュマップからエントリを削除できます。get
無効なハッシュマップキーに対してメソッドを使用すると、get
メソッドは「なし」を返します。
// Remove book review
let obsolete: &str = "Ancient Roman History";
println!("\n'{}\' removed.", obsolete);
reviews.remove(obsolete);
// Confirm book review removed
println!("\nReview for \'{}\': {:?}", obsolete, reviews.get(obsolete));
出力は次のとおりです。
'Ancient Roman History' removed.
Review for 'Ancient Roman History': None
このコードを試して、このRustPlaygroundでハッシュマップを操作できます。
演習:ハッシュマップを使用して注文を追跡する
この演習では、ハッシュマップを使用するように自動車工場のプログラムを変更します。
ハッシュマップキーと値のペアを使用して、車の注文に関する詳細を追跡し、出力を表示します。繰り返しになりますが、あなたの課題は、サンプルコードを完成させてコンパイルして実行することです。
この演習のサンプルコードで作業するには、次の2つのオプションがあります。
ノート
サンプルコードで、
todo!
マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。
最初のステップは、既存のプログラムコードを取得することです。
car_quality
、car_factory
およびmain
機能を。次のコードをコピーしてローカル開発環境で編集する
か、この準備されたRustPlaygroundでコードを開きます。
#[derive(PartialEq, Debug)]
struct Car { color: String, motor: Transmission, roof: bool, age: (Age, u32) }
#[derive(PartialEq, Debug)]
enum Transmission { Manual, SemiAuto, Automatic }
#[derive(PartialEq, Debug)]
enum Age { New, Used }
// Get the car quality by testing the value of the input argument
// - miles (u32)
// Return tuple with car age ("New" or "Used") and mileage
fn car_quality (miles: u32) -> (Age, u32) {
// Check if car has accumulated miles
// Return tuple early for Used car
if miles > 0 {
return (Age::Used, miles);
}
// Return tuple for New car, no need for "return" keyword or semicolon
(Age::New, miles)
}
// Build "Car" using input arguments
fn car_factory(order: i32, miles: u32) -> Car {
let colors = ["Blue", "Green", "Red", "Silver"];
// Prevent panic: Check color index for colors array, reset as needed
// Valid color = 1, 2, 3, or 4
// If color > 4, reduce color to valid index
let mut color = order as usize;
if color > 4 {
// color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
color = color - 4;
}
// Add variety to orders for motor type and roof type
let mut motor = Transmission::Manual;
let mut roof = true;
if order % 3 == 0 { // 3, 6, 9
motor = Transmission::Automatic;
} else if order % 2 == 0 { // 2, 4, 8, 10
motor = Transmission::SemiAuto;
roof = false;
} // 1, 5, 7, 11
// Return requested "Car"
Car {
color: String::from(colors[(color-1) as usize]),
motor: motor,
roof: roof,
age: car_quality(miles)
}
}
fn main() {
// Initialize counter variable
let mut order = 1;
// Declare a car as mutable "Car" struct
let mut car: Car;
// Order 6 cars, increment "order" for each request
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #2: Used, Convertible
order = order + 1;
car = car_factory(order, 2000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #3: New, Hard top
order = order + 1;
car = car_factory(order, 0);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #4: New, Convertible
order = order + 1;
car = car_factory(order, 0);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #5: Used, Hard top
order = order + 1;
car = car_factory(order, 3000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {:?}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
}
2. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされて実行されることを確認してください。
次の出力が表示されます。
1: Used, Hard top = true, Manual, Blue, 1000 miles
2: Used, Hard top = false, SemiAuto, Green, 2000 miles
3: New, Hard top = true, Automatic, Red, 0 miles
4: New, Hard top = false, SemiAuto, Silver, 0 miles
5: Used, Hard top = true, Manual, Blue, 3000 miles
6: Used, Hard top = true, Automatic, Green, 4000 miles
現在のプログラムは、各車の注文を処理し、各注文が完了した後に要約を印刷します。car_factory
関数を呼び出すたびにCar
、注文の詳細を含む構造体が返され、注文が実行されます。結果はcar
変数に格納されます。
お気づきかもしれませんが、このプログラムにはいくつかの重要な機能がありません。すべての注文を追跡しているわけではありません。car
変数は、現在の注文の詳細のみを保持しています。関数car
の結果で変数が更新されるたびcar_factory
に、前の順序の詳細が上書きされます。
ファイリングシステムのようにすべての注文を追跡するために、プログラムを更新する必要があります。この目的のために、<K、V>ペアでハッシュマップを定義します。ハッシュマップキーは、車の注文番号に対応します。ハッシュマップ値は、Car
構造体で定義されているそれぞれの注文の詳細になります。
main
関数の先頭、最初の中括弧の直後に次のコードを追加します{
。// Initialize a hash map for the car orders
// - Key: Car order number, i32
// - Value: Car order details, Car struct
use std::collections::HashMap;
let mut orders: HashMap<i32, Car> = HashMap;
2. orders
ハッシュマップを作成するステートメントの構文の問題を修正します。
ヒント
ハッシュマップを最初から作成しているので、おそらくこの
new()
メソッドを使用することをお勧めします。
3. プログラムをビルドします。次のセクションに進む前に、コードがコンパイルされていることを確認してください。コンパイラからの警告メッセージは無視してかまいません。
次のステップは、履行された各自動車注文をハッシュマップに追加することです。
このmain
関数では、car_factory
車の注文ごとに関数を呼び出します。注文が履行された後、println!
マクロを呼び出して、car
変数に格納されている注文の詳細を表示します。
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
新しいハッシュマップで機能するように、これらのコードステートメントを修正します。
car_factory
関数の呼び出しは保持します。返された各Car
構造体は、ハッシュマップの<K、V>ペアの一部として格納されます。println!
マクロの呼び出しを更新して、ハッシュマップに保存されている注文の詳細を表示します。main
関数で、関数の呼び出しcar_factory
とそれに伴うprintln!
マクロの呼び出しを見つけます。// Car order #1: Used, Hard top
car = car_factory(order, 1000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
println!("{}: {}, Hard top = {}, {:?}, {}, {} miles", order, car.age.0, car.roof, car.motor, car.color, car.age.1);
2. すべての自動車注文のステートメントの完全なセットを次の改訂されたコードに置き換えます。
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #2: Used, Convertible
order = order + 1;
car = car_factory(order, 2000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #3: New, Hard top
order = order + 1;
car = car_factory(order, 0);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #4: New, Convertible
order = order + 1;
car = car_factory(order, 0);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #5: Used, Hard top
order = order + 1;
car = car_factory(order, 3000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
orders(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
3. 今すぐプログラムをビルドしようとすると、コンパイルエラーが表示されます。<K、V>ペアをorders
ハッシュマップに追加するステートメントに構文上の問題があります。問題がありますか?先に進んで、ハッシュマップに順序を追加する各ステートメントの問題を修正してください。
ヒント
orders
ハッシュマップに直接値を割り当てることはできません。挿入を行うにはメソッドを使用する必要があります。
プログラムが正常にビルドされると、次の出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 1000) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 2000) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("New", 0) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("New", 0) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("Used", 3000) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 4000) })
改訂されたコードの出力が異なることに注意してください。println!
マクロディスプレイの内容Car
各値を示すことによって、構造体と対応するフィールド名。
次の演習では、ループ式を使用してコードの冗長性を減らします。
for、while、およびloop式を使用します
多くの場合、プログラムには、その場で繰り返す必要のあるコードのブロックがあります。ループ式を使用して、繰り返しの実行方法をプログラムに指示できます。電話帳のすべてのエントリを印刷するには、ループ式を使用して、最初のエントリから最後のエントリまで印刷する方法をプログラムに指示できます。
Rustは、プログラムにコードのブロックを繰り返させるための3つのループ式を提供します。
loop
:手動停止が発生しない限り、繰り返します。while
:条件が真のままで繰り返します。for
:コレクション内のすべての値に対して繰り返します。この単元では、これらの各ループ式を見ていきます。
loop
式は、無限ループを作成します。このキーワードを使用すると、式の本文でアクションを継続的に繰り返すことができます。ループを停止させるための直接アクションを実行するまで、アクションが繰り返されます。
次の例では、「We loopforever!」というテキストを出力します。そしてそれはそれ自体で止まりません。println!
アクションは繰り返し続けます。
loop {
println!("We loop forever!");
}
loop
式を使用する場合、ループを停止する唯一の方法は、プログラマーとして直接介入する場合です。特定のコードを追加してループを停止したり、Ctrl + Cなどのキーボード命令を入力してプログラムの実行を停止したりできます。
loop
式を停止する最も一般的な方法は、break
キーワードを使用してブレークポイントを設定することです。
loop {
// Keep printing, printing, printing...
println!("We loop forever!");
// On the other hand, maybe we should stop!
break;
}
プログラムがbreak
キーワードを検出すると、loop
式の本体でアクションの実行を停止し、次のコードステートメントに進みます。
break
キーワードは、特別な機能を明らかにするloop
表現を。break
キーワードを使用すると、式本体でのアクションの繰り返しを停止することも、ブレークポイントで値を返すこともできます。
次の例はbreak
、loop
式でキーワードを使用して値も返す方法を示しています。
let mut counter = 1;
// stop_loop is set when loop stops
let stop_loop = loop {
counter *= 2;
if counter > 100 {
// Stop loop, return counter value
break counter;
}
};
// Loop should break when counter = 128
println!("Break the loop at counter = {}.", stop_loop);
出力は次のとおりです。
Break the loop at counter = 128.
私たちのloop
表現の本体は、これらの連続したアクションを実行します。
stop_loop
変数を宣言します。loop
式の結果にバインドするようにプログラムに指示します。loop
式の本体でアクションを実行します:counter
値を現在の値の2倍にインクリメントします。counter
値を確認してください。counter
値が100以上です。ループから抜け出し、
counter
値を返します。
4. もしcounter
値が100以上ではありません。
ループ本体でアクションを繰り返します。
5. stop_loop
値を式のcounter
結果である値に設定しますloop
。
loop
式本体は、複数のブレークポイントを持つことができます。式に複数のブレークポイントがある場合、すべてのブレークポイントは同じタイプの値を返す必要があります。すべての値は、整数型、文字列型、ブール型などである必要があります。ブレークポイントが明示的に値を返さない場合、プログラムは式の結果を空のタプルとして解釈します()
。
while
ループは、条件式を使用しています。条件式が真である限り、ループが繰り返されます。このキーワードを使用すると、条件式がfalseになるまで、式本体のアクションを実行できます。
while
ループは、ブール条件式を評価することから始まります。条件式がと評価されるtrue
と、本体のアクションが実行されます。アクションが完了すると、制御は条件式に戻ります。条件式がと評価されるfalse
と、while
式は停止します。
次の例では、「しばらくループします...」というテキストを出力します。ループを繰り返すたびに、「カウントが5未満である」という条件がテストされます。条件が真のままである間、式本体のアクションが実行されます。条件が真でなくなった後、while
ループは停止し、プログラムは次のコードステートメントに進みます。
while counter < 5 {
println!("We loop a while...");
counter = counter + 1;
}
for
ループは、項目のコレクションを処理するためにイテレータを使用しています。ループは、コレクション内の各アイテムの式本体のアクションを繰り返します。このタイプのループの繰り返しは、反復と呼ばれます。すべての反復が完了すると、ループは停止します。
Rustでは、配列、ベクトル、ハッシュマップなど、任意のコレクションタイプを反復処理できます。Rustはイテレータを使用して、コレクション内の各アイテムを最初から最後まで移動します。
for
ループはイテレータとして一時変数を使用しています。変数はループ式の開始時に暗黙的に宣言され、現在の値は反復ごとに設定されます。
次のコードでは、コレクションはbig_birds
配列であり、イテレーターの名前はbird
です。
let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds
iter()
メソッドを使用して、コレクション内のアイテムにアクセスします。for
式は結果にイテレータの現在の値をバインドするiter()
方法。式本体では、イテレータ値を操作できます。
let big_birds = ["ostrich", "peacock", "stork"];
for bird in big_birds.iter() {
println!("The {} is a big bird.", bird);
}
出力は次のとおりです。
The ostrich is a big bird.
The peacock is a big bird.
The stork is a big bird.
イテレータを作成するもう1つの簡単な方法は、範囲表記を使用することですa..b
。イテレータはa
値から始まりb
、1ステップずつ続きますが、値を使用しませんb
。
for number in 0..5 {
println!("{}", number * 2);
}
このコードは、0、1、2、3、および4の数値をnumber
繰り返し処理します。ループの繰り返しごとに、値を変数にバインドします。
出力は次のとおりです。
0
2
4
6
8
このコードを実行して、このRustPlaygroundでループを探索できます。
演習:ループを使用してデータを反復処理する
この演習では、自動車工場のプログラムを変更して、ループを使用して自動車の注文を反復処理します。
main
関数を更新して、注文の完全なセットを処理するためのループ式を追加します。ループ構造は、コードの冗長性を減らすのに役立ちます。コードを簡素化することで、注文量を簡単に増やすことができます。
このcar_factory
関数では、範囲外の値での実行時のパニックを回避するために、別のループを追加します。
課題は、サンプルコードを完成させて、コンパイルして実行することです。
この演習のサンプルコードで作業するには、次の2つのオプションがあります。
ノート
サンプルコードで、
todo!
マクロを探します。このマクロは、完了するか更新する必要があるコードを示します。
前回の演習でプログラムコードを閉じた場合は、この準備されたRustPlaygroundでコードを再度開くことができます。
必ずプログラムを再構築し、コンパイラエラーなしで実行されることを確認してください。
より多くの注文をサポートするには、プログラムを更新する必要があります。現在のコード構造では、冗長ステートメントを使用して6つの注文をサポートしています。冗長性は扱いにくく、維持するのが困難です。
ループ式を使用してアクションを繰り返し、各注文を作成することで、構造を単純化できます。簡略化されたコードを使用すると、多数の注文をすばやく作成できます。
main
機能、削除次の文を。このコードブロックは、order
変数を定義および設定し、自動車の注文のcar_factory
関数とprintln!
マクロを呼び出し、各注文をorders
ハッシュマップに挿入します。// Order 6 cars
// - Increment "order" after each request
// - Add each order <K, V> pair to "orders" hash map
// - Call println! to show order details from the hash map
// Initialize order variable
let mut order = 1;
// Car order #1: Used, Hard top
car = car_factory(order, 1000);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
...
// Car order #6: Used, Hard top
order = order + 1;
car = car_factory(order, 4000);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
2. 削除されたステートメントを次のコードブロックに置き換えます。
// Start with zero miles
let mut miles = 0;
todo!("Add a loop expression to fulfill orders for 6 cars, initialize `order` variable to 1") {
// Call car_factory to fulfill order
// Add order <K, V> pair to "orders" hash map
// Call println! to show order details from the hash map
car = car_factory(order, miles);
orders.insert(order, car);
println!("Car order {}: {:?}", order, orders.get(&order));
// Reset miles for order variety
if miles == 2100 {
miles = 0;
} else {
miles = miles + 700;
}
}
3. アクションを繰り返すループ式を追加して、6台の車の注文を作成します。order
1に初期化された変数が必要です。
4. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。
次の例のような出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })
プログラムは現在、ループを使用して6台の車の注文を処理しています。6台以上注文するとどうなりますか?
main
関数のループ式を更新して、11台の車を注文します。 todo!("Update the loop expression to create 11 cars");
2. プログラムを再構築します。実行時に、プログラムはパニックになります!
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 1.26s
Running `target/debug/playground`
thread 'main' panicked at 'index out of bounds: the len is 4 but the index is 4', src/main.rs:34:29
この問題を解決する方法を見てみましょう。
このcar_factory
関数では、if / else式を使用color
して、colors
配列のインデックスの値を確認します。
// Prevent panic: Check color index for colors array, reset as needed
// Valid color = 1, 2, 3, or 4
// If color > 4, reduce color to valid index
let mut color = order as usize;
if color > 4 {
// color = 5 --> index 1, 6 --> 2, 7 --> 3, 8 --> 4
color = color - 4;
}
colors
配列には4つの要素を持ち、かつ有効なcolor
場合は、インデックスの範囲は0〜3の条件式をチェックしているcolor
私たちはをチェックしません(インデックスが4よりも大きい場合color
、その後の関数で4に等しいインデックスへのときに我々のインデックスを車の色を割り当てる配列では、インデックス値から1を減算しますcolor - 1
。color
値4はcolors[3]
、配列と同様に処理されます。)
現在のif / else式は、8台以下の車を注文するときの実行時のパニックを防ぐためにうまく機能します。しかし、11台の車を注文すると、プログラムは9番目の注文でパニックになります。より堅牢になるように式を調整する必要があります。この改善を行うために、別のループ式を使用します。
car_factory
機能、ループ式であれば/他の条件文を交換してください。color
インデックス値が4より大きい場合に実行時のパニックを防ぐために、次の擬似コードステートメントを修正してください。// Prevent panic: Check color index, reset as needed
// If color = 1, 2, 3, or 4 - no change needed
// If color > 4, reduce to color to a valid index
let mut color = order as usize;
todo!("Replace `if/else` condition with a loop to prevent run-time panic for color > 4");
ヒント
この場合、if / else条件からループ式への変更は実際には非常に簡単です。
2. プログラムをビルドします。コードがエラーなしでコンパイルされることを確認してください。
次の出力が表示されます。
Car order 1: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 2: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 3: Some(Car { color: "Red", motor: Automatic, roof: true, age: ("Used", 1400) })
Car order 4: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 5: Some(Car { color: "Blue", motor: Manual, roof: true, age: ("New", 0) })
Car order 6: Some(Car { color: "Green", motor: Automatic, roof: true, age: ("Used", 700) })
Car order 7: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })
Car order 8: Some(Car { color: "Silver", motor: SemiAuto, roof: false, age: ("Used", 2100) })
Car order 9: Some(Car { color: "Blue", motor: Automatic, roof: true, age: ("New", 0) })
Car order 10: Some(Car { color: "Green", motor: SemiAuto, roof: false, age: ("Used", 700) })
Car order 11: Some(Car { color: "Red", motor: Manual, roof: true, age: ("Used", 1400) })
このモジュールでは、Rustで使用できるさまざまなループ式を調べ、ハッシュマップの操作方法を発見しました。データは、キーと値のペアとしてハッシュマップに保存されます。ハッシュマップは拡張可能です。
loop
手動でプロセスを停止するまでの式は、アクションを繰り返します。while
式をループして、条件が真である限りアクションを繰り返すことができます。このfor
式は、データ収集を反復処理するために使用されます。
この演習では、自動車プログラムを拡張して、繰り返されるアクションをループし、すべての注文を処理しました。注文を追跡するためにハッシュマップを実装しました。
このラーニングパスの次のモジュールでは、Rustコードでエラーと障害がどのように処理されるかについて詳しく説明します。
リンク: https://docs.microsoft.com/en-us/learn/modules/rust-loop-expressions/
1603753200
So far in our journey through the Machine Learning universe, we covered several big topics. We investigated some regression algorithms, classification algorithms and algorithms that can be used for both types of problems (SVM**, **Decision Trees and Random Forest). Apart from that, we dipped our toes in unsupervised learning, saw how we can use this type of learning for clustering and learned about several clustering techniques.
We also talked about how to quantify machine learning model performance and how to improve it with regularization. In all these articles, we used Python for “from the scratch” implementations and libraries like TensorFlow, Pytorch and SciKit Learn. The word optimization popped out more than once in these articles, so in this and next article, we focus on optimization techniques which are an important part of the machine learning process.
In general, every machine learning algorithm is composed of three integral parts:
As you were able to see in previous articles, some algorithms were created intuitively and didn’t have optimization criteria in mind. In fact, mathematical explanations of why and how these algorithms work were done later. Some of these algorithms are Decision Trees and kNN. Other algorithms, which were developed later had this thing in mind beforehand. SVMis one example.
During the training, we change the parameters of our machine learning model to try and minimize the loss function. However, the question of how do you change those parameters arises. Also, by how much should we change them during training and when. To answer all these questions we use optimizers. They put all different parts of the machine learning algorithm together. So far we mentioned Gradient Decent as an optimization technique, but we haven’t explored it in more detail. In this article, we focus on that and we cover the grandfather of all optimization techniques and its variation. Note that these techniques are not machine learning algorithms. They are solvers of minimization problems in which the function to minimize has a gradient in most points of its domain.
Data that we use in this article is the famous Boston Housing Dataset . This dataset is composed 14 features and contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It is a small dataset with only 506 samples.
For the purpose of this article, make sure that you have installed the following _Python _libraries:
Once installed make sure that you have imported all the necessary modules that are used in this tutorial.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
Apart from that, it would be good to be at least familiar with the basics of linear algebra, calculus and probability.
Note that we also use simple Linear Regression in all examples. Due to the fact that we explore optimizationtechniques, we picked the easiest machine learning algorithm. You can see more details about Linear regression here. As a quick reminder the formula for linear regression goes like this:
where w and b are parameters of the machine learning algorithm. The entire point of the training process is to set the correct values to the w and b, so we get the desired output from the machine learning model. This means that we are trying to make the value of our error vector as small as possible, i.e. to find a global minimum of the cost function.
One way of solving this problem is to use calculus. We could compute derivatives and then use them to find places where is an extrema of the cost function. However, the cost function is not a function of one or a few variables; it is a function of all parameters of a machine learning algorithm, so these calculations will quickly grow into a monster. That is why we use these optimizers.
#ai #machine learning #python #artificaial inteligance #artificial intelligence #batch gradient descent #data science #datascience #deep learning #from scratch #gradient descent #machine learning #machine learning optimizers #ml optimization #optimizers #scikit learn #software #software craft #software craftsmanship #software development #stochastic gradient descent
1642390128
파이썬 무료 강의 (활용편6 - 이미지 처리)입니다.
OpenCV 를 이용한 다양한 이미지 처리 기법과 재미있는 프로젝트를 진행합니다.
누구나 볼 수 있도록 쉽고 재미있게 제작하였습니다. ^^
[소개]
(0:00:00) 0.Intro
(0:00:31) 1.소개
(0:02:18) 2.활용편 6 이미지 처리 소개
[OpenCV 전반전]
(0:04:36) 3.환경설정
(0:08:41) 4.이미지 출력
(0:21:51) 5.동영상 출력 #1 파일
(0:29:58) 6.동영상 출력 #2 카메라
(0:34:23) 7.도형 그리기 #1 빈 스케치북
(0:39:49) 8.도형 그리기 #2 영역 색칠
(0:42:26) 9.도형 그리기 #3 직선
(0:51:23) 10.도형 그리기 #4 원
(0:55:09) 11.도형 그리기 #5 사각형
(0:58:32) 12.도형 그리기 #6 다각형
(1:09:23) 13.텍스트 #1 기본
(1:17:45) 14.텍스트 #2 한글 우회
(1:24:14) 15.파일 저장 #1 이미지
(1:29:27) 16.파일 저장 #2 동영상
(1:39:29) 17.크기 조정
(1:50:16) 18.이미지 자르기
(1:57:03) 19.이미지 대칭
(2:01:46) 20.이미지 회전
(2:06:07) 21.이미지 변형 - 흑백
(2:11:25) 22.이미지 변형 - 흐림
(2:18:03) 23.이미지 변형 - 원근 #1
(2:27:45) 24.이미지 변형 - 원근 #2
[반자동 문서 스캐너 프로젝트]
(2:32:50) 25.미니 프로젝트 1 - #1 마우스 이벤트 등록
(2:42:06) 26.미니 프로젝트 1 - #2 기본 코드 완성
(2:49:54) 27.미니 프로젝트 1 - #3 지점 선 긋기
(2:55:24) 28.미니 프로젝트 1 - #4 실시간 선 긋기
[OpenCV 후반전]
(3:01:52) 29.이미지 변형 - 이진화 #1 Trackbar
(3:14:37) 30.이미지 변형 - 이진화 #2 임계값
(3:20:26) 31.이미지 변형 - 이진화 #3 Adaptive Threshold
(3:28:34) 32.이미지 변형 - 이진화 #4 오츠 알고리즘
(3:32:22) 33.이미지 변환 - 팽창
(3:41:10) 34.이미지 변환 - 침식
(3:45:56) 35.이미지 변환 - 열림 & 닫힘
(3:54:10) 36.이미지 검출 - 경계선
(4:05:08) 37.이미지 검출 - 윤곽선 #1 기본
(4:15:26) 38.이미지 검출 - 윤곽선 #2 찾기 모드
(4:20:46) 39.이미지 검출 - 윤곽선 #3 면적
[카드 검출 & 분류기 프로젝트]
(4:27:42) 40.미니프로젝트 2
[퀴즈]
(4:31:57) 41.퀴즈
[얼굴인식 프로젝트]
(4:41:25) 42.환경설정 및 기본 코드 정리
(4:54:48) 43.눈과 코 인식하여 도형 그리기
(5:10:42) 44.그림판 이미지 씌우기
(5:20:52) 45.캐릭터 이미지 씌우기
(5:33:10) 46.보충설명
(5:40:53) 47.마치며 (학습 참고 자료)
(5:42:18) 48.Outro
[학습자료]
수업에 필요한 이미지, 동영상 자료 링크입니다.
고양이 이미지 : https://pixabay.com/images/id-2083492/
크기 : 640 x 390
파일명 : img.jpg
고양이 동영상 : https://www.pexels.com/video/7515833/
크기 : SD (360 x 640)
파일명 : video.mp4
신문 이미지 : https://pixabay.com/images/id-350376/
크기 : 1280 x 853
파일명 : newspaper.jpg
카드 이미지 1 : https://pixabay.com/images/id-682332/
크기 : 1280 x 1019
파일명 : poker.jpg
책 이미지 : https://www.pexels.com/photo/1029807/
크기 : Small (640 x 853)
파일명 : book.jpg
눈사람 이미지 : https://pixabay.com/images/id-1300089/
크기 : 1280 x 904
파일명 : snowman.png
카드 이미지 2 : https://pixabay.com/images/id-161404/
크기 : 640 x 408
파일명 : card.png
퀴즈용 동영상 : https://www.pexels.com/video/3121459/
크기 : HD (1280 x 720)
파일명 : city.mp4
프로젝트용 동영상 : https://www.pexels.com/video/3256542/
크기 : Full HD (1920 x 1080)
파일명 : face_video.mp4
프로젝트용 캐릭터 이미지 : https://www.freepik.com/free-vector/cute-animal-masks-video-chat-application-effect-filters-set_6380101.htm
파일명 : right_eye.png (100 x 100), left_eye.png (100 x 100), nose.png (300 x 100)
무료 이미지 편집 도구 : https://pixlr.com/kr/
(Pixlr E -Advanced Editor)
#python #opencv