xxHash: Extremely Fast Non-cryptographic Hash Algorithm

xxHash - Extremely fast hash algorithm

  • XXH32 : generates 32-bit hashes, using 32-bit arithmetic
  • XXH64 : generates 64-bit hashes, using 64-bit arithmetic
  • XXH3 (since v0.8.0): generates 64 or 128-bit hashes, using vectorized arithmetic. The 128-bit variant is called XXH128.

All variants successfully complete the SMHasher test suite which evaluates the quality of hash functions (collision, dispersion and randomness). Additional tests, which evaluate more thoroughly speed and collision properties of 64-bit hashes, are also provided.

BranchStatus
releaseBuild Status
devBuild Status

Benchmarks

The benchmarked reference system uses an Intel i7-9700K cpu, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.

Hash NameWidthBandwidth (GB/s)Small Data VelocityQualityComment
XXH3 (SSE2)6431.5 GB/s133.110 
XXH128 (SSE2)12829.6 GB/s118.110 
RAM sequential readN/A28.0 GB/sN/AN/Afor reference
City646422.0 GB/s76.610 
T1ha26422.0 GB/s99.09Slightly worse [collisions]
City12812821.7 GB/s57.710 
XXH646419.4 GB/s71.010 
SpookyHash6419.3 GB/s53.210 
Mum6418.0 GB/s67.09Slightly worse [collisions]
XXH32329.7 GB/s71.910 
City32329.1 GB/s66.010 
Murmur3323.9 GB/s56.110 
SipHash643.0 GB/s43.210 
FNV64641.2 GB/s62.75Poor avalanche properties
Blake22561.1 GB/s5.110Cryptographic
SHA11600.8 GB/s5.610Cryptographic but broken
MD51280.6 GB/s7.810Cryptographic but broken

note 1: Small data velocity is a rough evaluation of algorithm's efficiency on small data. For more detailed analysis, please refer to next paragraph.

note 2: some algorithms feature faster than RAM speed. In which case, they can only reach their full speed potential when input is already in CPU cache (L3 or better). Otherwise, they max out on RAM speed limit.

Small data

Performance on large data is only one part of the picture. Hashing is also very useful in constructions like hash tables and bloom filters. In these use cases, it's frequent to hash a lot of small data (starting at a few bytes). Algorithm's performance can be very different for such scenarios, since parts of the algorithm, such as initialization or finalization, become fixed cost. The impact of branch mis-prediction also becomes much more present.

XXH3 has been designed for excellent performance on both long and small inputs, which can be observed in the following graph:

XXH3, latency, random size

For a more detailed analysis, please visit the wiki : https://github.com/Cyan4973/xxHash/wiki/Performance-comparison#benchmarks-concentrating-on-small-data-

Quality

Speed is not the only property that matters. Produced hash values must respect excellent dispersion and randomness properties, so that any sub-section of it can be used to maximally spread out a table or index, as well as reduce the amount of collisions to the minimal theoretical level, following the birthday paradox.

xxHash has been tested with Austin Appleby's excellent SMHasher test suite, and passes all tests, ensuring reasonable quality levels. It also passes extended tests from newer forks of SMHasher, featuring additional scenarios and conditions.

Finally, xxHash provides its own massive collision tester, able to generate and compare billions of hashes to test the limits of 64-bit hash algorithms. On this front too, xxHash features good results, in line with the birthday paradox. A more detailed analysis is documented in the wiki.

Build modifiers

The following macros can be set at compilation time to modify libxxhash's behavior. They are generally disabled by default.

  • XXH_INLINE_ALL: Make all functions inline, with implementations being directly included within xxhash.h. Inlining functions is beneficial for speed on small keys. It's extremely effective when key length is expressed as a compile time constant, with performance improvements observed in the +200% range . See this article for details.
  • XXH_PRIVATE_API: same outcome as XXH_INLINE_ALL. Still available for legacy support. The name underlines that XXH_* symbol names will not be exported.
  • XXH_NAMESPACE: Prefixes all symbols with the value of XXH_NAMESPACE. This macro can only use compilable character set. Useful to evade symbol naming collisions, in case of multiple inclusions of xxHash's source code. Client applications still use the regular function names, as symbols are automatically translated through xxhash.h.
  • XXH_FORCE_ALIGN_CHECK: Use a faster direct read path when input is aligned. This option can result in dramatic performance improvement when input to hash is aligned on 32 or 64-bit boundaries, when running on architectures unable to load memory from unaligned addresses, or suffering a performance penalty from it. It is (slightly) detrimental on platform with good unaligned memory access performance (same instruction for both aligned and unaligned accesses). This option is automatically disabled on x86, x64 and aarch64, and enabled on all other platforms.
  • XXH_FORCE_MEMORY_ACCESS: The default method 0 uses a portable memcpy() notation. Method 1 uses a gcc-specific packed attribute, which can provide better performance for some targets. Method 2 forces unaligned reads, which is not standard compliant, but might sometimes be the only way to extract better read performance. Method 3 uses a byteshift operation, which is best for old compilers which don't inline memcpy() or big-endian systems without a byteswap instruction.
  • XXH_VECTOR : manually select a vector instruction set (default: auto-selected at compilation time). Available instruction sets are XXH_SCALAR, XXH_SSE2, XXH_AVX2, XXH_AVX512, XXH_NEON and XXH_VSX. Compiler may require additional flags to ensure proper support (for example, gcc on linux will require -mavx2 for AVX2, and -mavx512f for AVX512).
  • XXH_NO_PREFETCH : disable prefetching. Some platforms or situations may perform better without prefetching. XXH3 only.
  • XXH_PREFETCH_DIST : select prefetching distance. For close-to-metal adaptation to specific hardware platforms. XXH3 only.
  • XXH_NO_STREAM: Disables the streaming API, limiting it to single shot variants only.
  • XXH_SIZE_OPT: 0: default, optimize for speed 1: default for -Os and -Oz: disables some speed hacks for size optimization 2: makes code as small as possible, performance may cry
  • XXH_NO_INLINE_HINTS: By default, xxHash uses __attribute__((always_inline)) and __forceinline to improve performance at the cost of code size. Defining this macro to 1 will mark all internal functions as static, allowing the compiler to decide whether to inline a function or not. This is very useful when optimizing for smallest binary size, and is automatically defined when compiling with -O0, -Os, -Oz, or -fno-inline on GCC and Clang. This may also increase performance depending on compiler and architecture.
  • XXH32_ENDJMP: Switch multi-branch finalization stage of XXH32 by a single jump. This is generally undesirable for performance, especially when hashing inputs of random sizes. But depending on exact architecture and compiler, a jump might provide slightly better performance on small inputs. Disabled by default.
  • XXH_NO_STDLIB: Disable invocation of <stdlib.h> functions, notably malloc() and free(). libxxhash's XXH*_createState() will always fail and return NULL. But one-shot hashing (like XXH32()) or streaming using statically allocated states still work as expected. This build flag is useful for embedded environments without dynamic allocation.
  • XXH_STATIC_LINKING_ONLY: gives access to internal state declaration, required for static allocation. Incompatible with dynamic linking, due to risks of ABI changes.
  • XXH_NO_XXH3 : removes symbols related to XXH3 (both 64 & 128 bits) from generated binary. Useful to reduce binary size, notably for applications which do not employ XXH3.
  • XXH_NO_LONG_LONG: removes compilation of algorithms relying on 64-bit types (XXH3 and XXH64). Only XXH32 will be compiled. Useful for targets (architectures and compilers) without 64-bit support.
  • XXH_IMPORT: MSVC specific: should only be defined for dynamic linking, as it prevents linkage errors.
  • XXH_CPU_LITTLE_ENDIAN: By default, endianness is determined by a runtime test resolved at compile time. If, for some reason, the compiler cannot simplify the runtime test, it can cost performance. It's possible to skip auto-detection and simply state that the architecture is little-endian by setting this macro to 1. Setting it to 0 states big-endian.
  • XXH_DEBUGLEVEL : When set to any value >= 1, enables assert() statements. This (slightly) slows down execution, but may help finding bugs during debugging sessions.

When compiling the Command Line Interface xxhsum using make, the following environment variables can also be set :

  • DISPATCH=1 : use xxh_x86dispatch.c, to automatically select between scalar, sse2, avx2 or avx512 instruction set at runtime, depending on local host. This option is only valid for x86/x64 systems.
  • XXH_1ST_SPEED_TARGET : select an initial speed target, expressed in MB/s, for the first speed test in benchmark mode. Benchmark will adjust the target at subsequent iterations, but the first test is made "blindly" by targeting this speed. Currently conservatively set to 10 MB/s, to support very slow (emulated) platforms.

Building xxHash - Using vcpkg

You can download and install xxHash using the vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install xxhash

The xxHash port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Building and Using xxHash - tipi.build

You can work on xxHash and depend on it in your tipi.build projects by adding the following entry to your .tipi/deps:

{
    "Cyan4973/xxHash": { "@": "v0.8.1" }
}

An example of such usage can be found in the /cli folder of this project which, if built as root project will depend on the release v0.8.1 of xxHash

To contribute to xxHash itself use tipi.build on this repository (change the target name appropriately to linux or macos or windows):

tipi . -t <target> --test all

Example

The simplest example calls xxhash 64-bit variant as a one-shot function generating a hash value from a single buffer, and invoked from a C/C++ program:

#include "xxhash.h"

    (...)
    XXH64_hash_t hash = XXH64(buffer, size, seed);
}

Streaming variant is more involved, but makes it possible to provide data incrementally:

#include "stdlib.h"   /* abort() */
#include "xxhash.h"


XXH64_hash_t calcul_hash_streaming(FileHandler fh)
{
    /* create a hash state */
    XXH64_state_t* const state = XXH64_createState();
    if (state==NULL) abort();

    size_t const bufferSize = SOME_SIZE;
    void* const buffer = malloc(bufferSize);
    if (buffer==NULL) abort();

    /* Initialize state with selected seed */
    XXH64_hash_t const seed = 0;   /* or any other value */
    if (XXH64_reset(state, seed) == XXH_ERROR) abort();

    /* Feed the state with input data, any size, any number of times */
    (...)
    while ( /* some data left */ ) {
        size_t const length = get_more_data(buffer, bufferSize, fh);
        if (XXH64_update(state, buffer, length) == XXH_ERROR) abort();
        (...)
    }
    (...)

    /* Produce the final hash value */
    XXH64_hash_t const hash = XXH64_digest(state);

    /* State could be re-used; but in this example, it is simply freed  */
    free(buffer);
    XXH64_freeState(state);

    return hash;
}

License

The library files xxhash.c and xxhash.h are BSD licensed. The utility xxhsum is GPL licensed.

Other programming languages

Beyond the C reference version, xxHash is also available from many different programming languages, thanks to great contributors. They are listed here.

Packaging status

Many distributions bundle a package manager which allows easy xxhash installation as both a libxxhash library and xxhsum command line interface.

Packaging status

Special Thanks

  • Takayuki Matsuoka, aka @t-mat, for creating xxhsum -c and great support during early xxh releases
  • Mathias Westerdahl, aka @JCash, for introducing the first version of XXH64
  • Devin Hussey, aka @easyaspi314, for incredible low-level optimizations on XXH3 and XXH128

Download Details:

Author: Cyan4973
Source Code: https://github.com/Cyan4973/xxHash 
License: View license

#c #hash #functions #algorithm 

xxHash: Extremely Fast Non-cryptographic Hash Algorithm
Royce  Reinger

Royce Reinger

1678964100

RBDyn: Provides A Set Of Classes and Functions to Model The Dynamics

RBDyn

RBDyn provides a set of classes and functions to model the dynamics of rigid body systems.

This implementation is based on Roy Featherstone Rigid Body Dynamics Algorithms book and other state of the art publications.

Installing

Ubuntu LTS (16.04, 18.04, 20.04)

You must first setup our package mirror:

curl -1sLf \
  'https://dl.cloudsmith.io/public/mc-rtc/stable/setup.deb.sh' \
  | sudo -E bash

You can also choose the head mirror which will have the latest version of this package:

curl -1sLf \
  'https://dl.cloudsmith.io/public/mc-rtc/head/setup.deb.sh' \
  | sudo -E bash

You can then install the package:

sudo apt install librbdyn-dev python-rbdyn python3-rbdyn

vcpkg

Use the registry available here

Homebrew OS X install

Install from the command line using Homebrew:

# Use mc-rtc tap
brew tap mc-rtc/mc-rtc
# install RBDyn and its Python bindings
brew install rbdyn

Manually build from source

Dependencies

To compile you need the following tools:

For Python bindings:

Building

git clone --recursive https://github.com/jrl-umi3218/RBDyn
cd RBDyn
mkdir _build
cd _build
cmake [options] ..
make && make intall

CMake options

By default, the build will use the python and pip command to install the bindings for the default system version (this behaviour can be used to build the bindings in a given virtualenv). The following options allow to control this behaviour:

  • PYTHON_BINDING Build the python binding (ON/OFF, default: ON)
  • PYTHON_BINDING_FORCE_PYTHON2: use python2 and pip2 instead of python and pip
  • PYTHON_BINDING_FORCE_PYTHON3: use python3 and pip3 instead of python and pip
  • PYTHON_BINDING_BUILD_PYTHON2_AND_PYTHON3: builds two sets of bindings one with python2 and pip2, the other with python3 and pip3
  • BUILD_TESTING Enable unit tests building (ON/OFF, default: ON)

Arch Linux

You can use the following AUR package.

Documentation

Features:

  • Kinematics tree Kinematics and Dynamics algorithm C++11 implementation
  • Use Eigen3 and SpaceVecAlg library
  • Free, Spherical, Planar, Cylindrical, Revolute, Prismatic joint support
  • Translation, Rotation, Vector, CoM, Momentum Jacobian computation
  • Inverse Dynamics, Forward Dynamics
  • Inverse Dynamic Identification Model (IDIM)
  • Kinematics tree body merging/filtering
  • Kinematics tree base selection
  • Python binding

To make sure that RBDyn works as intended, unit tests are available for each algorithm. Besides, the library has been used extensively to control humanoid robots such as HOAP-3, HRP-2, HRP-4 and Atlas.

A short tutorial is available here.

The SpaceVecAlg and RBDyn tutorial is also a big ressource to understand how to use RBDyn by providing a lot of IPython Notebook that will present real use case.

A doxygen documentation is available online.


Download Details:

Author: jrl-umi3218
Source Code: https://github.com/jrl-umi3218/RBDyn 
License: BSD-2-Clause license

#machinelearning #cpluplus #dynamic #functions 

RBDyn: Provides A Set Of Classes and Functions to Model The Dynamics
Rupert  Beatty

Rupert Beatty

1677143843

SigmaSwiftStatistics: σ (sigma) - Statistics Library Written in Swift

σ (sigma) - statistics library written in Swift

This library is a collection of functions that perform statistical calculations in Swift. It can be used in Swift apps for Apple devices and in open source Swift programs on other platforms.

Statistical library for Swift

Setup

There are four ways you can add Sigma to your project.

Add source (iOS 7+)

Simply add SigmaDistrib.swift file to your project.

Setup with Carthage (iOS 8+)

Alternatively, add github "evgenyneu/SigmaSwiftStatistics" ~> 9.0 to your Cartfile and run carthage update.

Setup with CocoaPods (iOS 8+)

If you are using CocoaPods add this text to your Podfile and run pod install.

use_frameworks!
target 'Your target name'
pod 'SigmaSwiftStatistics', '~> 9.0'

Setup with Swift Package Manager

Legacy Swift versions

Setup a previous version of the library if you use an older version of Swift.

Usage

Add import SigmaSwiftStatistics to your source code unless you used the file setup method.

Average / mean

Computes arithmetic mean of values in the array.

Note:

  • Returns nil for an empty array.
  • Same as AVERAGE in Microsoft Excel and Google Docs Sheets.

Formula

A = Σ(x) / n

Where:

  • n is the number of values.
Sigma.average([1, 3, 8])
// Result: 4

Central moment

Computes central moment of the dataset.

Note:

  • Returns nil for an empty array.
  • Same as in Wolfram Alpha and "moments" R package.

Formula

Σ(x - m)^k / n

Where:

  • m is the sample mean.
  • k is the order of the moment (0, 1, 2, 3, ...).
  • n is the sample size.
Sigma.centralMoment([3, -1, 1, 4.1, 4.1, 0.7], order: 3)
// Result: -1.5999259259

Covariance of a population

Computes covariance of the entire population between two variables: x and y.

Note:

  • Returns nil if arrays x and y have different number of values.
  • Returns nil for empty arrays.
  • Same as COVAR and COVARIANCE.P functions in Microsoft Excel and COVAR in Google Docs Sheets.

Formula

cov(x,y) = Σ(x - mx)(y - my) / n

Where:

  • mx is the population mean of the first variable.
  • my is the population mean of the second variable.
  • n is the total number of values.
let x = [1, 2, 3.5, 3.7, 8, 12]
let y = [0.5, 1, 2.1, 3.4, 3.4, 4]
Sigma.covariancePopulation(x: x, y: y)
// Result: 4.19166666666667

Covariance of a sample

Computes sample covariance between two variables: x and y.

Note:

  • Returns nil if arrays x and y have different number of values.
  • Returns nil for empty arrays or arrays containing a single element.
  • Same as COVARIANCE.S function in Microsoft Excel.

Formula

cov(x,y) = Σ(x - mx)(y - my) / (n - 1)

Where:

  • mx is the sample mean of the first variable.
  • my is the sample mean of the second variable.
  • n is the total number of values.
let x = [1, 2, 3.5, 3.7, 8, 12]
let y = [0.5, 1, 2.1, 3.4, 3.4, 4]
Sigma.covarianceSample(x: x, y: y)
// Result: 5.03

Coefficient of variation of a sample

Computes coefficient of variation based on a sample.

Note:

  • Returns nil when the array is empty or contains a single value.
  • Returns Double.infinity if the mean is zero.
  • Same as in Wolfram Alfa and in "raster" R package (expressed as a percentage in "raster").

Formula

CV = s / m

Where:

  • s is the sample standard deviation.
  • m is the mean.
Sigma.coefficientOfVariationSample([1, 12, 19.5, -5, 3, 8])
// Result: 1.3518226672

Frequencies

Returns a dictionary with the keys containing the numbers from the input array and the values corresponding to the frequencies of those numbers.

Sigma.frequencies([1, 2, 3, 4, 5, 4, 4, 3, 5])
// Result: [2:1, 3:2, 4:3, 5:2, 1:1]

Kurtosis A

Returns the kurtosis of a series of numbers.

Note:

  • Returns nil if the dataset contains less than 4 values.
  • Returns nil if all the values in the dataset are the same.
  • Same as KURT in Microsoft Excel and Google Docs Sheets.

Formula

Kurtosis formula

Sigma.kurtosisA([2, 1, 3, 4.1, 19, 1.5])
// Result: 5.4570693277

Kurtosis B

Returns the kurtosis of a series of numbers.

Note:

  • Returns nil if the dataset contains less than 2 values.
  • Returns nil if all the values in the dataset are the same.
  • Same as in Wolfram Alpha and "moments" R package.

Formula

Kurtosis formula

Sigma.kurtosisB([2, 1, 3, 4.1, 19, 1.5])
// Result: 4.0138523409

Max

Returns the maximum value in the array.

Note: returns nil for an empty array.

Sigma.max([1, 8, 3])
// Result: 8

Median

Returns the median value from the array.

Note:

  • Returns nil when the array is empty.
  • Returns the mean of the two middle values if there is an even number of items in the array.
  • Same as MEDIAN in Microsoft Excel and Google Docs Sheets.
Sigma.median([1, 12, 19.5, 3, -5])
// Result: 3

Median high

Returns the median value from the array.

Note:

  • Returns nil when the array is empty.
  • Returns the higher of the two middle values if there is an even number of items in the array.
Sigma.medianHigh([1, 12, 19.5, 10, 3, -5])
// Result: 10

Median low

Returns the median value from the array.

Note:

  • Returns nil when the array is empty.
  • Returns the lower of the two middle values if there is an even number of items in the array.
Sigma.medianLow([1, 12, 19.5, 10, 3, -5])
// Result: 3

Min

Returns the minimum value in the array.

Note: returns nil for an empty array.

Sigma.min([7, 2, 3])
// Result: 2

Normal distribution

Returns the normal distribution for the given values of x, μ and σ. The returned value is the area under the normal curve to the left of the value x.

Note:

  • Returns nil if σ is zero or negative.
  • Defaults: μ = 0, σ = 1.
  • Same as NORM.S.DIST, NORM.DIST and NORMDIST Excel functions and NORMDIST function in Google Docs sheet with cumulative argument equal to true.
Sigma.normalDistribution(x: -1, μ: 0, σ: 1)
// Result: 0.1586552539314570

Normal density

Returns density of the normal function for the given values of x, μ and σ.

Note:

  • Returns nil if σ is zero or negative.
  • Defaults: μ = 0, σ = 1.
  • Same as NORM.S.DIST, NORM.DIST and NORMDIST Excel functions and NORMDIST function in Google Docs sheet with cumulative argument equal to false.

Formula

Nodemal density function

Where:

  • x is the input value of the normal density function.
  • μ is the mean.
  • σ is the standard deviation.
Sigma.normalDensity(x: 0, μ: 0, σ: 1)
// Result: 0.3989422804014327

Normal quantile

Returns the quantile function for the normal distribution (the inverse of normal distribution). The p argument is the probability, or the area under the normal curve to the left of the returned value.

Note:

  • Returns nil if σ is zero or negative.
  • Returns nil if p is negative or greater than one.
  • Returns -Double.infinity if p is zero, and Double.infinity if p is one.
  • Defaults: μ = 0, σ = 1.
  • Same as NORM.INV, NORM.S.INV and NORMINV Excel functions and NORMINV, NORMSINV Google Docs sheet functions.
Sigma.normalQuantile(p: 0.025, μ: 0, σ: 1)
// -1.9599639845400538

Pearson correlation coefficient

Calculates the Pearson product-moment correlation coefficient between two variables: x and y.

Note:

  • Returns nil if arrays x and y have different number of values.
  • Returns nil for empty arrays.
  • Same as CORREL and PEARSON functions in Microsoft Excel and Google Docs Sheets.

Formula

p(x,y) = cov(x,y) / (σx * σy)

Where:

  • cov is the population covariance.
  • σ is the population standard deviation.
let x = [1, 2, 3.5, 3.7, 8, 12]
let y = [0.5, 1, 2.1, 3.4, 3.4, 4]
Sigma.pearson(x: x, y: y)
// Result: 0.843760859352745

Percentile

Calculates the Percentile value for the given dataset.

Note:

  • Returns nil when the values array is empty.
  • Returns nil when supplied percentile parameter is negative or greater than 1.
  • Same as PERCENTILE or PERCENTILE.INC in Microsoft Excel and PERCENTILE in Google Docs Sheets.
  • Same as the 7th sample quantile method from the Hyndman and Fan paper (1996).

See the Percentile method documentation for more information.

// Calculate 40th percentile
Sigma.percentile([35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29

// Same as
Sigma.quantiles.method7([35, 20, 50, 40, 15], probability: 0.4)

Quantiles

Collection of nine functions that calculate sample quantiles corresponding to the given probability. This is an implementation of the nine algorithms described in the Hyndman and Fan paper (1996). The documentation of the functions is based on R and Wikipedia.

Note:

  • Returns nil if the dataset is empty.
  • Returns nil if the probability is outside the [0, 1] range.
  • Same as quantile function in R.

Quantile method 1

This method calculates quantiles using the inverse of the empirical distribution function.

Sigma.quantiles.method1([1, 12, 19.5, -5, 3, 8], probability: 0.5)
// Result: 3

Quantile method 2

This method uses inverted empirical distribution function with averaging.

Sigma.quantiles.method2([1, 12, 19.5, -5, 3, 8], probability: 0.5)
// Result: 5.5

Quantile method 3

Sigma.quantiles.method3([1, 12, 19.5, -5, 3, 8], probability: 0.5)
// Result: 3

Quantile method 4

The method uses linear interpolation of the empirical distribution function.

Sigma.quantiles.method4([1, 12, 19.5, -5, 3, 8], probability: 0.17)
// Result: -4.88

Quantile method 5

This method uses a piecewise linear function where the knots are the values midway through the steps of the empirical distribution function.

Sigma.quantiles.method5([1, 12, 19.5, -5, 3, 8], probability: 0.11)
// Result: -4.04

Quantile method 6

This method is implemented in Microsoft Excel (PERCENTILE.EXC), Minitab and SPSS. It uses linear interpolation of the expectations for the order statistics for the uniform distribution on [0,1].

Sigma.quantiles.method6([1, 12, 19.5, -5, 3, 8], probability: 0.1999)
// Result: -2.6042

Quantile method 7

This method is implemented in S, Microsoft Excel (PERCENTILE or PERCENTILE.INC) and Google Docs Sheets (PERCENTILE). It uses linear interpolation of the modes for the order statistics for the uniform distribution on [0, 1].

Sigma.quantiles.method7([1, 12, 19.5, -5, 3, 8], probability: 0.00001)
// Result: -4.9997

Quantile method 8

The quantiles returned by the method are approximately median-unbiased regardless of the distribution of x.

Sigma.quantiles.method8([1, 12, 19.5, -5, 3, 8], probability: 0.11)
// Result: -4.82

Quantile method 9

The quantiles returned by this method are approximately unbiased for the expected order statistics if x is normally distributed.

Sigma.quantiles.method9([1, 12, 19.5, -5, 3, 8], probability: 0.10001)
// Result: -4.999625

Rank

Returns the ranks of the values in the dataset.

Note:

Receives an optional ties parameter that determines how the ranks for the equal values ('ties') are calculated. Default value is .average. Possible values:

  • .average: uses the average rank. Same as RANK.AVG in Microsoft Excel and Google Docs Sheets.
  • .min, .max: uses the minimum/maximum rank. The value .min is the same as RANK and RANK.EQ in Microsoft Excel and Google Docs Sheets.
  • .first, .last: the ranks are incremented/decremented.

Same as rank function in R.

Sigma.rank([2, 3, 6, 5, 3], ties: .average)
// Result: [1.0, 2.5, 5.0, 4.0, 2.5]

Skewness A

Returns the skewness of the dataset.

Note:

  • Returns nil if the dataset contains less than 3 values.
  • Returns nil if all the values in the dataset are the same.
  • Same as SKEW in Microsoft Excel and Google Docs Sheets.

Formula

Skewness formula

Sigma.skewnessA([4, 2.1, 8, 21, 1])
// Result: 1.6994131524

Skewness B

Returns the skewness of the dataset.

Note:

  • Returns nil if the dataset contains less than 3 values.
  • Returns nil if all the values in the dataset are the same.
  • Same as in Wolfram Alpha, SKEW.P in Microsoft Excel and skewness function in "moments" R package.

Formula

Skewness formula

Sigma.skewnessB([4, 2.1, 8, 21, 1])
// Result: 1.1400009992

Standard deviation of a population

Computes standard deviation of entire population.

Note:

  • Returns nil for an empty array.
  • Same as STDEVP and STDEV.P in Microsoft Excel and STDEVP in Google Docs Sheets.

Formula

σ = sqrt( Σ( (x - m)^2 ) / n )

Where:

  • m is the population mean.
  • n is the population size.
Sigma.standardDeviationPopulation([1, 12, 19.5, -5, 3, 8])
// Result: 7.918420858282849

Standard deviation of a sample

Computes standard deviation based on a sample.

Note:

  • Returns nil when the array is empty or contains a single value.
  • Same as STDEV and STDEV.S in Microsoft Excel and STDEV in Google Docs Sheets.

Formula

s = sqrt( Σ( (x - m)^2 ) / (n - 1) )

Where:

  • m is the sample mean.
  • n is the sample size.
Sigma.standardDeviationSample([1, 12, 19.5, -5, 3, 8])
// Result: 8.674195447801869

Standard error of the mean

Computes standard error of the mean.

Note:

  • Returns nil when the array is empty or contains a single value.

Formula

SE = s / sqrt(n)

Where:

  • s is the sample standard deviation.
  • n is the sample size.
Sigma.standardErrorOfTheMean([1, 12, 19.5, -5, 3, 8])
// Result: 3.5412254627

Sum

Computes sum of values from the array.

Sigma.sum([1, 3, 8])
// Result: 12

Unique values

Returns an unsorted array containing all values that occur within the input array without the duplicates.

Sigma.uniqueValues([2, 1, 3, 4, 5, 4, 3, 5])
// Result: [2, 3, 4, 5, 1]

Variance of a population

Computes variance of entire population.

Note:

  • Returns nil when the array is empty.
  • Same as VAR.P or VARPA in Microsoft Excel and VARP or VARPA in Google Docs Sheets.

Formula

σ^2 = Σ( (x - m)^2 ) / n

Where:

  • m is the population mean.
  • n is the population size.
Sigma.variancePopulation([1, 12, 19.5, -5, 3, 8])
// Result: 62.70138889

Variance of a sample

Computes variance based on a sample.

Note:

  • Returns nil when the array is empty or contains a single value.
  • Same as VAR, VAR.S or VARA in Microsoft Excel and VAR or VARA in Google Docs Sheets.

Formula

s^2 = Σ( (x - m)^2 ) / (n - 1)

Where:

  • m is the sample mean.
  • n is the sample size.
Sigma.varianceSample([1, 12, 19.5, -5, 3, 8])
// Result: 75.24166667

Feedback is welcome

If you need help or want to extend the library feel free to create an issue or submit a pull request.

Help will always be given at Hogwarts to those who ask for it.

-- J.K. Rowling, Harry Potter

Contributors

Download Details:

Author: Evgenyneu
Source Code: https://github.com/evgenyneu/SigmaSwiftStatistics 
License: MIT license

#swift #statistics #functions 

SigmaSwiftStatistics: σ (sigma) - Statistics Library Written in Swift

CurveFit.jl: Simple Least Squares and Curve Fitting Functions

CurveFit

A package that implements a few curve fitting functions.

Linear Least squares

Linear least square is commonly used technique to find approximation to a discrete set of data. Given the sets of points x[i] and y[i] and a list of functions f_i(x) the least squares method finds coefficients a[i] such that

a[1]*f_1(x) + a[2]*f_2(x) + ... + a[n]*f_n(x)

minimizes the squares of the errors in relation to y[i].

The basic function is implemented using QR decomposition: A \ y:

coefs = A \ y

where A[:,i] = f_i(x). While usually x is a single variable, in general, if several independent variables are required, the same procedure can be used, something along the line of: A[:,i] = f_i(x1, x2, ..., xn).

Several typical cases are possible:

  • linear_fit(x, y) finds coefficients a and b for y[i] = a + b*x[i]
  • power_fit(x, y) finds coefficients a and b for y[i] = a *x[i]^b
  • log_fit(x, y) finds coefficients a and b for y[i] = a + b*log(x[i])
  • exp_fit(x, y) finds coefficients a and b for y[i] = a*exp(b*x[i])
  • expsum_fit(x, y, 2, withconst = true) finds coefficients k, p, and λ for y[i] = k + p[1]*exp(λ[1]*x[i]) + p[2]*exp(λ[2]*x[i])
  • poly_fit(x, y, n) finds coefficients a[k] for y[i] = a[1] + a[2]*x[i] + a[3]*x[i]^2 + a[n+1]*x[i]^n
  • linear_king_fit(E, U), find coefficients a and b for E[i]^2 = a + b*U^0.5
  • linear_rational_fit(x, y, p, q) finds the coefficients for rational polynomials: y[i] = (a[1] + a[2]*x[i] + ... + a[p+1]*x[i]^p) / (1 + a[p+1+1]*x[i] + ... + a[p+1+q]*x[i]^q)

Nonlinear least squares

Sometimes the fitting function is non linear with respect to the fitting coefficients. In this case, given an approximation of the coefficients, the fitting function is linearized around this approximation and linear least squares is used to calculate a correction to the approximate coefficients. This iteration is repeated until convergence is reached. The fitting function has the following form:

f(x_1, x_2, x_3, ..., x_n, a_1, a_2, ...,  a_p) = 0

where xi are the known data points and ai are the coefficients that should be fitted.

When the model formula is not linear on the fitting coefficients, a nonlinear algorithm is necessary. This library implements a Newton-type algorithm that doesn't explicitly need derivatives. This is implemented in the function:

coefs, converged, iter = nonlinear_fit(x, fun, a0, eps=1e-7, maxiter=200)

In this function, x is an array where each column represents a different variable of the data set, fun is a callable that returns the fitting error and should be callable with the following signature:

residual = fun(x, a)

where x is a vector representing a row of the argument array x and a is an estimate of the fitting coefficients which should all be different from zero (to provide a scale). eps and maxiter are convergence parameters.

The nonlinear_fit function is used to implement the following fitting functions.

  • king_fit(E, U) find coefficients a, b and n for E[i]^2 = a + b*U^n
  • rational_fit Just like linear_rational_fit but tries to improve the results using nonlinear least squares (nonlinear_fit)

Generic interface

A generic interface was developed to have a common interface for all curve fitting possibilities and to make it easy to use the results:

fit = curve_fit(::Type{T}, x, y...)

where T is the curve fitting type.

The following cases are implemented:

  • curve_fit(LinearFit, x, y)
  • curve_fit(LogFit, x, y)
  • curve_fit(PowerFit, x, y)
  • curve_fit(ExpFit, x, y)
  • curve_fit(Polynomial, x, y, n=1)
  • curve_fit(LinearKingFit, E, U)
  • curve_fit(KingFit, E, U)
  • curve_fit(RationalPoly, x, y, p, q)

The curve_fit generic function returns an object that can be use to compute estimates of the model with apply_fit. call is overloaded so that the object can be used as a function.

Example

using PyPlot
using CurveFit

x = 0.0:0.02:2.0
y0 = @. 1 + x + x*x + randn()/10
fit = curve_fit(Polynomial, x, y0, 2)
y0b = fit.(x) 
plot(x, y0, "o", x, y0b, "r-", linewidth=3)

King's law

In hotwire anemometry, a simple expression for the calibration curve of the probe is known as King's law, expressed as:

E^2 = A + B*sqrt(U)

where E is voltage on the anemometer bridge, U is the flow velocity. The coefficients A and B are obtained from a calibration. The function linear_king_fit estimates coefficients A and B.

A better approximation for the calibration curve is known as modified King's law:

E^2 = A + B*U^n

Now, this is a nonlinear curve fit. The linear fit (linear_king_fit) is usually a very good first guess for the coefficients (where n=0.5). This curve fit is implemented in function king_fit.

Example

using PyPlot
using CurveFit

U = 1.0:20.0
E = @. sqrt(2 + 1 * U ^ 0.45) + randn()/60
e = range(minimum(E), maximum(E), length=50)

f1 = curve_fit(KingFit, E, U)
U1 = f1.(e)

f2 = curve_fit(Polynomial, E, U, 5)
U2 = f2.(e)

plot(U, E, "o", U1, e, "r-", U2, e, "g-", linewidth=3)

Build Status

Download Details:

Author: Pjabardo
Source Code: https://github.com/pjabardo/CurveFit.jl 
License: View license

#julia #curve #fitting #functions 

CurveFit.jl: Simple Least Squares and Curve Fitting Functions

Types and Helper Functions for Dealing with The HTTP in Julia

HttpCommon.jl

Installation: julia> Pkg.add("HttpCommon")

This package provides types and helper functions for dealing with the HTTP protocol in Julia:

  • Types to represent Headers, Requests, Cookies, and Responses
  • A dictionary of STATUS_CODES that maps HTTP codes to descriptions
  • Utility functions escapeHTMLand parsequerystring

HTTP Types

Headers

Headers represents the header fields for an HTTP request, and is type alias for Dict{AbstractString,AbstractString}. There is a default constructor, headers, that produces a reasonable default set of headers:

Dict( "Server"           => "Julia/$VERSION",
      "Content-Type"     => "text/html; charset=utf-8",
      "Content-Language" => "en",
      "Date"             => Dates.format(now(Dates.UTC), Dates.RFC1123Format) )

Request

A Request represents an HTTP request sent by a client to a server. It has five fields:

  • method: an HTTP methods string (e.g. "GET")
  • resource: the resource requested (e.g. "/hello/world")
  • headers: see Headers above
  • data: the data in the request as a vector of bytes
  • uri: additional details, normally not used

Cookie

A Cookie represents an HTTP cookie. It has three fields: name and value are strings, and attrs is dictionary of pairs of strings.

Response

A Response represents an HTTP response sent to a client by a server. It has six fields:

  • status: HTTP status code (see STATUS_CODES) [default: 200]
  • headers: Headers [default: HttpCommmon.headers()]
  • cookies: Dictionary of strings => Cookies
  • data: the request data as a vector of bytes [default: UInt8[]]
  • finished: true if the Reponse is valid, meaning that it can be converted to an actual HTTP response [default: false]
  • requests: the history of requests that generated the response. Can be greater than one if a redirect was involved.

Response has many constructors - use methods(Response) for full list.

Constants

STATUS_CODES

STATUS_CODES is a dictionary (Int => AbstractString) that maps all the status codes defined in RFC's to their descriptions, e.g.

STATUS_CODES[200] # => "OK"
STATUS_CODES[404] # => "Not Found"
STATUS_CODES[418] # => "I'm a teapot"
STATUS_CODES[500] # => "Internal Server Error"

Utility functions

escapeHTML(i::AbstractString)

Returns a string with special HTML characters escaped: &, <, >, ", '

parsequerystring(query::AbstractString)

Convert a valid querystring to a Dict:

q = "foo=bar&baz=%3Ca%20href%3D%27http%3A%2F%2Fwww.hackershool.com%27%3Ehello%20world%21%3C%2Fa%3E"
parsequerystring(q)
# Dict{String,String} with 2 entries:
#   "baz" => "<a href='http://www.hackershool.com'>hello world!</a>"
#   "foo" => "bar"

:::::::::::::
::         ::
:: Made at ::
::         ::
:::::::::::::
     ::
Hacker School
:::::::::::::

Download Details:

Author: JuliaWeb
Source Code: https://github.com/JuliaWeb/HttpCommon.jl 
License: MIT license

#http #julia #functions 

Types and Helper Functions for Dealing with The HTTP in Julia

Simplified parallel processing for PHP based on Amp

parallel-functions

Simplified parallel processing for PHP based on Amp

Installation

This package can be installed as a Composer dependency.

composer require amphp/parallel-functions

Requirements

  • PHP 7.4+

Documentation

Documentation can be found on amphp.org as well as in the ./docs directory.

Example

<?php

use function Amp\ParallelFunctions\parallelMap;
use function Amp\Promise\wait;

$responses = wait(parallelMap([
    'https://google.com/',
    'https://github.com/',
    'https://stackoverflow.com/',
], function ($url) {
    return file_get_contents($url);
}));

Further examples can be found in the ./examples directory.

Versioning

amphp/parallel-functions follows the semver semantic versioning specification like all other amphp packages.

Security

If you discover any security related issues, please email me@kelunik.com instead of using the issue tracker.

Download Details:

Author: amphp
Source Code: https://github.com/amphp/parallel-functions 
License: MIT license

#php #functions #parallel 

Simplified parallel processing for PHP based on Amp
Rupert  Beatty

Rupert Beatty

1672045020

Higher-Order Functions in JavaScript

Functions that take another function as an argument, or that define a function as the return value, are called higher-order functions.

JavaScript can accept higher-order functions. This ability to handle higher-order functions, among other characteristics, makes JavaScript one of the programming languages well-suited for functional programming.

JavaScript Treats Functions as First-Class Citizens

You may have heard that JavaScript functions are first-class citizens. This means functions in JavaScript are objects.

They have the type Object, they can be assigned as the value of a variable, and they can be passed and returned just like any other reference variable.

First-class functions give JavaScript special powers and enable us to benefit from higher-order functions.

Because functions are objects, JavaScript is one of the popular programming languages that supports a natural approach to functional programming.

In fact, first-class functions are so native to JavaScript’s approach that I bet you’ve been using them without even thinking about it.

Higher-Order Functions Can Take a Function as an Argument

If you’ve done much JavaScript web development, you’ve probably come across functions that use a callback.

A callback function is a function that executes at the end of an operation, once all other operations are complete.

Usually, we pass this function as an argument last, after other parameters. It’s often defined inline as an anonymous function. Callback functions rely on JavaScript’s ability to deal with higher-order functions.

JavaScript is a single-threaded language. This means only one operation can execute at a time.

To avoid operations blocking each other or the system’s main thread (which would cause deadlock), the engine ensures all operations execute in order. They’re queued along this single thread until it’s safe for another transaction of code to occur.

The ability to pass in a function as an argument and run it after the parent function’s other operations are complete is essential for a language to support higher-order functions.

Callback functions in JavaScript allow for asynchronous behavior, so a script can continue executing other functions or operations while waiting for a result.

The ability to pass a callback function is critical when dealing with resources that may return a result after an undetermined period of time.

This higher-order function pattern is very useful in a web development. A script may send a request off to a server, and then need to handle the response whenever it arrives, without requiring any knowledge of the server’s network latency or processing time.

Node.js frequently uses callback functions to make efficient use of server resources. This asynchronous approach is also useful in the case of an app that waits for user input before performing a function.

Example: Passing an Alert Function to an Element Event Listener

Consider this snippet of simple JavaScript that adds an event listener to a button.

So Clickable

document.getElementById("clicker").addEventListener("click", function() {
alert("you triggered " + this.id);
});

This script uses an anonymous inline function to display an alert.

But it could just as easily have used a separately defined function and passed that named function to the addEventListener method:

var proveIt = function() {
alert("you triggered " + this.id);
};

document.getElementById("clicker").addEventListener("click", proveIt);

We haven’t just demonstrated higher-order functions by doing this. We’ve made our code more readable and resilient, and separated functionality for different tasks (listening for clicks vs. alerting the user).

How Higher-Order Functions Support Code Reusability

Our little proveIt() function is structurally independent of the code around it, always returning the id of whatever element was triggered. This approach to function design is at the core of functional programming.

This bit of code could exist in any context where you display an alert with the id of an element, and could be called with any event listener.

The ability to replace an inline function with a separately defined and named function opens up a world of possibilities.

In functional programming, we try to develop pure functions that don’t alter external data and return the same result for the same input every time.

We now have one of the essential tools to help us develop a library of small, targeted higher-order functions you can use generically in any application.

Note: Passing Functions vs. Passing the Function Object

Note that we passed proveIt and not proveIt() to our addEventListener function.

When you pass a function by name without parentheses, you are passing the function object itself.

When you pass it with parentheses, you are passing the result of executing that function.

Returning Functions as Results with Higher-Order Functions

In addition to taking functions as arguments, JavaScript allows functions to return other functions as a result.

This makes sense since functions are simply objects. Objects (including functions) can be defined as a function’s returned value, just like strings, arrays, or other values.

But what does it mean to return a function as a result?

Functions are a powerful way to break down problems and create reusable pieces of code. When we define a function as the return value of a higher-order function, it can serve as a template for new functions!

That opens the door to another world of functional JavaScript magic.

Say you’ve read one too many articles about Millennials and grown bored. You decide you want to replace the word Millennials with the phrase Snake People every time it occurs.

Your impulse might be simply to write a function that performed that text replacement on any text you passed to it:

var snakify = function(text) {
return text.replace(/millenials/ig, "Snake People");
};
console.log(snakify("The Millenials are always up to something."));
// The Snake People are always up to something.

That works, but it’s pretty specific to this one situation. Perhaps your patience has also outgrown articles about the Baby Boomers. You’d like to make a custom function for them as well.

But even with such a simple function, you don’t want to have to repeat the code that you’ve written when you can start with a higher-order function instead.

var hippify = function(text) {
return text.replace(/baby boomers/ig, "Aging Hippies");
};
console.log(hippify("The Baby Boomers just look the other way."));
// The Aging Hippies just look the other way.

But what if you decided that you wanted to do something fancier to preserve the case in the original string? You would have to modify both of your new functions to do this.

That’s a hassle, and it makes your code more brittle and harder to read. In situations like this, we can use a higher-order function as a solution.

Building a Template Higher-Order Function

What you really want is the flexibility to be able to replace any term with any other term in a template function, and define that behavior as a foundational function from which you can build new custom functions.

With the ability to assign functions as return values, JavaScript offers up ways to make that scenario much more convenient:

var attitude = function(original, replacement, source) {
return function(source) {
return source.replace(original, replacement);
};
};

var snakify = attitude(/millenials/ig, "Snake People");
var hippify = attitude(/baby boomers/ig, "Aging Hippies");

console.log(snakify("The Millenials are always up to something."));
// The Snake People are always up to something.
console.log(hippify("The Baby Boomers just look the other way."));
// The Aging Hippies just look the other way.

What we’ve done is isolate the code that does the actual work into a versatile and extensible attitude function. It encapsulates all of the work needed to modify any input string using the original phrase as the initial value, and output a replacement phrase with some attitude.

What do we gain when we define this new function as a reference to the attitude higher-order function, pre-populated with the first two arguments it takes? It allows the new function to take whatever text you pass it and use that argument in the return function we’ve defined as the attitude function’s output.

JavaScript functions don’t care about the number of arguments you pass them.

If the second argument is missing, it will treat it as undefined. And it will do the same when we opt not to provide a third argument, or any number of additional arguments, too.

Further, you can pass that additional argument in later. You can do this when you’ve defined the higher-order function you wish to call, as just demonstrated.

Simply define it as a reference to a function a higher-order function returns with one or more arguments left undefined.

Go over that a few times if you need to, so you fully understand what’s happening.

We’re creating a template higher-order function that returns another function. Then we’re defining that newly returned function, minus one attribute, as a custom implementation of the template function.

All the functions you create this way will inherit the working code from the higher-order function. However, you can predefine them with different default arguments.

You’re Already Using Higher-Order Functions

Higher-order functions are so basic to the way JavaScript works, you’re already using them.

Every time you pass an anonymous or callback function, you’re actually taking the value that the passed function returns, and using that as an argument for another function (such as with arrow functions).

Developers become familiar with higher-order functions early in the process of learning JavaScript. It’s so inherent to JavaScript’s design that the need to learn about the concept driving arrow functions or callbacks may not arise until later on.

The ability to assign functions that return other functions extends JavaScript’s convenience. Higher-order functions allow us to create custom-named functions to perform specialized tasks with shared template code from a first-order function.

Each of these functions can inherit any improvements made in the higher-order function down the road. This helps us avoid code duplication, and keeps our source code clean and readable.

If you ensure your functions are pure (they don’t alter external values and always return the same value for any given input), you can create tests to verify that your code changes don’t break anything when you update your first-order functions.

Conclusion

Now that you know how a higher-order function works, you can start thinking about ways you can take advantage of the concept in your own projects.

One of the great things about JavaScript is that you can mix functional techniques right in with the code you’re already familiar with.

Try some experiments. Even if you start by using a higher-order function for the sake of it, you’ll become familiar with the extra flexibility they provide soon enough.

A little work with higher-order functions now can improve your code for years to come.

Original article source at: https://www.sitepoint.com

#javascript #functions 

Higher-Order Functions in JavaScript

Function Expression vs. Function Declaration

There are two ways to create functions in JavaScript: function expressions and function declarations. In this article, we will discuss when to use function expressions vs. function declarations, and explain the differences between them.

Function declarations have been used for a long time, but function expressions have been gradually taking over. Many developers aren’t sure when to use one or the other, so they end up using the wrong one.

There are a few key differences between function expressions and function declarations. Let’s take a closer look at those differences, and when to use function expressions vs. function declarations in your code.

function funcDeclaration() {
    return 'A function declaration';
}

let funcExpression = function () {
    return 'A function expression';
}

What Are Function Declarations?

Function declarations are when you create a function and give it a name. You declare the name of the function when you write the function keyword, followed by the function name. For example:

function myFunction() {
  // do something
};

As you can see, the function name (myFunction) is declared when the function is created. This means that you can call the function before it is defined.

Here is an example of a function declaration:

function add (a, b) {
  return a + b;
};

What Are Function Expressions?

Function expressions are when you create a function and assign it to a variable. The function is anonymous, which means it doesn’t have a name. For example:

let myFunction = function() {
  // do something
};

As you can see, the function is assigned to the myFunction variable. This means that you must define the function before you can call it.

Here is an example of a function expression:

let add = function (a, b) {
  return a + b;
};

The Differences Between Function Expressions & Declarations

There are a few key differences between function expressions and function declarations:

  • Function declarations are hoisted, while function expressions are not. This means that you can call a function declaration before it is defined, but you cannot do this with a function expression.
  • With function expressions, you can use a function immediately after it is defined. With function declarations, you have to wait until the entire script has been parsed.
  • Function expressions can be used as an argument to another function, but function declarations cannot.
  • Function expressions can be anonymous, while function declarations cannot.

Understanding Scope in Your Function Expression: JavaScript Hoisting Differences

Similar to the let statement, function declarations are hoisted to the top of other code.

Function expressions aren’t hoisted. This allows them to retain a copy of the local variables from the scope where they were defined.

Normally, you can use function declarations and function expressions interchangeably. But there are times when function expressions result in easier-to-understand code without the need for a temporary function name.

How to Choose Between Expressions and Declarations

So, when should you use function expressions vs. function declarations?

The answer depends on your needs. If you need a more flexible function or one that is not hoisted, then a function expression is the way to go. If you need a more readable and understandable function, then use a function declaration.

As you’ve seen, the two syntaxes are similar. The most obvious difference is that function expressions are anonymous, while function declarations have a name.

Today, you would typically use a function declaration when you need to do something that function expressions cannot do. If you don’t need to do anything that can only be done with a function declaration, then it is generally best to use a function expression.

Use function declarations when you need to create a function that is recursive, or when you need to call the function before you define it. As a rule of thumb, use function expressions for cleaner code when you don’t need to do either of those things.

Benefits of function declarations

There are a few key benefits to using function declarations.

  • It can make your code more readable. If you have a long function, giving it a name can help you keep track of what it’s doing.
  • Function declarations are hoisted, which means that they are available before they are defined in your code. This helps if you need to use the function before it is defined.

Benefits of function expressions

Function expressions also have a few benefits.

  • They are more flexible than function declarations. You can create function expressions and assign them to different variables, which can be helpful when you need to use the same function in different places.
  • Function expressions are not hoisted, so you can’t use them before they are defined in your code. This helps if you want to make sure that a function is only used after it is defined.

When to choose a function declaration vs. function expression

In most cases, it’s easy to figure out which method of defining a function is best for your needs. These guidelines will help you make a quick decision in most situations.

Use a function declaration when:

  • you need a more readable and understandable function (such as a long function, or one you’ll need to use in different places)
  • an anonymous function won’t suit your needs
  • you need to create a function that is recursive
  • you need to call the function before it is defined

Use a function expression when:

  • you need a more flexible function
  • you need a function that isn’t hoisted
  • the function should only used when it is defined
  • the function is anonymous, or doesn’t need a name for later use
  • you want to control when the function is executed, using techniques like immediately-invoked function expressions (IIFE)
  • you want to pass the function as an argument to another function

That said, there are a number of cases where the flexibility of function expressions becomes a powerful asset.

Unlocking the Function Expression: JavaScript Hoisting Differences

There are several different ways that function expressions become more useful than function declarations.

  • Closures
  • Arguments to other functions
  • Immediately Invoked Function Expressions (IIFE)

Creating Closures with Function Expressions

Closures are used when you want to give parameters to a function before that function is executed. A good example of how this can benefit you is when looping through a NodeList.

A closure allows you to retain other information such as the index, in situations where that information isn’t available once the function is executed.

function tabsHandler(index) {
    return function tabClickEvent(evt) {
        // Do stuff with tab.
        // The index variable can be accessed from within here.
    };
}

let tabs = document.querySelectorAll('.tab'),
    i;

for (i = 0; i &lt; tabs.length; i += 1) {
    tabs[i].onclick = tabsHandler(i);
}

The attached event handlers are executed at a later time (after the loop is finished), so a closure is needed to retain the appropriate value of the for loop.

// Bad code, demonstrating why a closure is needed
let i;

for (i = 0; i &lt; list.length; i += 1) {
    document.querySelector('#item' + i).onclick = function doSomething(evt) {
        // Do something with item i
        // But, by the time this function executes, the value of i is always list.length
    }
}

It’s easier to understand why the problem occurs by extracting the doSomething() function out from within the for loop.

// Bad code, demonstrating why a closure is needed

let list = document.querySelectorAll('.item'),
    i,
    doSomething = function (evt) {
        // Do something with item i.
        // But, by the time this function executes, the value of i is not what it was in the loop.
    };

for (i = 0; i &lt; list.length; i += 1) {
    item[i].onclick = doSomething;
}

The solution here is to pass the index as a function argument to an outer function so that it can pass that value to an inner function. You’ll commonly see handler functions used to organize the information that an inner returning function needs.

// The following is good code, demonstrating the use of a closure

let list = ['item1', 'item2', 'item3'],
    i,
    doSomethingHandler = function (itemIndex) {
        return function doSomething(evt) {
            // now this doSomething function can retain knowledge of
            // the index variable via the itemIndex parameter,
            // along with other variables that may be available too.
            console.log('Doing something with ' + list[itemIndex]);
        };
    };

for (i = 0; i &lt; list.length; i += 1) {
    list[i].onclick = doSomethingHandler(i);
}

Learn more about closures and their usage.

Passing Function Expressions as Arguments

Function expressions can be passed directly to functions without having to be assigned to an intermediate temporary variable.

You’ll most often see them in the form of an anonymous function. Here’s a familiar jQuery function expression example:

$(document).ready(function () {
    console.log('An anonymous function');
});

A function expression is also used to handle the array items when using methods such as forEach().

They don’t have to be unnamed anonymous functions, either. It’s a good idea to name the function expression to help express what the function is supposed to do and to aid in debugging:

let productIds = ['12356', '13771', '15492'];

productIds.forEach(function showProduct(productId) {
    ...
});

Immediately Invoked Function Expressions (IIFE)

IIFEs help prevent your functions and variables from affecting the global scope.

All the properties within fall inside the anonymous function’s scope. This is a common design pattern that’s used to prevent your code from having unwanted or undesired side-effects elsewhere.

It’s also used as a module pattern to contain blocks of code in easy-to-maintain sections. We take a deeper look at these in Demystifying JavaScript closures, callbacks, and IIFEs.

Here’s a simple example of an IIFE:

(function () {
    // code in here
}());

… which when used as a module, can result in some easy-to-achieve maintainability for your code.

let myModule = (function () {
    let privateMethod = function () {
        console.log('A private method');
    },
    someMethod = function () {
        console.log('A public method');
    },
    anotherMethod = function () {
        console.log('Another public method');
    };

    return {
        someMethod: someMethod,
        anotherMethod: anotherMethod
    };
}());

Conclusion

As we’ve seen, function expressions aren’t radically different from function declarations, but they can often result in cleaner and more readable code.

Their widespread use makes them an essential part of every developer’s toolbox. Do you use function expressions in your code in any interesting ways that I haven’t mentioned above? Comment and let me know!

Original article source at: https://www.sitepoint.com/

#functions #javascript #hoisting 

Function Expression vs. Function Declaration
Nat  Grady

Nat Grady

1670947260

How to Useful JavaScript Math Functions

The built-in JavaScript Math object includes a number of useful functions for performing a variety of mathematical operations. Let’s dive in and take a look at how they work and what you might use them for.

Math.max and Math.min

These functions pretty much do what you’d expect: they return the maximum or minimum of the list of arguments supplied:

Math.max(1,2,3,4,5)
<< 5

Math.min(4,71,-7,2,1,0)
<< -7

The arguments all have to be of the Number data type. Otherwise, NaN will be returned:

Math.max('a','b','c')
<< NaN

Math.min(5,"hello",6)
<< NaN

Watch out, though. JavaScript will attempt to coerce values into a number:

Math.min(5,true,6)
<< 1

In this example, the Boolean value true is coerced into the number 1, which is why this is returned as the minimum value. If you’re not familiar with type coercion, it happens when the operands of an operator are of different types. In this case, JavaScript will attempt to convert one operand to an equivalent value of the other operand’s type. You can read more about type coercion in JavaScript: Novice to Ninja, 2nd Edition, in Chapter 2.

A list of numbers needs to be supplied as the argument, not an array, but you can use the spread operator (...) to unpack an array of numbers:

Math.max(...[8,4,2,1])
<< 8

The Math.max function is useful for finding the high score from a list of scores saved in an array:

const scores = [23,12,52,6,25,38,19,37,76,54,24]
const highScore = Math.max(...scores)
<< 76

The Math.min function is useful for finding the best price on a price-comparison website:

const prices = [19.99, 20.25, 18.57, 19,75, 25, 22.50]
const bestPrice = Math.min(...prices)
<< 18.57

Absolute Values

An absolute value is simply the size of the number, no matter what its size. This means that positive numbers stay the same and negative numbers lose their minus sign. The Math.abs function will calculate the absolute value of its argument:

Math.abs(5)
<< 5

Math.abs(-42)
<< 42

Math.abs(-3.14159)
<< 3.14159

Why would you want to do this? Well, sometimes you want to calculate the difference between two values, which you work out by subtracting the smallest from the largest, but often you won’t know which is the smallest of the two values in advance. To get around, this you can just subtract the numbers in any order and take the absolute value:

const x = 5
const y = 8

const difference = Math.abs(x - y)
<< 3

A practical example might be on a money-saving website, where you want to know how much you could save by calculating the difference between two deals, since you’d be dealing with live price data and wouldn’t know in advance which deal was the cheapest:

const dealA = 150
const dealB = 167

const saving = Math.abs(dealA - dealB)
<< 17

Math.pow

Math.pow performs power calculations, like these:

3⁴ = 81

 

In the example above, 3 is known as the base number and 4 is the exponent. We would read it as “3 to the power of 4 is 81”.

The function accepts two values — the base and the exponent — and returns the result of raising the base to the power of the exponent:

Math.pow(2,3)
<< 8

Math.pow(8,0)
<< 1

Math.pow(-1,-1)
<< -1

Math.pow has pretty much been replaced by the infix exponentiation operator (**) — introduced in ES2016 — which does exactly the same operation:

2 ** 3
<< 8

8 ** 0
<< 1

(-1) ** (-1)
<< -1

Calculating Roots

Roots are the inverse operation to powers. For example, since 3 squared is 9, the square root of 9 is 3.

Math.sqrt can be used to return the square root of the number provided as an argument:

Math.sqrt(4)
<< 2

Math.sqrt(100)
<< 10

Math.sqrt(2)
<< 1.4142135623730951

This function will return NaN if a negative number or non-numerical value is provided as an argument:

Math.sqrt(-1)
<< NaN

Math.sqrt("four")
<< NaN

But watch out, because JavaScript will attempt to coerce the type:

Math.sqrt('4') 
<< 2

Math.sqrt(true)
<< 1

Math.cbrt returns the cube root of a number. This accepts all numbers — including negative numbers. It will also attempt to coerce the type if a value that’s not a number is used. If it can’t coerce the value to a number, it will return NaN:

Math.cbrt(1000)
<< 10

Math.cbrt(-1000)
<< -10

Math.cbrt("10")
<< 2.154434690031884

Math.cbrt(false)
<< 0

It’s possible to calculate other roots using the exponentiation operator and a fractional power. For example, the fourth root of a number can be found by raising it to the power one-quarter (or 0.25). So the following code will return the fourth root of 625:

625 ** 0.25
<< 5

To find the fifth root of a number, you would raise it to the power of one fifth (or 0.2):

32 ** 0.2
<< 2

In general, to find the nth root of a number you would raise it to the power of 1/n, so to find the sixth root of a million, you would raise it to the power of 1/6:

1000000 ** (1/6)
<< 9.999999999999998

Notice that there’s a rounding error here, as the answer should be exactly 10. This will often happen with fractional powers that can’t be expressed exactly in binary. (You can read more about this rounding issue in “A Guide to Rounding Numbers in JavaScript“.)

Also note that you can’t find the roots of negative numbers if the root is even. This will return NaN. So you can’t attempt to find the 10th root of -7, for example (because 10 is even):

(-7) ** 0.1 // 0.1 is 1/10
<< NaN

One reason you might want to calculate roots is to work out growth rates. For example, say you want to 10x your profits by the end of the year. How much do your profits need to grow each month? To find this out, you’d need to calculate the 12th root of 10, or 10 to the power of a twelfth:

10 ** (1/12)
<< 1.2115276586285884

This result tells us that the monthly growth factor has to be around 1.21 in order to 10x profits by the end of the year. Or to put it another way, you’d need to increase your profits by 21% every month in order to achieve your goal.

Logs and Exponentials

Logarithms — or logs for short — can be used to find the exponent of a calculation. For example, imagine you wanted to solve the following equation:

2ˣ = 100

 

In the equation above, x certainly isn’t an integer, because 100 isn’t a power of 2. This can be solved by using base 2 logarithms:

x = log²(100) = 6.64 (rounded to 2 d.p.)

 

The Math object has a log2 method that will perform this calculation:

Math.log2(100)
<< 6.643856189774724

It also has a log10 method that performs the same calculations, but uses 10 as the base number:

Math.log10(100)
<< 2

This result is telling us that, to get 100, you need to raise 10 to the power of 2.

There’s one other log method, which is just Math.log. This calculates the natural logarithm, which uses Euler’s numbere (approximately 2.7) — as the base. This might seem to be a strange value to use, but it actually occurs often in nature when exponential growth happens — hence the name “natural logarithms”:

Math.log(10)
<< 4.605170185988092

Math.log(Math.E)
<< 1

The last calculation shows that Euler’s number (e) — which is stored as the constant Math.E — needs to be raised to the power of 1 to obtain itself. This makes sense, because any number to the power of 1 is in fact itself. The same results can be obtained if 2 and 10 are provided as arguments to Math.log2 and Math.log10:

Math.log2(2)
<< 1

Math.log10(10)
<< 1

Why would you use logarithms? It’s common when dealing with data that grows exponentially to use a logarithmic scale so that the growth rate is easier to see. Logarithmic scales were often used to measure the number of daily COVID-19 cases during the pandemic as they were rising so quickly.

If you’re lucky enough to have a website that’s growing rapidly in popularity (say, doubling every day) then you might want to consider using a logarithmic scale before displaying a graph to show how your popularity is growing.

Hypotenuse

You might remember studying Pythagoras’ theorem at school. This states that the length of the longest side of a right-angled triangle (the hypotenuse) can be found using the following formula:

h² = x² + y²

 

Here, x and y are the lengths of the other two sides.

The Math object has a hypot method that will calculate the length of the hypotenuse when provided with the other two lengths as arguments. For example, if one side is length 3 and the other is length 4, we can work out the hypotenuse using the following code:

Math.hypot(3,4)
<< 5

But why would this ever be useful? Well, the hypotenuse is a measure of the shortest distance between two points. This means that, if you know the x and y coordinates of two elements on the page, you could use this function to calculate how far apart they are:

const ship = {x: 220, y: 100}
const boat = {x: 340, y: 50}

const distance = Math.hypot(ship.x - boat.x,ship.y - boat.y)

I hope that this short roundup has been useful and helps you utilize the full power of the JavaScript Math object in your projects.

Related reading:

Original article source at: https://www.sitepoint.com/

#javascript #functions #math 

How to Useful JavaScript Math Functions
Gordon  Murray

Gordon Murray

1670696280

Guide to Python Lambda Functions, with Examples

This article introduces Python lambda functions and how write and use them.

Although Python is an object-oriented programming language, lambda functions are handy when you’re doing various kinds of functional programming.

Note: this article will assume you already understand Python programming and how to use a regular function. It’s also assumed you have Python 3.8 or above installed on your device.

Explaining Python Lambda Functions

In Python, functions can take in one or more positional or keyword arguments, a variable list of arguments, a variable list of keyword arguments, and so on. They can be passed into a higher-order function and returned as output. Regular functions can have several expressions and multiple statements. They also always have a name.

A Python lambda function is simply an anonymous function. It could also be called a nameless function. Normal Python functions are defined by the def keyword. Lambda functions in Python are usually composed of the lambda keyword, any number of arguments, and one expression.

Note: the terms lambda functions, lambda expressions, and lambda forms can be used interchangeably, depending on the programming language or programmer.

Lambda functions are mostly used as one-liners. They’re used very often within higher-order functions like map() and filter(). This is because anonymous functions are passed as arguments to higher-order functions, which is not only done in Python programming.

A lambda function is also very useful for handling list comprehension in Python — with various options for using Python lambda expressions for this purpose.

Lambdas are great when used for conditional rending in UI frameworks like Tkinter, wxPython, Kivy, etc. Although the workings of Python GUI frameworks aren’t covered in this article, some code snippets reveal heavy use of lambda functions to render UI based on a user’s interaction.

Things to Understand before Delving into Python Lambda Functions

Because Python is an object-oriented programming language, everything is an object. Python classes, class instances, modules and functions are all handled as objects.

A function object can be assigned to a variable.

It’s not uncommon to assign variables to regular functions in Python. This behavior can also be applied to lambda functions. This is because they’re function objects, even though they’re nameless:

def greet(name):
    return f'Hello {name}'

greetings = greet
greetings('Clint')
>>>>
Hello Clint

Higher-order functions like map(), filter(), and reduce()

It’s likely you’ll need to use a lambda function within built-in functions such as filter() and map(),and also with reduce() — which is imported from the functools module in Python, because it’s not a built-in function. By default, higher-order functions are functions that receive other functions as arguments.

As seen in the code examples below, the normal functions can be replaced with lambdas, passed as arguments into any of these higher-order functions:

#map function
names = ['Clint', 'Lisa', 'Asake', 'Ada']

greet_all = list(map(greet, names))
print(greet_all)
>>>>
['Hello Clint', 'Hello Lisa', 'Hello Asake', 'Hello Ada']
#filter function
numbers = [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
def multiples_of_three(x):
        return x % 3 == 0

print(list(filter(multiples_of_three, numbers)))
>>>>
[12, 15, 18]
#reduce function
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def add_numbers(x, y):
        return x * y

print(reduce(add_numbers, numbers))
>>>>
55

The difference between a statement and an expression

A common point of confusion amongst developers is differentiating between a statement and an expression in programming.

A statement is any piece of code that does something or performs an action — such as if or while conditions.

An expression is made of a combination of variables, values, and operators and evaluates to a new value.

This distinction is important as we explore the subject of lambda functions in Python. An expression like the one below returns a value:

square_of_three = 3 ** 2
print(square_of_three)
>>>>
9

A statement looks like this:

for i in range(len(numbers), 0, -1):
        if i % 2 == 1:
                print(i)
        else:
                print('even')
>>>>
even 9 even 7 even 5 even 3 even 1

How to Use Python Lambda Functions

The Python style guide stipulates that every lambda function must begin with the keyword lambda (unlike normal functions, which begin with the def keyword). The syntax for a lambda function generally goes like this:

lambda arguments : expression

Lambda functions can take any number of positional arguments, keyword arguments, or both, followed by a colon and only one expression. There can’t be more than one expression, as it’s syntactically restricted. Let’s examine an example of a lambda expression below:

add_number = lambda x, y : x + y
print(add_number(10, 4))
>>>>
14

From the example above, the lambda expression is assigned to the variable add_number. A function call is made by passing arguments, which evaluates to 14.

Let’s take another example below:

discounted_price = lambda price, discount = 0.1, vat = 0.02 : price * (1 - discount) * (1 + vat)

print(discounted_price(1000, vat=0.04, discount=0.3))
>>>>
728.0

As seen above, the lambda function evaluates to 728.0. A combination of positional and keyword arguments are used in the Python lambda function. While using positional arguments, we can’t alter the order outlined in the function definition. However, we can place keyword arguments at any position only after the positional arguments.

Lambda functions are always executed just like immediately invoked function expressions (IIFEs) in JavaScript. This is mostly used with a Python interpreter, as shown in the following example:

print((lambda x, y: x - y)(45, 18))
>>>>
27

The lambda function object is wrapped within parentheses, and another pair of parentheses follows closely with arguments passed. As an IIFE, the expression is evaluated and the function returns a value that’s assigned to the variable.

Python lambda functions can also be executed within a list comprehension. A list comprehension always has an output expression, which is replaced by a lambda function. Here are some examples:

my_list = [(lambda x: x * 2)(x) for x in range(10) if x % 2 == 0]
print(my_list)
>>>>
[0, 4, 8, 12, 16]
value = [(lambda x: x % 2 and 'odd' or 'even')(x) for x in my_list] 
print(value)
>>>>
['even', 'even', 'even', 'even', 'even']

Lambda functions can be used when writing ternary expressions in Python. A ternary expression outputs a result based on a given condition. Check out the examples below:

test_condition1 = lambda x: x / 5 if x > 10 else x + 5
print(test_condition1(9))
>>>>
14
test_condition2 = lambda x: f'{x} is even' if x % 2 == 0 else (lambda x: f'{x} is odd')(x)

print(test_condition2(9))
>>>>
9 is odd

Lambda functions within higher-order functions

The concept of higher-order functions is popular in Python, just as in other languages. They are functions that accept other functions as arguments and also return functions as output.

In Python, a higher-order function takes two arguments: a function, and an iterable. The function argument is applied to each item in the iterable object. Since we can pass a function as an argument to a higher-order function, we can equally pass in a lambda function.

Here are some examples of a lambda function used with the map() function:

square_of_numbers = list(map(lambda x: x ** 2, range(10)))

print(square_of_numbers)
>>>>
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
strings = ['Nigeria', 'Ghana', 'Niger', 'Kenya', 'Ethiopia', 'South Africa', 'Tanzania', 'Egypt', 'Morocco', 'Uganda']

length_of_strings = list(map(lambda x: len(x), strings))

print(length_of_strings)
>>>>
[7, 5, 5, 5, 8, 12, 8, 5, 7, 6]

Here are some lambda functions used with the filter() function:

length_of_strings_above_five = list(filter(lambda x: len(x) > 5, strings))

print(length_of_strings_above_five)
>>>>
['Nigeria', 'Ethiopia', 'South Africa', 'Tanzania', 'Morocco', 'Uganda']
fruits_numbers_alphanumerics = ['apple', '123', 'python3', '4567', 'mango', 'orange', 'web3', 'banana', '890']

fruits = list(filter(lambda x: x.isalpha(), fruits_numbers_alphanumerics))

numbers = list(filter(lambda x: x.isnumeric(), fruits_numbers_alphanumerics))

print(fruits)
print(numbers)
>>>>
['apple', 'mango', 'orange', 'banana']
['123', '4567', '890']

Here are some lambda functions used with the reduce() function:

values = [13, 6, 12, 23, 15, 31, 16, 21]
max_value = reduce(lambda x,y: x if (x > y) else y, values)
print(max_value)
>>>>
31
nums = [1, 2, 3, 4, 5, 6]
multiplication_of_nums = reduce(lambda x,y: x*y, nums)

print(multiplication_of_nums)
>>>>
720

Conclusion

Although Python lambdas can significantly reduce the number of lines of code you write, they should be used sparingly and only when necessary. The readability of your code should be prioritized over conciseness. For more readable code, always use a normal function where suited over lambda functions, as recommended by the Python Style Guide.

Lambdas can be very handy with Python ternary expressions, but again, try not to sacrifice readability. Lambda functions really come into their own when higher-order functions are being used.

In summary:

  • Python lambdas are good for writing one-liner functions.
  • They are also used for IIFEs (immediately invoked function expression).
  • Lambdas shouldn’t be used when there are multiple expressions, as it makes code unreadable.
  • Python is an object-oriented programming language, but lambdas are a good way to explore functional programming in Python.

Original article source at: https://www.sitepoint.com/

#python #lambda #functions 

Guide to Python Lambda Functions, with Examples
Nat  Grady

Nat Grady

1670413740

Learn Python Regex Functions, with Examples

Regular expressions (regex) are special sequences of characters used to find or match patterns in strings, as this introduction to regex explains. We’ve previously shown how to use regular expressions with JavaScript and PHP. The focus of this article is Python regex, with the goal of helping you better understand how to manipulate regular expressions in Python.

You’ll learn how to use Python regex functions and methods effectively in your programs as we cover the nuances involved in handling Python regex objects.

python regex

Regular Expression Modules in Python: re and regex

Python has two modules — re and regex — that facilitate working with regular expressions. The re module is built in to Python, while the regex module was developed by Matthew Barnett and is available on PyPI. The regex module by Barnett is developed using the built-in re module, and both modules have similar functionalities. They differ in terms of implementation. The built-in re module is the more popular of the two, so we’ll be working with that module here.

Python’s Built-in re Module

More often than not, Python developers use the re module when executing regular expressions. The general construct of regular expression syntax remains the same (characters and symbols), but the module provides some functions and method to effectively execute regex in a Python program.

Before we can use the re module, we have to import it into our file like any other Python module or library:

import re

This makes the module available in the current file so that Python’s regex functions and methods are easily accessible. With the re module, we can create Python regex objects, manipulate matched objects, and apply flags where necessary.

A Selection of re Functions.

The re module has functions such as re.search(), re.match(), and re.compile(), which we’ll discuss first.

re.search(pattern, string, flags=0) vs re.match(pattern, string, flags=0)

The re.search() and re.match() search through a string for a Python regex pattern and return a match if found or None if no match object is found.

Both functions always return the first matched substring found in a given string and maintain a default value 0 for flag. But while the search() function scans through an entire string to find a match, match() only searches for a match at the beginning of a string.

Python’s re.search() documentation:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Python’s re.match() documentation:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Let’s see some code examples to further clarify:

search_result = [re.search](http://re.search)(r'\d{2}', 'I live at 22 Garden Road, East Legon')

print(search_result)

print(search_result.group())

>>>>

<re.Match object; span=(10, 12), match='22'>

22
match_result = re.match(r'\d{2}', 'I live at 22 Garden Road, East Legon')

print(match_result)

print(match_result.group())

>>>>

None

Traceback (most recent call last):

File "/home/ini/Dev./sitepoint/regex.py", line 4, in <module>

print(match_result.group())

AttributeError: 'NoneType' object has no attribute 'group'

From the above example, None was returned because there was no match at the beginning of the string. An AttributeError was raised when the group() method was called, because there’s no match object:

match_result = re.match(r'\d{2}', "45 cars were used for the president's convoy")

print(match_result)

print(match_result.group())

>>>>

<re.Match object; span=(0, 2), match='45'>

45

With 45, the match object at the beginning of the string, the match() method works just fine.

re.compile(pattern, flags=0)

The compile() function takes a given regular expression pattern and compiles it into a regular expression object used in finding a match in a string or text. It also accepts a flag as an optional second argument. This method is useful because the regex object can be assigned to a variable and used later in our Python code. Always remember to use a raw string r"..." when creating a Python regex object.

Here’s an example of how it works:

regex_object = re.compile(r'b[ae]t')

mo = regex_object.search('I bet, you would not let a bat be your president')

print(regex_object)

>>>>

re.compile('b[ae]t')

re.fullmatch(pattern, string, flags=0)

This function takes two arguments: a string passed as a regular expression pattern, a string to search, and an optional flag argument. A match object is returned if the entire string matches the given regex pattern. If there’s no match, it returns None:

regex_object = re.compile(r'Tech is the future')

mo = regex_object.fullmatch('Tech is the future, join now')

print(mo)

print([mo.group](http://mo.group)())

>>>>

None

Traceback (most recent call last):

File "/home/ini/Dev./sitepoint/regex.py", line 16, in <module>

print([mo.group](http://mo.group)())

AttributeError: 'NoneType' object has no attribute 'group'

The code raises an AttributeError, because there’s no string matching.

re.findall(pattern, string, flags=0)

The findall() function returns a list of all match objects found in a given string. It traverses the string left to right, until all matches are returned. See the code snippet below:

regex_object = re.compile(r'[A-Z]\w+')

mo = regex_object.findall('Pick out all the Words that Begin with a Capital letter')

print(mo)

>>>>

['Pick', 'Words', 'Begin', 'Capital']

In the code snippet above, the regex consists of a character class and a word character, which ensures that the matched substring begins with a capital letter.

re.sub(pattern, repl, string, count=0, flags=0)

Parts of a string can be substituted with another substring with the help of the sub() function. It takes at least three arguments: the search pattern, the replacement string, and the string to be worked on. The original string is returned unchanged if no matches are found. Without passing a count argument, by default the function finds one or more occurrences of the regular expression and replaces all the matches.

Here’s an example:

regex_object = re.compile(r'disagreed')

mo = regex_object.sub('agreed',"The founder and the CEO disagreed on the company's new direction, the investors disagreed too.")

print(mo)

>>>>

The founder and the CEO agreed on the company's new direction, the investors agreed too.

subn(pattern, repl, string, count=0, flags=0)

The subn() function performs the same operation as sub(), but it returns a tuple with the string and number of replacement done. See the code snippet below:

regex_object = re.compile(r'disagreed')

mo = regex_object.subn('agreed',"The founder and the CEO disagreed on the company's new direction, the investors disagreed too.")

print(mo)

>>>>

("The founder and the CEO agreed on the company's new direction, the investors agreed too.", 2)

Match Objects and Methods

A match object is returned when a regex pattern matches a given string in the regex object’s search() or match() method. Match objects have several methods that prove useful while maneuvering regex in Python.

Match.group([group1, …])

This method returns one or more subgroups of a match object. A single argument will return a signal subgroup; multiple arguments will return multiple subgroups, based on their indexes. By default, the group() method returns the entire match substring. When the argument in the group() is more than or less than the subgroups, an IndexError exception is thrown.

Here’s an example:

regex_object = re.compile(r'(\+\d{3}) (\d{2} \d{3} \d{4})')

mo = regex_object.search('Pick out the country code from the phone number: +233 54 502 9074')

print([mo.group](http://mo.group)(1))

>>>>

+233

The argument 1 passed into the group(1) method — as seen in the above example — picks out the country code for Ghana +233. Calling the method without an argument or 0 as an argument returns all subgroups of the match object:

regex_object = re.compile(r'(\+\d{3}) (\d{2} \d{3} \d{4})')

mo = regex_object.search('Pick out the phone number: +233 54 502 9074')

print([mo.group](http://mo.group)())

>>>>

+233 54 502 9074

Match.groups(default=None)

groups() returns a tuple of subgroups that match the given string. Regex pattern groups are always captured with parentheses — () — and these groups are returned when there’s a match, as elements in a tuple:

regex_object = re.compile(r'(\+\d{3}) (\d{2}) (\d{3}) (\d{4})')

mo = regex_object.search('Pick out the phone number: +233 54 502 9074')

print(mo.groups())

>>>>

('+233', '54', '502', '9074')

Match.start([group]) & Match.end([group])

The start() method returns the start index, while the end() method returns the end index of the match object:

regex_object = re.compile(r'\s\w+')

mo = regex_object.search('Match any word after a space')

print('Match begins at', mo.start(), 'and ends', mo.end())

print([mo.group](http://mo.group)())

>>>>

Match begins at 5 and ends 9

any

The example above has a regex pattern for matching any word character after a whitespace. A match was found — ' any' — starting from position 5 and ending at 9.

Pattern.search(string[, pos[, endpos]])

The pos value indicates the index position where the search for a match object should begin. endpos indicates where the search for a match should stop. The value for both pos and endpos can be passed as arguments in the search() or match() methods after the string. This is how it works:

regex_object = re.compile(r'[a-z]+[0-9]')

mo = regex_object.search('find the alphanumeric character python3 in the string', 20 , 30)

print([mo.group](http://mo.group)())

>>>>

python3

The code above picks out any alphanumeric character in the search string.

The search begins at string index position of 20 and stops at 30.

re Regex Flags

Python allows the use of flags when using re module methods like search() and match(), which gives more context to regular expressions. The flags are optional arguments that specify how the Python regex engine finds a match object.

re.I (re.IGNORECASE)

This flag is used when performing a case-insentive match. The regex engine will ignore uppercase or lowercase variation of regular expression patterns:

regex_object = [re.search](http://re.search)('django', 'My tech stack comprises of python, Django, MySQL, AWS, React', re.I)

print(regex_object.group())

>>>>

Django

The re.I ensures that a match object is found, regardless of whether it’s in uppercase or lowercase.

re.S (re.DOTALL)

The '.' special character matches any character except a newline. Introducing this flag will also match a newline in a block of text or string. See the example below:

regex_object= [re.search](http://re.search)('.+', 'What is your favourite coffee flavor \nI prefer the Mocha')

print(regex_object.group())

>>>>

What is your favourite coffee flavor

The '.' character only finds a match from the beginning of the string and stops at the newline. Introducing the re.DOTALL flag will match a newline character. See the example below:

regex_object= [re.search](http://re.search)('.+', 'What is your favourite coffee flavor \nI prefer the Mocha', re.S)

print(regex_object.group())

>>>>

What is your favourite coffee flavor

I prefer the Mocha

re.M (re.MULTILINE)

By default the '^' special character only matches the beginning of a string. With this flag introduced, the function searches for a match at the beginning of each line. The '$' character only matches patterns at the end of the string. But the re.M flag ensures it also finds matches at the end of each line:

regex_object = [re.search](http://re.search)('^J\w+', 'Popular programming languages in 2022: \nPython \nJavaScript \nJava \nRust \nRuby', re.M)

print(regex_object.group())

>>>>

JavaScript

re.X (re.VERBOSE)

Sometimes, Python regex patterns can get long and messy. The re.X flag helps out when we need to add comments within our regex pattern. We can use the ''' string format to create a multiline regex with comments:

email_regex = [re.search](http://re.search)(r'''

[a-zA-Z0-9._%+-]+ # username composed of alphanumeric characters

@ # @ symbol

[a-zA-Z0-9.-]+ # domain name has word characters

(\.[a-zA-Z]{2,4}) # dot-something

''', 'extract the email address in this string [kwekujohnson1@gmail.com](mailto:kwekujohnson1@gmail.com) and send an email', re.X)

print(email_regex.group())

>>>>

[kwekujohnson1@gmail.com](mailto:kwekujohnson1@gmail.com)

Practical Examples of Regex in Python

Let’s now dive in to some more practical examples.

Python password strength test regex

One of the most popular use cases for regular expressions is to test for password strength. When signing up for any new account, there’s a check to ensure we input an appropriate combination of letters, numbers, and characters to ensure a strong password.

Here’s a sample regex pattern for checking password strength:

password_regex = re.match(r"""

^(?=.*?[A-Z]) # this ensures user inputs at least one uppercase letter

(?=.*?[a-z]) # this ensures user inputs at least one lowercase letter

(?=.*?[0-9]) # this ensures user inputs at least one digit

(?=.*?[#?!@$%^&*-]) # this ensures user inputs one special character

.{8,}$ #this ensures that password is at least 8 characters long

""", '@Sit3po1nt', re.X)

print('Your password is' ,password_regex.group())

>>>>

Your password is @Sit3po1nt

Note the use of '^' and '$' to ensure the input string (password) is a regex match.

Python search and replace in file regex

Here’s our goal for this example:

  • Create a file ‘pangram.txt’.
  • Add a simple some text to file, "The five boxing wizards climb quickly."
  • Write a simple Python regex to search and replace “climb” to “jump” so we have a pangram.

Here’s some code for doing that:

#importing the regex module

import re

file_path="pangram.txt"

text="climb"

subs="jump"

#defining the replace method

def search_and_replace(filePath, text, subs, flags=0):

with open(file_path, "r+") as file:

#read the file contents

file_contents = [file.read](http://file.read)()

text_pattern = re.compile(re.escape(text), flags)

file_contents = text_pattern.sub(subs, file_contents)

[file.seek](http://file.seek)(0)

file.truncate()

file.write(file_contents)

#calling the search_and_replace method

search_and_replace(file_path, text, subs)

Python web scraping regex

Sometimes you might need to harvest some data on the Internet or automate simple tasks like web scraping. Regular expressions are very useful when extracting certain data online. Below is an example:

import urllib.request

phone_number_regex = r'\(\d{3}\) \d{3}-\d{4}'

url = 'https://www.summet.com/dmsi/html/codesamples/addresses.html'

# get response

response = urllib.request.urlopen(url)

# convert response to string

string_object = [response.read](http://response.read)().decode("utf8")

# use regex to extract phone numbers

regex_object = re.compile(phone_regex)

mo = regex_object.findall(string_object)

# print top 5 phone numbers

print(mo[: 5])

>>>>

['(257) 563-7401', '(372) 587-2335', '(786) 713-8616', '(793) 151-6230', '(492) 709-6392']

Conclusion

Regular expressions can vary from simple to complex. They’re a vital part of programming, as the examples above demonstrate. To better understand regex in Python, it’s good to begin by getting familiar with things like character classes, special characters, anchors, and grouping constructs.

There’s a lot further we can go to deepen our understanding of regex in Python. The Python re module makes it easier to get up and running quickly.

Regex significantly reduces the amount of code we need write to do things like validate input and implement search algorithms.

It’s also good to be able to answer questions about the use of regular expressions, as they often come up in technical interviews for software engineers and developers.

Original article source at: https://www.sitepoint.com/

#python #functions 

Learn Python Regex Functions, with Examples

Why call C functions from Rust?

The Rust FFI and the bindgen utility are well designed for making Rust calls out to C libraries. Rust talks readily to C and thereby to any other language that talks to C.

Why call C functions from Rust? The short answer is software libraries. A longer answer touches on where C stands among programming languages in general and towards Rust in particular. C, C++, and Rust are systems languages, which give programmers access to machine-level data types and operations. Among these three systems languages, C remains the dominant one. The kernels of modern operating systems are written mainly in C, with assembly language accounting for the rest. The standard system libraries for input and output, number crunching, cryptography, security, networking, internationalization, string processing, memory management, and more, are likewise written mostly in C. These libraries represent a vast infrastructure for applications written in any other language. Rust is well along the way to providing fine libraries of its own, but C libraries—​around since the 1970s and still growing—​are a resource not to be ignored. Finally, C is still the lingua franca among programming languages: most languages can talk to C and, through C, to any other language that does so.

Two proof-of-concept examples

Rust has an FFI (Foreign Function Interface) that supports calls to C functions. An issue for any FFI is whether the calling language covers the data types in the called language. For example, ctypes is an FFI for calls from Python into C, but Python doesn't cover the unsigned integer types available in C. As a result, ctypes must resort to workarounds.

By contrast, Rust covers all the primitive (that is, machine-level) types in C. For example, the Rust i32 type matches the C int type. C specifies only that the char type must be one byte in size and other types, such as int, must be at least this size; but nowadays every reasonable C compiler supports a four-byte int, an eight-byte double (in Rust, the f64 type), and so on.

There is another challenge for an FFI directed at C: Can the FFI handle C's raw pointers, including pointers to arrays that count as strings in C? C does not have a string type, but rather implements strings as character arrays with a non-printing terminating character, the null terminator of legend. By contrast, Rust has two string types: String and &str (string slice). The question, then, is whether the Rust FFI can transform a C string into a Rust one—​and the answer is yes.

Pointers to structures also are common in C. The reason is efficiency. By default, a C structure is passed by value (that is, by a byte-per-byte copy) when a structure is either an argument passed to a function or a value returned from one. C structures, like their Rust counterparts, can include arrays and nest other structures and so be arbitrarily large in size. Best practice in either language is to pass and return structures by reference, that is, by passing or returning the structure's address rather than a copy of the structure. Once again, the Rust FFI is up to the task of handling C pointers to structures, which are common in C libraries.

The first code example focuses on calls to relatively simple C library functions such as abs (absolute value) and sqrt (square root). These functions take non-pointer scalar arguments and return a non-pointer scalar value. The second code example, which covers strings and pointers to structures, introduces the bindgen utility, which generates Rust code from C interface (header) files such as math.h and time.h. C header files specify the calling syntax for C functions and define structures used in such calls. The two code examples are available on my homepage.

Calling relatively simple C functions

The first code example has four Rust calls to C functions in the standard mathematics library: one call apiece to abs (absolute value) and pow (exponentiation), and two calls to sqrt (square root). The program can be built directly with the rustc compiler, or more conveniently with the cargo build command:

use std::os::raw::c_int;    // 32 bits
use std::os::raw::c_double; // 64 bits

// Import three functions from the standard library libc.
// Here are the Rust declarations for the C functions:
extern "C" {
    fn abs(num: c_int) -> c_int;
    fn sqrt(num: c_double) -> c_double;
    fn pow(num: c_double, power: c_double) -> c_double;
}

fn main() {
    let x: i32 = -123;
    println!("\nAbsolute value of {x}: {}.",
             unsafe { abs(x) });

    let n: f64 = 9.0;
    let p: f64 = 3.0;
    println!("\n{n} raised to {p}: {}.",
             unsafe { pow(n, p) });

    let mut y: f64 = 64.0;
    println!("\nSquare root of {y}: {}.",
             unsafe { sqrt(y) });
    y = -3.14;
    println!("\nSquare root of {y}: {}.",
             unsafe { sqrt(y) }); //** NaN = NotaNumber
}

The two use declarations at the top are for the Rust data types c_int and c_double, which match the C types int and double, respectively. The standard Rust module std::os::raw defines fourteen such types for C compatibility. The module std::ffi has the same fourteen type definitions together with support for strings.

The extern "C" block above the main function then declares the three C library functions called in the main function below. Each call uses the standard C function's name, but each call must occur within an unsafe block. As every programmer new to Rust discovers, the Rust compiler enforces memory safety with a vengeance. Other languages (in particular, C and C++) do not make the same guarantees. The unsafe block thus says: Rust takes no responsibility for whatever unsafe operations may occur in the external call.

The first program's output is:

Absolute value of -123: 123.
9 raised to 3: 729
Square root of 64: 8.
Square root of -3.14: NaN.

In the last output line, the NaN stands for Not a Number: the C sqrt library function expects a non-negative value as its argument, which means that the argument -3.14 generates NaN as the returned value.

Calling C functions involving pointers

C library functions in security, networking, string processing, memory management, and other areas regularly use pointers for efficiency. For example, the library function asctime (time as an ASCII string) expects a pointer to a structure as its single argument. A Rust call to a C function such as asctime is thus trickier than a call to sqrt, which involves neither pointers nor structures.

The C structure for the asctime function call is of type struct tm. A pointer to such a structure also is passed to library function mktime (make a time value). The structure breaks a time into units such as the year, the month, the hour, and so forth. The structure's fields are of type time_t, an alias for for either int (32 bits) or long (64 bits). The two library functions combine these broken-apart time pieces into a single value: asctime returns a string representation of the time, whereas mktime returns a time_t value that represents the number of elapsed seconds since the epoch, which is a time relative to which a system's clock and timestamp are determined. Typical epoch settings are January 1 00:00:00 (zero hours, minutes, and seconds) of either 1900 or 1970.

The C program below calls asctime and mktime, and uses another library function strftime to convert the mktime returned value into a formatted string. This program acts as a warm-up for the Rust version:

#include <stdio.h>
#include <time.h>

int main () {
  struct tm sometime;  /* time broken out in detail */
  char buffer[80];
  int utc;

  sometime.tm_sec = 1;
  sometime.tm_min = 1;
  sometime.tm_hour = 1;
  sometime.tm_mday = 1;
  sometime.tm_mon = 1;
  sometime.tm_year = 1;
  sometime.tm_hour = 1;
  sometime.tm_wday = 1;
  sometime.tm_yday = 1;

  printf("Date and time: %s\n", asctime(&sometime));

  utc = mktime(&sometime);
  if( utc < 0 ) {
    fprintf(stderr, "Error: unable to make time using mktime\n");
  } else {
    printf("The integer value returned: %d\n", utc);
    strftime(buffer, sizeof(buffer), "%c", &sometime);
    printf("A more readable version: %s\n", buffer);
  }

  return 0;
}

The program outputs:

Date and time: Fri Feb  1 01:01:01 1901
The integer value returned: 2120218157
A more readable version: Fri Feb  1 01:01:01 1901

In summary, the Rust calls to library functions asctime and mktime must deal with two issues:

Passing a raw pointer as the single argument to each library function.

Converting the C string returned from asctime into a Rust string.

Rust calls to asctime and mktime

The bindgen utility generates Rust support code from C header files such as math.h and time.h. In this example, a simplified version of time.h will do but with two changes from the original:

The built-in type int is used instead of the alias type time_t. The bindgen utility can handle the time_t type but generates some distracting warnings along the way because time_t does not follow Rust naming conventions: in time_t an underscore separates the t at the end from the time that comes first; Rust would prefer a CamelCase name such as TimeT.

The type struct tm type is given StructTM as an alias for the same reason.

Here is the simplified header file with declarations for mktime and asctime at the bottom:

typedef struct tm {
    int tm_sec;    /* seconds */
    int tm_min;    /* minutes */
    int tm_hour;   /* hours */
    int tm_mday;   /* day of the month */
    int tm_mon;    /* month */
    int tm_year;   /* year */
    int tm_wday;   /* day of the week */
    int tm_yday;   /* day in the year */
    int tm_isdst;  /* daylight saving time */
} StructTM;

extern int mktime(StructTM*);
extern char* asctime(StructTM*);

With bindgen installed, % as the command-line prompt, and mytime.h as the header file above, the following command generates the required Rust code and saves it in the file mytime.rs:

% bindgen mytime.h > mytime.rs

Here is the relevant part of mytime.rs:

/* automatically generated by rust-bindgen 0.61.0 */

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct tm {
    pub tm_sec: ::std::os::raw::c_int,
    pub tm_min: ::std::os::raw::c_int,
    pub tm_hour: ::std::os::raw::c_int,
    pub tm_mday: ::std::os::raw::c_int,
    pub tm_mon: ::std::os::raw::c_int,
    pub tm_year: ::std::os::raw::c_int,
    pub tm_wday: ::std::os::raw::c_int,
    pub tm_yday: ::std::os::raw::c_int,
    pub tm_isdst: ::std::os::raw::c_int,
}

pub type StructTM = tm;

extern "C" {
    pub fn mktime(arg1: *mut StructTM) -> ::std::os::raw::c_int;
}

extern "C" {
    pub fn asctime(arg1: *mut StructTM) -> *mut ::std::os::raw::c_char;
}

#[test]
fn bindgen_test_layout_tm() {
    const UNINIT: ::std::mem::MaybeUninit<tm> =
       ::std::mem::MaybeUninit::uninit();
    let ptr = UNINIT.as_ptr();
    assert_eq!(
        ::std::mem::size_of::<tm>(),
        36usize,
        concat!("Size of: ", stringify!(tm))
    );
    ...

The Rust structure struct tm, like the C original, contains nine 4-byte integer fields. The field names are the same in C and Rust. The extern "C" blocks declare the library functions asctime and mktime as taking one argument apiece, a raw pointer to a mutable StructTM instance. (The library functions may mutate the structure via the pointer passed as an argument.)

The remaining code, under the #[test] attribute, tests the layout of the Rust version of the time structure. The test can be run with the cargo test command. At issue is that C does not specify how the compiler must lay out the fields of a structure. For example, the C struct tm starts out with the field tm_sec for the second; but C does not require that the compiled version has this field as the first. In any case, the Rust tests should succeed and the Rust calls to the library functions should work as expected.

Getting the second example up and running

The code generated from bindgen does not include a main function and, therefore, is a natural module. Below is the main function with the StructTM initialization and the calls to asctime and mktime:

mod mytime;
use mytime::*;
use std::ffi::CStr;

fn main() {
    let mut sometime  = StructTM {
        tm_year: 1,
        tm_mon: 1,
        tm_mday: 1,
        tm_hour: 1,
        tm_min: 1,
        tm_sec: 1,
        tm_isdst: -1,
        tm_wday: 1,
        tm_yday: 1
    };

    unsafe {
        let c_ptr = &mut sometime; // raw pointer

        // make the call, convert and then own
        // the returned C string
        let char_ptr = asctime(c_ptr);
        let c_str = CStr::from_ptr(char_ptr);
        println!("{:#?}", c_str.to_str());

        let utc = mktime(c_ptr);
        println!("{}", utc);
    }
}

The Rust code can be compiled (using either rustc directly or cargo) and then run. The output is:

Ok(
    "Mon Feb  1 01:01:01 1901\n",
)
2120218157

The calls to the C functions asctime and mktime again must occur inside an unsafe block, as the Rust compiler cannot be held responsible for any memory-safety mischief in these external functions. For the record, asctime and mktime are well behaved. In the calls to both functions, the argument is the raw pointer ptr, which holds the (stack) address of the sometime structure.

The call to asctime is the trickier of the two calls because this function returns a pointer to a C char, the character M in Mon of the text output. Yet the Rust compiler does not know where the C string (the null-terminated array of char) is stored. In the static area of memory? On the heap? The array used by the asctime function to store the text representation of the time is, in fact, in the static area of memory. In any case, the C-to-Rust string conversion is done in two steps to avoid compile-time errors:

The call Cstr::from_ptr(char_ptr) converts the C string to a Rust string and returns a reference stored in the c_str variable.

The call to c_str.to_str() ensures that c_str is the owner.

The Rust code does not generate a human-readable version of the integer value returned from mktime, which is left as an exercise for the interested. The Rust module chrono::format includes a strftime function, which can be used like the C function of the same name to get a text representation of the time.

Calling C with FFI and bindgen

The Rust FFI and the bindgen utility are well designed for making Rust calls out to C libraries, whether standard or third-party. Rust talks readily to C and thereby to any other language that talks to C. For calling relatively simple library functions such as sqrt, the Rust FFI is straightforward because Rust's primitive data types cover their C counterparts.

For more complicated interchanges—​in particular, Rust calls to C library functions such as asctime and mktime that involve structures and pointers—​the bindgen utility is the way to go. This utility generates the support code together with appropriate tests. Of course, the Rust compiler cannot assume that C code measures up to Rust standards when it comes to memory safety; hence, calls from Rust to C must occur in unsafe blocks.

Original article source at: https://opensource.com/

#rust #c #functions 

Why call C functions from Rust?

How to Use Modern C++ to Eliminate Virtual Functions

Using Modern C++ to Eliminate Virtual Functions. You'll understanding of the purposes of virtual functions and the mechanisms in modern C++ that can now be used to achieve those same purposes. 

As of C++20, there are no cases in which statically linked programs require virtual functions. This talk will explore techniques for replacing runtime polymorphism with compile-time polymorphism such that virtual functions are never necessary. This talk will also address the higher-order concern of when it might make sense to avoid virtual functions or remove them from a codebase, as that decision ultimately is a design decision that only the author of the code can make. Attendees can expect to come away with a stronger understanding of the purposes of virtual functions and the mechanisms in modern C++ that can now be used to achieve those same purposes.

#cplusplus #cpp #programming #functions

How to Use Modern C++ to Eliminate Virtual Functions

How to Use Rust Calls To C Library Functions

The Rust FFI and the bindgen utility are well designed for making Rust calls out to C libraries. Rust talks readily to C and thereby to any other language that talks to C.

Why call C functions from Rust? The short answer is software libraries. A longer answer touches on where C stands among programming languages in general and towards Rust in particular. C, C++, and Rust are systems languages, which give programmers access to machine-level data types and operations. Among these three systems languages, C remains the dominant one. The kernels of modern operating systems are written mainly in C, with assembly language accounting for the rest. The standard system libraries for input and output, number crunching, cryptography, security, networking, internationalization, string processing, memory management, and more, are likewise written mostly in C. These libraries represent a vast infrastructure for applications written in any other language. Rust is well along the way to providing fine libraries of its own, but C libraries—​around since the 1970s and still growing—​are a resource not to be ignored. Finally, C is still the lingua franca among programming languages: most languages can talk to C and, through C, to any other language that does so.

Two proof-of-concept examples

Rust has an FFI (Foreign Function Interface) that supports calls to C functions. An issue for any FFI is whether the calling language covers the data types in the called language. For example, ctypes is an FFI for calls from Python into C, but Python doesn't cover the unsigned integer types available in C. As a result, ctypes must resort to workarounds.

By contrast, Rust covers all the primitive (that is, machine-level) types in C. For example, the Rust i32 type matches the C int type. C specifies only that the char type must be one byte in size and other types, such as int, must be at least this size; but nowadays every reasonable C compiler supports a four-byte int, an eight-byte double (in Rust, the f64 type), and so on.

There is another challenge for an FFI directed at C: Can the FFI handle C's raw pointers, including pointers to arrays that count as strings in C? C does not have a string type, but rather implements strings as character arrays with a non-printing terminating character, the null terminator of legend. By contrast, Rust has two string types: String and &str (string slice). The question, then, is whether the Rust FFI can transform a C string into a Rust one—​and the answer is yes.

Pointers to structures also are common in C. The reason is efficiency. By default, a C structure is passed by value (that is, by a byte-per-byte copy) when a structure is either an argument passed to a function or a value returned from one. C structures, like their Rust counterparts, can include arrays and nest other structures and so be arbitrarily large in size. Best practice in either language is to pass and return structures by reference, that is, by passing or returning the structure's address rather than a copy of the structure. Once again, the Rust FFI is up to the task of handling C pointers to structures, which are common in C libraries.

The first code example focuses on calls to relatively simple C library functions such as abs (absolute value) and sqrt (square root). These functions take non-pointer scalar arguments and return a non-pointer scalar value. The second code example, which covers strings and pointers to structures, introduces the bindgen utility, which generates Rust code from C interface (header) files such as math.h and time.h. C header files specify the calling syntax for C functions and define structures used in such calls. The two code examples are available on my homepage.

Calling relatively simple C functions

The first code example has four Rust calls to C functions in the standard mathematics library: one call apiece to abs (absolute value) and pow (exponentiation), and two calls to sqrt (square root). The program can be built directly with the rustc compiler, or more conveniently with the cargo build command:

use std::os::raw::c_int;    // 32 bits
use std::os::raw::c_double; // 64 bits

// Import three functions from the standard library libc.
// Here are the Rust declarations for the C functions:
extern "C" {
    fn abs(num: c_int) -> c_int;
    fn sqrt(num: c_double) -> c_double;
    fn pow(num: c_double, power: c_double) -> c_double;
}

fn main() {
    let x: i32 = -123;
    println!("\nAbsolute value of {x}: {}.",
             unsafe { abs(x) });

    let n: f64 = 9.0;
    let p: f64 = 3.0;
    println!("\n{n} raised to {p}: {}.",
             unsafe { pow(n, p) });

    let mut y: f64 = 64.0;
    println!("\nSquare root of {y}: {}.",
             unsafe { sqrt(y) });
    y = -3.14;
    println!("\nSquare root of {y}: {}.",
             unsafe { sqrt(y) }); //** NaN = NotaNumber
}

The two use declarations at the top are for the Rust data types c_int and c_double, which match the C types int and double, respectively. The standard Rust module std::os::raw defines fourteen such types for C compatibility. The module std::ffi has the same fourteen type definitions together with support for strings.

The extern "C" block above the main function then declares the three C library functions called in the main function below. Each call uses the standard C function's name, but each call must occur within an unsafe block. As every programmer new to Rust discovers, the Rust compiler enforces memory safety with a vengeance. Other languages (in particular, C and C++) do not make the same guarantees. The unsafe block thus says: Rust takes no responsibility for whatever unsafe operations may occur in the external call.

The first program's output is:

Absolute value of -123: 123.
9 raised to 3: 729
Square root of 64: 8.
Square root of -3.14: NaN.

In the last output line, the NaN stands for Not a Number: the C sqrt library function expects a non-negative value as its argument, which means that the argument -3.14 generates NaN as the returned value.

Calling C functions involving pointers

C library functions in security, networking, string processing, memory management, and other areas regularly use pointers for efficiency. For example, the library function asctime (time as an ASCII string) expects a pointer to a structure as its single argument. A Rust call to a C function such as asctime is thus trickier than a call to sqrt, which involves neither pointers nor structures.

The C structure for the asctime function call is of type struct tm. A pointer to such a structure also is passed to library function mktime (make a time value). The structure breaks a time into units such as the year, the month, the hour, and so forth. The structure's fields are of type time_t, an alias for for either int (32 bits) or long (64 bits). The two library functions combine these broken-apart time pieces into a single value: asctime returns a string representation of the time, whereas mktime returns a time_t value that represents the number of elapsed seconds since the epoch, which is a time relative to which a system's clock and timestamp are determined. Typical epoch settings are January 1 00:00:00 (zero hours, minutes, and seconds) of either 1900 or 1970.

The C program below calls asctime and mktime, and uses another library function strftime to convert the mktime returned value into a formatted string. This program acts as a warm-up for the Rust version:

#include <stdio.h>
#include <time.h>

int main () {
  struct tm sometime;  /* time broken out in detail */
  char buffer[80];
  int utc;

  sometime.tm_sec = 1;
  sometime.tm_min = 1;
  sometime.tm_hour = 1;
  sometime.tm_mday = 1;
  sometime.tm_mon = 1;
  sometime.tm_year = 1;
  sometime.tm_hour = 1;
  sometime.tm_wday = 1;
  sometime.tm_yday = 1;

  printf("Date and time: %s\n", asctime(&sometime));

  utc = mktime(&sometime);
  if( utc < 0 ) {
    fprintf(stderr, "Error: unable to make time using mktime\n");
  } else {
    printf("The integer value returned: %d\n", utc);
    strftime(buffer, sizeof(buffer), "%c", &sometime);
    printf("A more readable version: %s\n", buffer);
  }

  return 0;
}

The program outputs:

Date and time: Fri Feb  1 01:01:01 1901
The integer value returned: 2120218157
A more readable version: Fri Feb  1 01:01:01 1901

In summary, the Rust calls to library functions asctime and mktime must deal with two issues:

Passing a raw pointer as the single argument to each library function.

Converting the C string returned from asctime into a Rust string.

Rust calls to asctime and mktime

The bindgen utility generates Rust support code from C header files such as math.h and time.h. In this example, a simplified version of time.h will do but with two changes from the original:

The built-in type int is used instead of the alias type time_t. The bindgen utility can handle the time_t type but generates some distracting warnings along the way because time_t does not follow Rust naming conventions: in time_t an underscore separates the t at the end from the time that comes first; Rust would prefer a CamelCase name such as TimeT.

The type struct tm type is given StructTM as an alias for the same reason.

Here is the simplified header file with declarations for mktime and asctime at the bottom:

typedef struct tm {
    int tm_sec;    /* seconds */
    int tm_min;    /* minutes */
    int tm_hour;   /* hours */
    int tm_mday;   /* day of the month */
    int tm_mon;    /* month */
    int tm_year;   /* year */
    int tm_wday;   /* day of the week */
    int tm_yday;   /* day in the year */
    int tm_isdst;  /* daylight saving time */
} StructTM;

extern int mktime(StructTM*);
extern char* asctime(StructTM*);

With bindgen installed, % as the command-line prompt, and mytime.h as the header file above, the following command generates the required Rust code and saves it in the file mytime.rs:

% bindgen mytime.h > mytime.rs

Here is the relevant part of mytime.rs:

/* automatically generated by rust-bindgen 0.61.0 */

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct tm {
    pub tm_sec: ::std::os::raw::c_int,
    pub tm_min: ::std::os::raw::c_int,
    pub tm_hour: ::std::os::raw::c_int,
    pub tm_mday: ::std::os::raw::c_int,
    pub tm_mon: ::std::os::raw::c_int,
    pub tm_year: ::std::os::raw::c_int,
    pub tm_wday: ::std::os::raw::c_int,
    pub tm_yday: ::std::os::raw::c_int,
    pub tm_isdst: ::std::os::raw::c_int,
}

pub type StructTM = tm;

extern "C" {
    pub fn mktime(arg1: *mut StructTM) -> ::std::os::raw::c_int;
}

extern "C" {
    pub fn asctime(arg1: *mut StructTM) -> *mut ::std::os::raw::c_char;
}

#[test]
fn bindgen_test_layout_tm() {
    const UNINIT: ::std::mem::MaybeUninit<tm> =
       ::std::mem::MaybeUninit::uninit();
    let ptr = UNINIT.as_ptr();
    assert_eq!(
        ::std::mem::size_of::<tm>(),
        36usize,
        concat!("Size of: ", stringify!(tm))
    );
    ...

The Rust structure struct tm, like the C original, contains nine 4-byte integer fields. The field names are the same in C and Rust. The extern "C" blocks declare the library functions asctime and mktime as taking one argument apiece, a raw pointer to a mutable StructTM instance. (The library functions may mutate the structure via the pointer passed as an argument.)

The remaining code, under the #[test] attribute, tests the layout of the Rust version of the time structure. The test can be run with the cargo test command. At issue is that C does not specify how the compiler must lay out the fields of a structure. For example, the C struct tm starts out with the field tm_sec for the second; but C does not require that the compiled version has this field as the first. In any case, the Rust tests should succeed and the Rust calls to the library functions should work as expected.

Getting the second example up and running

The code generated from bindgen does not include a main function and, therefore, is a natural module. Below is the main function with the StructTM initialization and the calls to asctime and mktime:

mod mytime;
use mytime::*;
use std::ffi::CStr;

fn main() {
    let mut sometime  = StructTM {
        tm_year: 1,
        tm_mon: 1,
        tm_mday: 1,
        tm_hour: 1,
        tm_min: 1,
        tm_sec: 1,
        tm_isdst: -1,
        tm_wday: 1,
        tm_yday: 1
    };

    unsafe {
        let c_ptr = &mut sometime; // raw pointer

        // make the call, convert and then own
        // the returned C string
        let char_ptr = asctime(c_ptr);
        let c_str = CStr::from_ptr(char_ptr);
        println!("{:#?}", c_str.to_str());

        let utc = mktime(c_ptr);
        println!("{}", utc);
    }
}

The Rust code can be compiled (using either rustc directly or cargo) and then run. The output is:

Ok(
    "Mon Feb  1 01:01:01 1901\n",
)
2120218157

The calls to the C functions asctime and mktime again must occur inside an unsafe block, as the Rust compiler cannot be held responsible for any memory-safety mischief in these external functions. For the record, asctime and mktime are well behaved. In the calls to both functions, the argument is the raw pointer ptr, which holds the (stack) address of the sometime structure.

The call to asctime is the trickier of the two calls because this function returns a pointer to a C char, the character M in Mon of the text output. Yet the Rust compiler does not know where the C string (the null-terminated array of char) is stored. In the static area of memory? On the heap? The array used by the asctime function to store the text representation of the time is, in fact, in the static area of memory. In any case, the C-to-Rust string conversion is done in two steps to avoid compile-time errors:

The call Cstr::from_ptr(char_ptr) converts the C string to a Rust string and returns a reference stored in the c_str variable.

The call to c_str.to_str() ensures that c_str is the owner.

The Rust code does not generate a human-readable version of the integer value returned from mktime, which is left as an exercise for the interested. The Rust module chrono::format includes a strftime function, which can be used like the C function of the same name to get a text representation of the time.

Calling C with FFI and bindgen

The Rust FFI and the bindgen utility are well designed for making Rust calls out to C libraries, whether standard or third-party. Rust talks readily to C and thereby to any other language that talks to C. For calling relatively simple library functions such as sqrt, the Rust FFI is straightforward because Rust's primitive data types cover their C counterparts.

For more complicated interchanges—​in particular, Rust calls to C library functions such as asctime and mktime that involve structures and pointers—​the bindgen utility is the way to go. This utility generates the support code together with appropriate tests. Of course, the Rust compiler cannot assume that C code measures up to Rust standards when it comes to memory safety; hence, calls from Rust to C must occur in unsafe blocks.

Original article source at: https://opensource.com/

#rust #c #functions 

How to Use Rust Calls To C Library Functions

How to Make PHP Functions Return an Array

Learn how you can make PHP functions return an array with code examples

A PHP function can return only one value using the return statement.

The return value from a PHP function can be of any type, including arrays and objects.

To return an array from a PHP function:

  • you can create the array next to the return keyword
  • you can return a variable of array type

Here’s an example of making a PHP function return an array:

<?php
// 👇 1. return a literal array
function get_names() {
    return ["Nathan", "Jenny"];
}

$names = get_names();
print $names[0]. "\r\n"; // Nathan
print $names[1]. "\r\n"; // Jenny

// 👇 2. return an array variable
function get_weathers() {
    $weathers = ["Sunny", "Cloudy", "Rainy"];
    return $weathers;
}

$var = get_weathers();
print $var[0]. "\r\n"; // Sunny
print $var[1]; // Cloudy
?>

As you can see, it’s very easy to make a PHP function returns an array.

Original article source at: https://sebhastian.com/

#php #array #functions 

How to Make PHP Functions Return an Array