1677899040
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.10 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files. You can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:
pip install git+https://github.com/openai/whisper.git
To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust
installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during the pip install
command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH
environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH"
. If the installation fails with No module named 'setuptools_rust'
, you need to install setuptools_rust
, e.g. by running:
pip install setuptools-rust
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
base | 74 M | base.en | base | ~1 GB | ~16x |
small | 244 M | small.en | small | ~2 GB | ~6x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1550 M | N/A | large | ~10 GB | 1x |
The .en
models for English-only applications tend to perform better, especially for the tiny.en
and base.en
models. We observed that the difference becomes less significant for the small.en
and medium.en
models.
Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the large-v2
model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D in the paper. The smaller, the better.
The following command will transcribe speech in audio files, using the medium
model:
whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small
model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language
option:
whisper japanese.wav --language Japanese
Adding --task translate
will translate the speech into English:
whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:
whisper --help
See tokenizer.py for the list of all available languages.
Transcription can also be performed within Python:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe()
method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of whisper.detect_language()
and whisper.decode()
which provide lower-level access to the model.
import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
Please use the š Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.
Author: Openai
Source Code: https://github.com/openai/whisper
License: MIT license
1676685540
This is a Python wrapper for TA-LIB based on Cython instead of SWIG. From the homepage:
TA-Lib is widely used by trading software developers requiring to perform technical analysis of financial market data.
- Includes 150+ indicators such as ADX, MACD, RSI, Stochastic, Bollinger Bands, etc.
- Candlestick pattern recognition
- Open-source API for C/C++, Java, Perl, Python and 100% Managed .NET
The original Python bindings included with TA-Lib use SWIG which unfortunately are difficult to install and aren't as efficient as they could be. Therefore this project uses Cython and Numpy to efficiently and cleanly bind to TA-Lib -- producing results 2-4 times faster than the SWIG interface.
In addition, this project also supports the use of the Polars and Pandas libraries.
You can install from PyPI:
$ python3 -m pip install TA-Lib
Or checkout the sources and run setup.py
yourself:
$ python setup.py install
It also appears possible to install via Conda Forge:
$ conda install -c conda-forge ta-lib
To use TA-Lib for python, you need to have the TA-Lib already installed. You should probably follow their installation directions for your platform, but some suggestions are included below for reference.
Some Conda Forge users have reported success installing the underlying TA-Lib C library using the libta-lib package:
$ conda install -c conda-forge libta-lib
Mac OS X
You can simply install using Homebrew:
$ brew install ta-lib
If you are using Apple Silicon, such as the M1 processors, and building mixed architecture Homebrew projects, you might want to make sure it's being built for your architecture:
$ arch -arm64 brew install ta-lib
And perhaps you can set these before installing with pip
:
$ export TA_INCLUDE_PATH="$(brew --prefix ta-lib)/include"
$ export TA_LIBRARY_PATH="$(brew --prefix ta-lib)/lib"
You might also find this helpful, particularly if you have tried several different installations without success:
$ your-arm64-python -m pip install --no-cache-dir ta-lib
Windows
Download ta-lib-0.4.0-msvc.zip and unzip to C:\ta-lib
.
This is a 32-bit binary release. If you want to use 64-bit Python, you will need to build a 64-bit version of the library. Some unofficial (and unsupported) instructions for building on 64-bit Windows 10, here for reference:
- Download and Unzip
ta-lib-0.4.0-msvc.zip
- Move the Unzipped Folder
ta-lib
toC:\
- Download and Install Visual Studio Community (2015 or later)
- Remember to Select
[Visual C++]
Feature- Build TA-Lib Library
- From Windows Start Menu, Start
[VS2015 x64 Native Tools Command Prompt]
- Move to
C:\ta-lib\c\make\cdr\win32\msvc
- Build the Library
nmake
You might also try these unofficial windows binaries for both 32-bit and 64-bit:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#ta-lib
Linux
Download ta-lib-0.4.0-src.tar.gz and:
$ tar -xzf ta-lib-0.4.0-src.tar.gz
$ cd ta-lib/
$ ./configure --prefix=/usr
$ make
$ sudo make install
If you build
TA-Lib
usingmake -jX
it will fail but that's OK! Simply rerunmake -jX
followed by[sudo] make install
.
Note: if your directory path includes spaces, the installation will probably fail with No such file or directory
errors.
If you get a warning that looks like this:
setup.py:79: UserWarning: Cannot find ta-lib library, installation may fail.
warnings.warn('Cannot find ta-lib library, installation may fail.')
This typically means setup.py
can't find the underlying TA-Lib
library, a dependency which needs to be installed.
If you installed the underlying TA-Lib
library with a custom prefix (e.g., with ./configure --prefix=$PREFIX
), then when you go to install this python wrapper you can specify additional search paths to find the library and include files for the underlying TA-Lib
library using the TA_LIBRARY_PATH
and TA_INCLUDE_PATH
environment variables:
$ export TA_LIBRARY_PATH=$PREFIX/lib
$ export TA_INCLUDE_PATH=$PREFIX/include
$ python setup.py install # or pip install ta-lib
Sometimes installation will produce build errors like this:
talib/_ta_lib.c:601:10: fatal error: ta-lib/ta_defs.h: No such file or directory
601 | #include "ta-lib/ta_defs.h"
| ^~~~~~~~~~~~~~~~~~
compilation terminated.
or:
common.obj : error LNK2001: unresolved external symbol TA_SetUnstablePeriod
common.obj : error LNK2001: unresolved external symbol TA_Shutdown
common.obj : error LNK2001: unresolved external symbol TA_Initialize
common.obj : error LNK2001: unresolved external symbol TA_GetUnstablePeriod
common.obj : error LNK2001: unresolved external symbol TA_GetVersionString
This typically means that it can't find the underlying TA-Lib
library, a dependency which needs to be installed. On Windows, this could be caused by installing the 32-bit binary distribution of the underlying TA-Lib
library, but trying to use it with 64-bit Python.
Sometimes installation will fail with errors like this:
talib/common.c:8:22: fatal error: pyconfig.h: No such file or directory
#include "pyconfig.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
This typically means that you need the Python headers, and should run something like:
$ sudo apt-get install python3-dev
Sometimes building the underlying TA-Lib
library has errors running make
that look like this:
../libtool: line 1717: cd: .libs/libta_lib.lax/libta_abstract.a: No such file or directory
make[2]: *** [libta_lib.la] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1
This might mean that the directory path to the underlying TA-Lib
library has spaces in the directory names. Try putting it in a path that does not have any spaces and trying again.
Sometimes you might get this error running setup.py
:
/usr/include/limits.h:26:10: fatal error: bits/libc-header-start.h: No such file or directory
#include <bits/libc-header-start.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~
This is likely an issue with trying to compile for 32-bit platform but without the appropriate headers. You might find some success looking at the first answer to this question.
If you get an error on macOS like this:
code signature in <141BC883-189B-322C-AE90-CBF6B5206F67>
'python3.9/site-packages/talib/_ta_lib.cpython-39-darwin.so' not valid for
use in process: Trying to load an unsigned library)
You might look at this question and use xcrun codesign
to fix it.
If you wonder why STOCHRSI
gives you different results than you expect, probably you want STOCH
applied to RSI
, which is a little different than the STOCHRSI
which is STOCHF
applied to RSI
:
>>> import talib
>>> import numpy as np
>>> c = np.random.randn(100)
# this is the library function
>>> k, d = talib.STOCHRSI(c)
# this produces the same result, calling STOCHF
>>> rsi = talib.RSI(c)
>>> k, d = talib.STOCHF(rsi, rsi, rsi)
# you might want this instead, calling STOCH
>>> rsi = talib.RSI(c)
>>> k, d = talib.STOCH(rsi, rsi, rsi)
If the build appears to hang, you might be running on a VM with not enough memory -- try 1 GB or 2 GB.
If you get "permission denied" errors such as this, you might need to give your user access to the location where the underlying TA-Lib C library is installed -- or install it to a user-accessible location.
talib/_ta_lib.c:747:28: fatal error: /usr/include/ta-lib/ta_defs.h: Permission denied
#include "ta-lib/ta-defs.h"
^
compilation terminated
error: command 'gcc' failed with exit status 1
Similar to TA-Lib, the Function API provides a lightweight wrapper of the exposed TA-Lib indicators.
Each function returns an output array and have default values for their parameters, unless specified as keyword arguments. Typically, these functions will have an initial "lookback" period (a required number of observations before an output is generated) set to NaN
.
For convenience, the Function API supports both numpy.ndarray
and pandas.Series
and polars.Series
inputs.
All of the following examples use the Function API:
import numpy as np
import talib
close = np.random.random(100)
Calculate a simple moving average of the close prices:
output = talib.SMA(close)
Calculating bollinger bands, with triple exponential moving average:
from talib import MA_Type
upper, middle, lower = talib.BBANDS(close, matype=MA_Type.T3)
Calculating momentum of the close prices, with a time period of 5:
output = talib.MOM(close, timeperiod=5)
NaN's
The underlying TA-Lib C library handles NaN's in a sometimes surprising manner by typically propagating NaN's to the end of the output, for example:
>>> c = np.array([1.0, 2.0, 3.0, np.nan, 4.0, 5.0, 6.0])
>>> talib.SMA(c, 3)
array([nan, nan, 2., nan, nan, nan, nan])
You can compare that to a Pandas rolling mean, where their approach is to output NaN until enough "lookback" values are observed to generate new outputs:
>>> c = pandas.Series([1.0, 2.0, 3.0, np.nan, 4.0, 5.0, 6.0])
>>> c.rolling(3).mean()
0 NaN
1 NaN
2 2.0
3 NaN
4 NaN
5 NaN
6 5.0
dtype: float64
If you're already familiar with using the function API, you should feel right at home using the Abstract API.
Every function takes a collection of named inputs, either a dict
of numpy.ndarray
or pandas.Series
or polars.Series
, or a pandas.DataFrame
or polars.DataFrame
. If a pandas.DataFrame
or polars.DataFrame
is provided, the output is returned as the same type with named output columns.
For example, inputs could be provided for the typical "OHLCV" data:
import numpy as np
# note that all ndarrays must be the same length!
inputs = {
'open': np.random.random(100),
'high': np.random.random(100),
'low': np.random.random(100),
'close': np.random.random(100),
'volume': np.random.random(100)
}
Functions can either be imported directly or instantiated by name:
from talib import abstract
# directly
SMA = abstract.SMA
# or by name
SMA = abstract.Function('sma')
From there, calling functions is basically the same as the function API:
from talib.abstract import *
# uses close prices (default)
output = SMA(inputs, timeperiod=25)
# uses open prices
output = SMA(inputs, timeperiod=25, price='open')
# uses close prices (default)
upper, middle, lower = BBANDS(inputs, 20, 2, 2)
# uses high, low, close (default)
slowk, slowd = STOCH(inputs, 5, 3, 0, 3, 0) # uses high, low, close by default
# uses high, low, open instead
slowk, slowd = STOCH(inputs, 5, 3, 0, 3, 0, prices=['high', 'low', 'open'])
An experimental Streaming API was added that allows users to compute the latest value of an indicator. This can be faster than using the Function API, for example in an application that receives streaming data, and wants to know just the most recent updated indicator value.
import talib
from talib import stream
close = np.random.random(100)
# the Function API
output = talib.SMA(close)
# the Streaming API
latest = stream.SMA(close)
# the latest value is the same as the last output value
assert (output[-1] - latest) < 0.00001
We can show all the TA functions supported by TA-Lib, either as a list
or as a dict
sorted by group (e.g. "Overlap Studies", "Momentum Indicators", etc):
import talib
# list of functions
print talib.get_functions()
# dict of functions by group
print talib.get_function_groups()
Overlap Studies
BBANDS Bollinger Bands
DEMA Double Exponential Moving Average
EMA Exponential Moving Average
HT_TRENDLINE Hilbert Transform - Instantaneous Trendline
KAMA Kaufman Adaptive Moving Average
MA Moving average
MAMA MESA Adaptive Moving Average
MAVP Moving average with variable period
MIDPOINT MidPoint over period
MIDPRICE Midpoint Price over period
SAR Parabolic SAR
SAREXT Parabolic SAR - Extended
SMA Simple Moving Average
T3 Triple Exponential Moving Average (T3)
TEMA Triple Exponential Moving Average
TRIMA Triangular Moving Average
WMA Weighted Moving Average
Momentum Indicators
ADX Average Directional Movement Index
ADXR Average Directional Movement Index Rating
APO Absolute Price Oscillator
AROON Aroon
AROONOSC Aroon Oscillator
BOP Balance Of Power
CCI Commodity Channel Index
CMO Chande Momentum Oscillator
DX Directional Movement Index
MACD Moving Average Convergence/Divergence
MACDEXT MACD with controllable MA type
MACDFIX Moving Average Convergence/Divergence Fix 12/26
MFI Money Flow Index
MINUS_DI Minus Directional Indicator
MINUS_DM Minus Directional Movement
MOM Momentum
PLUS_DI Plus Directional Indicator
PLUS_DM Plus Directional Movement
PPO Percentage Price Oscillator
ROC Rate of change : ((price/prevPrice)-1)*100
ROCP Rate of change Percentage: (price-prevPrice)/prevPrice
ROCR Rate of change ratio: (price/prevPrice)
ROCR100 Rate of change ratio 100 scale: (price/prevPrice)*100
RSI Relative Strength Index
STOCH Stochastic
STOCHF Stochastic Fast
STOCHRSI Stochastic Relative Strength Index
TRIX 1-day Rate-Of-Change (ROC) of a Triple Smooth EMA
ULTOSC Ultimate Oscillator
WILLR Williams' %R
Volume Indicators
AD Chaikin A/D Line
ADOSC Chaikin A/D Oscillator
OBV On Balance Volume
Cycle Indicators
HT_DCPERIOD Hilbert Transform - Dominant Cycle Period
HT_DCPHASE Hilbert Transform - Dominant Cycle Phase
HT_PHASOR Hilbert Transform - Phasor Components
HT_SINE Hilbert Transform - SineWave
HT_TRENDMODE Hilbert Transform - Trend vs Cycle Mode
Price Transform
AVGPRICE Average Price
MEDPRICE Median Price
TYPPRICE Typical Price
WCLPRICE Weighted Close Price
Volatility Indicators
ATR Average True Range
NATR Normalized Average True Range
TRANGE True Range
Pattern Recognition
CDL2CROWS Two Crows
CDL3BLACKCROWS Three Black Crows
CDL3INSIDE Three Inside Up/Down
CDL3LINESTRIKE Three-Line Strike
CDL3OUTSIDE Three Outside Up/Down
CDL3STARSINSOUTH Three Stars In The South
CDL3WHITESOLDIERS Three Advancing White Soldiers
CDLABANDONEDBABY Abandoned Baby
CDLADVANCEBLOCK Advance Block
CDLBELTHOLD Belt-hold
CDLBREAKAWAY Breakaway
CDLCLOSINGMARUBOZU Closing Marubozu
CDLCONCEALBABYSWALL Concealing Baby Swallow
CDLCOUNTERATTACK Counterattack
CDLDARKCLOUDCOVER Dark Cloud Cover
CDLDOJI Doji
CDLDOJISTAR Doji Star
CDLDRAGONFLYDOJI Dragonfly Doji
CDLENGULFING Engulfing Pattern
CDLEVENINGDOJISTAR Evening Doji Star
CDLEVENINGSTAR Evening Star
CDLGAPSIDESIDEWHITE Up/Down-gap side-by-side white lines
CDLGRAVESTONEDOJI Gravestone Doji
CDLHAMMER Hammer
CDLHANGINGMAN Hanging Man
CDLHARAMI Harami Pattern
CDLHARAMICROSS Harami Cross Pattern
CDLHIGHWAVE High-Wave Candle
CDLHIKKAKE Hikkake Pattern
CDLHIKKAKEMOD Modified Hikkake Pattern
CDLHOMINGPIGEON Homing Pigeon
CDLIDENTICAL3CROWS Identical Three Crows
CDLINNECK In-Neck Pattern
CDLINVERTEDHAMMER Inverted Hammer
CDLKICKING Kicking
CDLKICKINGBYLENGTH Kicking - bull/bear determined by the longer marubozu
CDLLADDERBOTTOM Ladder Bottom
CDLLONGLEGGEDDOJI Long Legged Doji
CDLLONGLINE Long Line Candle
CDLMARUBOZU Marubozu
CDLMATCHINGLOW Matching Low
CDLMATHOLD Mat Hold
CDLMORNINGDOJISTAR Morning Doji Star
CDLMORNINGSTAR Morning Star
CDLONNECK On-Neck Pattern
CDLPIERCING Piercing Pattern
CDLRICKSHAWMAN Rickshaw Man
CDLRISEFALL3METHODS Rising/Falling Three Methods
CDLSEPARATINGLINES Separating Lines
CDLSHOOTINGSTAR Shooting Star
CDLSHORTLINE Short Line Candle
CDLSPINNINGTOP Spinning Top
CDLSTALLEDPATTERN Stalled Pattern
CDLSTICKSANDWICH Stick Sandwich
CDLTAKURI Takuri (Dragonfly Doji with very long lower shadow)
CDLTASUKIGAP Tasuki Gap
CDLTHRUSTING Thrusting Pattern
CDLTRISTAR Tristar Pattern
CDLUNIQUE3RIVER Unique 3 River
CDLUPSIDEGAP2CROWS Upside Gap Two Crows
CDLXSIDEGAP3METHODS Upside/Downside Gap Three Methods
Statistic Functions
BETA Beta
CORREL Pearson's Correlation Coefficient (r)
LINEARREG Linear Regression
LINEARREG_ANGLE Linear Regression Angle
LINEARREG_INTERCEPT Linear Regression Intercept
LINEARREG_SLOPE Linear Regression Slope
STDDEV Standard Deviation
TSF Time Series Forecast
VAR Variance
Author: TA-Lib
Source Code: https://github.com/TA-Lib/ta-lib-python
License: View license
#machinelearning #python #cython #finance #pattern #recognition
1675185679
flutter_plugin
A new flutter plugin project.
This project is a starting point for a Flutter plug-in package, a specialized package that includes platform-specific implementation code for Android and/or iOS.
For help getting started with Flutter, view our online documentation, which offers tutorials, samples, guidance on mobile development, and a full API reference.
Run this command:
With Flutter:
$ flutter pub add flutter_image_recognition
This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get
):
dependencies:
flutter_image_recognition: ^0.0.1
Alternatively, your editor might support flutter pub get
. Check the docs for your editor to learn more.
Now in your Dart code, you can use:
import 'package:flutter_image_recognition/flutter_plugin.dart';
import 'package:flutter/material.dart';
import 'dart:async';
import 'package:flutter/services.dart';
import 'package:flutter_plugin/flutter_plugin.dart';
void main() {
runApp(const MyApp());
}
class MyApp extends StatefulWidget {
const MyApp({Key? key}) : super(key: key);
@override
State<MyApp> createState() => _MyAppState();
}
class _MyAppState extends State<MyApp> {
String _platformVersion = 'Unknown';
@override
void initState() {
super.initState();
initPlatformState();
}
// Platform messages are asynchronous, so we initialize in an async method.
Future<void> initPlatformState() async {
String platformVersion;
// Platform messages may fail, so we use a try/catch PlatformException.
// We also handle the message potentially returning null.
try {
platformVersion =
await FlutterPlugin.platformVersion ?? 'Unknown platform version';
} on PlatformException {
platformVersion = 'Failed to get platform version.';
}
// If the widget was removed from the tree while the asynchronous platform
// message was in flight, we want to discard the reply rather than calling
// setState to update our non-existent appearance.
if (!mounted) return;
setState(() {
_platformVersion = platformVersion;
});
}
@override
Widget build(BuildContext context) {
return MaterialApp(
home: Scaffold(
appBar: AppBar(
title: const Text('Plugin example app'),
),
body: Center(
child: Column(
children: [
Text('Running on: $_platformVersion\n'),
MaterialButton(
onPressed: () async {
String? message = await FlutterPlugin.sayHello('请诓Hello');
print('čæęÆiOSčæåēę°ę®ļ¼$message');
},
child: const Text('sayHello'),
color: Colors.yellowAccent,
)
],
),
),
),
);
}
}
Download Details:
Author: Lawlignt
Source Code: https://github.com/Lawlignt/flutter_plugin
1674142680
JavaScript face recognition API for the browser and nodejs implemented on top of tensorflow.js core (tensorflow/tfjs-core)
Running the Examples
Clone the repository:
git clone https://github.com/justadudewhohacks/face-api.js.git
cd face-api.js/examples/examples-browser
npm i
npm start
Browse to http://localhost:3000/.
cd face-api.js/examples/examples-nodejs
npm i
Now run one of the examples using ts-node:
ts-node faceDetection.ts
Or simply compile and run them with node:
tsc faceDetection.ts
node faceDetection.js
face-api.js for the Browser
Simply include the latest script from dist/face-api.js.
Or install it via npm:
npm i face-api.js
face-api.js for Nodejs
We can use the equivalent API in a nodejs environment by polyfilling some browser specifics, such as HTMLImageElement, HTMLCanvasElement and ImageData. The easiest way to do so is by installing the node-canvas package.
Alternatively you can simply construct your own tensors from image data and pass tensors as inputs to the API.
Furthermore you want to install @tensorflow/tfjs-node (not required, but highly recommended), which speeds things up drastically by compiling and binding to the native Tensorflow C++ library:
npm i face-api.js canvas @tensorflow/tfjs-node
Now we simply monkey patch the environment to use the polyfills:
// import nodejs bindings to native tensorflow,
// not required, but will speed up things drastically (python required)
import '@tensorflow/tfjs-node';
// implements nodejs wrappers for HTMLCanvasElement, HTMLImageElement, ImageData
import * as canvas from 'canvas';
import * as faceapi from 'face-api.js';
// patch nodejs environment, we need to provide an implementation of
// HTMLCanvasElement and HTMLImageElement
const { Canvas, Image, ImageData } = canvas
faceapi.env.monkeyPatch({ Canvas, Image, ImageData })
Getting Started
All global neural network instances are exported via faceapi.nets:
console.log(faceapi.nets)
// ageGenderNet
// faceExpressionNet
// faceLandmark68Net
// faceLandmark68TinyNet
// faceRecognitionNet
// ssdMobilenetv1
// tinyFaceDetector
// tinyYolov2
To load a model, you have to provide the corresponding manifest.json file as well as the model weight files (shards) as assets. Simply copy them to your public or assets folder. The manifest.json and shard files of a model have to be located in the same directory / accessible under the same route.
Assuming the models reside in public/models:
await faceapi.nets.ssdMobilenetv1.loadFromUri('/models')
// accordingly for the other models:
// await faceapi.nets.faceLandmark68Net.loadFromUri('/models')
// await faceapi.nets.faceRecognitionNet.loadFromUri('/models')
// ...
In a nodejs environment you can furthermore load the models directly from disk:
await faceapi.nets.ssdMobilenetv1.loadFromDisk('./models')
You can also load the model from a tf.NamedTensorMap:
await faceapi.nets.ssdMobilenetv1.loadFromWeightMap(weightMap)
Alternatively, you can also create own instances of the neural nets:
const net = new faceapi.SsdMobilenetv1()
await net.loadFromUri('/models')
You can also load the weights as a Float32Array (in case you want to use the uncompressed models):
// using fetch
net.load(await faceapi.fetchNetWeights('/models/face_detection_model.weights'))
// using axios
const res = await axios.get('/models/face_detection_model.weights', { responseType: 'arraybuffer' })
const weights = new Float32Array(res.data)
net.load(weights)
In the following input can be an HTML img, video or canvas element or the id of that element.
<img id="myImg" src="images/example.png" />
<video id="myVideo" src="media/example.mp4" />
<canvas id="myCanvas" />
const input = document.getElementById('myImg')
// const input = document.getElementById('myVideo')
// const input = document.getElementById('myCanvas')
// or simply:
// const input = 'myImg'
Detect all faces in an image. Returns Array<FaceDetection>:
const detections = await faceapi.detectAllFaces(input)
Detect the face with the highest confidence score in an image. Returns FaceDetection | undefined:
const detection = await faceapi.detectSingleFace(input)
By default detectAllFaces and detectSingleFace utilize the SSD Mobilenet V1 Face Detector. You can specify the face detector by passing the corresponding options object:
const detections1 = await faceapi.detectAllFaces(input, new faceapi.SsdMobilenetv1Options())
const detections2 = await faceapi.detectAllFaces(input, new faceapi.TinyFaceDetectorOptions())
You can tune the options of each face detector as shown here.
After face detection, we can furthermore predict the facial landmarks for each detected face as follows:
Detect all faces in an image + computes 68 Point Face Landmarks for each detected face. Returns Array<WithFaceLandmarks<WithFaceDetection<{}>>>:
const detectionsWithLandmarks = await faceapi.detectAllFaces(input).withFaceLandmarks()
Detect the face with the highest confidence score in an image + computes 68 Point Face Landmarks for that face. Returns WithFaceLandmarks<WithFaceDetection<{}>> | undefined:
const detectionWithLandmarks = await faceapi.detectSingleFace(input).withFaceLandmarks()
You can also specify to use the tiny model instead of the default model:
const useTinyModel = true
const detectionsWithLandmarks = await faceapi.detectAllFaces(input).withFaceLandmarks(useTinyModel)
After face detection and facial landmark prediction the face descriptors for each face can be computed as follows:
Detect all faces in an image + compute 68 Point Face Landmarks for each detected face. Returns Array<WithFaceDescriptor<WithFaceLandmarks<WithFaceDetection<{}>>>>:
const results = await faceapi.detectAllFaces(input).withFaceLandmarks().withFaceDescriptors()
Detect the face with the highest confidence score in an image + compute 68 Point Face Landmarks and face descriptor for that face. Returns WithFaceDescriptor<WithFaceLandmarks<WithFaceDetection<{}>>> | undefined:
const result = await faceapi.detectSingleFace(input).withFaceLandmarks().withFaceDescriptor()
Face expression recognition can be performed for detected faces as follows:
Detect all faces in an image + recognize face expressions of each face. Returns Array<WithFaceExpressions<WithFaceLandmarks<WithFaceDetection<{}>>>>:
const detectionsWithExpressions = await faceapi.detectAllFaces(input).withFaceLandmarks().withFaceExpressions()
Detect the face with the highest confidence score in an image + recognize the face expressions for that face. Returns WithFaceExpressions<WithFaceLandmarks<WithFaceDetection<{}>>> | undefined:
const detectionWithExpressions = await faceapi.detectSingleFace(input).withFaceLandmarks().withFaceExpressions()
You can also skip .withFaceLandmarks(), which will skip the face alignment step (less stable accuracy):
Detect all faces without face alignment + recognize face expressions of each face. Returns Array<WithFaceExpressions<WithFaceDetection<{}>>>:
const detectionsWithExpressions = await faceapi.detectAllFaces(input).withFaceExpressions()
Detect the face with the highest confidence score without face alignment + recognize the face expression for that face. Returns WithFaceExpressions<WithFaceDetection<{}>> | undefined:
const detectionWithExpressions = await faceapi.detectSingleFace(input).withFaceExpressions()
Age estimation and gender recognition from detected faces can be done as follows:
Detect all faces in an image + estimate age and recognize gender of each face. Returns Array<WithAge<WithGender<WithFaceLandmarks<WithFaceDetection<{}>>>>>:
const detectionsWithAgeAndGender = await faceapi.detectAllFaces(input).withFaceLandmarks().withAgeAndGender()
Detect the face with the highest confidence score in an image + estimate age and recognize gender for that face. Returns WithAge<WithGender<WithFaceLandmarks<WithFaceDetection<{}>>>> | undefined:
const detectionWithAgeAndGender = await faceapi.detectSingleFace(input).withFaceLandmarks().withAgeAndGender()
You can also skip .withFaceLandmarks(), which will skip the face alignment step (less stable accuracy):
Detect all faces without face alignment + estimate age and recognize gender of each face. Returns Array<WithAge<WithGender<WithFaceDetection<{}>>>>:
const detectionsWithAgeAndGender = await faceapi.detectAllFaces(input).withAgeAndGender()
Detect the face with the highest confidence score without face alignment + estimate age and recognize gender for that face. Returns WithAge<WithGender<WithFaceDetection<{}>>> | undefined:
const detectionWithAgeAndGender = await faceapi.detectSingleFace(input).withAgeAndGender()
Tasks can be composed as follows:
// all faces
await faceapi.detectAllFaces(input)
await faceapi.detectAllFaces(input).withFaceExpressions()
await faceapi.detectAllFaces(input).withFaceLandmarks()
await faceapi.detectAllFaces(input).withFaceLandmarks().withFaceExpressions()
await faceapi.detectAllFaces(input).withFaceLandmarks().withFaceExpressions().withFaceDescriptors()
await faceapi.detectAllFaces(input).withFaceLandmarks().withAgeAndGender().withFaceDescriptors()
await faceapi.detectAllFaces(input).withFaceLandmarks().withFaceExpressions().withAgeAndGender().withFaceDescriptors()
// single face
await faceapi.detectSingleFace(input)
await faceapi.detectSingleFace(input).withFaceExpressions()
await faceapi.detectSingleFace(input).withFaceLandmarks()
await faceapi.detectSingleFace(input).withFaceLandmarks().withFaceExpressions()
await faceapi.detectSingleFace(input).withFaceLandmarks().withFaceExpressions().withFaceDescriptor()
await faceapi.detectSingleFace(input).withFaceLandmarks().withAgeAndGender().withFaceDescriptor()
await faceapi.detectSingleFace(input).withFaceLandmarks().withFaceExpressions().withAgeAndGender().withFaceDescriptor()
To perform face recognition, one can use faceapi.FaceMatcher to compare reference face descriptors to query face descriptors.
First, we initialize the FaceMatcher with the reference data, for example we can simply detect faces in a referenceImage and match the descriptors of the detected faces to faces of subsequent images:
const results = await faceapi
.detectAllFaces(referenceImage)
.withFaceLandmarks()
.withFaceDescriptors()
if (!results.length) {
return
}
// create FaceMatcher with automatically assigned labels
// from the detection results for the reference image
const faceMatcher = new faceapi.FaceMatcher(results)
Now we can recognize a persons face shown in queryImage1:
const singleResult = await faceapi
.detectSingleFace(queryImage1)
.withFaceLandmarks()
.withFaceDescriptor()
if (singleResult) {
const bestMatch = faceMatcher.findBestMatch(singleResult.descriptor)
console.log(bestMatch.toString())
}
Or we can recognize all faces shown in queryImage2:
const results = await faceapi
.detectAllFaces(queryImage2)
.withFaceLandmarks()
.withFaceDescriptors()
results.forEach(fd => {
const bestMatch = faceMatcher.findBestMatch(fd.descriptor)
console.log(bestMatch.toString())
})
You can also create labeled reference descriptors as follows:
const labeledDescriptors = [
new faceapi.LabeledFaceDescriptors(
'obama',
[descriptorObama1, descriptorObama2]
),
new faceapi.LabeledFaceDescriptors(
'trump',
[descriptorTrump]
)
]
const faceMatcher = new faceapi.FaceMatcher(labeledDescriptors)
Preparing the overlay canvas:
const displaySize = { width: input.width, height: input.height }
// resize the overlay canvas to the input dimensions
const canvas = document.getElementById('overlay')
faceapi.matchDimensions(canvas, displaySize)
face-api.js predefines some highlevel drawing functions, which you can utilize:
/* Display detected face bounding boxes */
const detections = await faceapi.detectAllFaces(input)
// resize the detected boxes in case your displayed image has a different size than the original
const resizedDetections = faceapi.resizeResults(detections, displaySize)
// draw detections into the canvas
faceapi.draw.drawDetections(canvas, resizedDetections)
/* Display face landmarks */
const detectionsWithLandmarks = await faceapi
.detectAllFaces(input)
.withFaceLandmarks()
// resize the detected boxes and landmarks in case your displayed image has a different size than the original
const resizedResults = faceapi.resizeResults(detectionsWithLandmarks, displaySize)
// draw detections into the canvas
faceapi.draw.drawDetections(canvas, resizedResults)
// draw the landmarks into the canvas
faceapi.draw.drawFaceLandmarks(canvas, resizedResults)
/* Display face expression results */
const detectionsWithExpressions = await faceapi
.detectAllFaces(input)
.withFaceLandmarks()
.withFaceExpressions()
// resize the detected boxes and landmarks in case your displayed image has a different size than the original
const resizedResults = faceapi.resizeResults(detectionsWithExpressions, displaySize)
// draw detections into the canvas
faceapi.draw.drawDetections(canvas, resizedResults)
// draw a textbox displaying the face expressions with minimum probability into the canvas
const minProbability = 0.05
faceapi.draw.drawFaceExpressions(canvas, resizedResults, minProbability)
You can also draw boxes with custom text (DrawBox):
const box = { x: 50, y: 50, width: 100, height: 100 }
// see DrawBoxOptions below
const drawOptions = {
label: 'Hello I am a box!',
lineWidth: 2
}
const drawBox = new faceapi.draw.DrawBox(box, drawOptions)
drawBox.draw(document.getElementById('myCanvas'))
DrawBox drawing options:
export interface IDrawBoxOptions {
boxColor?: string
lineWidth?: number
drawLabelOptions?: IDrawTextFieldOptions
label?: string
}
Finally you can draw custom text fields (DrawTextField):
const text = [
'This is a textline!',
'This is another textline!'
]
const anchor = { x: 200, y: 200 }
// see DrawTextField below
const drawOptions = {
anchorPosition: 'TOP_LEFT',
backgroundColor: 'rgba(0, 0, 0, 0.5)'
}
const drawBox = new faceapi.draw.DrawTextField(text, anchor, drawOptions)
drawBox.draw(document.getElementById('myCanvas'))
DrawTextField drawing options:
export interface IDrawTextFieldOptions {
anchorPosition?: AnchorPosition
backgroundColor?: string
fontColor?: string
fontSize?: number
fontStyle?: string
padding?: number
}
export enum AnchorPosition {
TOP_LEFT = 'TOP_LEFT',
TOP_RIGHT = 'TOP_RIGHT',
BOTTOM_LEFT = 'BOTTOM_LEFT',
BOTTOM_RIGHT = 'BOTTOM_RIGHT'
}
export interface ISsdMobilenetv1Options {
// minimum confidence threshold
// default: 0.5
minConfidence?: number
// maximum number of faces to return
// default: 100
maxResults?: number
}
// example
const options = new faceapi.SsdMobilenetv1Options({ minConfidence: 0.8 })
export interface ITinyFaceDetectorOptions {
// size at which image is processed, the smaller the faster,
// but less precise in detecting smaller faces, must be divisible
// by 32, common sizes are 128, 160, 224, 320, 416, 512, 608,
// for face tracking via webcam I would recommend using smaller sizes,
// e.g. 128, 160, for detecting smaller faces use larger sizes, e.g. 512, 608
// default: 416
inputSize?: number
// minimum confidence threshold
// default: 0.5
scoreThreshold?: number
}
// example
const options = new faceapi.TinyFaceDetectorOptions({ inputSize: 320 })
export interface IBox {
x: number
y: number
width: number
height: number
}
export interface IFaceDetection {
score: number
box: Box
}
export interface IFaceLandmarks {
positions: Point[]
shift: Point
}
export type WithFaceDetection<TSource> = TSource & {
detection: FaceDetection
}
export type WithFaceLandmarks<TSource> = TSource & {
unshiftedLandmarks: FaceLandmarks
landmarks: FaceLandmarks
alignedRect: FaceDetection
}
export type WithFaceDescriptor<TSource> = TSource & {
descriptor: Float32Array
}
export type WithFaceExpressions<TSource> = TSource & {
expressions: FaceExpressions
}
export type WithAge<TSource> = TSource & {
age: number
}
export type WithGender<TSource> = TSource & {
gender: Gender
genderProbability: number
}
export enum Gender {
FEMALE = 'female',
MALE = 'male'
}
Instead of using the high level API, you can directly use the forward methods of each neural network:
const detections1 = await faceapi.ssdMobilenetv1(input, options)
const detections2 = await faceapi.tinyFaceDetector(input, options)
const landmarks1 = await faceapi.detectFaceLandmarks(faceImage)
const landmarks2 = await faceapi.detectFaceLandmarksTiny(faceImage)
const descriptor = await faceapi.computeFaceDescriptor(alignedFaceImage)
const regionsToExtract = [
new faceapi.Rect(0, 0, 100, 100)
]
// actually extractFaces is meant to extract face regions from bounding boxes
// but you can also use it to extract any other region
const canvases = await faceapi.extractFaces(input, regionsToExtract)
// ment to be used for computing the euclidean distance between two face descriptors
const dist = faceapi.euclideanDistance([0, 0], [0, 10])
console.log(dist) // 10
const landmarkPositions = landmarks.positions
// or get the positions of individual contours,
// only available for 68 point face ladnamrks (FaceLandmarks68)
const jawOutline = landmarks.getJawOutline()
const nose = landmarks.getNose()
const mouth = landmarks.getMouth()
const leftEye = landmarks.getLeftEye()
const rightEye = landmarks.getRightEye()
const leftEyeBbrow = landmarks.getLeftEyeBrow()
const rightEyeBrow = landmarks.getRightEyeBrow()
<img id="myImg" src="">
const image = await faceapi.fetchImage('/images/example.png')
console.log(image instanceof HTMLImageElement) // true
// displaying the fetched image content
const myImg = document.getElementById('myImg')
myImg.src = image.src
const json = await faceapi.fetchJson('/files/example.json')
<img id="myImg" src="">
<input id="myFileUpload" type="file" onchange="uploadImage()" accept=".jpg, .jpeg, .png">
async function uploadImage() {
const imgFile = document.getElementById('myFileUpload').files[0]
// create an HTMLImageElement from a Blob
const img = await faceapi.bufferToImage(imgFile)
document.getElementById('myImg').src = img.src
}
<img id="myImg" src="images/example.png" />
<video id="myVideo" src="media/example.mp4" />
const canvas1 = faceapi.createCanvasFromMedia(document.getElementById('myImg'))
const canvas2 = faceapi.createCanvasFromMedia(document.getElementById('myVideo'))
Available Models
For face detection, this project implements a SSD (Single Shot Multibox Detector) based on MobileNetV1. The neural net will compute the locations of each face in an image and will return the bounding boxes together with it's probability for each face. This face detector is aiming towards obtaining high accuracy in detecting face bounding boxes instead of low inference time. The size of the quantized model is about 5.4 MB (ssd_mobilenetv1_model).
The face detection model has been trained on the WIDERFACE dataset and the weights are provided by yeephycho in this repo.
The Tiny Face Detector is a very performant, realtime face detector, which is much faster, smaller and less resource consuming compared to the SSD Mobilenet V1 face detector, in return it performs slightly less well on detecting small faces. This model is extremely mobile and web friendly, thus it should be your GO-TO face detector on mobile devices and resource limited clients. The size of the quantized model is only 190 KB (tiny_face_detector_model).
The face detector has been trained on a custom dataset of ~14K images labeled with bounding boxes. Furthermore the model has been trained to predict bounding boxes, which entirely cover facial feature points, thus it in general produces better results in combination with subsequent face landmark detection than SSD Mobilenet V1.
This model is basically an even tinier version of Tiny Yolo V2, replacing the regular convolutions of Yolo with depthwise separable convolutions. Yolo is fully convolutional, thus can easily adapt to different input image sizes to trade off accuracy for performance (inference time).
This package implements a very lightweight and fast, yet accurate 68 point face landmark detector. The default model has a size of only 350kb (face_landmark_68_model) and the tiny model is only 80kb (face_landmark_68_tiny_model). Both models employ the ideas of depthwise separable convolutions as well as densely connected blocks. The models have been trained on a dataset of ~35k face images labeled with 68 face landmark points.
For face recognition, a ResNet-34 like architecture is implemented to compute a face descriptor (a feature vector with 128 values) from any given face image, which is used to describe the characteristics of a persons face. The model is not limited to the set of faces used for training, meaning you can use it for face recognition of any person, for example yourself. You can determine the similarity of two arbitrary faces by comparing their face descriptors, for example by computing the euclidean distance or using any other classifier of your choice.
The neural net is equivalent to the FaceRecognizerNet used in face-recognition.js and the net used in the dlib face recognition example. The weights have been trained by davisking and the model achieves a prediction accuracy of 99.38% on the LFW (Labeled Faces in the Wild) benchmark for face recognition.
The size of the quantized model is roughly 6.2 MB (face_recognition_model).
The face expression recognition model is lightweight, fast and provides reasonable accuracy. The model has a size of roughly 310kb and it employs depthwise separable convolutions and densely connected blocks. It has been trained on a variety of images from publicly available datasets as well as images scraped from the web. Note, that wearing glasses might decrease the accuracy of the prediction results.
The age and gender recognition model is a multitask network, which employs a feature extraction layer, an age regression layer and a gender classifier. The model has a size of roughly 420kb and the feature extractor employs a tinier but very similar architecture to Xception.
This model has been trained and tested on the following databases with an 80/20 train/test split each: UTK, FGNET, Chalearn, Wiki, IMDB*, CACD*, MegaAge, MegaAge-Asian. The *
indicates, that these databases have been algorithmically cleaned up, since the initial databases are very noisy.
Total MAE (Mean Age Error): 4.54
Total Gender Accuracy: 95%
The -
indicates, that there are no gender labels available for these databases.
Database | UTK | FGNET | Chalearn | Wiki | IMDB* | CACD* | MegaAge | MegaAge-Asian |
---|---|---|---|---|---|---|---|---|
MAE | 5.25 | 4.23 | 6.24 | 6.54 | 3.63 | 3.20 | 6.23 | 4.21 |
Gender Accuracy | 0.93 | - | 0.94 | 0.95 | - | 0.97 | - | - |
Age Range | 0 - 3 | 4 - 8 | 9 - 18 | 19 - 28 | 29 - 40 | 41 - 60 | 60 - 80 | 80+ |
---|---|---|---|---|---|---|---|---|
MAE | 1.52 | 3.06 | 4.82 | 4.99 | 5.43 | 4.94 | 6.17 | 9.91 |
Gender Accuracy | 0.69 | 0.80 | 0.88 | 0.96 | 0.97 | 0.97 | 0.96 | 0.9 |
Author: justadudewhohacks
Source Code: https://github.com/justadudewhohacks/face-api.js
License: MIT license
#typescript #javascript #nodejs #tensorflow #face #recognition
1673727660
A practical hand tracking engine.
npm install @handtracking.io/yoha
Please note:
node_modules/@handtracking.io/yoha
since the library needs to download the model files from here. (Webpack Example)Yoha is a hand tracking engine that is built with the goal of being a versatile solution in practical scenarios where hand tracking is employed to add value to an application. While ultimately the goal is to be a general purpose hand tracking engine supporting any hand pose, the engine evolves around specific hand poses that users/developers find useful. These poses are detected by the engine which allows to build applications with meaningful interactions. See the demo for an example.
Yoha is currently in beta.
About the name: Yoha is short for ("Your Hand Tracking").
Yoha is currently available for the web via JavaScript. More languages will be added in the future. If you want to port Yoha to another language and need help feel free reach out.
Yoha was built from scratch. It uses a custom neural network trained using a custom dataset. The backbone for the inference in the browser is currently TensorFlow.js
Your desired pose is not on this list? Feel free to create an issue for it.
Yoha was built with performance in mind. It is able to provide realtime user experience on a broad range of laptops and desktop devices. The performance on mobile devices is not great which hopefuly will change with the further development of inference frameworks like TensorFlow.js
Please note that native inference speed can not be compared with the web inference speed. Differently put, if you were to run Yoha natively it would be much faster than via the web browser.
git clone https://github.com/handtracking-io/yoha && \
cd yoha/example && \
yarn && \
yarn start
git clone https://github.com/handtracking-io/yoha && \
cd yoha && \
./download_models.sh && \
yarn && \
yarn start
Author: Handtracking-io
Source Code: https://github.com/handtracking-io/yoha
License: MIT license
1669021128
A speech recognition software is used to convert verbal language into text form using algorithms. Based on the report by Research and Markets, the speech recognition market will reach from USD 10.70 billion in 2020 to USD 27.155 billion in 2026.
Speech Recognition Software is not just something that is used for complex processes but also availed in day-to-day life. For example, voice assistants are now being used in smartphones, at homes, and even in automated vehicles.
When it comes to businesses, it is used by customer service executives to process routine requests. In the healthcare and legal system, it is used for documentation. Basically, speech recognition software helps companies in improving communication by translating data that is easy to search and manage. There are many advanced solutions that provide AI or biometric speech recognition.
Natural Language Processing or NLP is one part of artificial intelligence that analyzes language data through conversion. Speech recognition and AI play an important role in NLP models. They also help in improving the accuracy of recognizing human language. With the use of AI, speech recognition happens with more accuracy.
Here are the top 7 speech recognition software:
Letās talk about each in detail.
Briana is one of the highly preferred speech recognition software that is adept in handling over 90 languages with a high level of accuracy. It follows an AI-based voice recognition system via which you can control apps and interpret text on any application and website. The best part is that it is compatible with Windows, iOS, and Android. It is available in 3 versions: Braina Lite, Briana PRO, and Briana PRO Lifetime. Whereas the first one is free of cost, the latter two have an annual and lifetime subscription.
This is one of the best software products out there for students, healthcare, and legal professionals. Owned by Nuance, it also supports cloud document management. It has an accuracy rate of 99%. It is available in mobile and desktop versions and supports languages like Dutch, English, Italian, French, German, and Spanish. Its feature of Auto-Text allows you to shorten the entire address in just a single word! When it comes to versions, Dragon Home, Professional firms are chargeable. Users are also given a free 7 days trial to know the software better.
Winscribe is a software owned by the company Nuance and provides documentation workflow management so that users can organize their text. It supports Android, iPhone, and PC devices. It provides fast, easy, and secure documentation solutions. This way it aims at giving professionals more time to do work that adds value to their organization. One thing to note here is that Winscribe is a speech recognition and document management application meant for professionals at medium and large organizations.
I am sure you must be familiar with this one. It is an easy-to-use application for android users. It allows you to dictate text, perform emoji and GIF search when texting, and also offers swipe-style input. Simple, accurate and free of cost ā it works well with both Android and iOS. It also offers the option of personalizing the app so that Gboard can recognize the voice usage patterns of the user and improve on them. This promises higher accuracy over time.
If you are a Windows 10 user, this speech recognition software is included on our PC already, you do not need to install or download it. Its awesome features like accepting punctuation commands, or the commands for moving the cursor within your document, deleting words and more make it an excellent choice for speech recognition software. But yes, the commands are available only in US English. This software supports Mandarin Chinese, English, French, Japanese, Portuguese, and German; to name a few languages.
This is a speech recognition software designed by a company called Inedo. This was built primarily to support Windows. It uses an AI-powered transcription service that can be easily integrated with Zoom or Google Meet to record meetings or webinars. Otter supports a variety of accents, such as: Indian, American, German, Swiss, British, Chinese, etc. This software is available in 3 packages: Essential Otter, Premium Otter, Teams Otter.
A powerful, speech-enabled online notepad, Speechnotes allows users to type at speed of speech and also lets you move from voice to key typing swiftly. This product has been featured on ProductHunt, Techpp.com and other leading magazines loved by tech users. It is powered by Googleās Speech-to-Text engines and makes for a world-class business tool. Its accuracy is measured over 90%. It comes in 2 versions: Basic and Premium. The basic version is free of cost and the premium chrome extension costs USD 9.9. The best part about this software is that it can work on any website, has custom text stamps, and can be exported easily to Google Drive.
With the impact of speech recognition in various fields, and AI in personal lives as well as at work, there is a high demand for AI engineers and machine learning engineers. They are required to build a stronger relationship between the software and humans.
You can begin your journey of AI with the Post Graduate Program in Artificial Learning & Machine Learning: Business Applications, offered by McCombs School of Business at The University of Texas at Austin. This is one of the best artificial intelligence training programs offered by one of the top universities in the world.
The 6-month long online program Master the skills needed to build machine learning and deep learning models with lectures from UT Austin faculty and live mentorship sessions by Industry professionals. You also get access to videos lectures, hands-on practical projects, assignments and quizzes curated to carve your niche in the AI market. At the completion of the online program, youāll also earn a certificate by UT Austin.
Original article source at: https://www.mygreatlearning.com
1646918855
https://www.blog.duomly.com/the-best-image-recognition-apps/
In the age of smartphones and image-centric social media, image recognition apps are becoming more and more popular. These apps allow users to search for and identify specific images and recognize objects or scenes in pictures. While there are many image recognition apps available on the market, the best ones are those that can accurately identify a wide range of images.
If youāre interested in learning more about the best image recognition apps available in 2022, be sure to read our article!
#artificial-intelligence #recognition #python #machine-learning #tensorflow #keras #pandas #numpy #business #startup #startups
1641912180
Library for performing speech recognition, with support for several engines and APIs, online and offline.
Speech recognition engine/API support:
Quickstart: pip install SpeechRecognition
. See the "Installing" section for more details.
To quickly try it out, run python -m speech_recognition
after installing.
Project links:
The library reference documents every publicly accessible object in the library. This document is also included under reference/library-reference.rst
.
See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst
.
See the examples/
directory in the repository root for usage examples:
recognizer_instance.energy_threshold
for details)First, make sure you have all the requirements listed in the "Requirements" section.
The easiest way to install this is using pip install SpeechRecognition
.
Otherwise, download the source distribution from PyPI, and extract the archive.
In the folder, run python setup.py install
.
To use all of the functionality of the library, you should have:
Microphone
)recognizer_instance.recognize_sphinx
)recognizer_instance.recognize_google_cloud
)The following requirements are optional, but can improve or extend functionality in some situations:
recognizer_instance.recognize_bing
) will run slower if you do not have Monotonic for Python 2 installed.The following sections go over the details of each requirement.
The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library.
PyAudio is required if and only if you want to use microphone input (Microphone
). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations.
If not installed, everything in the library will still work, except attempting to instantiate a Microphone
object will raise an AttributeError
.
The installation instructions on the PyAudio website are quite good - for convenience, they are summarized below:
pip install pyaudio
in a terminal.sudo apt-get install python-pyaudio python3-pyaudio
in a terminal.If the version in the repositories is too old, install the latest release using Pip: execute sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio
(replace pip
with pip3
if using Python 3).
brew install portaudio
. Then, install PyAudio using Pip: pip install pyaudio
.portaudio19-dev
and python-all-dev
(or python3-all-dev
if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using Pip: pip install pyaudio
(replace pip
with pip3
if using Python 3).PyAudio wheel packages for common 64-bit Python versions on Windows and Linux are included for convenience, under the third-party/
directory in the repository root. To install, simply run pip install wheel
followed by pip install ./third-party/WHEEL_FILENAME
(replace pip
with pip3
if using Python 3) in the repository root directory.
PocketSphinx-Python is required if and only if you want to use the Sphinx recognizer (recognizer_instance.recognize_sphinx
).
PocketSphinx-Python wheel packages for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the third-party/
directory. To install, simply run pip install wheel
followed by pip install ./third-party/WHEEL_FILENAME
(replace pip
with pip3
if using Python 3) in the SpeechRecognition folder.
On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in Notes on using PocketSphinx for installation instructions.
Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.
See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst
.
Google Cloud Speech library for Python is required if and only if you want to use the Google Cloud Speech API (recognizer_instance.recognize_google_cloud
).
If not installed, everything in the library will still work, except calling recognizer_instance.recognize_google_cloud
will raise an RequestError
.
According to the official installation instructions, the recommended way to install this is using Pip: execute pip install google-cloud-speech
(replace pip
with pip3
if using Python 3).
A FLAC encoder is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), this is already bundled with this library - you do not need to install anything.
Otherwise, ensure that you have the flac
command line tool, which is often available through the system package manager. For example, this would usually be sudo apt-get install flac
on Debian-derivatives, or brew install flac
on OS X with Homebrew.
On Python 2, and only on Python 2, if you do not install the Monotonic for Python 2 library, some functions will run slower than they otherwise could (though everything will still work correctly).
On Python 3, that library's functionality is built into the Python standard library, which makes it unnecessary.
This is because monotonic time is necessary to handle cache expiry properly in the face of system time changes and other time-related issues. If monotonic time functionality is not available, then things like access token requests will not be cached.
To install, use Pip: execute pip install monotonic
in a terminal.
Try increasing the recognizer_instance.energy_threshold
property. This is basically how sensitive the recognizer is to when recognition should start. Higher values mean that it will be less sensitive, which is useful if you are in a loud room.
This value depends entirely on your microphone or audio data. There is no one-size-fits-all value, but good values typically range from 50 to 4000.
Also, check on your microphone volume settings. If it is too sensitive, the microphone may be picking up a lot of ambient noise. If it is too insensitive, the microphone may be rejecting speech as just noise.
The recognizer_instance.energy_threshold
property is probably set to a value that is too high to start off with, and then being adjusted lower automatically by dynamic energy threshold adjustment. Before it is at a good level, the energy threshold is so high that speech is just considered ambient noise.
The solution is to decrease this threshold, or call recognizer_instance.adjust_for_ambient_noise
beforehand, which will set the threshold to a good value automatically.
Try setting the recognition language to your language/dialect. To do this, see the documentation for recognizer_instance.recognize_sphinx
, recognizer_instance.recognize_google
, recognizer_instance.recognize_wit
, recognizer_instance.recognize_bing
, recognizer_instance.recognize_api
, recognizer_instance.recognize_houndify
, and recognizer_instance.recognize_ibm
.
For example, if your language/dialect is British English, it is better to use "en-GB"
as the language rather than "en-US"
.
recognizer_instance.listen
; specifically, when it's calling Microphone.MicrophoneStream.read
.This usually happens when you're using a Raspberry Pi board, which doesn't have audio input capabilities by itself. This causes the default microphone used by PyAudio to simply block when we try to read it. If you happen to be using a Raspberry Pi, you'll need a USB sound card (or USB microphone).
Once you do this, change all instances of Microphone()
to Microphone(device_index=MICROPHONE_INDEX)
, where MICROPHONE_INDEX
is the hardware-specific index of the microphone.
To figure out what the value of MICROPHONE_INDEX
should be, run the following code:
import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))
This will print out something like the following:
Microphone with name "HDA Intel HDMI: 0 (hw:0,3)" found for `Microphone(device_index=0)`
Microphone with name "HDA Intel HDMI: 1 (hw:0,7)" found for `Microphone(device_index=1)`
Microphone with name "HDA Intel HDMI: 2 (hw:0,8)" found for `Microphone(device_index=2)`
Microphone with name "Blue Snowball: USB Audio (hw:1,0)" found for `Microphone(device_index=3)`
Microphone with name "hdmi" found for `Microphone(device_index=4)`
Microphone with name "pulse" found for `Microphone(device_index=5)`
Microphone with name "default" found for `Microphone(device_index=6)`
Now, to use the Snowball microphone, you would change Microphone()
to Microphone(device_index=3)
.
Microphone()
gives the error IOError: No Default Input Device Available
.As the error says, the program doesn't know which microphone to use.
To proceed, either use Microphone(device_index=MICROPHONE_INDEX, ...)
instead of Microphone(...)
, or set a default microphone in your OS. You can obtain possible values of MICROPHONE_INDEX
using the code in the troubleshooting entry right above this one.
UnicodeEncodeError: 'ascii' codec can't encode character
when run.When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is raised when trying to write non-ASCII characters.
This is because in Python 2, recognizer_instance.recognize_sphinx
, recognizer_instance.recognize_google
, recognizer_instance.recognize_wit
, recognizer_instance.recognize_bing
, recognizer_instance.recognize_api
, recognizer_instance.recognize_houndify
, and recognizer_instance.recognize_ibm
return unicode strings (u"something"
) rather than byte strings ("something"
). In Python 3, all strings are unicode strings.
To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form:
print SOME_UNICODE_STRING
With the following:
print SOME_UNICODE_STRING.encode("utf8")
This change, however, will prevent the code from working in Python 3.
As of PyInstaller version 3.0, SpeechRecognition is supported out of the box. If you're getting weird issues when compiling your program using PyInstaller, simply update PyInstaller.
You can easily do this by running pip install --upgrade pyinstaller
.
The "bt_audio_service_open" error means that you have a Bluetooth audio device, but as a physical device is not currently connected, we can't actually use it - if you're not using a Bluetooth microphone, then this can be safely ignored. If you are, and audio isn't working, then double check to make sure your microphone is actually connected. There does not seem to be a simple way to disable these messages.
For errors of the form "ALSA lib [...] Unknown PCM", see this StackOverflow answer. Basically, to get rid of an error of the form "Unknown PCM cards.pcm.rear", simply comment out pcm.rear cards.pcm.rear
in /usr/share/alsa/alsa.conf
, ~/.asoundrc
, and /etc/asound.conf
.
For "jack server is not running or cannot be started" or "connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)" or "attempt to connect to server failed", these are caused by ALSA trying to connect to JACK, and can be safely ignored. I'm not aware of any simple way to turn those messages off at this time, besides entirely disabling printing while starting the microphone.
ChildProcessError
saying that it couldn't find the system FLAC converter, even though it's installed.Installing FLAC for OS X directly from the source code will not work, since it doesn't correctly add the executables to the search path.
Installing FLAC using Homebrew ensures that the search path is correctly updated. First, ensure you have Homebrew, then run brew install flac
to install the necessary files.
To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
speech_recognition/__init__.py
.examples/
directory, and the demo script lives in speech_recognition/__main__.py
.speech_recognition/
directory.reference/
directory.third-party/
directory.To install/reinstall the library locally, run python setup.py install
in the project root directory.
Before a release, the version number is bumped in README.rst
and speech_recognition/__init__.py
. Version tags are then created using git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE"
.
Releases are done by running make-release.sh VERSION_GOES_HERE
to build the Python source packages, sign them, and upload them to PyPI.
To run all the tests:
python -m unittest discover --verbose
Testing is also done automatically by TravisCI, upon every push. To set up the environment for offline/local Travis-like testing on a Debian-like system:
sudo docker run --volume "$(pwd):/speech_recognition" --interactive --tty quay.io/travisci/travis-python:latest /bin/bash
su - travis && cd /speech_recognition
sudo apt-get update && sudo apt-get install swig libpulse-dev
pip install --user pocketsphinx monotonic && pip install --user flake8 rstcheck && pip install --user -e .
python -m unittest discover --verbose # run unit tests
python -m flake8 --ignore=E501,E701 speech_recognition tests examples setup.py # ignore errors for long lines and multi-statement lines
python -m rstcheck README.rst reference/*.rst # ensure RST is well-formed
The included flac-win32
executable is the official FLAC 1.3.2 32-bit Windows binary.
The included flac-linux-x86
and flac-linux-x86_64
executables are built from the FLAC 1.3.2 source code with Manylinux to ensure that it's compatible with a wide variety of distributions.
The built FLAC executables should be bit-for-bit reproducible. To rebuild them, run the following inside the project directory on a Debian-like system:
# download and extract the FLAC source code
cd third-party
sudo apt-get install --yes docker.io
# build FLAC inside the Manylinux i686 Docker image
tar xf flac-1.3.2.tar.xz
sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_i686:latest bash
cd /root/flac-1.3.2
./configure LDFLAGS=-static # compiler flags to make a static build
make
exit
cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86 && sudo rm -rf flac-1.3.2/
# build FLAC inside the Manylinux x86_64 Docker image
tar xf flac-1.3.2.tar.xz
sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_x86_64:latest bash
cd /root/flac-1.3.2
./configure LDFLAGS=-static # compiler flags to make a static build
make
exit
cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86_64 && sudo rm -r flac-1.3.2/
The included flac-mac
executable is extracted from xACT 2.39, which is a frontend for FLAC 1.3.2 that conveniently includes binaries for all of its encoders. Specifically, it is a copy of xACT 2.39/xACT.app/Contents/Resources/flac
in xACT2.39.zip
.
Uberi <me@anthonyz.ca> (Anthony Zhang)
bobsayshilol
arvindch <achembarpu@gmail.com> (Arvind Chembarpu)
kevinismith <kevin_i_smith@yahoo.com> (Kevin Smith)
haas85
DelightRun <changxu.mail@gmail.com>
maverickagm
kamushadenes <kamushadenes@hyadesinc.com> (Kamus Hadenes)
sbraden <braden.sarah@gmail.com> (Sarah Braden)
tb0hdan (Bohdan Turkynewych)
Thynix <steve@asksteved.com> (Steve Dougherty)
beeedy <broderick.carlin@gmail.com> (Broderick Carlin)
Please report bugs and suggestions at the issue tracker!
How to cite this library (APA style):
Zhang, A. (2017). Speech Recognition (Version 3.8) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
How to cite this library (Chicago style):
Zhang, Anthony. 2017. Speech Recognition (version 3.8).
Also check out the Python Baidu Yuyin API, which is based on an older version of this project, and adds support for Baidu Yuyin. Note that Baidu Yuyin is only available inside China.
Author: Uberi
Source Code: https://github.com/Uberi/speech_recognition
License: View license
1634678640
Key-value stores come with high speed and performance against relational databases. In this video, we are going to build a facial recognition application with redis key value database. Because key value databases overperforms facial verification tasks.
We will use deepface framework for face recognition models. It wraps several state-of-the-art face recognition models: vgg-face, google facenet, openface, facebook deepface, deepid, dlib and arcface. We will use FaceNet in this video.
We will retrieve the vector representations of faces from redis, and compare it to a target image in the client side - python.
1634660008
In this video, we are going to mention how to use Apache Cassandra wide column store for facial recognition tasks. Notice that key value stores such as Redis and Cassandra over-perform face verification tasks. Here,
Cassandra is a wide column store. In contrast to Redis key value store, it can store multiple columns for a row.
We will use deepface library for python to represent facial images as vector. Then, we will store vector representations in Cassandra.
1625155680
Face recognition tutorial with Python using OpenCV in only 10 minutes from your live webcam. Thank you for watching.
Requiements.txt: https://github.com/ageitgey/face_recognition/blob/master/requirements.txt
Activate venv with these commands:
Learn how to make your own Discord bot: https://www.youtube.com/watch?v=_o8lwjVNJsg
Follow me on my new Twitter account š:
https://twitter.com/elilopezdev
#enterflash #face #recognition #python #opencv #agitgey
#enterflash #recognition #python #opencv #agitgey #face
1598671246
In this tutorial we are going to learn how to perform Facial recognition with high accuracy. We will first briefly go through the theory and learn the basic implementation. Then we will create an Attendance project that will use webcam to detect faces and record the attendance live in an excel sheet.
https://www.murtazahassan.com/face-recognition-opencv/
#opencv #face #recognition #attendance
1597457400
The necessity of digitisation is rapidly increasing in the modern era. Due to the growth of information and communication technologies (ICT) and the wide availability of handheld devices, people often prefer digitized content over the printed materials including books and newspaper. Also, it is easier to organize digitized data and analyze them for various purposes with many advanced techniques like artificial intelligence etc. So to keep up with the present technological scenario, it is necessary to convert all the information present till now which is in the printed format to digitised format.
Here comes OCR ā¦.Our saviouršŖ šŖ which helps us in performing the tedious work of digitising the information. OCR stands for **Optical Character Recognition, **whose primary job is to recognise the printed text in an image. Once we recognise the printed text with the help of OCR, we can use that information in various types.
Recognizing
This is a 3-part series of articles that explains various concepts and phases of an OCR system. Letās have a look at what you are going to learn in each part
Below image shows the different phases in the workflow of an OCR system.
#machine-learning #ocr #image-processing #recognition #segmentation #deep learning
1592642710
Being one of the most scrutinised technologies of the current era, the debate against facial recognition has been raging for quite some time. However, the recent notable incident of ākilling of George Floyd by a Minneapolis police officerā has brought in the urgency for framing a strict regulation and guidelines against using this technology by law enforcement. Nevertheless, in the current era, this divisive technology has penetrated almost every aspect of human lives ā smartphones, airports, police stations, advertising, and payments. It has also replaced the dated technology of biometrics amid COVID pandemic. But, the growing concerns of the recent incident has urged tech giants to reckon their decisions of building and offering this technology to police authorities.
Read more: https://analyticsindiamag.com/is-there-a-case-of-regulating-facial-recognition-technology/
#facial #recognition #technology
1592456417
Computer Vision and Pattern Recognition (CVPR) conference is one of the most popular events around the globe where computer vision experts and researchers gather to share their work and views on the trending techniques on various computer vision topics, including object detection, video understanding, visual recognition, among others.
Read more: https://analyticsindiamag.com/everything-so-far-in-cvpr-2020-conference/
#artificial-intelligence #conference #computervision #techniques #recognition #imageclasification