Hal  Sauer

Hal Sauer

1592980560

Speech Recognition with TensorFlow.js

When we usually talk about AI, deep learning, machine learning we automatically think of Python, R, or C++, but what about JavaScript? Well… turns out, one of the most popular libraries for machine learning in Python is available for JavaScript as well, we are talking about Tensorflow, and today we will do a short introduction into the library, and we will build a fun project together.


What is Tensorflow.js and what can it be used for?

TensorFlow.js is a JavaScript library developed by Google for training and deploying machine learning models in the browser and in Node.js. It’s a companion library to TensorFlow, the popular ML library for Python.TensorFlow.js is not just a toy library, it is serious business, the performance is surprising, especially when using hardware acceleration through WebGL, but should we train models with it? Perhaps no, even though you can achieve great performance, it’s Python counterpart is even faster, and when working with Python you will find more libraries to support your code like Numpy and Pandas. In addition to learning materials, where there’s not as much for TensorFlow.js as there is for TensorFlow.Now, this doesn’t mean you shouldn’t use TensorFlow.js, on the contrary, I think it’s a great library for deploying and running ML models, and it is what we are going to focus for the rest of the article.


Deploying a sample model with TensorFlow.js

As we said, TensorFlow.js is a powerful library, and we can work on a lot of different things like image classification, video manipulation, and speech recognition among others. For today I decided to work on a basic speech recognition example.Our code will be able to listen through the microphone and identify what the user is saying, at least up to a few words as we have some limitations on the sample model I’m using. But rather than explaining, I think it’s cool if we see it first in action:

Unfortunately, I can’t run the code on medium, but you can access the live demo here

Pretty cool? I know it can be a bit erratic, and it’s limited to a few words, but if you use the right model, the possibilities are endless. Enough talking, let’s start coding.

#data-science #tensorflow #nodejs #javascript #ai

What is GEEK

Buddha Community

Speech Recognition with TensorFlow.js

NBB: Ad-hoc CLJS Scripting on Node.js

Nbb

Not babashka. Node.js babashka!?

Ad-hoc CLJS scripting on Node.js.

Status

Experimental. Please report issues here.

Goals and features

Nbb's main goal is to make it easy to get started with ad hoc CLJS scripting on Node.js.

Additional goals and features are:

  • Fast startup without relying on a custom version of Node.js.
  • Small artifact (current size is around 1.2MB).
  • First class macros.
  • Support building small TUI apps using Reagent.
  • Complement babashka with libraries from the Node.js ecosystem.

Requirements

Nbb requires Node.js v12 or newer.

How does this tool work?

CLJS code is evaluated through SCI, the same interpreter that powers babashka. Because SCI works with advanced compilation, the bundle size, especially when combined with other dependencies, is smaller than what you get with self-hosted CLJS. That makes startup faster. The trade-off is that execution is less performant and that only a subset of CLJS is available (e.g. no deftype, yet).

Usage

Install nbb from NPM:

$ npm install nbb -g

Omit -g for a local install.

Try out an expression:

$ nbb -e '(+ 1 2 3)'
6

And then install some other NPM libraries to use in the script. E.g.:

$ npm install csv-parse shelljs zx

Create a script which uses the NPM libraries:

(ns script
  (:require ["csv-parse/lib/sync$default" :as csv-parse]
            ["fs" :as fs]
            ["path" :as path]
            ["shelljs$default" :as sh]
            ["term-size$default" :as term-size]
            ["zx$default" :as zx]
            ["zx$fs" :as zxfs]
            [nbb.core :refer [*file*]]))

(prn (path/resolve "."))

(prn (term-size))

(println (count (str (fs/readFileSync *file*))))

(prn (sh/ls "."))

(prn (csv-parse "foo,bar"))

(prn (zxfs/existsSync *file*))

(zx/$ #js ["ls"])

Call the script:

$ nbb script.cljs
"/private/tmp/test-script"
#js {:columns 216, :rows 47}
510
#js ["node_modules" "package-lock.json" "package.json" "script.cljs"]
#js [#js ["foo" "bar"]]
true
$ ls
node_modules
package-lock.json
package.json
script.cljs

Macros

Nbb has first class support for macros: you can define them right inside your .cljs file, like you are used to from JVM Clojure. Consider the plet macro to make working with promises more palatable:

(defmacro plet
  [bindings & body]
  (let [binding-pairs (reverse (partition 2 bindings))
        body (cons 'do body)]
    (reduce (fn [body [sym expr]]
              (let [expr (list '.resolve 'js/Promise expr)]
                (list '.then expr (list 'clojure.core/fn (vector sym)
                                        body))))
            body
            binding-pairs)))

Using this macro we can look async code more like sync code. Consider this puppeteer example:

(-> (.launch puppeteer)
      (.then (fn [browser]
               (-> (.newPage browser)
                   (.then (fn [page]
                            (-> (.goto page "https://clojure.org")
                                (.then #(.screenshot page #js{:path "screenshot.png"}))
                                (.catch #(js/console.log %))
                                (.then #(.close browser)))))))))

Using plet this becomes:

(plet [browser (.launch puppeteer)
       page (.newPage browser)
       _ (.goto page "https://clojure.org")
       _ (-> (.screenshot page #js{:path "screenshot.png"})
             (.catch #(js/console.log %)))]
      (.close browser))

See the puppeteer example for the full code.

Since v0.0.36, nbb includes promesa which is a library to deal with promises. The above plet macro is similar to promesa.core/let.

Startup time

$ time nbb -e '(+ 1 2 3)'
6
nbb -e '(+ 1 2 3)'   0.17s  user 0.02s system 109% cpu 0.168 total

The baseline startup time for a script is about 170ms seconds on my laptop. When invoked via npx this adds another 300ms or so, so for faster startup, either use a globally installed nbb or use $(npm bin)/nbb script.cljs to bypass npx.

Dependencies

NPM dependencies

Nbb does not depend on any NPM dependencies. All NPM libraries loaded by a script are resolved relative to that script. When using the Reagent module, React is resolved in the same way as any other NPM library.

Classpath

To load .cljs files from local paths or dependencies, you can use the --classpath argument. The current dir is added to the classpath automatically. So if there is a file foo/bar.cljs relative to your current dir, then you can load it via (:require [foo.bar :as fb]). Note that nbb uses the same naming conventions for namespaces and directories as other Clojure tools: foo-bar in the namespace name becomes foo_bar in the directory name.

To load dependencies from the Clojure ecosystem, you can use the Clojure CLI or babashka to download them and produce a classpath:

$ classpath="$(clojure -A:nbb -Spath -Sdeps '{:aliases {:nbb {:replace-deps {com.github.seancorfield/honeysql {:git/tag "v2.0.0-rc5" :git/sha "01c3a55"}}}}}')"

and then feed it to the --classpath argument:

$ nbb --classpath "$classpath" -e "(require '[honey.sql :as sql]) (sql/format {:select :foo :from :bar :where [:= :baz 2]})"
["SELECT foo FROM bar WHERE baz = ?" 2]

Currently nbb only reads from directories, not jar files, so you are encouraged to use git libs. Support for .jar files will be added later.

Current file

The name of the file that is currently being executed is available via nbb.core/*file* or on the metadata of vars:

(ns foo
  (:require [nbb.core :refer [*file*]]))

(prn *file*) ;; "/private/tmp/foo.cljs"

(defn f [])
(prn (:file (meta #'f))) ;; "/private/tmp/foo.cljs"

Reagent

Nbb includes reagent.core which will be lazily loaded when required. You can use this together with ink to create a TUI application:

$ npm install ink

ink-demo.cljs:

(ns ink-demo
  (:require ["ink" :refer [render Text]]
            [reagent.core :as r]))

(defonce state (r/atom 0))

(doseq [n (range 1 11)]
  (js/setTimeout #(swap! state inc) (* n 500)))

(defn hello []
  [:> Text {:color "green"} "Hello, world! " @state])

(render (r/as-element [hello]))

Promesa

Working with callbacks and promises can become tedious. Since nbb v0.0.36 the promesa.core namespace is included with the let and do! macros. An example:

(ns prom
  (:require [promesa.core :as p]))

(defn sleep [ms]
  (js/Promise.
   (fn [resolve _]
     (js/setTimeout resolve ms))))

(defn do-stuff
  []
  (p/do!
   (println "Doing stuff which takes a while")
   (sleep 1000)
   1))

(p/let [a (do-stuff)
        b (inc a)
        c (do-stuff)
        d (+ b c)]
  (prn d))
$ nbb prom.cljs
Doing stuff which takes a while
Doing stuff which takes a while
3

Also see API docs.

Js-interop

Since nbb v0.0.75 applied-science/js-interop is available:

(ns example
  (:require [applied-science.js-interop :as j]))

(def o (j/lit {:a 1 :b 2 :c {:d 1}}))

(prn (j/select-keys o [:a :b])) ;; #js {:a 1, :b 2}
(prn (j/get-in o [:c :d])) ;; 1

Most of this library is supported in nbb, except the following:

  • destructuring using :syms
  • property access using .-x notation. In nbb, you must use keywords.

See the example of what is currently supported.

Examples

See the examples directory for small examples.

Also check out these projects built with nbb:

API

See API documentation.

Migrating to shadow-cljs

See this gist on how to convert an nbb script or project to shadow-cljs.

Build

Prequisites:

  • babashka >= 0.4.0
  • Clojure CLI >= 1.10.3.933
  • Node.js 16.5.0 (lower version may work, but this is the one I used to build)

To build:

  • Clone and cd into this repo
  • bb release

Run bb tasks for more project-related tasks.

Download Details:
Author: borkdude
Download Link: Download The Source Code
Official Website: https://github.com/borkdude/nbb 
License: EPL-1.0

#node #javascript

How to Categorize TensorFlow.js Images Made easy

TensorFlow.js Image Classification Made Easy
In this video you’re going to discover an easy way how to train a convolutional neural network for image classification and use the created TensorFlow.js image classifier afterwards to score x-ray images locally in your web browser.

TensorFlow.JS is a great machine learning javascript-based framework to run your machine learning models locally in the web browser as well as on your server using node.js.
But defining your model structure and training it, is way more complex than just using a trained model.
Azure Custom Vision - one of the various Cognitive Services - offers you an easy way to avoid this hassle.

#TensorFlow.js #TensorFlow #js

宇野  和也

宇野 和也

1593877440

Indian Accent Speech Recognition

Traditional ASR (Signal Analysis, MFCC, DTW, HMM & Language Modelling) and DNNs (Custom Models & Baidu DeepSpeech Model) on Indian Accent Speech

Courtesy_: _Speech and Music Technology Lab, IIT Madras

Image Courtesy

Notwithstanding an approved Indian-English accent speech, accent-less enunciation is a myth. Irregardless of the racial stereotypes, our speech is naturally shaped by the vernacular we speak, and the Indian vernaculars are numerous! Then how does a computer decipher speech from different Indian states, which even Indians from other states, find ambiguous to understand?

**ASR (Automatic Speech Recognition) **takes any continuous audio speech and output the equivalent text . In this blog, we will explore some challenges in speech recognition with focus on the speaker-independent recognition, both in theory and practice.

The** challenges in ASR** include

  • Variability of volume
  • Variability of words speed
  • Variability of Speaker
  • Variability of** pitch**
  • Word boundaries: we speak words without pause.
  • **Noises **like background sound, audience talks etc.

Lets address** each of the above problems** in the sections discussed below.

The complete source code of the above studies can be found here.

Models in speech recognition can conceptually be divided into:

  • Acoustic model: Turn sound signals into some kind of phonetic representation.
  • Language model: houses domain knowledge of words, grammar, and sentence structure for the language.

Signal Analysis

When we speak we create sinusoidal vibrations in the air. Higher pitches vibrate faster with a higher frequency than lower pitches. A microphone transduce acoustical energy in vibrations to electrical energy.

If we say “Hello World’ then the corresponding signal would contain 2 blobs

Some of the vibrations in the signal have higher amplitude. The amplitude tells us how much acoustical energy is there in the sound

Our speech is made up of many frequencies at the same time, i.e. it is a sum of all those frequencies. To analyze the signal, we use the component frequencies as features. **Fourier transform **is used to break the signal into these components.

We can use this splitting technique to convert the sound to a Spectrogram, where **frequency **on the vertical axis is plotted against time. The intensity of shading indicates the amplitude of the signal.

Spectrogram of the hello world phrase

To create a Spectrogram,

  1. **Divide the signal **into time frames.
  2. Split each frame signal into frequency components with an FFT.
  3. Each time frame is now represented with a** vector of amplitudes** at each frequency.

one dimensional vector for one time frame

If we line up the vectors again in their time series order, we can have a visual picture of the sound components, the Spectrogram.

Spectrogram can be lined up with the original audio signal in time

Next, we’ll look at Feature Extraction techniques which would reduce the noise and dimensionality of our data.

Unnecessary information is encoded in Spectrograph

Feature Extraction with MFCC

Mel Frequency Cepstrum Coefficient Analysis is the reduction of an audio signal to essential speech component features using both Mel frequency analysis and Cepstral analysis. The range of frequencies are reduced and binned into groups of frequencies that humans can distinguish. The signal is further separated into source and filter so that variations between speakers unrelated to articulation can be filtered away.

a) Mel Frequency Analysis

Only **those frequencies humans can hear are **important for recognizing speech. We can split the frequencies of the Spectrogram into bins relevant to our own ears and filter out sound that we can’t hear.

Frequencies above the black line will be filtered out

b) Cepstral Analysis

We also need to separate the elements of sound that are speaker-independent. We can think of a human voice production model as a combination of source and filter, where the source is unique to an individual and the filter is the articulation of words that we all use when speaking.

Cepstral analysis relies on this model for separating the two. The cepstrum can be extracted from a signal with an algorithm. Thus, we drop the component of speech unique to individual vocal chords and preserving the shape of the sound made by the vocal tract.

Cepstral analysis combined with Mel frequency analysis get you 12 or 13 MFCC features related to speech. **Delta and Delta-Delta MFCC features **can optionally be appended to the feature set, effectively doubling (or tripling) the number of features, up to 39 features, but gives better results in ASR.

Thus MFCC (Mel-frequency cepstral coefficients) Features Extraction,

  • Reduced the dimensionality of our data and
  • We squeeze noise out of the system

So there are 2 Acoustic Features for Speech Recognition:

  • Spectrograms
  • Mel-Frequency Cepstral Coefficients (MFCCs):

When you construct your pipeline, you will be able to choose to use either spectrogram or MFCC features. Next, we’ll look at sound from a language perspective, i.e. the phonetics of the words we hear.

Phonetics

Phonetics is the study of sound in human speech. Linguistic analysis is used to break down human words into their smallest sound segments.


phonemes define the distinct sounds

  • Phoneme is the smallest sound segment that can be used to distinguish one word from another.
  • Grapheme, in contrast, is the smallest distinct unit written in a language. Eg: English has 26 alphabets plus a space (27 graphemes).

Unfortunately, we can’t map phonemes to grapheme, as some letters map to multiple phonemes & some phonemes map to many letters. For example, the C letter sounds different in cat, chat, and circle.

Phonemes are often a useful intermediary between speech and text. If we can successfully produce an acoustic model that decodes a sound signal into phonemes the remaining task would be to map those phonemes to their matching words. This step is called Lexical Decoding, named so as it is based on a lexicon or dictionary of the data set.

If we want to train a limited vocabulary of words we might just skip the phonemes. If we have a large vocabulary, then converting to smaller units first, reduces the total number of comparisons needed.

Acoustic Models and the Trouble with Time

With feature extraction, we’ve addressed noise problems as well as variability of speakers. But we still haven’t solved the problem of matching variable lengths of the same word.

Dynamic Time Warping (DTW) calculates the similarity between two signals, even if their time lengths differ. This can be used to align the sequence data of a new word to its most similar counterpart in a dictionary of word examples.

2 signals mapped with Dynamic Time Warping

#deep-speech #speech #deep-learning #speech-recognition #machine-learning #deep learning

Vincent Lab

Vincent Lab

1605177504

Text to Speech in Node.js

In this video, I will be showing you how to turn text into speech in Node.js

#javascript #text to speech #javascript api #text to speech app #node.js text to speech #javascript text to speech

Hire Dedicated Node.js Developers - Hire Node.js Developers

If you look at the backend technology used by today’s most popular apps there is one thing you would find common among them and that is the use of NodeJS Framework. Yes, the NodeJS framework is that effective and successful.

If you wish to have a strong backend for efficient app performance then have NodeJS at the backend.

WebClues Infotech offers different levels of experienced and expert professionals for your app development needs. So hire a dedicated NodeJS developer from WebClues Infotech with your experience requirement and expertise.

So what are you waiting for? Get your app developed with strong performance parameters from WebClues Infotech

For inquiry click here: https://www.webcluesinfotech.com/hire-nodejs-developer/

Book Free Interview: https://bit.ly/3dDShFg

#hire dedicated node.js developers #hire node.js developers #hire top dedicated node.js developers #hire node.js developers in usa & india #hire node js development company #hire the best node.js developers & programmers