Royce  Reinger

Royce Reinger

1659212880

Pocketsphinx-ruby: Ruby Speech Recognition with Pocketsphinx

pocketsphinx-ruby   

This gem provides Ruby FFI bindings for Pocketsphinx, a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the CMU Sphinx Open Source Toolkit For Speech Recognition.

Pocketsphinx's SWIG interface was initially considered for this gem, but dropped in favor of FFI for many of the reasons outlined here; most importantly ease of maintenance and JRuby support.

The goal of this project is to make it as easy as possible for the Ruby community to experiment with speech recognition. Please do contribute fixes and enhancements.

Installation

This gem depends on Pocketsphinx (libpocketsphinx), and Sphinxbase (libsphinxbase and libsphinxad). The current stable versions (0.8) are from late 2012 and are now outdated. Build them manually from source, or on OSX the latest development (potentially unstable) versions can be installed using Homebrew as follows (more information here).

Add the Homebrew tap:

$ brew tap watsonbox/cmu-sphinx

You'll see some warnings as these formulae conflict with those in the main reponitory, but that's fine.

Install the libraries:

$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxbase
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxtrain # optional
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-pocketsphinx

You can test continuous recognition as follows:

$ pocketsphinx_continuous -inmic yes

Then add this line to your application's Gemfile:

gem 'pocketsphinx-ruby'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pocketsphinx-ruby

Usage

The LiveSpeechRecognizer is modeled on the same class in Sphinx4. It uses the Microphone and Decoder classes internally to provide a simple, high-level recognition interface:

require 'pocketsphinx-ruby' # Omitted in subsequent examples

Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
  puts speech
end

The AudioFileSpeechRecognizer decodes directly from an audio file by coordinating interactions between an AudioFile and Decoder.

recognizer = Pocketsphinx::AudioFileSpeechRecognizer.new

recognizer.recognize('spec/assets/audio/goforward.raw') do |speech|
  puts speech # => "go forward ten meters"
end

These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the vad_postspeech, vad_prespeech, and vad_threshold configuration settings.

Configuration

All of Pocketsphinx's decoding settings are managed by the Configuration class, which can be passed into the high-level speech recognizers:

configuration = Pocketsphinx::Configuration.default
configuration.details('vad_threshold')
# => {
#   :name => "vad_threshold",
#   :type => :float,
#   :default => 2.0,
#   :value => 2.0,
#   :info => "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
# }

configuration['vad_threshold'] = 4

Pocketsphinx::LiveSpeechRecognizer.new(configuration)

You can find the output of configuration.details here for more information on the various different settings.

Microphone

The Microphone class uses Pocketsphinx's libsphinxad to record audio for speech recognition. For desktop applications this should normally be 16bit/16kHz raw PCM audio, so these are the default settings. The exact audio backend depends on what was selected when libsphinxad was built. On OSX, OpenAL is now supported and should work just fine.

For example, to record and save a 5 second raw audio file:

microphone = Pocketsphinx::Microphone.new

File.open("test.raw", "wb") do |file|
  microphone.record do
    FFI::MemoryPointer.new(:int16, 2048) do |buffer|
      50.times do
        sample_count = microphone.read_audio(buffer, 2048)
        file.write buffer.get_bytes(0, sample_count * 2)

        sleep 0.1
      end
    end
  end
end

To open this audio file take a look at this wiki page.

Decoder

The Decoder class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:

decoder = Pocketsphinx::Decoder.new(Pocketsphinx::Configuration.default)
decoder.decode 'spec/assets/audio/goforward.raw'

puts decoder.hypothesis # => "go forward ten meters"

And split into individual words with frame data:

decoder.words
# => [
#  #<struct Pocketsphinx::Decoder::Word word="<s>", start_frame=608, end_frame=610>,
#  #<struct Pocketsphinx::Decoder::Word word="go", start_frame=611, end_frame=622>,
#  #<struct Pocketsphinx::Decoder::Word word="forward", start_frame=623, end_frame=675>,
#  #<struct Pocketsphinx::Decoder::Word word="ten", start_frame=676, end_frame=711>,
#  #<struct Pocketsphinx::Decoder::Word word="meters", start_frame=712, end_frame=770>,
#  #<struct Pocketsphinx::Decoder::Word word="</s>", start_frame=771, end_frame=821>
# ]

Note: When the Decoder is initialized, the supplied Configuration is updated by Pocketsphinx with some settings from the acoustic model. To see exactly what's going on:

Pocketsphinx::Decoder.new(Pocketsphinx::Configuration.default).configuration.changes

Keyword Spotting

Keyword spotting is another feature that is not in the current stable (0.8) releases of Pocketsphinx, having been merged into trunk early in 2014. It can be useful for detecting an activation keyword in a command and control application, while ignoring all other speech. Set up a recognizer as follows:

configuration = Pocketsphinx::Configuration::KeywordSpotting.new('Okay computer')
recognizer = Pocketsphinx::LiveSpeechRecognizer.new(configuration)

The KeywordSpotting configuration accepts a second argument for adjusting the sensitivity of the keyword detection. Note that this is just a wrapper which sets the keyphrase and kws_threshold settings on the default configuration, and removes the language model:

Pocketsphinx::Configuration::KeywordSpotting.new('keyword', 2).changes
# => [
#   { :name => "keyphrase", :type => :string, :default => nil, :required => false, :value => "keyword", :info => "Keyphrase to spot" },
#   { :name => "kws_threshold", :type => :float, :default => 1.0, :required => false, :value => 2.0, :info => "Threshold for p(hyp)/p(alternatives) ratio" },
#   { :name => "lm", :type => :string, :default => "/usr/local/Cellar/cmu-pocketsphinx/HEAD/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP", :required => false, :value => nil, :info => "Word trigram language model input file" }
# ]

Grammars

Another way of configuring Pocketsphinx is with a grammar, which is normally used to describe very simple types of languages for command and control. Restricting the set of possible utterances in this way can greatly improve recognition accuracy for these types of application.

Load a JSGF grammar from a file:

configuration = Pocketsphinx::Configuration::Grammar.new('sentences.gram')

Or build one dynamically with this simple DSL (currently only supports sentence lists):

configuration = Pocketsphinx::Configuration::Grammar.new do
  sentence "Go forward ten meters"
  sentence "Go backward ten meters"
end

Recognition Accuracy and Training

See the CMU Sphinx resources on training and adapting acoustic models for more information.

Peter Grasch, author of Simon, has also made a number of interesting posts on the state of open source speech recognition, as wells as improving language and acoustic models.

See sphinxtrain-ruby for an experimental toolkit for training/adapting CMU Sphinx acoustic models. Its main goal is to help with adapting existing acoustic models to a specific speaker/accent.

Troubleshooting

First and foremost, because this gem depends on development versions of CMU Sphinx packages, there will be times when errors are caused by API changes or bugs in those packages. Unfortunately until some up to date releases are made this is going to happen from time to time, so please do open an issue with as much detail as you have.

This gem has been tested with a manual Pocketsphinx installation on Ubuntu 14.04 and a Homebrew Pocketsphinx installation on OSX 10.9.4 Mavericks. Take a look at the following common problems before opening an issue.

`attach_function': Function 'ps_default_search_args' not found in [libpocketsphinx.so] (FFI::NotFoundError)

An error like this probably means that you have an old version of the Pocketsphinx libraries installed. If necessary, replace them with a recent development version which supports the features available in this gem.

Contributing

  1. Fork it ( https://github.com/watsonbox/pocketsphinx-ruby/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Projects Using pocketsphinx-ruby

  • Isabella - A voice-computing assistant built in Ruby.
  • sphinxtrain-ruby - A Toolkit for training/adapting CMU Sphinx acoustic models.

Author: Watsonbox
Source Code: https://github.com/watsonbox/pocketsphinx-ruby 
License: MIT license

#ruby 

What is GEEK

Buddha Community

Pocketsphinx-ruby: Ruby Speech Recognition with Pocketsphinx

Ruby on Rails Development Services | Ruby on Rails Development

Ruby on Rails is a development tool that offers Web & Mobile App Developers a structure for all the codes they write resulting in time-saving with all the common repetitive tasks during the development stage.

Want to build a Website or Mobile App with Ruby on Rails Framework

Connect with WebClues Infotech, the top Web & Mobile App development company that has served more than 600 clients worldwide. After serving them with our services WebClues Infotech is ready to serve you in fulfilling your Web & Mobile App Development Requirements.

Want to know more about development on the Ruby on Rails framework?

Visit: https://www.webcluesinfotech.com/ruby-on-rails-development/

Share your requirements https://www.webcluesinfotech.com/contact-us/

View Portfolio https://www.webcluesinfotech.com/portfolio/

#ruby on rails development services #ruby on rails development #ruby on rails web development company #ruby on rails development company #hire ruby on rails developer #hire ruby on rails developers

Shardul Bhatt

Shardul Bhatt

1626850869

7 Reasons to Trust Ruby on Rails

Ruby on Rails is an amazing web development framework. Known for its adaptability, it powers 3,903,258 sites internationally. Ruby on Rails development speeds up the interaction within web applications. It is productive to such an extent that a Ruby on Rails developer can develop an application 25% to 40% quicker when contrasted with different frameworks. 

Around 2.1% (21,034) of the best 1 million sites utilize Ruby on Rails. The framework is perfect for creating web applications in every industry. Regardless of whether it's medical services or vehicles, Rails carries a higher degree of dynamism to each application. 

Be that as it may, what makes the framework so mainstream? Some say that it is affordable, some say it is on the grounds that the Ruby on Rails improvement environment is simple and basic. There are numerous reasons that make it ideal for creating dynamic applications.

Read more: Best Ruby on Rails projects Examples

7 reasons Ruby on Rails is preferred

There are a few other well-known backend services for web applications like Django, Flask, Laravel, and that's only the tip of the iceberg. So for what reason should organizations pick Ruby on Rails application development? We believe the accompanying reasons will feature why different organizations trust the framework -

Quick prototyping 

Rails works on building MVPs in a couple of months. Organizations incline toward Ruby on Rails quick application development as it offers them more opportunity to showcase the elements. Regular development groups accomplish 25% to 40% higher efficiency when working with Rails. Joined with agile, Ruby on Rails empowers timely delivery.

Basic and simple 

Ruby on Rails is easy to arrange and work with. It is not difficult to learn also. Both of these things are conceivable as a result of Ruby. The programming language has one of the most straightforward sentence structures, which is like the English language. Ruby is a universally useful programming language, working on things for web applications. 

Cost-effective 

Probably the greatest advantage of Rails is that it is very reasonable. The system is open-source, which implies there is no licensing charge included. Aside from that, engineers are additionally effectively accessible, that too at a lower cost. There are a large number of Ruby on Rails engineers for hire at an average compensation of $107,381 each year. 

Startup-friendly

Ruby on Rails is regularly known as "the startup technology." It offers adaptable, fast, and dynamic web improvement to new companies. Most arising organizations and new businesses lean toward this as a direct result of its quick application improvement capacities. It prompts quicker MVP development, which permits new companies to rapidly search for venture investment. 

Adaptable framework 

Ruby on Rails is profoundly adaptable and versatile. In any event, when engineers miss adding any functions, they can utilize different modules to add highlights into the application. Aside from that, they can likewise reclassify components by eliminating or adding them during the development environment. Indeed, even individual projects can be extended and changed. 

Convention over configuration

Regardless of whether it's Ruby on Rails enterprise application development or ecommerce-centered applications, the system utilizes convention over configuration. Developers don't have to go through hours attempting to set up the Ruby on Rails improvement environment. The standard conventions cover everything, improving on things for engineers on the task. The framework likewise utilizes the standard of "Don't Repeat Yourself" to guarantee there are no redundancies. 

Versatile applications 

At the point when organizations scale, applications regularly slack. However, this isn't the situation with Ruby on Rails web application development. The system powers sites with high traffic, It can deal with a huge load of worker demands immediately. Adaptability empowers new businesses to keep utilizing the structure even after they prepare their first model for dispatch. 

Checkout Pros and Cons of Ruby on Rails for Web Development

Bottom Line 

Ruby on Rails is as yet a significant framework utilized by organizations all over the world - of every kind. In this day and age, it is probably the best framework to digitize endeavors through powerful web applications.

A software development company provides comprehensive Ruby on Rails development to guarantee startups and MNCs can benefit as much as possible from their digital application needs. 

Reach us today for a FREE CONSULTATION

#ruby on rails development #ruby on rails application development #ruby on rails web application development #ruby on rails developer

宇野  和也

宇野 和也

1593877440

Indian Accent Speech Recognition

Traditional ASR (Signal Analysis, MFCC, DTW, HMM & Language Modelling) and DNNs (Custom Models & Baidu DeepSpeech Model) on Indian Accent Speech

Courtesy_: _Speech and Music Technology Lab, IIT Madras

Image Courtesy

Notwithstanding an approved Indian-English accent speech, accent-less enunciation is a myth. Irregardless of the racial stereotypes, our speech is naturally shaped by the vernacular we speak, and the Indian vernaculars are numerous! Then how does a computer decipher speech from different Indian states, which even Indians from other states, find ambiguous to understand?

**ASR (Automatic Speech Recognition) **takes any continuous audio speech and output the equivalent text . In this blog, we will explore some challenges in speech recognition with focus on the speaker-independent recognition, both in theory and practice.

The** challenges in ASR** include

  • Variability of volume
  • Variability of words speed
  • Variability of Speaker
  • Variability of** pitch**
  • Word boundaries: we speak words without pause.
  • **Noises **like background sound, audience talks etc.

Lets address** each of the above problems** in the sections discussed below.

The complete source code of the above studies can be found here.

Models in speech recognition can conceptually be divided into:

  • Acoustic model: Turn sound signals into some kind of phonetic representation.
  • Language model: houses domain knowledge of words, grammar, and sentence structure for the language.

Signal Analysis

When we speak we create sinusoidal vibrations in the air. Higher pitches vibrate faster with a higher frequency than lower pitches. A microphone transduce acoustical energy in vibrations to electrical energy.

If we say “Hello World’ then the corresponding signal would contain 2 blobs

Some of the vibrations in the signal have higher amplitude. The amplitude tells us how much acoustical energy is there in the sound

Our speech is made up of many frequencies at the same time, i.e. it is a sum of all those frequencies. To analyze the signal, we use the component frequencies as features. **Fourier transform **is used to break the signal into these components.

We can use this splitting technique to convert the sound to a Spectrogram, where **frequency **on the vertical axis is plotted against time. The intensity of shading indicates the amplitude of the signal.

Spectrogram of the hello world phrase

To create a Spectrogram,

  1. **Divide the signal **into time frames.
  2. Split each frame signal into frequency components with an FFT.
  3. Each time frame is now represented with a** vector of amplitudes** at each frequency.

one dimensional vector for one time frame

If we line up the vectors again in their time series order, we can have a visual picture of the sound components, the Spectrogram.

Spectrogram can be lined up with the original audio signal in time

Next, we’ll look at Feature Extraction techniques which would reduce the noise and dimensionality of our data.

Unnecessary information is encoded in Spectrograph

Feature Extraction with MFCC

Mel Frequency Cepstrum Coefficient Analysis is the reduction of an audio signal to essential speech component features using both Mel frequency analysis and Cepstral analysis. The range of frequencies are reduced and binned into groups of frequencies that humans can distinguish. The signal is further separated into source and filter so that variations between speakers unrelated to articulation can be filtered away.

a) Mel Frequency Analysis

Only **those frequencies humans can hear are **important for recognizing speech. We can split the frequencies of the Spectrogram into bins relevant to our own ears and filter out sound that we can’t hear.

Frequencies above the black line will be filtered out

b) Cepstral Analysis

We also need to separate the elements of sound that are speaker-independent. We can think of a human voice production model as a combination of source and filter, where the source is unique to an individual and the filter is the articulation of words that we all use when speaking.

Cepstral analysis relies on this model for separating the two. The cepstrum can be extracted from a signal with an algorithm. Thus, we drop the component of speech unique to individual vocal chords and preserving the shape of the sound made by the vocal tract.

Cepstral analysis combined with Mel frequency analysis get you 12 or 13 MFCC features related to speech. **Delta and Delta-Delta MFCC features **can optionally be appended to the feature set, effectively doubling (or tripling) the number of features, up to 39 features, but gives better results in ASR.

Thus MFCC (Mel-frequency cepstral coefficients) Features Extraction,

  • Reduced the dimensionality of our data and
  • We squeeze noise out of the system

So there are 2 Acoustic Features for Speech Recognition:

  • Spectrograms
  • Mel-Frequency Cepstral Coefficients (MFCCs):

When you construct your pipeline, you will be able to choose to use either spectrogram or MFCC features. Next, we’ll look at sound from a language perspective, i.e. the phonetics of the words we hear.

Phonetics

Phonetics is the study of sound in human speech. Linguistic analysis is used to break down human words into their smallest sound segments.


phonemes define the distinct sounds

  • Phoneme is the smallest sound segment that can be used to distinguish one word from another.
  • Grapheme, in contrast, is the smallest distinct unit written in a language. Eg: English has 26 alphabets plus a space (27 graphemes).

Unfortunately, we can’t map phonemes to grapheme, as some letters map to multiple phonemes & some phonemes map to many letters. For example, the C letter sounds different in cat, chat, and circle.

Phonemes are often a useful intermediary between speech and text. If we can successfully produce an acoustic model that decodes a sound signal into phonemes the remaining task would be to map those phonemes to their matching words. This step is called Lexical Decoding, named so as it is based on a lexicon or dictionary of the data set.

If we want to train a limited vocabulary of words we might just skip the phonemes. If we have a large vocabulary, then converting to smaller units first, reduces the total number of comparisons needed.

Acoustic Models and the Trouble with Time

With feature extraction, we’ve addressed noise problems as well as variability of speakers. But we still haven’t solved the problem of matching variable lengths of the same word.

Dynamic Time Warping (DTW) calculates the similarity between two signals, even if their time lengths differ. This can be used to align the sequence data of a new word to its most similar counterpart in a dictionary of word examples.

2 signals mapped with Dynamic Time Warping

#deep-speech #speech #deep-learning #speech-recognition #machine-learning #deep learning

Royce  Reinger

Royce Reinger

1659212880

Pocketsphinx-ruby: Ruby Speech Recognition with Pocketsphinx

pocketsphinx-ruby   

This gem provides Ruby FFI bindings for Pocketsphinx, a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the CMU Sphinx Open Source Toolkit For Speech Recognition.

Pocketsphinx's SWIG interface was initially considered for this gem, but dropped in favor of FFI for many of the reasons outlined here; most importantly ease of maintenance and JRuby support.

The goal of this project is to make it as easy as possible for the Ruby community to experiment with speech recognition. Please do contribute fixes and enhancements.

Installation

This gem depends on Pocketsphinx (libpocketsphinx), and Sphinxbase (libsphinxbase and libsphinxad). The current stable versions (0.8) are from late 2012 and are now outdated. Build them manually from source, or on OSX the latest development (potentially unstable) versions can be installed using Homebrew as follows (more information here).

Add the Homebrew tap:

$ brew tap watsonbox/cmu-sphinx

You'll see some warnings as these formulae conflict with those in the main reponitory, but that's fine.

Install the libraries:

$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxbase
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxtrain # optional
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-pocketsphinx

You can test continuous recognition as follows:

$ pocketsphinx_continuous -inmic yes

Then add this line to your application's Gemfile:

gem 'pocketsphinx-ruby'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pocketsphinx-ruby

Usage

The LiveSpeechRecognizer is modeled on the same class in Sphinx4. It uses the Microphone and Decoder classes internally to provide a simple, high-level recognition interface:

require 'pocketsphinx-ruby' # Omitted in subsequent examples

Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
  puts speech
end

The AudioFileSpeechRecognizer decodes directly from an audio file by coordinating interactions between an AudioFile and Decoder.

recognizer = Pocketsphinx::AudioFileSpeechRecognizer.new

recognizer.recognize('spec/assets/audio/goforward.raw') do |speech|
  puts speech # => "go forward ten meters"
end

These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the vad_postspeech, vad_prespeech, and vad_threshold configuration settings.

Configuration

All of Pocketsphinx's decoding settings are managed by the Configuration class, which can be passed into the high-level speech recognizers:

configuration = Pocketsphinx::Configuration.default
configuration.details('vad_threshold')
# => {
#   :name => "vad_threshold",
#   :type => :float,
#   :default => 2.0,
#   :value => 2.0,
#   :info => "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
# }

configuration['vad_threshold'] = 4

Pocketsphinx::LiveSpeechRecognizer.new(configuration)

You can find the output of configuration.details here for more information on the various different settings.

Microphone

The Microphone class uses Pocketsphinx's libsphinxad to record audio for speech recognition. For desktop applications this should normally be 16bit/16kHz raw PCM audio, so these are the default settings. The exact audio backend depends on what was selected when libsphinxad was built. On OSX, OpenAL is now supported and should work just fine.

For example, to record and save a 5 second raw audio file:

microphone = Pocketsphinx::Microphone.new

File.open("test.raw", "wb") do |file|
  microphone.record do
    FFI::MemoryPointer.new(:int16, 2048) do |buffer|
      50.times do
        sample_count = microphone.read_audio(buffer, 2048)
        file.write buffer.get_bytes(0, sample_count * 2)

        sleep 0.1
      end
    end
  end
end

To open this audio file take a look at this wiki page.

Decoder

The Decoder class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:

decoder = Pocketsphinx::Decoder.new(Pocketsphinx::Configuration.default)
decoder.decode 'spec/assets/audio/goforward.raw'

puts decoder.hypothesis # => "go forward ten meters"

And split into individual words with frame data:

decoder.words
# => [
#  #<struct Pocketsphinx::Decoder::Word word="<s>", start_frame=608, end_frame=610>,
#  #<struct Pocketsphinx::Decoder::Word word="go", start_frame=611, end_frame=622>,
#  #<struct Pocketsphinx::Decoder::Word word="forward", start_frame=623, end_frame=675>,
#  #<struct Pocketsphinx::Decoder::Word word="ten", start_frame=676, end_frame=711>,
#  #<struct Pocketsphinx::Decoder::Word word="meters", start_frame=712, end_frame=770>,
#  #<struct Pocketsphinx::Decoder::Word word="</s>", start_frame=771, end_frame=821>
# ]

Note: When the Decoder is initialized, the supplied Configuration is updated by Pocketsphinx with some settings from the acoustic model. To see exactly what's going on:

Pocketsphinx::Decoder.new(Pocketsphinx::Configuration.default).configuration.changes

Keyword Spotting

Keyword spotting is another feature that is not in the current stable (0.8) releases of Pocketsphinx, having been merged into trunk early in 2014. It can be useful for detecting an activation keyword in a command and control application, while ignoring all other speech. Set up a recognizer as follows:

configuration = Pocketsphinx::Configuration::KeywordSpotting.new('Okay computer')
recognizer = Pocketsphinx::LiveSpeechRecognizer.new(configuration)

The KeywordSpotting configuration accepts a second argument for adjusting the sensitivity of the keyword detection. Note that this is just a wrapper which sets the keyphrase and kws_threshold settings on the default configuration, and removes the language model:

Pocketsphinx::Configuration::KeywordSpotting.new('keyword', 2).changes
# => [
#   { :name => "keyphrase", :type => :string, :default => nil, :required => false, :value => "keyword", :info => "Keyphrase to spot" },
#   { :name => "kws_threshold", :type => :float, :default => 1.0, :required => false, :value => 2.0, :info => "Threshold for p(hyp)/p(alternatives) ratio" },
#   { :name => "lm", :type => :string, :default => "/usr/local/Cellar/cmu-pocketsphinx/HEAD/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP", :required => false, :value => nil, :info => "Word trigram language model input file" }
# ]

Grammars

Another way of configuring Pocketsphinx is with a grammar, which is normally used to describe very simple types of languages for command and control. Restricting the set of possible utterances in this way can greatly improve recognition accuracy for these types of application.

Load a JSGF grammar from a file:

configuration = Pocketsphinx::Configuration::Grammar.new('sentences.gram')

Or build one dynamically with this simple DSL (currently only supports sentence lists):

configuration = Pocketsphinx::Configuration::Grammar.new do
  sentence "Go forward ten meters"
  sentence "Go backward ten meters"
end

Recognition Accuracy and Training

See the CMU Sphinx resources on training and adapting acoustic models for more information.

Peter Grasch, author of Simon, has also made a number of interesting posts on the state of open source speech recognition, as wells as improving language and acoustic models.

See sphinxtrain-ruby for an experimental toolkit for training/adapting CMU Sphinx acoustic models. Its main goal is to help with adapting existing acoustic models to a specific speaker/accent.

Troubleshooting

First and foremost, because this gem depends on development versions of CMU Sphinx packages, there will be times when errors are caused by API changes or bugs in those packages. Unfortunately until some up to date releases are made this is going to happen from time to time, so please do open an issue with as much detail as you have.

This gem has been tested with a manual Pocketsphinx installation on Ubuntu 14.04 and a Homebrew Pocketsphinx installation on OSX 10.9.4 Mavericks. Take a look at the following common problems before opening an issue.

`attach_function': Function 'ps_default_search_args' not found in [libpocketsphinx.so] (FFI::NotFoundError)

An error like this probably means that you have an old version of the Pocketsphinx libraries installed. If necessary, replace them with a recent development version which supports the features available in this gem.

Contributing

  1. Fork it ( https://github.com/watsonbox/pocketsphinx-ruby/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Projects Using pocketsphinx-ruby

  • Isabella - A voice-computing assistant built in Ruby.
  • sphinxtrain-ruby - A Toolkit for training/adapting CMU Sphinx acoustic models.

Author: Watsonbox
Source Code: https://github.com/watsonbox/pocketsphinx-ruby 
License: MIT license

#ruby 

HI Python

HI Python

1621673040

How to Use ASR System for Accurate Transcription Properties of Your Digital Product

Thanks to advances in speech recognition, companies can now build a whole range of products with accurate transcription capabilities at their heart. Conversation intelligence platforms, personal assistants and video and audio editing tools, for example, all rely on speech to text transcription. However, you often need to train these systems for every domain you want to transcribe, using supervised data. In practice, you need a large body of transcribed audio that’s similar to what you are transcribing just to get started in a new domain.

Recently, Facebook released wav2vec 2.0 which goes some way towards addressing this challenge. wav2vec 2.0 allows you to pre-train transcription systems using _audio only — _with no corresponding transcription — and then use just a tiny transcribed dataset for training.

In this blog, we share how we worked with wav2vec 2.0, with great results.

#speech-to-text-recognition #speech-recognition #machine-learning #artificial-intelligence #python #pytorch #speech-recognition-in-python #hackernoon-top-story