Reid  Rohan

Reid Rohan


8 Best Libraries for Natural Language Processing For JavaScript

In today's post we will learn about 8 Best Libraries for Natural Language Processing for JavaScript. 

What is Natural Language Processing (NLP)?

Natural language refers to the way humans communicate with each other.

Natural Language Processing (NLP) is broadly defined as the electronic manipulation of natural language, like speech and text, by software.

NLP is important because we want to open up communication between machines and humans in a more natural way. NLP has various use cases such as running a search engine, sentimental analysis, entity-recognition, voice-based apps, chatbots, and personal assistants.

The history of natural language processing (NLP) generally started in the 1950s. Alan Turing published the article “Computing Machinery and Intelligence,” a pioneer seminal paper on artificial intelligence.

Some of the notably successful NLP systems developed in the 1960s were SHRDLU and ELIZA. Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. In the 1980s, the NLP started to pick up after the introduction of machine learning algorithms.

Now, decades later, the world is full of multiple NLP libraries and engines. Let’s look at some of them, especially for the newer languages, such as Node.js and JavaScript.

Table of contents:

  • Natural - general natural language facilities for node
  • nlp_compromise - natural language processing
  • Hanzi - HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js
  • Salient - Machine Learning, Natural Language Processing and Sentiment Analysis Toolkit for Node.js
  • Node-summary - Node module that summarizes text using a naive summarization algorithm
  • Snowball-js - javascript implementation of the popular snowball word stemming nlp algorithm
  • Porter-stemmer - Martin Porter's stemmer for node.js
  • Lunr-languages - A collection of languages stemmers and stopwords for Lunr Javascript library

1 - Natural: General natural language facilities for node.

"Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. 


If you’re just looking to use natural without your own node application, you can install via NPM like so:

npm install natural

If you’re interested in contributing to natural, or just hacking on it, then by all means fork away!


Word, Regexp, and Treebank tokenizers are provided for breaking text up into arrays of tokens:

var natural = require('natural');
var tokenizer = new natural.WordTokenizer();
console.log(tokenizer.tokenize("your dog has fleas."));
// [ 'your', 'dog', 'has', 'fleas' ]

The other tokenizers follow a similar pattern:

tokenizer = new natural.TreebankWordTokenizer();
console.log(tokenizer.tokenize("my dog hasn't any fleas."));
// [ 'my', 'dog', 'has', 'n\'t', 'any', 'fleas', '.' ]

tokenizer = new natural.RegexpTokenizer({pattern: /\-/});
// [ 'flea', 'dog' ]

tokenizer = new natural.WordPunctTokenizer();
console.log(tokenizer.tokenize("my dog hasn't any fleas."));
// [ 'my',  'dog',  'hasn',  '\'',  't',  'any',  'fleas',  '.' ]

tokenizer = new natural.OrthographyTokenizer({language: "fi"});
console.log(tokenizer.tokenize("Mikä sinun nimesi on?"));
// [ 'Mikä', 'sinun', 'nimesi', 'on' ]

tokenizer = new natural.SentenceTokenizer();
console.log(tokenizer.tokenize("This is a sentence. This is another sentence"));
// ["This is a sentence.", "This is another sentence."]

In addition to the sentence tokenizer based on regular expressions (called SentenceTokenizer), there is a sentence tokenizer based on parsing (called SentenceTokenizerNew). It is build using PEGjs. It handles more cases, and can be extended in a more structured way (than regular expressions).

View on Github

2 - nlp_compromise: Natural language processing.

compromise tries its best to turn text into data. it makes limited and sensible decisions. it's not as smart as you'd think. 

import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
// ‘she sold seashells by the seashore.’

don't be fancy, at all:

if (doc.has('simon says #Verb')) {
  return true

grab parts of the text:

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

match docs

and get data:

import plg from 'compromise-speech'

let doc = nlp('Milwaukee has certainly had its share of visitors..')
  "text": "Milwaukee",
  "terms": [{ 
    "normal": "milwaukee",
    "syllables": ["mil", "wau", "kee"]

json docs

View on Github

3 - Hanzi: HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js.


npm install hanzi

How to use

Initiate HanziJS. Required.

var hanzi = require("hanzi");


hanzi.decompose(character, type of decomposition);

A function that takes a Chinese character and returns an object with decomposition data. Type of decomposition is optional.

Type of decomposition levels:

  • 1 - "Once" (only decomposes character once),
  • 2 - "Radical" (decomposes character into its lowest radical components),
  • 3 - "Graphical" (decomposes into lowest forms, will be mostly strokes and small indivisable units)
var decomposition = hanzi.decompose('爱');

{ character: '爱',
  components1: [ 'No glyph available', '友' ],
  components2: [ '爫', '冖', '𠂇', '又' ],
  components3: [ '爫', '冖', '𠂇', '㇇', '㇏' ] }

//Example of forced level decomposition

var decomposition = hanzi.decompose('爱', 2);

{ character: '爱', components: [ '爫', '冖', '𠂇', '又' ] }

hanzi.decomposeMany(character string, type of decomposition);

View on Github

4 - Salient: Machine Learning, Natural Language Processing and Sentiment Analysis Toolkit for Node.js.


There are plenty of libraries that perform tokenization, this library is no different, the only exception is that this library will also do some tokenization steps necessary to cleanup random HTML, XML, Wiki, Twitter and other sources. More examples are in the specs directory. Tokenizers in salient are built on top of each other and include the following:

  • Tokenizer (lib/salient/tokenizers/tokenizer.js) abstract
  • RegExpTokenizer (lib/salient/tokenizers/regexp_tokenizer.js) extends: Tokenizer
var salient = require('salient');
var tokenizer = new salient.tokenizers.RegExpTokenizer({ pattern: /\W+/ });
tokenizer.tokenize('these are things');
> ['these', 'are', 'things']
  • UrlTokenizer (lib/salient/tokenizers/url_tokenizer.js) extends: Tokenizer
var salient = require('salient');
var tokenizer = new salient.tokenizers.UrlTokenizer();
tokenizer.tokenize('Some text a wikipedia url in it.');
> ['']
  • WordPunctTokenizer (lib/salient/tokenizers/wordpunct_tokenizer.js) extends: RegExpTokenizer
    • Handles Time, Numerics (including 1st, 2nd..etc), numerics with commas/decimals/percents, $, words with hyphenations, words with and without accepts, with and without apostrophes, punctuations and optional emoticon preservation.
var salient = require('salient');
var tokenizer = new salient.tokenizers.WordPunctTokenizer();
tokenizer.tokenize('From 12:00 am-11:59 pm. on Nov. 12th, you can make donation online to support Wylie Center.')
> [ 'From', '12:00 am', '-', '11:59 pm.', 'on', 'Nov', '.', '12th', ',', 'you', 'can', 'make', 'donation', 'online', 'to', 'support', 'Wylie', 'Center', '.' ]

// preserve emoticons
tokenizer = new salient.tokenizers.WordPunctTokenizer({ preserveEmoticons: true })
tokenizer.tokenize('data, here to clean.Wouldn\'t you say so? :D I have $20.45, are you ok? >:(');
> [ 'data', ',', 'here', 'to', 'clean', '.', 'Wouldn', '\'', 't', 'you', 'say', 'so', '? ', ':D', 'I', 'have', ' $', '20.45', ',', 'are', 'you', 'ok', '?', '>:(' ]

View on Github

5 - Node-summary: Node module that summarizes text using a naive summarization algorithm.


The algorithm used is explained here. Essentially

This algorithm extracts the key sentence from each paragraph in the text.


$ npm install node-summary


var SummaryTool = require('node-summary');

var title = "Swayy is a beautiful new dashboard for discovering and curating online content [Invites]";
var content = "";
content += "Lior Degani, the Co-Founder and head of Marketing of Swayy, pinged me last week when I was in California to tell me about his startup and give me beta access. I heard his pitch and was skeptical. I was also tired, cranky and missing my kids – so my frame of mind wasn't the most positive.\n";
content += "I went into Swayy to check it out, and when it asked for access to my Twitter and permission to tweet from my account, all I could think was, \"If this thing spams my Twitter account I am going to bitch-slap him all over the Internet.\" Fortunately that thought stayed in my head, and not out of my mouth.\n";
content += "One week later, I'm totally addicted to Swayy and glad I said nothing about the spam (it doesn't send out spam tweets but I liked the line too much to not use it for this article). I pinged Lior on Facebook with a request for a beta access code for TNW readers. I also asked how soon can I write about it. It's that good. Seriously. I use every content curation service online. It really is That Good.\n";
content += "What is Swayy? It's like Percolate and LinkedIn recommended articles, mixed with trending keywords for the topics you find interesting, combined with an analytics dashboard that shows the trends of what you do and how people react to it. I like it for the simplicity and accuracy of the content curation.\n";
content += "Everything I'm actually interested in reading is in one place – I don't have to skip from another major tech blog over to Harvard Business Review then hop over to another major tech or business blog. It's all in there. And it has saved me So Much Time\n\n";
content += "After I decided that I trusted the service, I added my Facebook and LinkedIn accounts. The content just got That Much Better. I can share from the service itself, but I generally prefer reading the actual post first – so I end up sharing it from the main link, using Swayy more as a service for discovery.\n";
content += "I'm also finding myself checking out trending keywords more often (more often than never, which is how often I do it on\n\n\n";
content += "The analytics side isn't as interesting for me right now, but that could be due to the fact that I've barely been online since I came back from the US last weekend. The graphs also haven't given me any particularly special insights as I can't see which post got the actual feedback on the graph side (however there are numbers on the Timeline side.) This is a Beta though, and new features are being added and improved daily. I'm sure this is on the list. As they say, if you aren't launching with something you're embarrassed by, you've waited too long to launch.\n";
content += "It was the suggested content that impressed me the most. The articles really are spot on – which is why I pinged Lior again to ask a few questions:\n";
content += "How do you choose the articles listed on the site? Is there an algorithm involved? And is there any IP?\n";
content += "Yes, we're in the process of filing a patent for it. But basically the system works with a Natural Language Processing Engine. Actually, there are several parts for the content matching, but besides analyzing what topics the articles are talking about, we have machine learning algorithms that match you to the relevant suggested stuff. For example, if you shared an article about Zuck that got a good reaction from your followers, we might offer you another one about Kevin Systrom (just a simple example).\n";
content += "Who came up with the idea for Swayy, and why? And what's your business model?\n";
content += "Our business model is a subscription model for extra social accounts (extra Facebook / Twitter, etc) and team collaboration.\n";
content += "The idea was born from our day-to-day need to be active on social media, look for the best content to share with our followers, grow them, and measure what content works best.\n";
content += "Who is on the team?\n";
content += "Ohad Frankfurt is the CEO, Shlomi Babluki is the CTO and Oz Katz does Product and Engineering, and I [Lior Degani] do Marketing. The four of us are the founders. Oz and I were in 8200 [an elite Israeli army unit] together. Emily Engelson does Community Management and Graphic Design.\n";
content += "If you use Percolate or read LinkedIn's recommended posts I think you'll love Swayy.\n";
content += "Want to try Swayy out without having to wait? Go to this secret URL and enter the promotion code thenextweb . The first 300 people to use the code will get access.\n";
content += "Image credit: Thinkstock";

SummaryTool.summarize(title, content, function(err, summary) {
	if(err) console.log("Something went wrong man!");


	console.log("Original Length " + (title.length + content.length));
	console.log("Summary Length " + summary.length);
	console.log("Summary Ratio: " + (100 - (100 * (summary.length / (title.length + content.length)))));

var url = ""

SummaryTool.summarizeFromUrl(url, function(err, summary) {
  if(err) {
    console.log("err is ", result)
  else {

View on Github

6 - Snowball-js: Javascript implementation of the popular snowball word stemming nlp algorithm.


npm install snowball


var Snowball = require('snowball');
var stemmer = new Snowball('English');
# => 'abbrebi'


To make porting from Java sources easier each stemmer could be validated by Regex:


Snowball.min.js library was compressed by Google Closure Compiler:

java -jar compiler.jar --js Snowball.js --js_output_file Snowball.min.js

View on Github

7 - Porter-stemmer: Martin Porter's stemmer for node.js.


For node.js, using npm:

npm install porter-stemmer

or git clone this repo.


> var stemmer = require('porter-stemmer').stemmer
> stemmer('Smurftastic')

Test Suite

I have included Dr Porter's sample input and output text in a test suite.

To verify:

npm test

View on Github

8 - Lunr-languages: A collection of languages stemmers and stopwords for Lunr Javascript library.

How to use

Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.

In a web browser

The following example is for the German language (de).

Add the following JS files to the page:

<script src="lunr.js"></script> <!-- lunr.js library -->
<script src=""></script>
<script src=""></script> <!-- or any other language you want -->

then, use the language in when initializing lunr:

var idx = lunr(function () {
  // use the language (de)
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 });
  // now you can call this.add(...) to add documents written in German

That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.

In a web browser, with RequireJS

Add require.js to the page:

<script src="lib/require.js"></script>

then, use the language in when initializing lunr:

require(['lib/lunr.js', '../', '../'], function(lunr, stemmerSupport, de) {
  // since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
  // in the end, we will only need lunr.
  stemmerSupport(lunr); // adds lunr.stemmerSupport
  de(lunr); // adds key

  // at this point, lunr can be used
  var idx = lunr(function () {
  // use the language (de)
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 })
  // now you can call this.add(...) to add documents written in German

View on Github

Thank you for following this article.

Related videos:

NLP Libraries for NodeJS and JavaScript

#javascript #naturallanguageprocessing 

8 Best Libraries for Natural Language Processing For JavaScript
Vicenta  Hauck

Vicenta Hauck


How to N-gram Language Modeling in Natural Language Processing

N-gram is a sequence of n words in the modeling of Natural Language Processing (NLP). How can this technique be useful in language modeling?


#ngram #naturallanguageprocessing #nlp 

How to N-gram Language Modeling in Natural Language Processing
Thai  Son

Thai Son


Cách Lập Mô Hình Ngôn Ngữ N-gram Trong Natural Language Processing

N-gram là một chuỗi gồm n từ trong mô hình NLP. Làm thế nào kỹ thuật này có thể hữu ích trong mô hình ngôn ngữ?

Giới thiệu

Mô hình ngôn ngữ được sử dụng để xác định xác suất của trình tự của từ. Mô hình này có một số lượng lớn các ứng dụng như nhận dạng giọng nói, lọc thư rác, v.v. [1].

Xử lý ngôn ngữ tự nhiên (NLP)

Xử lý ngôn ngữ tự nhiên (NLP) là sự hội tụ của trí tuệ nhân tạo (AI) và ngôn ngữ học. Nó được sử dụng để làm cho máy tính hiểu các từ hoặc câu lệnh được viết bằng ngôn ngữ của con người. NLP đã được phát triển để làm cho công việc và giao tiếp với máy tính trở nên dễ dàng và hài lòng. Vì tất cả những người sử dụng máy tính không thể biết rõ bằng các ngôn ngữ cụ thể của máy móc nên NLP hoạt động tốt hơn với những người dùng không thể có thời gian học các ngôn ngữ mới của máy móc. Chúng ta có thể định nghĩa ngôn ngữ như một tập hợp các quy tắc hoặc ký hiệu. Các ký hiệu được kết hợp để truyền tải thông tin. Họ bị chuyên chế bởi bộ quy tắc. NLP được phân loại thành hai phần là hiểu ngôn ngữ tự nhiên và tạo ngôn ngữ tự nhiên, giúp phát triển các nhiệm vụ để hiểu và tạo ra văn bản.

Phân loại NLP
Hình 1 Các phân loại NLP

Phương pháp mô hình hóa ngôn ngữ

Các mô hình ngôn ngữ được phân loại như sau:

Mô hình ngôn ngữ thống kê : Trong mô hình này, có sự phát triển của các mô hình xác suất. Mô hình xác suất này dự đoán từ tiếp theo trong một chuỗi. Ví dụ mô hình ngôn ngữ N-gram. Mô hình này có thể được sử dụng để phân biệt đầu vào. Chúng có thể được sử dụng để lựa chọn một giải pháp khả thi. Mô hình này phụ thuộc vào lý thuyết xác suất. Xác suất là dự đoán khả năng một điều gì đó sẽ xảy ra.

Mô hình ngôn ngữ thần kinh: Lập mô hình ngôn ngữ thần kinh cho kết quả tốt hơn so với các phương pháp cổ điển đối với cả các mô hình độc lập và khi các mô hình được kết hợp vào các mô hình lớn hơn trong các nhiệm vụ khó khăn, tức là nhận dạng giọng nói và dịch máy. Một phương pháp thực hiện mô hình ngôn ngữ thần kinh là nhúng từ [1].

Mô hình hóa N-gram trong NLP

N-gram là một chuỗi gồm N từ trong mô hình NLP. Hãy xem xét một ví dụ về câu lệnh để lập mô hình. “Tôi thích đọc sách lịch sử và xem phim tài liệu”. Trong một gam hoặc đơn ký tự, có một chuỗi một từ. Đối với câu trên, trong một gam nó có thể là “tôi”, “tình yêu”, “lịch sử”, “sách”, “và”, “xem”, “phim tài liệu”. Trong hai gam hoặc hai gam, có một chuỗi hai từ, nghĩa là “Tôi yêu”, “thích đọc” hoặc “sách lịch sử”. Trong ba gam hoặc ba gam, có ba chuỗi từ tức là “Tôi thích đọc”, “sách lịch sử” hoặc “và xem phim tài liệu” [3]. Hình minh họa của mô hình N-gram tức là cho N = 1,2,3 được đưa ra dưới đây trong Hình 2 [5].

Mô hình Uni-gram, Bi-gram và Tri-gram
Hình 2 Mô hình Uni-gram, Bi-gram và Tri-gram

Đối với N-1 từ, mô hình N-gram dự đoán hầu hết các từ xuất hiện có thể theo trình tự. Mô hình là mô hình ngôn ngữ xác suất được đào tạo trên bộ sưu tập văn bản. Mô hình này hữu ích trong các ứng dụng như nhận dạng giọng nói và dịch máy. Một mô hình đơn giản có một số hạn chế có thể được cải thiện bằng cách làm mịn, nội suy và lùi lại. Vì vậy, mô hình ngôn ngữ N-gram là về việc tìm kiếm phân bố xác suất trên các trình tự của từ. Hãy xem xét các câu tức là "Có mưa to" và "Có lũ lớn". Bằng kinh nghiệm sử dụng, có thể nói câu đầu tiên là tốt. Mô hình ngôn ngữ N-gram cho biết rằng "mưa lớn" xảy ra thường xuyên hơn "lũ lớn". Vì thế, câu lệnh đầu tiên có nhiều khả năng xảy ra hơn và sau đó nó sẽ được lựa chọn bởi mô hình này. Trong mô hình một gam, mô hình thường dựa vào đó từ nào xuất hiện thường xuyên mà không cần cân nhắc các từ trước đó. Trong 2 gam, chỉ từ trước đó được xem xét để dự đoán từ hiện tại. Trong 3 gam, hai từ trước đó được coi là. Trong mô hình ngôn ngữ N-gram, các xác suất sau được tính:

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

Vì không thực tế khi tính toán xác suất có điều kiện nhưng bằng cách sử dụng " Các giả định Markov" , điều này được gần đúng với mô hình bi-gram là [4]:

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

Các ứng dụng của Mô hình N-gram trong NLP

Trong nhận dạng giọng nói, đầu vào có thể bị nhiễu. Tiếng ồn này có thể tạo ra một lời nói sai đối với việc chuyển đổi văn bản. Mô hình ngôn ngữ N-gram sửa lỗi nhiễu bằng cách sử dụng kiến ​​thức xác suất. Tương tự như vậy, mô hình này được sử dụng trong các bản dịch máy để tạo ra các câu lệnh tự nhiên hơn trong các ngôn ngữ đích và được chỉ định. Để sửa lỗi chính tả, đôi khi từ điển là vô dụng. Ví dụ: "trong khoảng mười lăm phút" 'minuets' là một từ hợp lệ theo từ điển nhưng nó không chính xác trong cụm từ. Mô hình ngôn ngữ N-gram có thể khắc phục loại lỗi này.

Mô hình ngôn ngữ N-gram nói chung là ở các cấp độ từ. Nó cũng được sử dụng ở các cấp độ ký tự để thực hiện việc tách gốc, tức là để tách các từ gốc khỏi một hậu tố. Bằng cách nhìn vào mô hình N-gram, các ngôn ngữ có thể được phân loại hoặc phân biệt giữa cách viết của Hoa Kỳ và Vương quốc Anh. Nhiều ứng dụng nhận được lợi ích từ mô hình N-gram bao gồm gắn thẻ một phần của bài phát biểu, các thế hệ ngôn ngữ tự nhiên, các điểm tương đồng từ và chiết xuất tình cảm. [4].

Hạn chế của Mô hình N-gram trong NLP

Mô hình ngôn ngữ N-gram cũng có một số hạn chế. Có một vấn đề với các từ ngoài từ vựng. Những từ này là trong quá trình kiểm tra nhưng không có trong đào tạo. Một giải pháp là sử dụng các từ vựng cố định và sau đó chuyển các từ vựng trong khóa đào tạo sang các từ giả. Khi được triển khai trong phân tích cảm tính, mô hình bi-gram hoạt động tốt hơn mô hình đơn gam nhưng sau đó số lượng các tính năng được tăng gấp đôi. Vì vậy, việc mở rộng mô hình N-gram thành các tập dữ liệu lớn hơn hoặc chuyển sang bậc cao hơn cần các phương pháp lựa chọn tính năng tốt hơn. Mô hình N-gram chụp bối cảnh đường dài kém. Nó đã được hiển thị sau mỗi 6 gam, việc đạt được hiệu suất bị hạn chế. 


#ngram #naturallanguageprocessing #nlp 

Cách Lập Mô Hình Ngôn Ngữ N-gram Trong Natural Language Processing

Как смоделировать язык N-грамм в Natural Language Processing

N-грамма — это последовательность из n слов в моделировании НЛП. Как этот метод может быть полезен в языковом моделировании?


Языковое моделирование используется для определения вероятности последовательности слов. Это моделирование имеет большое количество приложений, таких как распознавание речи, фильтрация спама и т. д. [1].

Обработка естественного языка (NLP)

Обработка естественного языка (NLP) — это слияние искусственного интеллекта (ИИ) и лингвистики. Он используется для того, чтобы компьютеры понимали слова или утверждения, написанные на человеческом языке. НЛП было разработано для того, чтобы сделать работу и общение с компьютером легкими и приятными. Поскольку все пользователи компьютеров не могут быть хорошо знакомы с конкретными языками машин, НЛП лучше работает с пользователями, у которых нет времени на изучение новых языков машин. Мы можем определить язык как набор правил или символов. Символы объединяются для передачи информации. Они тиранизированы набором правил. НЛП подразделяется на две части: понимание естественного языка и генерация естественного языка, которые развивают задачи для понимания и создания текста.

Классификации НЛП
Рис. 1. Классификации НЛП

Методы языкового моделирования

Языковые модели классифицируются следующим образом:

Моделирование статистического языка : в этом моделировании происходит разработка вероятностных моделей. Эта вероятностная модель предсказывает следующее слово в последовательности. Например, моделирование языка N-грамм. Это моделирование можно использовать для устранения неоднозначности ввода. Их можно использовать для выбора вероятного решения. Это моделирование зависит от теории вероятностей. Вероятность — это предсказание вероятности того, что что-то произойдет.

Нейронно-языковое моделирование. Нейронно-языковое моделирование дает лучшие результаты, чем классические методы, как для автономных моделей, так и при включении моделей в более крупные модели для решения сложных задач, таких как распознавание речи и машинный перевод. Одним из методов моделирования нейронного языка является встраивание слов [1].

Моделирование N-грамм в НЛП

N-грамма — это последовательность N-слов в моделировании НЛП. Рассмотрим пример постановки для моделирования. «Я люблю читать книги по истории и смотреть документальные фильмы». В одном грамме или униграмме есть последовательность из одного слова. Что касается приведенного выше высказывания, то в одном грамме это может быть «я», «люблю», «история», «книги», «и», «смотрю», «документальные фильмы». В двухграммах или биграммах есть последовательность из двух слов, т. е. «я люблю», «люблю читать» или «книги по истории». В трехграммах или триграммах есть последовательности из трех слов, т. е. «я люблю читать», «книги по истории» или «и смотреть документальные фильмы» [3]. Иллюстрация моделирования N-грамм, т.е. для N=1,2,3, приведена ниже на рисунке 2 [5].

Модель униграммы, биграммы и триграммы
Рис. 2. Модель униграммы, биграммы и триграммы

Для N-1 слов моделирование N-грамм предсказывает наиболее часто встречающиеся слова, которые могут следовать за последовательностями. Модель представляет собой вероятностную языковую модель, которая обучается на наборе текста. Эта модель полезна в приложениях, таких как распознавание речи и машинный перевод. Простая модель имеет некоторые ограничения, которые можно улучшить за счет сглаживания, интерполяции и отсрочки. Итак, языковая модель N-грамм предназначена для нахождения вероятностных распределений последовательностей слов. Рассмотрим предложения т.е. «Был сильный дождь» и «Было сильное наводнение». По опыту можно сказать, что первое утверждение хорошее. Модель языка N-грамм говорит о том, что «сильный дождь» происходит чаще, чем «сильный паводок». Так, первое утверждение более вероятно, и оно будет выбрано этой моделью. В модели с одним граммом модель обычно опирается на то, какое слово встречается часто, не задумываясь над предыдущими словами. В 2-грамме для предсказания текущего слова учитывается только предыдущее слово. В 3-грамме учитываются два предыдущих слова. В языковой модели N-грамм вычисляются следующие вероятности:

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

Поскольку расчет условной вероятности нецелесообразен, кроме как с использованием « марковских предположений» , это аппроксимируется биграммной моделью как [4]:

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

Применение модели N-грамм в НЛП

При распознавании речи ввод может быть шумным. Этот шум может исказить речь при преобразовании текста. Модель языка N-грамм исправляет шум, используя знание вероятности. Точно так же эта модель используется в машинных переводах для создания более естественных утверждений на целевом и заданных языках. Для исправления орфографических ошибок словарь иногда бесполезен. Например, «примерно через пятнадцать минут» «менуэт» является допустимым словом в соответствии со словарем, но неверным во фразе. Языковая модель N-грамм может исправить этот тип ошибки.

Языковая модель N-грамм обычно находится на уровне слов. Он также используется на уровне символов для определения основы, т. е. для отделения корневых слов от суффикса. Глядя на модель N-грамм, можно классифицировать языки или различать их правописание в США и Великобритании. Многие приложения получают преимущества от модели N-грамм, включая тегирование частей речи, генерацию естественного языка, сходство слов и извлечение тональностей. [4].

Ограничения модели N-грамм в НЛП

Языковая модель N-грамм также имеет некоторые ограничения. Есть проблема со словарными словами. Эти слова во время тестирования, но не в обучении. Одним из решений является использование фиксированного словарного запаса, а затем преобразование словарных слов при обучении в псевдослова. При реализации в анализе настроений модель биграмм превзошла модель униграммы, но количество функций удвоилось. Таким образом, масштабирование модели N-грамм для больших наборов данных или переход к более высокому порядку требует более эффективных подходов к выбору признаков. Модель N-грамм плохо отражает контекст междугородной связи. Было показано, что после каждых 6 грамм прирост производительности ограничен. 


#ngram #naturallanguageprocessing #nlp 

Как смоделировать язык N-грамм в Natural Language Processing

如何在 Natural Language Processing 中進行 N-gram 語言建模

N-gram 是 NLP 建模中的 n 個單詞的序列。這種技術如何在語言建模中有用?



自然語言處理 (NLP)

自然語言處理 (NLP) 是人工智能 (AI) 和語言學的融合。它用於使計算機理解用人類語言編寫的單詞或語句。NLP 的開發是為了使與計算機的工作和交流變得容易和令人滿意。由於無法通過機器的特定語言了解所有計算機用戶,因此 NLP 更適合沒有時間學習機器新語言的用戶。我們可以將語言定義為一組規則或符號。符號組合起來傳達信息。他們被一套規則所壓制。NLP 分為自然語言理解和自然語言生成兩部分,自然語言生成演變了理解和生成文本的任務。

圖1 NLP的分類



統計語言建模:在這種建模中,有概率模型的發展。該概率模型預測序列中的下一個單詞。例如 N-gram 語言建模。該建模可用於消除輸入的歧義。它們可用於選擇可能的解決方案。這種建模依賴於概率論。概率是預測某事發生的可能性。

神經語言建模:對於獨立模型以及將模型合併到具有挑戰性任務(即語音識別和機器翻譯)的更大模型中時,神經語言建模比經典方法提供更好的結果。執行神經語言建模的一種方法是通過詞嵌入 [1]。

NLP 中的 N-gram 建模

N-gram 是 NLP 建模中的 N 個單詞的序列。考慮一個建模語句的例子。“我喜歡看歷史書和看紀錄片”。在 one-gram 或 unigram 中,有一個單詞序列。至於上面的說法,一克可以是“我”、“愛”、“歷史”、“書”、“和”、“看”、“紀錄片”。在二元或二元中,有兩個詞的序列,即“我愛”、“愛閱讀”或“歷史書”。在三元組或三元組中,有三個單詞序列,即“我喜歡閱讀”、“歷史書籍”或“和看紀錄片”[3]。下面的圖 2 [5] 中給出了 N-gram 建模的說明,即 N=1,2,3。

Uni-gram、Bi-gram 和 Tri-gram 模型
圖 2 Uni-gram、Bi-gram 和 Tri-gram 模型

對於 N-1 個單詞,N-gram 模型會預測最常出現的可以遵循序列的單詞。該模型是在文本集合上訓練的概率語言模型。該模型在語音識別和機器翻譯等應用中很有用。一個簡單的模型有一些限制,可以通過平滑、插值和回退來改進。因此,N-gram 語言模型是關於尋找單詞序列的概率分佈。考慮句子,即“有大雨”和“有大洪水”。通過使用經驗,可以說第一種說法是好的。N-gram 語言模型告訴我們,“大雨”比“大洪水”更頻繁地發生。所以,第一個語句更有可能發生,然後將由該模型選擇。在 one-gram 模型中,該模型通常依賴於哪個詞經常出現,而不考慮前面的詞。在 2-gram 中,只考慮前一個詞來預測當前詞。在 3-gram 中,考慮了之前的兩個單詞。在 N-gram 語言模型中,計算以下概率:

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

由於通過使用“馬爾可夫假設”來計算條件概率是不切實際的,因此將其近似為二元模型 [4]:

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

N-gram 模型在 NLP 中的應用

在語音識別中,輸入可能是嘈雜的。這種噪音會導致錯誤的語音向文本轉換。N-gram 語言模型通過使用概率知識來糾正噪聲。同樣,該模型用於機器翻譯,以在目標語言和指定語言中生成更自然的語句。對於拼寫錯誤更正,字典有時是無用的。例如,根據字典,“大約十五分鐘”“小步舞曲”是一個有效的詞,但在短語中是不正確的。N-gram 語言模型可以糾正這種類型的錯誤。

N-gram 語言模型通常處於單詞級別。它還用於在字符級別進行詞幹提取,即將根詞與後綴分開。通過查看 N-gram 模型,可以在美國和英國的拼寫之間對語言進行分類或區分。許多應用程序都從 N-gram 模型中受益,包括部分語音的標記、自然語言生成、單詞相似性和情感提取。[4]。

NLP 中 N-gram 模型的局限性

N-gram 語言模型也有一些限制。詞彙量不足的單詞存在問題。這些話是在測試期間,而不是在訓練中。一種解決方案是使用固定詞彙,然後將訓練中的詞彙轉換為偽詞。在情感分析中實施時,二元模型的性能優於一元模型,但特徵數量增加了一倍。因此,將 N-gram 模型擴展到更大的數據集或移動到更高階需要更好的特徵選擇方法。N-gram 模型很難捕捉到長距離上下文。每 6 克後顯示,性能增益有限。 

來源:  https ://

#ngram #naturallanguageprocessing #nlp 

如何在 Natural Language Processing 中進行 N-gram 語言建模
Jarrod  Douglas

Jarrod Douglas


Comment Modéliser Le Langage N-gram Dans Natural Language Processing

N-gramme est une séquence de n mots dans la modélisation de la PNL. Comment cette technique peut-elle être utile dans la modélisation du langage ?


La modélisation du langage est utilisée pour déterminer la probabilité de la séquence du mot. Cette modélisation a un grand nombre d'applications comme la reconnaissance de la parole, le filtrage des spams, etc. [1].

Traitement du langage naturel (TAL)

Le traitement automatique du langage naturel (TLN) est la convergence de l'intelligence artificielle (IA) et de la linguistique. Il est utilisé pour faire comprendre aux ordinateurs les mots ou les déclarations qui sont écrits dans des langues humaines. NLP a été développé pour rendre le travail et la communication avec l'ordinateur faciles et satisfaisants. Comme tous les utilisateurs d'ordinateurs ne peuvent pas être bien connus par les langages spécifiques des machines, la PNL fonctionne mieux avec les utilisateurs qui n'ont pas le temps d'apprendre les nouveaux langages des machines. Nous pouvons définir le langage comme un ensemble de règles ou de symboles. Les symboles sont combinés pour transmettre l'information. Ils sont tyrannisés par l'ensemble des règles. La PNL est classée en deux parties qui sont la compréhension du langage naturel et la génération du langage naturel qui fait évoluer les tâches de compréhension et de génération du texte.

Classifications de la PNL
Figure 1 Classifications de la PNL

Méthodes de modélisation du langage

Les modélisations de langage sont classées comme suit :

Modélisations du langage statistique : Dans cette modélisation, il y a le développement de modèles probabilistes. Ce modèle probabiliste prédit le mot suivant dans une séquence. Par exemple la modélisation du langage N-gram. Cette modélisation peut être utilisée pour désambiguïser l'entrée. Ils peuvent être utilisés pour sélectionner une solution probable. Cette modélisation dépend de la théorie des probabilités. La probabilité consiste à prédire la probabilité que quelque chose se produise.

Modélisations du langage neuronal : La modélisation du langage neuronal donne de meilleurs résultats que les méthodes classiques à la fois pour les modèles autonomes et lorsque les modèles sont incorporés dans des modèles plus grands sur les tâches difficiles, à savoir les reconnaissances vocales et les traductions automatiques. Une méthode d'exécution de la modélisation du langage neuronal est l'incorporation de mots [1].

Modélisation N-gram en PNL

N-gramme est une séquence de N-mots dans la modélisation de la PNL. Considérons un exemple de la déclaration pour la modélisation. "J'adore lire des livres d'histoire et regarder des documentaires". Dans un gramme ou un unigramme, il y a une séquence d'un mot. Quant à la déclaration ci-dessus, dans un gramme, cela peut être "je", "amour", "histoire", "livres", "et", "regarder", "documentaires". Dans deux-grammes ou le bi-gramme, il y a la séquence de deux mots c'est-à-dire « j'aime », « aime lire », ou « livres d'histoire ». Dans le trois-gramme ou le trigramme, il y a les séquences de trois mots c'est-à-dire « j'aime lire », « des livres d'histoire » ou « et regarder des documentaires » [3]. L'illustration de la modélisation N-gramme c'est-à-dire pour N=1,2,3 est donnée ci-dessous dans la Figure 2 [5].

Modèle Uni-gramme, Bi-gramme et Tri-gramme
Figure 2 Modèle Uni-gramme, Bi-gramme et Tri-gramme

Pour N-1 mots, la modélisation N-grammes prédit la plupart des mots survenus qui peuvent suivre les séquences. Le modèle est le modèle de langage probabiliste qui est formé sur la collection du texte. Ce modèle est utile dans des applications telles que la reconnaissance vocale et les traductions automatiques. Un modèle simple présente certaines limites qui peuvent être améliorées par le lissage, les interpolations et le recul. Ainsi, le modèle de langage N-gramme consiste à trouver des distributions de probabilité sur les séquences du mot. Considérez les phrases, c'est-à-dire "Il y a eu de fortes pluies" et "Il y a eu de fortes inondations". En utilisant l'expérience, on peut dire que la première affirmation est bonne. Le modèle de langage N-gram indique que la "forte pluie" se produit plus fréquemment que la "forte inondation". Alors, la première déclaration est plus susceptible de se produire et elle sera alors sélectionnée par ce modèle. Dans le modèle à un gramme, le modèle repose généralement sur le mot qui apparaît souvent sans réfléchir aux mots précédents. En 2 grammes, seul le mot précédent est pris en compte pour prédire le mot courant. En 3 grammes, deux mots précédents sont pris en compte. Dans le modèle de langage N-gram, les probabilités suivantes sont calculées :

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

Comme il n'est pas pratique de calculer la probabilité conditionnelle mais en utilisant les " hypothèses de Markov" , cela est approché du modèle bi-gramme comme [4] :

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

Applications du modèle N-gramme en PNL

En reconnaissance vocale, l'entrée peut être bruyante. Ce bruit peut fausser le discours lors de la conversion du texte. Le modèle de langage N-gram corrige le bruit en utilisant la connaissance des probabilités. De même, ce modèle est utilisé dans les traductions automatiques pour produire des déclarations plus naturelles dans les langues cibles et spécifiées. Pour les corrections de fautes d'orthographe, le dictionnaire est parfois inutile. Par exemple, "dans environ quinze minutes" 'menuets' est un mot valide selon le dictionnaire mais il est incorrect dans la phrase. Le modèle de langage N-gram peut rectifier ce type d'erreur.

Le modèle de langage N-gram est généralement au niveau des mots. Il est également utilisé au niveau des caractères pour faire la racine, c'est-à-dire pour séparer les mots racines d'un suffixe. En regardant le modèle N-gram, les langues peuvent être classées ou différenciées entre les orthographes américaines et britanniques. De nombreuses applications bénéficient du modèle N-gram, notamment le marquage d'une partie du discours, les générations de langage naturel, les similitudes de mots et l'extraction de sentiments. [4].

Limites du modèle N-gramme en PNL

Le modèle de langage N-gram a également quelques limitations. Il y a un problème avec les mots hors vocabulaire. Ces mots sont pendant les tests mais pas dans la formation. Une solution consiste à utiliser le vocabulaire fixe, puis à convertir les mots de vocabulaire de la formation en pseudo-mots. Lorsqu'il est implémenté dans l'analyse des sentiments, le modèle bi-gramme a surpassé le modèle uni-gramme, mais le nombre de caractéristiques est alors doublé. Ainsi, la mise à l'échelle du modèle N-gramme vers des ensembles de données plus grands ou le passage à l'ordre supérieur nécessite de meilleures approches de sélection des fonctionnalités. Le modèle N-gram capture mal le contexte longue distance. Il a été démontré qu'au bout de 6 grammes, le gain de performance est limité. 

Source :

#ngram #naturallanguageprocessing #nlp 

Comment Modéliser Le Langage N-gram Dans Natural Language Processing
Wayne  Richards

Wayne Richards


Cómo Modelar El Lenguaje N-gram En Natural Language Processing

N-grama es una secuencia de n palabras en el modelado de PNL. ¿Cómo puede ser útil esta técnica en el modelado del lenguaje?


El modelado del lenguaje se utiliza para determinar la probabilidad de la secuencia de la palabra. Este modelado tiene una gran cantidad de aplicaciones, es decir, reconocimiento de voz, filtrado de spam, etc. [1].

Procesamiento del lenguaje natural (PNL)

El procesamiento del lenguaje natural (PNL) es la convergencia de la inteligencia artificial (IA) y la lingüística. Se utiliza para hacer que las computadoras entiendan las palabras o declaraciones que están escritas en lenguajes humanos. La PNL ha sido desarrollada para que el trabajo y la comunicación con el ordenador sean fáciles y satisfactorios. Como todos los usuarios de computadoras no pueden ser bien conocidos por los lenguajes específicos de las máquinas, la PNL funciona mejor con los usuarios que no tienen tiempo para aprender los nuevos lenguajes de las máquinas. Podemos definir el lenguaje como un conjunto de reglas o símbolos. Los símbolos se combinan para transmitir la información. Están tiranizados por el conjunto de reglas. La PNL se clasifica en dos partes que son la comprensión del lenguaje natural y la generación del lenguaje natural, que desarrolla las tareas para comprender y generar el texto.

Clasificaciones de la PNL
Figura 1 Clasificaciones de la PNL

Métodos del Modelado del Lenguaje

Los modelos de lenguaje se clasifican de la siguiente manera:

Modelaciones de lenguaje estadístico : En esta modelación, se encuentra el desarrollo de modelos probabilísticos. Este modelo probabilístico predice la siguiente palabra en una secuencia. Por ejemplo, modelado de lenguaje N-gram. Este modelado se puede utilizar para eliminar la ambigüedad de la entrada. Se pueden utilizar para seleccionar una solución probable. Este modelado depende de la teoría de la probabilidad. La probabilidad es predecir la probabilidad de que ocurra algo.

Modelado de lenguaje neuronal: el modelado de lenguaje neuronal brinda mejores resultados que los métodos clásicos tanto para los modelos independientes como cuando los modelos se incorporan a los modelos más grandes en las tareas desafiantes, es decir, reconocimientos de voz y traducciones automáticas. Un método para realizar el modelado del lenguaje neuronal es mediante la incrustación de palabras [1].

Modelado de N-gramas en PNL

N-grama es una secuencia de N-palabras en el modelado de PNL. Considere un ejemplo de la declaración para el modelado. “Me encanta leer libros de historia y ver documentales”. En un gramo o unigrama, hay una secuencia de una palabra. En cuanto a la declaración anterior, en un gramo puede ser "yo", "amor", "historia", "libros", "y", "ver", "documentales". En dos gramos o bigramas, hay una secuencia de dos palabras, es decir, "Me encanta", "Me encanta leer" o "Libros de historia". En el tres-grama o el tri-grama, hay secuencias de tres palabras, es decir, "Me encanta leer", "libros de historia" o "y ver documentales" [3]. La ilustración del modelado de N-gramas, es decir, para N=1,2,3, se muestra a continuación en la Figura 2 [5].

Modelo de unigrama, bigrama y trigrama
Figura 2 Modelo de unigrama, bigrama y trigrama

Para N-1 palabras, el modelo N-grama predice la mayoría de las palabras que pueden seguir las secuencias. El modelo es el modelo de lenguaje probabilístico que se entrena en la recopilación del texto. Este modelo es útil en aplicaciones, por ejemplo, reconocimiento de voz y traducción automática. Un modelo simple tiene algunas limitaciones que se pueden mejorar suavizando, interpolando y retrocediendo. Entonces, el modelo de lenguaje N-grama se trata de encontrar distribuciones de probabilidad sobre las secuencias de la palabra. Considere las oraciones, es decir, "Hubo fuertes lluvias" y "Hubo fuertes inundaciones". Usando la experiencia, se puede decir que la primera afirmación es buena. El modelo de lenguaje N-gram dice que la "lluvia intensa" ocurre con más frecuencia que la "inundación intensa". Asi que, la primera declaración es más probable que ocurra y luego será seleccionada por este modelo. En el modelo de un gramo, el modelo generalmente se basa en la palabra que ocurre a menudo sin tener en cuenta las palabras anteriores. En 2 gramos, solo se considera la palabra anterior para predecir la palabra actual. En 3-gram, se consideran dos palabras anteriores. En el modelo de lenguaje N-gram se calculan las siguientes probabilidades:

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

Como no es práctico calcular la probabilidad condicional sino usando los “ Supuestos de Markov” , esto se aproxima al modelo de bi-grama como [4]:

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

Aplicaciones del Modelo N-gram en PNL

En el reconocimiento de voz, la entrada puede ser ruidosa. Este ruido puede generar un discurso incorrecto en la conversión de texto. El modelo de lenguaje N-gram corrige el ruido usando conocimiento de probabilidad. Asimismo, este modelo se utiliza en traducciones automáticas para producir declaraciones más naturales en idiomas específicos y de destino. Para las correcciones de errores ortográficos, el diccionario es inútil a veces. Por ejemplo, "en unos quince minutos" 'minuetos' es una palabra válida según el diccionario pero es incorrecta en la frase. El modelo de lenguaje N-gram puede corregir este tipo de error.

El modelo de lenguaje N-grama generalmente está en los niveles de palabra. También se utiliza en los niveles de caracteres para realizar la lematización, es decir, para separar las palabras raíz de un sufijo. Al observar el modelo N-gram, los idiomas se pueden clasificar o diferenciar entre las ortografías de los EE. UU. y el Reino Unido. Muchas aplicaciones se benefician del modelo N-gram, incluido el etiquetado de parte del discurso, las generaciones de lenguaje natural, las similitudes de palabras y la extracción de sentimientos. [4].

Limitaciones del modelo N-gram en PNL

El modelo de lenguaje N-gram también tiene algunas limitaciones. Hay un problema con las palabras fuera de vocabulario. Estas palabras son durante la prueba pero no en el entrenamiento. Una solución es usar el vocabulario fijo y luego convertir las palabras del vocabulario en el entrenamiento en pseudopalabras. Cuando se implementó en el análisis de sentimiento, el modelo de bigrama superó al modelo de unigrama, pero el número de funciones se duplicó. Por lo tanto, la escala del modelo de N-gramas a los conjuntos de datos más grandes o pasar al orden superior necesita mejores enfoques de selección de características. El modelo N-gram captura pobremente el contexto de larga distancia. Se ha demostrado que después de cada 6 gramos, la ganancia de rendimiento es limitada. 


#ngram #naturallanguageprocessing #nlp 

Cómo Modelar El Lenguaje N-gram En Natural Language Processing

Como Modelar A Linguagem N-gram No Natural Language Processing

N-gram é uma sequência de n palavras na modelagem da PNL. Como essa técnica pode ser útil na modelagem de linguagem?


A modelagem de linguagem é usada para determinar a probabilidade da sequência da palavra. Esta modelagem tem um grande número de aplicações, ou seja, reconhecimento de voz, filtragem de spam, etc. [1].

Processamento de linguagem natural (PLN)

O processamento de linguagem natural (PLN) é a convergência da inteligência artificial (IA) e da linguística. Ele é usado para fazer os computadores entenderem as palavras ou declarações que são escritas em idiomas humanos. A PNL foi desenvolvida para tornar o trabalho e a comunicação com o computador fácil e satisfatório. Como nem todos os usuários de computador podem ser bem conhecidos pelas linguagens específicas das máquinas, a PNL funciona melhor com os usuários que não têm tempo para aprender as novas linguagens das máquinas. Podemos definir a linguagem como um conjunto de regras ou símbolos. Os símbolos são combinados para transmitir a informação. Eles são tiranizados pelo conjunto de regras. A PNL é classificada em duas partes que são a compreensão da linguagem natural e a geração da linguagem natural, que envolve as tarefas de compreensão e geração do texto.

Classificações da PNL
Figura 1 Classificações da PNL

Métodos da Modelagem da Linguagem

As modelagens de linguagem são classificadas da seguinte forma:

Modelagens de linguagem estatística : Nesta modelagem, há o desenvolvimento de modelos probabilísticos. Este modelo probabilístico prevê a próxima palavra em uma sequência. Por exemplo, modelagem de linguagem N-gram. Essa modelagem pode ser usada para desambiguar a entrada. Eles podem ser usados ​​para selecionar uma solução provável. Essa modelagem depende da teoria da probabilidade. Probabilidade é prever a probabilidade de algo acontecer.

Modelagens de linguagem neural: A modelagem de linguagem neural oferece melhores resultados do que os métodos clássicos tanto para os modelos autônomos quanto quando os modelos são incorporados aos modelos maiores nas tarefas desafiadoras, ou seja, reconhecimento de fala e traduções automáticas. Um método de realizar a modelagem de linguagem neural é por incorporação de palavras [1].

Modelagem N-gram em PNL

N-gram é uma sequência das N-palavras na modelagem da PNL. Considere um exemplo da declaração para modelagem. “Adoro ler livros de história e assistir documentários”. Em um grama ou unigrama, há uma sequência de uma palavra. Quanto à afirmação acima, em um grama pode ser “eu”, “amor”, “história”, “livros”, “e”, “assistindo”, “documentários”. No two-gram ou no bi-gram, há a sequência de duas palavras, ou seja, “eu amo”, “amo ler” ou “livros de história”. No trigrama ou trigrama, existem as três sequências de palavras, ou seja, “eu amo ler”, “livros de história” ou “e assistir documentários” [3]. A ilustração da modelagem N-gram, ou seja, para N=1,2,3 é dada abaixo na Figura 2 [5].

Modelo Unigrama, Bigrama e Trigrama
Figura 2 Modelo Unigrama, Bigrama e Trigrama

Para palavras N-1, a modelagem N-gram prevê a maioria das palavras ocorridas que podem seguir as sequências. O modelo é o modelo de linguagem probabilístico que é treinado na coleta do texto. Este modelo é útil em aplicações como reconhecimento de voz e traduções automáticas. Um modelo simples tem algumas limitações que podem ser melhoradas por suavização, interpolações e retrocesso. Assim, o modelo de linguagem N-gram trata de encontrar distribuições de probabilidade sobre as sequências da palavra. Considere as frases, por exemplo, "Houve uma chuva forte" e "Houve uma grande inundação". Usando a experiência, pode-se dizer que a primeira afirmação é boa. O modelo de linguagem N-gram diz que a "chuva forte" ocorre com mais frequência do que a "inundação forte". Então, a primeira afirmação é mais provável de ocorrer e será então selecionada por este modelo. No modelo de um grama, o modelo geralmente se baseia na palavra que ocorre muitas vezes sem ponderar as palavras anteriores. Em 2 gramas, apenas a palavra anterior é considerada para prever a palavra atual. Em 3 gramas, duas palavras anteriores são consideradas. No modelo de linguagem N-gram as seguintes probabilidades são calculadas:

P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”).

Como não é prático calcular a probabilidade condicional, mas usando as “ Suposições de Markov” , isso é aproximado ao modelo bigrama como [4]:

P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain” |“heavy”)

Aplicações do Modelo N-gram em PNL

No reconhecimento de fala, a entrada pode ser ruidosa. Este ruído pode fazer um discurso errado para a conversão do texto. O modelo de linguagem N-gram corrige o ruído usando conhecimento de probabilidade. Da mesma forma, esse modelo é usado em traduções automáticas para produzir declarações mais naturais em idiomas de destino e especificados. Para correções de erros ortográficos, o dicionário às vezes é inútil. Por exemplo, "em cerca de quinze minutos" 'minuetos' é uma palavra válida de acordo com o dicionário, mas está incorreta na frase. O modelo de linguagem N-gram pode corrigir esse tipo de erro.

O modelo de linguagem N-gram geralmente está nos níveis de palavras. Também é usado nos níveis de caractere para fazer a divisão, ou seja, para separar as palavras-raiz de um sufixo. Observando o modelo N-gram, os idiomas podem ser classificados ou diferenciados entre as grafias dos EUA e do Reino Unido. Muitos aplicativos se beneficiam do modelo N-gram, incluindo marcação de parte da fala, gerações de linguagem natural, semelhanças de palavras e extração de sentimentos. [4].

Limitações do Modelo N-gram na PNL

O modelo de linguagem N-gram também tem algumas limitações. Há um problema com as palavras fora do vocabulário. Essas palavras são durante o teste, mas não no treinamento. Uma solução é usar o vocabulário fixo e depois converter as palavras do vocabulário no treinamento em pseudopalavras. Quando implementado na análise de sentimento, o modelo bigrama superou o modelo unigrama, mas o número de características é dobrado. Portanto, o dimensionamento do modelo N-gram para conjuntos de dados maiores ou a mudança para a ordem superior precisa de melhores abordagens de seleção de recursos. O modelo N-gram captura mal o contexto de longa distância. Foi demonstrado que a cada 6 gramas, o ganho de desempenho é limitado. 


#ngram #naturallanguageprocessing #nlp 

Como Modelar A Linguagem N-gram No Natural Language Processing
Nat  Grady

Nat Grady


Quanteda: An R Package for The Quantitative Analysis Of Textual Data


An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

The quanteda family of packages

As of v3.0, we have continued our trend of splitting quanteda into modular packages. These are now the following:

  • quanteda: contains all of the core natural language processing and textual data management functions
  • quanteda.textmodels: contains all of the text models and supporting functions, namely the textmodel_*() functions. This was split from the main package with the v2 release
  • quanteda.textstats: statistics for textual data, namely the textstat_*() functions, split with the v3 release
  • quanteda.textplots: plots for textual data, namely the textplot_*() functions, split with the v3 release

We are working on additional package releases, available in the meantime from our GitHub pages:

  • quanteda.sentiment: Functions and lexicons for sentiment analysis using dictionaries
  • quanteda.tidy: Extensions for manipulating document variables in core quanteda objects using your favourite tidyverse functions

and more to come.

How To…

How to Install

The normal way from CRAN, using your R GUI or


Or for the latest development version:

# devtools package required to install quanteda from Github 

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers to build the development version.

How to Use

See the quick start guide to learn how to use quanteda.

How to Get Help

How to Cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774.

For a BibTeX entry, use the output from citation(package = "quanteda").

How to Leave Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

How to Contribute

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

For more details, see

quanteda version 3: New major release

quanteda 3.0 is a major release that improves functionality, completes the modularisation of the package begun in v2.0, further improves function consistency by removing previously deprecated functions, and enhances workflow stability and consistency by deprecating some shortcut steps built into some functions.

See for a full list of the changes.

Download Details:

Author: Quanteda
Source Code: 
License: GPL-3.0 license

#r #naturallanguageprocessing #text 

Quanteda: An R Package for The Quantitative Analysis Of Textual Data
Nat  Grady

Nat Grady


Text2vec: Fast Vectorization, Distances & GloVe Word Embeddings in R

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).

Goals which we aimed to achieve as a result of development of text2vec:

  • Concise - expose as few functions as possible
  • Consistent - expose unified interfaces, no need to explore new interface for each task
  • Flexible - allow to easily solve complex tasks
  • Fast - maximize efficiency per single thread, transparently scale to multiple threads on multicore machines
  • Memory efficient - use streams and iterators, not keep data in RAM if possible

See API section for details.



This package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly. Some parts are fully parallelized using OpenMP.

Other emrassingly parallel tasks (such as vectorization) can use any fork-based parallel backend on UNIX-like machines. They can achieve near-linear scalability with the number of available cores.

Finally, a streaming API means that users do not have to load all the data into RAM.


The package has issue tracker on GitHub where I'm filing feature requests and notes for future work. Any ideas are appreciated.

Contributors are welcome. You can help by:

Download Details:

Author: Dselivanov
Source Code: 
License: View license

#r #text #naturallanguageprocessing 

Text2vec: Fast Vectorization, Distances & GloVe Word Embeddings in R
Monty  Boehm

Monty Boehm


DependencyTrees.jl: Dependency Parsing in Julia


DependencyTrees.jl is a Julia package for working with dependency structures in natural language sentences. It provides a representation of dependency parse trees (DependencyTree), a treebank reader, and transition-based parsing algorithms.


Trees and Treebanks

The DependencyTree type represents a dependency parse of a natural language sentence.

julia> using DependencyTrees

julia> treebank = Treebank("test/data/english.conllu")

julia> for tree in treebank
           # ...

julia> tree = first(treebank)
┌──────────────────── ROOT
│                 ┌─► Pierre
│     ┌─►┌──┌──┌──└── Vinken
│     │  │  │  └────► ,
│     │  │  │     ┌─► 61
│     │  │  │  ┌─►└── years
│     │  │  └─►└───── old
│     │  └──────────► ,
└─►┌──└───────────┌── will
   │  ┌─────┌──┌──└─► join
   │  │     │  │  ┌─► the
   │  │     │  └─►└── board
   │  │  ┌──└───────► as
   │  │  │     ┌────► a
   │  │  │     │  ┌─► nonexecutive
   │  │  └────►└──└── director
   │  └──────────►┌── Nov.
   │              └─► 29
   └────────────────► .

Transition-based parsing

A number of transition systems and oracles are implemented in the TransitionParsing submodule.

julia> using DependencyTrees
julia> using DependencyTrees.TransitionParsing
julia> treebank = Treebank("test/data/english.conll")
julia> oracle = Oracle(ArcHybrid(), dymanic_oracle)
julia> for tree in treebank, (config, gold ts) in oracle(tree)
           # ...

Transition systems:

  • ArcStandard (static oracle)
  • ArcEager[1],[2] (static and dynamic oracles)
  • ArcHybrid[3],[4] (static and dynamic oracles)
  • ArcSwift[5] (static oracle)
  • ListBasedNonProjective[2] (static oracle)

See the documentation for details.


]add DependencyTrees

Contributing & Help

Open an issue! Bug reports, feature requests, etc. are all welcome.


[1]: Nivre 2003: An efficient algorithm for projective dependency parsing.

[2]: Nivre 2008: Algorithms for Deterministic Incremental Dependency Parsing.

[3]: Kuhlmann et all 2011: Dynamic programming algorithms for transition-based dependency parsers.

[4]: Goldberg & Nivre 2013: Training deterministic parsers with non-deterministic oracles.

[5]: Qi & Manning 2016: Arc-swift: a novel transition system for dependency parsing.

Read the docs!

Author: Dellison
Source Code: 
License: MIT license

#julia #tree #naturallanguageprocessing 

DependencyTrees.jl: Dependency Parsing in Julia
Royce  Reinger

Royce Reinger


Stealth: an Open Source Ruby Framework for Text and Voice Chatbots

Stealth is a Ruby framework for creating text and voice chatbots. It's design is inspired by Ruby on Rails's philosophy of convention over configuration. It has an MVC architecture with the slight caveat that views are aptly named replies


  • Deploy anywhere, it's just a Rack app
  • Variants allow you to use a single codebase on multiple messaging platforms
  • Structured, universal reply format
  • Sessions utilize a state-machine concept and are Redis backed
  • Highly scalable. Incoming webhooks are processed via a Sidekiq queue
  • Built-in best practices: catch-alls (error handling), hello flows, goodbye flows

Getting Started

Getting started with Stealth is simple:

> gem install stealth
> stealth new <bot>

Service Integrations

Stealth is extensible. All service integrations are split out into separate Ruby Gems. Things like analytics and natural language processing (NLP) can be added in as gems as well.

Currently, there are gems for:



Natural Language Processing



You can find our full docs here. If something is not clear in the docs, please file an issue! We consider all shortcomings in the docs as bugs.


Stealth is versioned using Semantic Versioning, but it's more like the Linux Kernel. Major version releases are just as arbitrary as minor version releases. We strive to never break anything with any version change. Patches are still issues as the "third dot" in the version string.

Author: Hellostealth
Source Code: 
License: MIT license

#ruby #bot #rails #naturallanguageprocessing 

Stealth: an Open Source Ruby Framework for Text and Voice Chatbots
Royce  Reinger

Royce Reinger


Wlapi: Ruby Based API for The Project Wortschatz Leipzig



WLAPI is a programmatic API for web services provided by the project Wortschatz, University of Leipzig. These services are a great source of linguistic knowledge for morphological, syntactic and semantic analysis of German both for traditional and Computational Linguistics (CL).

Use this API to gain data on word frequencies, left and right neighbours, collocations and semantic similarity. Check it out if you are interested in Natural Language Processing (NLP) and Human Language Technology (HLT).

This library is a set of Ruby bindings for the following featuren. You may also be interested in other language specific bindings:





The original Java based clients with many examples can be found on the project overview page.

Implemented Features

You can use the following search methods:



















The services NGrams and NGramReferences are under development and will be available soon. Both methods throw an NotImplementedError for now.

The interface will be slightly changed in the version 1.0 to be more readable. For example, #cooccurrences_all may become #all_cooccurrences.

There are two additional services by Wortschatz Leipzig: MARS and Kookurrenzschnitt. They will not be implemented due to internal restrictions of the service provider.


WLAPI is provided as a .gem package. Simply install it via RubyGems.

To install WLAPI ussue the following command:

$ gem install wlapi

The current version of WLAPI works with the second Savon generation. You might want to install versions prior to 0.8.0, if you are bound on the old implementations of savon:

$ gem install wlapi -v 0.7.4

If you want to do a system wide installation, do this as root (possibly using sudo).

Alternatively use your Gemfile for dependency management.

We are working on a .deb package, which will be released soon.


Basic usage is very simple:

$ require 'wlapi'
$ api =
$ api.synonyms('Haus', 15) # returns an array with string values (UTF8 encoded)
$ api.domain('Auto') # => Array

If you are going to send mass requests, please contact the support team of the project Wortschatz, get your private credentials and instantiate an authenticated client:

$ require 'wlapi'
$ api =, password)

See documentation in the WLAPI::API class for details on particular search methods.


While using WLAPI you can face following errors:



The errors here are presented in the order they may occur during WLAPI's work.

First WLAPI checks the user input and throws a WLAPI::UserError if the arguments are not appropriate.

Then it fetches a response from a remote server, it can result in a WLAPI::ExternalError. In most cases it will be a simple wrapper around other errors, e.g. Savon::SOAP::Fault.

All of them are subcalsses of WLAPI::Error which is in turn a subclass of the standard RuntimeError.

If you want to intercept any and every exception thrown by WLAPI simply rescue WLAPI::Error.


If you have question, bug reports or any suggestions, please drop me an email :) Any help is deeply appreciated!

If you need some new functionality please contact me or provide a pull request. You code should be complete and tested. Please use local_* and remote_* naming convention for your tests.

Supported Ruby Versions

The library is testend on the following Ruby interpreters:

MRI 1.8.7

MRI 1.9.3

MRI 2.0.x

MRI 2.1.x

JRuby (both 1.8 and 1.9 modes)



For details on future plan and working progress see CHANGELOG.


This library is work in process! Though the interface is mostly complete, you might face some not implemented features.

Please contact me with your suggestions, bug reports and feature requests.

DISCLAIMER We are working on the new RESTful client. Please be patient!

Author: Arbox
Source Code: 
License: MIT license

#ruby #nlp #naturallanguageprocessing 

Wlapi: Ruby Based API for The Project Wortschatz Leipzig
Royce  Reinger

Royce Reinger


Tickle: Natural Language Parser for Recurring Events


If you wish to contribute then please take a look at the Contribution section further down the page, but I'd be really, really grateful if anyone wishes to contribute specs. Not unit tests, but specs. This library's internals will be changing a lot over the coming months, and it would be good to have integration tests - a black-box spec of the library - to work against. Even if you've never contributed to a library before, now is your chance! I'll help anyone through with what they may need, and I promise not to be the standard snarky Open Source dictator that a lot of projects have. We'll try and improve this library together.

Take a look at the develop branch where all this stuff will be happening.


Tickle is a natural language parser for recurring events.

Tickle is designed to be a complement to Chronic and can interpret things such as "every 2 days", "every Sunday", “Sundays", “Weekly", etc.

Tickle has one main method, Tickle.parse, which returns the next time the event should occur, at which point you simply call Tickle.parse again.


If you starting using Tickle pre version 0.1.X, you will need to update your code to either include the :next_only => true option or read correctly from the options hash. Sorry.


You can parse strings containing a natural language interval using the Tickle.parse method.

You can either pass a string prefixed with the word "every", "each" or "on the" or simply the timeframe.

Tickle.parse returns a hash containing the following keys:

  • next the next occurrence of the event. This is NEVER today as its always the next date in the future.
  • starting the date all calculations as based on. If not passed as an option, the start date is right now.
  • until the last date you want this event to run until.
  • expression this is the natural language expression to store to run through tickle later to get the next occurrence.

Tickle returns nil if it cannot parse the string.

Tickle heavily uses Chronic for parsing both the event and the start date.


There are two ways to pass options: natural language or an options hash.


  • Pass a start date with the word "starting", "start", or "stars" e.g. Tickle.parse('every 3 days starting next friday')
  • Pass an end date with the word "until", "end", "ends", or "ending" e.g. Tickle.parse('every 3 days until next friday')
  • Pass both at the same time e.g. "starting May 5th repeat every other week until December 1"

OPTIONS HASH Valid options are:

  • start - must be a valid date or Chronic date expression. e.g. Tickle.parse('every other day', {:start =>,8,1) })
  • until - must be a valid date or Chronic date expression. e.g. Tickle.parse('every other day', {:until =>,10,1) })
  • next_only - legacy switch to only return the next occurrence as a date and not return a hash


You may notice when parsing an expression with a start date that the next occurrence is the start date passed. This is designed behaviour.

Here's why - assume your user says "remind me every 3 weeks starting Dec 1" and today is May 8th. Well the first reminder needs to be sent on Dec 1, not Dec 21 (three weeks later).

If you don't like that, fork and have fun but don't say you weren't warned.


require 'tickle'


  #=> {:next=>2010-05-10 20:57:36 -0400, :expression=>"day", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-10 20:57:36 -0400, :expression=>"day", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-16 20:57:36 -0400, :expression=>"week", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-06-09 20:57:36 -0400, :expression=>"month", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2011-05-09 20:57:36 -0400, :expression=>"year", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-10 20:57:36 -0400, :expression=>"daily", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-16 20:57:36 -0400, :expression=>"weekly", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-06-09 20:57:36 -0400, :expression=>"monthly", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2011-05-09 20:57:36 -0400, :expression=>"yearly", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 days')
  #=> {:next=>2010-05-12 20:57:36 -0400, :expression=>"3 days", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 weeks')
  #=> {:next=>2010-05-30 20:57:36 -0400, :expression=>"3 weeks", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 months')
  #=> {:next=>2010-08-09 20:57:36 -0400, :expression=>"3 months", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 years')
  #=> {:next=>2013-05-09 20:57:36 -0400, :expression=>"3 years", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('other day')
  #=> {:next=>2010-05-11 20:57:36 -0400, :expression=>"other day", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('other week')
  #=> {:next=>2010-05-23 20:57:36 -0400, :expression=>"other week", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('other month')
  #=> {:next=>2010-07-09 20:57:36 -0400, :expression=>"other month", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('other year')
  #=> {:next=>2012-05-09 20:57:36 -0400, :expression=>"other year", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-10 12:00:00 -0400, :expression=>"monday", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"wednesday", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

  #=> {:next=>2010-05-14 12:00:00 -0400, :expression=>"friday", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('February', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2021-02-01 12:00:00 -0500, :expression=>"february", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('May', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-01 12:00:00 -0400, :expression=>"may", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('june', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-06-01 12:00:00 -0400, :expression=>"june", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('beginning of the week')
  #=> {:next=>2010-05-16 12:00:00 -0400, :expression=>"beginning of the week", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('middle of the week')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"middle of the week", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('end of the week')
  #=> {:next=>2010-05-15 12:00:00 -0400, :expression=>"end of the week", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('beginning of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-01 00:00:00 -0400, :expression=>"beginning of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('middle of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-15 00:00:00 -0400, :expression=>"middle of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('end of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-30 00:00:00 -0400, :expression=>"end of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('beginning of the year', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2021-01-01 00:00:00 -0500, :expression=>"beginning of the year", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('middle of the year', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-06-15 00:00:00 -0400, :expression=>"middle of the year", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('end of the year', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-12-31 00:00:00 -0500, :expression=>"end of the year", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd of May', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-03 00:00:00 -0400, :expression=>"the 3rd of may", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd of February', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2021-02-03 00:00:00 -0500, :expression=>"the 3rd of february", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd of February 2022', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2022-02-03 00:00:00 -0500, :expression=>"the 3rd of february 2022", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd of Feb 2022', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2022-02-03 00:00:00 -0500, :expression=>"the 3rd of feb 2022", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 4th of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-04 00:00:00 -0400, :expression=>"the 4th of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 10th of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-10 00:00:00 -0400, :expression=>"the 10th of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the tenth of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-10 00:00:00 -0400, :expression=>"the tenth of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('first', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-01 00:00:00 -0400, :expression=>"first", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the first of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-01 00:00:00 -0400, :expression=>"the first of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the thirtieth', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-30 00:00:00 -0400, :expression=>"the thirtieth", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the fifth', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-05 00:00:00 -0400, :expression=>"the fifth", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 1st Wednesday of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-01 00:00:00 -0400, :expression=>"the 1st wednesday of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd Sunday of May', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-05-17 12:00:00 -0400, :expression=>"the 3rd sunday of may", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 3rd Sunday of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-19 12:00:00 -0400, :expression=>"the 3rd sunday of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the 23rd of June', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-06-23 00:00:00 -0400, :expression=>"the 23rd of june", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the twenty third of June', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-06-23 00:00:00 -0400, :expression=>"the twenty third of june", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the thirty first of July', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-07-31 00:00:00 -0400, :expression=>"the thirty first of july", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the twenty first', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-21 00:00:00 -0400, :expression=>"the twenty first", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}

Tickle.parse('the twenty first of the month', {:start=>#<Date: 2020-04-01 (4917881/2,0,2299161)>, :now=>#<Date: 2020-04-01 (4917881/2,0,2299161)>})
  #=> {:next=>2020-04-21 00:00:00 -0400, :expression=>"the twenty first of the month", :starting=>2020-04-01 00:00:00 -0400, :until=>nil}


Tickle.parse('starting today and ending one week from now')
  #=> {:next=>2010-05-10 22:30:00 -0400, :expression=>"day", :starting=>2010-05-09 22:30:00 -0400, :until=>2010-05-16 20:57:35 -0400}

Tickle.parse('starting tomorrow and ending one week from now')
  #=> {:next=>2010-05-10 12:00:00 -0400, :expression=>"day", :starting=>2010-05-10 12:00:00 -0400, :until=>2010-05-16 20:57:35 -0400}

Tickle.parse('starting Monday repeat every month')
  #=> {:next=>2010-05-10 12:00:00 -0400, :expression=>"month", :starting=>2010-05-10 12:00:00 -0400, :until=>nil}

Tickle.parse('starting May 13th repeat every week')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"week", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('starting May 13th repeat every other day')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"other day", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starts May 13th')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"other day", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starts May 13')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"other day", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starting May 13th')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"other day", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starting May 13')
  #=> {:next=>2010-05-13 12:00:00 -0400, :expression=>"other day", :starting=>2010-05-13 12:00:00 -0400, :until=>nil}

Tickle.parse('every week starts this wednesday')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"week", :starting=>2010-05-12 12:00:00 -0400, :until=>nil}

Tickle.parse('every week starting this wednesday')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"week", :starting=>2010-05-12 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starting May 1st 2021')
  #=> {:next=>2021-05-01 12:00:00 -0400, :expression=>"other day", :starting=>2021-05-01 12:00:00 -0400, :until=>nil}

Tickle.parse('every other day starting May 1 2021')
  #=> {:next=>2021-05-01 12:00:00 -0400, :expression=>"other day", :starting=>2021-05-01 12:00:00 -0400, :until=>nil}

Tickle.parse('every other week starting this Sunday')
  #=> {:next=>2010-05-16 12:00:00 -0400, :expression=>"other week", :starting=>2010-05-16 12:00:00 -0400, :until=>nil}

Tickle.parse('every week starting this wednesday until May 13th')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"week", :starting=>2010-05-12 12:00:00 -0400, :until=>2010-05-13 12:00:00 -0400}

Tickle.parse('every week starting this wednesday ends May 13th')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"week", :starting=>2010-05-12 12:00:00 -0400, :until=>2010-05-13 12:00:00 -0400}

Tickle.parse('every week starting this wednesday ending May 13th')
  #=> {:next=>2010-05-12 12:00:00 -0400, :expression=>"week", :starting=>2010-05-12 12:00:00 -0400, :until=>2010-05-13 12:00:00 -0400}


Tickle.parse('May 1st 2020', {:next_only=>true})
  #=> 2020-05-01 00:00:00 -0400

Tickle.parse('3 days', {:start=>2010-05-09 20:57:36 -0400})
  #=> {:next=>2010-05-12 20:57:36 -0400, :expression=>"3 days", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 weeks', {:start=>2010-05-09 20:57:36 -0400})
  #=> {:next=>2010-05-30 20:57:36 -0400, :expression=>"3 weeks", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 months', {:start=>2010-05-09 20:57:36 -0400})
  #=> {:next=>2010-08-09 20:57:36 -0400, :expression=>"3 months", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 years', {:start=>2010-05-09 20:57:36 -0400})
  #=> {:next=>2013-05-09 20:57:36 -0400, :expression=>"3 years", :starting=>2010-05-09 20:57:36 -0400, :until=>nil}

Tickle.parse('3 days', {:start=>2010-05-09 20:57:36 -0400, :until=>2010-10-09 00:00:00 -0400})
  #=> {:next=>2010-05-12 20:57:36 -0400, :expression=>"3 days", :starting=>2010-05-09 20:57:36 -0400, :until=>2010-10-09 00:00:00 -0400}

Tickle.parse('3 weeks', {:start=>2010-05-09 20:57:36 -0400, :until=>2010-10-09 00:00:00 -0400})
  #=> {:next=>2010-05-30 20:57:36 -0400, :expression=>"3 weeks", :starting=>2010-05-09 20:57:36 -0400, :until=>2010-10-09 00:00:00 -0400}

Tickle.parse('3 months', {:until=>2010-10-09 00:00:00 -0400})
  #=> {:next=>2010-08-09 20:57:36 -0400, :expression=>"3 months", :starting=>2010-05-09 20:57:36 -0400, :until=>2010-10-09 00:00:00 -0400}


Tickle.parse('New Years Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2021-01-01 12:00:00 -0500, :expression=>"january 1, 2021", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Inauguration', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-01-20 12:00:00 -0500, :expression=>"january 20", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Martin Luther King Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-01-20 12:00:00 -0500, :expression=>"third monday in january", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('MLK', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-01-20 12:00:00 -0500, :expression=>"third monday in january", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Presidents Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-02-17 12:00:00 -0500, :expression=>"third monday in february", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Memorial Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-05-25 12:00:00 -0400, :expression=>"4th monday of may", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Independence Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-07-04 12:00:00 -0400, :expression=>"july 4, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Labor Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-09-07 12:00:00 -0400, :expression=>"first monday in september", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Columbus Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-10-12 12:00:00 -0400, :expression=>"second monday in october", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Veterans Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-11-11 12:00:00 -0500, :expression=>"november 11, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Christmas', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-12-25 12:00:00 -0500, :expression=>"december 25, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Super Bowl Sunday', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-02-02 12:00:00 -0500, :expression=>"first sunday in february", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Groundhog Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-02-02 12:00:00 -0500, :expression=>"february 2, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Valentines Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-02-14 12:00:00 -0500, :expression=>"february 14, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Saint Patricks day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-03-17 12:00:00 -0400, :expression=>"march 17, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('April Fools Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-04-01 12:00:00 -0400, :expression=>"april 1, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Earth Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-04-22 12:00:00 -0400, :expression=>"april 22, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Arbor Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-04-24 12:00:00 -0400, :expression=>"fourth friday in april", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Cinco De Mayo', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-05-05 12:00:00 -0400, :expression=>"may 5, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Mothers Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-05-10 12:00:00 -0400, :expression=>"second sunday in may", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Flag Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-06-14 12:00:00 -0400, :expression=>"june 14, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Fathers Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-06-21 12:00:00 -0400, :expression=>"third sunday in june", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Halloween', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-10-31 12:00:00 -0400, :expression=>"october 31, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Christmas Day', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-12-25 12:00:00 -0500, :expression=>"december 25, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Christmas Eve', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2020-12-24 12:00:00 -0500, :expression=>"december 24, 2020", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}

Tickle.parse('Kwanzaa', {:start=>#<Date: 2020-01-01 (4917699/2,0,2299161)>, :now=>#<Date: 2020-01-01 (4917699/2,0,2299161)>})
  #=> {:next=>2021-01-01 12:00:00 -0500, :expression=>"january 1, 2021", :starting=>2020-01-01 00:00:00 -0500, :until=>nil}


To use in your app, we recommend adding two attributes to your database model:

  • next_occurrence
  • tickle_expression

Then call Tickle.parse("date expression goes here") when you need to and save the results accordingly. In your code, each day, simply check to see if today is >= next_occurrence and, if so, run your block.

After it completes, call next_occurrence = Tickle.parse(tickle_expression) again to update the next occurrence of the event.


Tickle can be installed via RubyGems:

gem install tickle

or if you're using Bundler, add this to the Gemfile:

gem "tickle"


Chronic gem:

gem install chronic

thoughtbot's shoulda:

gem install shoulda

or just run bundle install.


Currently, Tickle only works for day intervals but feel free to fork and add time-based interval support or send me a note if you really want me to add it.


Fork it, create a new branch for your changes, and send in a pull request.

  • Only tested code gets in.
  • Document it.
  • If you want to work on something but aren't sure whether it'll get in, create a new branch and open a pull request before you've done any code. That will open an issue and we can discuss it.
  • Do not mess with the Rakefile, version, or history (if you want to have your own version, that is fine but do it on a separate branch from the one I'm going to merge.)


Tickle comes with a full testing suite for simple, complex, options hash and invalid arguments.

You also have some command line options:

  • --v verbose output like the examples above
  • --d debug output showing the guts of a date expression


The original work on the library was done by Joshua Lippiner a.k.a. Noctivity.

HUGE shout-out to both the creator of Chronic, Tom Preston-Werner as well as Brian Brownling who maintains a Github version at

As always, BIG shout-out to the RVM Master himself, Wayne Seguin, for putting up with me and Ruby from day one. Ask Wayne to make you some ciabatta bread next time you see him.

Author: yb66
Source Code: 
License: MIT license

#ruby #time #naturallanguageprocessing 

Tickle: Natural Language Parser for Recurring Events
Anil  Sakhiya

Anil Sakhiya


NLP Tutorial | Build a Chatbot with Python

NLP Tutorial | What is Natural Language Processing? | NLP Full Course

Natural Language Processing is the branch of computer science and, more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand the text and spoken words in much the same way human beings can.
In this tutorial, you will start from the absolute basics of NLP and then proceed toward semantic segmentation, focusing on learning how to build a recommendation engine and build a chatbot using the widely used programming language ‘python.’
If this interests you, then fasten your seatbelt as you are going to dive deeper into the world of artificial intelligence.

#naturallanguageprocessing #nlp #computerscience #artificialintelligence #ai #python #chatbot


NLP Tutorial | Build a Chatbot with Python