1679558460
html5gum
is a WHATWG-compliant HTML tokenizer.
use std::fmt::Write;
use html5gum::{Tokenizer, Token};
let html = "<title >hello world</title>";
let mut new_html = String::new();
for token in Tokenizer::new(html).infallible() {
match token {
Token::StartTag(tag) => {
write!(new_html, "<{}>", String::from_utf8_lossy(&tag.name)).unwrap();
}
Token::String(hello_world) => {
write!(new_html, "{}", String::from_utf8_lossy(&hello_world)).unwrap();
}
Token::EndTag(tag) => {
write!(new_html, "</{}>", String::from_utf8_lossy(&tag.name)).unwrap();
}
_ => panic!("unexpected input"),
}
}
assert_eq!(new_html, "<title>hello world</title>");
html5gum
fully implements 13.2.5 of the WHATWG HTML spec, i.e. is able to tokenize HTML documents and passes html5lib's tokenizer test suite. Since it is just a tokenizer, this means:
html5gum
does not implement charset detection. This implementation takes and returns bytes, but assumes UTF-8. It recovers gracefully from invalid UTF-8.html5gum
does not correct mis-nested tags.html5gum
does not recognize implicitly self-closing elements like <img>
, as a tokenizer it will simply emit a start token. It does however emit a self-closing tag for <img .. />
.html5gum
doesn't implement the DOM, and unfortunately in the HTML spec, constructing the DOM ("tree construction") influences how tokenization is done. For an example of which problems this causes see this example code.html5gum
does not generally qualify as a browser-grade HTML parser as per the WHATWG spec. This can change in the future, see issue 21.With those caveats in mind, html5gum
can pretty much parse tokenize anything that browsers can.
Emitter
traitA distinguishing feature of html5gum
is that you can bring your own token datastructure and hook into token creation by implementing the Emitter
trait. This allows you to:
Rewrite all per-HTML-tag allocations to use a custom allocator or datastructure.
Efficiently filter out uninteresting categories data without ever allocating for it. For example if any plaintext between tokens is not of interest to you, you can implement the respective trait methods as noop and therefore avoid any overhead creating plaintext tokens.
jetscii
, and can be disabled via crate features (see Cargo.toml
)html5gum
was created out of a need to parse HTML tag soup efficiently. Previous options were to:
use quick-xml or xmlparser with some hacks to make either one not choke on bad HTML. For some (rather large) set of HTML input this works well (particularly quick-xml
can be configured to be very lenient about parsing errors) and parsing speed is stellar. But neither can parse all HTML.
For my own usecase html5gum
is about 2x slower than quick-xml
.
use html5ever's own tokenizer to avoid as much tree-building overhead as possible. This was functional but had poor performance for my own usecase (10-15x slower than quick-xml
).
use lol-html, which would probably perform at least as well as html5gum
, but comes with a closure-based API that I didn't manage to get working for my usecase.
Why is this library called html5gum
?
G.U.M: Giant Unreadable Match-statement
<insert "how it feels to chew 5 gum parse HTML" meme here>
Author: Untitaker
Source Code: https://github.com/untitaker/html5gum
License: MIT license
1676661480
Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Foundation includes the String
method components(separatedBy:)
that allows us to get substrings divided up by certain characters:
let sentence = "hello 2017 year"
let words = sentence.components(separatedBy: .whitespaces)
// words.count -> 3
// words = ["hello", "2017", "year"]
Mustard provides a similar feature, but with the opposite approach, where instead of matching by separators you can match by one or more character sets, which is useful if separators simply don't exist:
import Mustard
let sentence = "hello2017year"
let words = sentence.components(matchedWith: .letters, .decimalDigits)
// words.count -> 3
// words = ["hello", "2017", "year"]
If you want more than just the substrings, you can use the tokens(matchedWith: CharacterSet...)
method which will return an array of TokenType
.
As a minimum, TokenType
requires properties for text (the substring matched), and range (the range of the substring in the original string). When using CharacterSets as a tokenizer, the more specific type CharacterSetToken
is returned, which includes the property set
which contains the instance of CharacterSet that was used to create the match.
import Mustard
let tokens = "123Hello world&^45.67".tokens(matchedWith: .decimalDigits, .letters)
// tokens: [CharacterSet.Token]
// tokens.count -> 5 (characters '&', '^', and '.' are ignored)
//
// second token..
// token[1].text -> "Hello"
// token[1].range -> Range<String.Index>(3..<8)
// token[1].set -> CharacterSet.letters
//
// last token..
// tokens[4].text -> "67"
// tokens[4].range -> Range<String.Index>(19..<21)
// tokens[4].set -> CharacterSet.decimalDigits
Mustard can do more than match from character sets. You can create your own tokenizers with more sophisticated matching behavior by implementing the TokenizerType
and TokenType
protocols.
Here's an example of using DateTokenizer
(see example for implementation) that finds substrings that match a MM/dd/yy
format.
DateTokenizer
returns tokens with the type DateToken
. Along with the substring text and range, DateToken
includes a Date
object corresponding to the date in the substring:
import Mustard
let text = "Serial: #YF 1942-b 12/01/17 (Scanned) 12/03/17 (Arrived) ref: 99/99/99"
let tokens = text.tokens(matchedWith: DateTokenizer())
// tokens: [DateTokenizer.Token]
// tokens.count -> 2
// ('99/99/99' is *not* matched by `DateTokenizer` because it's not a valid date)
//
// first date
// tokens[0].text -> "12/01/17"
// tokens[0].date -> Date(2017-12-01 05:00:00 +0000)
//
// last date
// tokens[1].text -> "12/03/17"
// tokens[1].date -> Date(2017-12-03 05:00:00 +0000)
Feedback, or contributions for bug fixing or improvements are welcome. Feel free to submit a pull request or open an issue.
Author: Mathewsanders
Source Code: https://github.com/mathewsanders/Mustard
License: MIT license
1665833528
Un chatbot es un software basado en IA diseñado para interactuar con humanos en sus lenguajes naturales. Estos chatbots generalmente se comunican a través de métodos auditivos o textuales, y pueden imitar sin esfuerzo los lenguajes humanos para comunicarse con los seres humanos de una manera similar a la humana. Podría decirse que un chatbot es una de las mejores aplicaciones de procesamiento de lenguaje natural.
En los últimos años, los chatbots en Python se han vuelto muy populares en los sectores tecnológico y empresarial. Estos bots inteligentes son tan hábiles para imitar los lenguajes humanos naturales y conversar con los humanos, que las empresas de varios sectores industriales los están adoptando. Desde empresas de comercio electrónico hasta instituciones de atención médica, todos parecen estar aprovechando esta ingeniosa herramienta para generar beneficios comerciales. En este artículo, aprenderemos sobre el chatbot usando Python y cómo hacer un chatbot en python .
Para crear un Chatbot en Python desde cero seguimos estos pasos.
Primero, debe crear un archivo llamado train_chatbot.py. Traemos los paquetes que necesita nuestro chatbot y configuramos las variables que usaremos en nuestro proyecto de Python.
El archivo de datos está en JSONformato, por lo que usamos json packagepara leer el JSONarchivo en Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Antes de que podamos crear un modelo de aprendizaje automático o aprendizaje profundo a partir de datos de texto, debemos procesar los datos de diferentes maneras. Dependiendo de las necesidades, tenemos que usar diferentes operaciones para preprocesar los datos.
La tokenización de datos de texto es lo primero y más básico que puede hacer con ellos. Tokenizar es el proceso de romper un texto en pedazos pequeños, como palabras.
Aquí, revisamos los patrones, usamos la nltk.word_tokenize()función para dividir la oración en palabras y agregamos cada palabra a la lista de palabras. También hacemos una lista de las clases a las que pertenecen nuestras etiquetas.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Ahora, averiguaremos qué significa cada palabra y nos desharemos de las palabras que ya están en la lista. La lematización es el proceso de cambiar una palabra a su forma de lema y luego crear un archivo pickle para almacenar los objetos de Python que usaremos al predecir.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Ahora, crearemos los datos de entrenamiento, que incluirán tanto las entradas como las salidas. El patrón será nuestra entrada, y la clase a la que pertenece ese patrón será nuestra salida. Pero la computadora no puede leer palabras, así que convertiremos las palabras en números.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Ahora que nuestros datos de entrenamiento están listos, construiremos una red neuronal profunda de 3 capas. Hacemos esto con la KerasAPI secuencial. Después de entrenar el modelo durante 200 iteraciones, fue 100 % preciso. Vamos a nombrar el archivo “ chatbot model.h5” y guardarlo.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Para predecir las oraciones y obtener una respuesta del usuario que nos permita crear un nuevo archivo llamado “ chatapp.py.”
Cargaremos el modelo entrenado y luego usaremos una interfaz gráfica de usuario para predecir la respuesta del bot. El modelo solo nos dirá a qué clase pertenece, por lo que haremos algunas funciones que descubrirán la clase y luego elegirán una respuesta aleatoria de la lista de respuestas.
Nuevamente, cargamos los archivos pickle ' words.pkl' y ' classes.pkl' que creamos cuando entrenamos nuestro modelo:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Para predecir la clase, tendremos que dar entrada de la misma manera que lo hicimos durante el entrenamiento. Entonces, haremos algunas funciones que preprocesarán el texto y luego adivinarán la clase.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
Después de predecir la clase, obtendremos una respuesta aleatoria de la lista de intenciones.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Ahora, crearemos una interfaz gráfica de usuario (GUI). Usemos la biblioteca Tkinter, que viene con muchas otras bibliotecas GUI útiles.
Tomaremos el mensaje del usuario y usaremos las funciones auxiliares que hemos creado para obtener la respuesta del bot y mostrarla en la GUI. Aquí está el código fuente completo de la GUI.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Para ejecutar el chatbot, tenemos dos archivos principales; tren_chatbot.py y chatapp.py.
Primero, entrenamos el modelo usando el comando en la terminal:
python train_chatbot.py
1665829886
Un chatbot est un logiciel basé sur l'IA conçu pour interagir avec les humains dans leur langue naturelle. Ces chatbots sont généralement conversés via des méthodes auditives ou textuelles, et ils peuvent imiter sans effort les langues humaines pour communiquer avec les êtres humains d'une manière humaine. Un chatbot est sans doute l'une des meilleures applications de traitement du langage naturel.
Au cours des dernières années, les chatbots en Python sont devenus très populaires dans les secteurs de la technologie et des affaires. Ces robots intelligents sont si aptes à imiter les langages humains naturels et à converser avec les humains que des entreprises de divers secteurs industriels les adoptent. Des entreprises de commerce électronique aux établissements de santé, tout le monde semble tirer parti de cet outil astucieux pour générer des avantages commerciaux. Dans cet article, nous allons découvrir le chatbot utilisant Python et comment créer un chatbot en python .
Pour créer un Chatbot en Python à partir de zéro, nous suivons ces étapes.
Tout d'abord, vous devez créer un fichier appelé train_chatbot.py. Nous apportons les packages dont notre chatbot a besoin et configurons les variables que nous utiliserons dans notre projet Python.
Le fichier de données est au JSONformat , nous avons donc utilisé le json packagepour lire le JSONfichier en Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Avant de pouvoir créer un modèle d'apprentissage automatique ou d'apprentissage en profondeur à partir de données textuelles, nous devons traiter les données de différentes manières. Selon les besoins, nous devons utiliser différentes opérations pour prétraiter les données.
La tokenisation des données textuelles est la première et la plus élémentaire chose que vous puissiez faire avec. La tokenisation est le processus qui consiste à diviser un texte en petits morceaux, comme des mots.
Ici, nous passons en revue les modèles, utilisons la nltk.word_tokenize()fonction pour diviser la phrase en mots et ajoutons chaque mot à la liste de mots. Nous dressons également une liste des classes auxquelles appartiennent nos balises.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Maintenant, nous allons comprendre ce que signifie chaque mot et nous débarrasser de tous les mots qui sont déjà sur la liste. La lemmatisation est le processus consistant à transformer un mot en sa forme lemmaire, puis à créer un fichier pickle pour stocker les objets Python que nous utiliserons lors de la prédiction.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Maintenant, nous allons créer les données d'apprentissage, qui incluront à la fois les entrées et les sorties. Le motif sera notre entrée et la classe à laquelle appartient le motif sera notre sortie. Mais l'ordinateur ne peut pas lire les mots, nous allons donc transformer les mots en nombres.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Maintenant que nos données d'entraînement sont prêtes, nous allons construire un réseau neuronal profond à 3 couches. Nous le faisons avec l' KerasAPI séquentielle. Après avoir entraîné le modèle pendant 200 itérations, il était précis à 100 %. Nommons le fichier « chatbot model.h5» et sauvegardons-le.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Pour prédire les phrases et obtenir une réponse de l'utilisateur, laissez-nous créer un nouveau fichier nommé " chatapp.py."
Nous allons charger le modèle formé, puis utiliser une interface utilisateur graphique pour prédire la réponse du bot. Le modèle ne nous dira qu'à quelle classe il appartient, nous allons donc créer des fonctions qui détermineront la classe, puis choisirons une réponse aléatoire dans la liste des réponses.
Encore une fois, nous chargeons les fichiers pickle ' words.pkl' et ' classes.pkl' que nous avons créés lorsque nous avons entraîné notre modèle :
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Pour prédire la classe, nous devrons donner des informations de la même manière que nous l'avons fait lors de la formation. Nous allons donc créer des fonctions qui effectueront un prétraitement sur le texte, puis devinerons la classe.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
Après avoir prédit la classe, nous obtiendrons une réponse aléatoire à partir de la liste des intentions.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Maintenant, nous allons créer une interface utilisateur graphique (GUI). Utilisons la bibliothèque Tkinter, qui est fournie avec de nombreuses autres bibliothèques d'interface graphique utiles.
Nous prendrons le message de l'utilisateur et utiliserons les fonctions d'assistance que nous avons créées pour obtenir la réponse du bot et l'afficher sur l'interface graphique. Voici le code source complet de l'interface graphique.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Pour exécuter le chatbot, nous avons deux fichiers principaux ; train_chatbot.py et chatapp.py.
Tout d'abord, nous formons le modèle à l'aide de la commande dans le terminal :
python train_chatbot.py
TÉLÉCHARGEZ LE CODE SOURCE COMPLET !
1665826260
チャットボットは、自然言語で人間と対話するように設計された AI ベースのソフトウェアです。これらのチャットボットは通常、聴覚またはテキストの方法で会話し、人間の言語を簡単に模倣して、人間のような方法で人間と通信できます。チャットボットは、間違いなく自然言語処理の最良のアプリケーションの 1 つです。
ここ数年、Python のチャットボットは、テクノロジーおよびビジネスの分野で非常に人気が高まっています。これらのインテリジェントなボットは、自然な人間の言語を模倣し、人間と会話することに長けているため、さまざまな産業部門の企業が採用しています。e コマース企業から医療機関まで、誰もがこの気の利いたツールを活用してビジネス上の利益を上げているようです。この記事では、 Python を使用したチャットボットと、Pythonでチャットボットを作成する方法について説明します。
Python でゼロからチャットボットを作成するには、次の手順に従います。
まず、というファイルを作成する必要がありますtrain_chatbot.py。チャットボットに必要なパッケージを取り込み、Python プロジェクトで使用する変数を設定します。
データ ファイルは のJSON形式であるため、 を使用してファイルを Pythonjson packageに読み込みました。JSON
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
テキスト データから機械学習またはディープ ラーニング モデルを作成する前に、さまざまな方法でデータを処理する必要があります。必要に応じて、さまざまな操作を使用してデータを前処理する必要があります。
テキスト データのトークン化は、テキスト データを使ってできる最初の最も基本的なことです。トークン化とは、テキストを単語などの小さな断片に分割するプロセスです。
ここでは、パターンを調べ、nltk.word_tokenize()関数を使用して文を単語に分割し、各単語を単語リストに追加します。タグが属するクラスのリストも作成します。
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
次に、各単語の意味を把握し、既にリストにある単語を削除します。レマタイズとは、単語をそのレンマ形式に変更し、予測時に使用する Python オブジェクトを格納するための pickle ファイルを作成するプロセスです。
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
次に、入力と出力の両方を含むトレーニング データを作成します。パターンが入力になり、パターンが属するクラスが出力になります。でも、コンピューターは単語を読めないので、単語を数字に変換します。
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
トレーニング データの準備ができたので、3 層のディープ ニューラル ネットワークを構築します。これはKerasシーケンシャル API で行います。モデルを 200 回反復してトレーニングした後、100% 正確になりました。ファイルに「 」という名前を付けて保存しましょうchatbot model.h5。
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
文を予測し、ユーザーからの応答を取得して、「chatapp.py.」という名前の新しいファイルを作成できるようにします。
トレーニング済みのモデルを読み込み、グラフィカル ユーザー インターフェイスを使用してボットの応答を予測します。モデルはそれがどのクラスに属しているかだけを教えてくれるので、クラスを特定し、応答のリストからランダムな応答を選択する関数をいくつか作成します。
ここでも、モデルをトレーニングしたときに作成した ' words.pkl' と ' classes.pkl' の pickle ファイルを読み込みます。
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
クラスを予測するには、トレーニング中に行ったのと同じ方法で入力を行う必要があります。そこで、テキストを前処理してからクラスを推測する関数をいくつか作成します。
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
クラスを予測した後、インテントのリストからランダムな応答を取得します。
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
次に、グラフィカル ユーザー インターフェイス (GUI) を作成します。他にも多くの便利な GUI ライブラリが付属している Tkinter ライブラリを使用してみましょう。
ユーザーのメッセージを取得し、作成したヘルパー関数を使用してボットから回答を取得し、GUI に表示します。GUI の完全なソース コードは次のとおりです。
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
チャットボットを実行するために、2 つの主要なファイルがあります。train_chatbot.py と chatapp.py。
まず、ターミナルで次のコマンドを使用してモデルをトレーニングします。
python train_chatbot.py
1665822616
Чат-бот — это программное обеспечение на основе искусственного интеллекта, предназначенное для общения с людьми на их естественных языках. Эти чат-боты обычно общаются с помощью слуховых или текстовых методов, и они могут легко имитировать человеческие языки, чтобы общаться с людьми по-человечески. Чат-бот, возможно, является одним из лучших приложений для обработки естественного языка.
За последние несколько лет чат-боты на Python стали очень популярны в технологическом и бизнес-секторе. Эти интеллектуальные боты настолько искусны в имитации естественного человеческого языка и общении с людьми, что компании в различных отраслях промышленности перенимают их. Кажется, что все, от фирм электронной коммерции до медицинских учреждений, используют этот отличный инструмент для получения преимуществ для бизнеса. В этой статье мы узнаем о чат- боте с использованием Python и о том, как создать чат-бота на python .
Чтобы создать чат-бота на Python с нуля, выполните следующие действия.
Во-первых, вам нужно создать файл с именем train_chatbot.py. Мы загружаем пакеты, которые нужны нашему чат-боту, и настраиваем переменные, которые мы будем использовать в нашем проекте Python.
Файл данных находится в JSONформате, поэтому мы использовали json packageдля чтения JSONфайла в Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Прежде чем мы сможем создать модель машинного обучения или глубокого обучения из текстовых данных, мы должны обработать данные разными способами. В зависимости от потребностей мы должны использовать различные операции для предварительной обработки данных.
Токенизация текстовых данных — это первое и самое основное, что вы можете с ними сделать. Токенизация — это процесс разбиения текста на мелкие части, такие как слова.
Здесь мы просматриваем шаблоны, используем nltk.word_tokenize()функцию, чтобы разбить предложение на слова и добавить каждое слово в список слов. Мы также составляем список классов, к которым принадлежат наши теги.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Теперь мы выясним, что означает каждое слово, и избавимся от всех слов, которые уже есть в списке. Лемматизация — это процесс преобразования слова в его форму леммы, а затем создание файла рассола для хранения объектов Python, которые мы будем использовать при прогнозировании.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Теперь мы создадим обучающие данные, которые будут включать как входные, так и выходные данные. Шаблон будет нашим входом, а класс, к которому принадлежит шаблон, будет нашим выходом. Но компьютер не может читать слова, поэтому мы превратим слова в числа.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Теперь, когда наши обучающие данные готовы, мы построим трехслойную глубокую нейронную сеть. Мы делаем это с помощью Kerasпоследовательного API. После обучения модели на 200 итераций она была точной на 100%. Назовем файл « chatbot model.h5» и сохраним его.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Чтобы предсказать предложения и получить ответ от пользователя, давайте создадим новый файл с именем « chatapp.py.»
Мы загрузим обученную модель, а затем с помощью графического пользовательского интерфейса предскажем реакцию бота. Модель только скажет нам, к какому классу она принадлежит, поэтому мы создадим несколько функций, которые определят класс, а затем выберут случайный ответ из списка ответов.
Опять же, мы загружаем файлы pickle ' words.pkl' и ' classes.pkl', которые мы создали при обучении нашей модели:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Чтобы предсказать класс, нам нужно будет вводить данные так же, как мы это делали во время обучения. Итак, мы создадим несколько функций, которые будут выполнять предварительную обработку текста, а затем угадывать класс.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
После предсказания класса мы получим случайный ответ из списка намерений.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Теперь мы создадим графический интерфейс пользователя (GUI). Давайте воспользуемся библиотекой Tkinter, которая поставляется с множеством других полезных библиотек графического интерфейса.
Мы возьмем сообщение пользователя и используем созданные нами вспомогательные функции, чтобы получить ответ от бота и показать его в графическом интерфейсе. Вот полный исходный код GUI.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Для запуска чат-бота у нас есть два основных файла; train_chatbot.py и chatapp.py.
Сначала обучаем модель с помощью команды в терминале:
python train_chatbot.py
1665815400
챗봇은 자연어로 인간과 상호 작용하도록 설계된 AI 기반 소프트웨어입니다. 이러한 챗봇은 일반적으로 청각 또는 텍스트 방식으로 대화하며 인간의 언어를 쉽게 모방하여 인간과 같은 방식으로 인간과 의사 소통할 수 있습니다. 챗봇은 틀림없이 자연어 처리의 최고의 응용 프로그램 중 하나입니다.
지난 몇 년 동안 Python의 챗봇은 기술 및 비즈니스 부문에서 큰 인기를 얻었습니다. 이 지능형 봇은 인간의 자연스러운 언어를 모방하고 인간과 대화하는 데 매우 능숙하여 다양한 산업 분야의 기업에서 이를 채택하고 있습니다. 전자 상거래 회사에서 의료 기관에 이르기까지 모든 사람이 이 멋진 도구를 활용하여 비즈니스 이점을 얻고 있는 것 같습니다. 이 기사에서는 Python을 사용 하는 챗봇 과 Python 에서 챗봇을 만드는 방법에 대해 알아봅니다 .
Scratch에서 Python으로 챗봇을 만들려면 다음 단계를 따릅니다.
먼저 라는 파일을 만들어야 합니다 train_chatbot.py. 챗봇에 필요한 패키지를 가져오고 Python 프로젝트에서 사용할 변수를 설정합니다.
데이터 파일은 JSON형식이므로 파일을 Python json package으로 읽는 데 사용했습니다.JSON
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
텍스트 데이터에서 머신 러닝 또는 딥 러닝 모델을 만들려면 먼저 데이터를 다양한 방식으로 처리해야 합니다. 필요에 따라 다른 작업을 사용하여 데이터를 사전 처리해야 합니다.
텍스트 데이터를 토큰화하는 것은 텍스트 데이터로 할 수 있는 첫 번째이자 가장 기본적인 작업입니다. 토큰화는 텍스트를 단어와 같이 작은 조각으로 나누는 과정입니다.
여기에서는 패턴을 살펴보고 nltk.word_tokenize()함수를 사용하여 문장을 단어로 나누고 각 단어를 단어 목록에 추가합니다. 또한 태그가 속한 클래스 목록도 만듭니다.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
이제 각 단어의 의미를 파악하고 이미 목록에 있는 단어를 제거합니다. 보조 정리는 단어를 보조 정리 형식으로 변경한 다음 예측할 때 사용할 Python 개체를 저장하기 위해 피클 파일을 만드는 프로세스입니다.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
이제 입력과 출력을 모두 포함하는 훈련 데이터를 만들 것입니다. 패턴이 입력이 되고 패턴이 속한 클래스가 출력이 됩니다. 하지만 컴퓨터는 단어를 읽을 수 없기 때문에 우리는 단어를 숫자로 바꿀 것입니다.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
이제 훈련 데이터가 준비되었으므로 3계층 심층 신경망을 구축합니다. Keras순차 API 를 사용하여 이 작업을 수행합니다 . 200번의 반복을 위해 모델을 훈련시킨 후 100% 정확했습니다. 파일 이름을 " chatbot model.h5"로 지정하고 저장해 보겠습니다.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
문장을 예측하고 사용자로부터 응답을 받아 " chatapp.py." 라는 새 파일을 만들도록 합니다.
훈련된 모델을 로드한 다음 그래픽 사용자 인터페이스를 사용하여 봇의 응답을 예측합니다. 모델은 그것이 속한 클래스만 알려주므로 클래스를 파악한 다음 응답 목록에서 임의의 응답을 선택하는 몇 가지 함수를 만들 것입니다.
다시, 우리는 모델을 훈련할 때 만든 ' words.pkl' 및 ' ' 피클 파일을 로드합니다.classes.pkl
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
클래스를 예측하려면 훈련 중에 했던 것과 같은 방식으로 입력해야 합니다. 그래서, 우리는 텍스트에 대한 전처리를 하고 클래스를 추측하는 몇 가지 함수를 만들 것입니다.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
클래스를 예측한 후 의도 목록에서 임의의 응답을 받습니다.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
이제 그래픽 사용자 인터페이스(GUI)를 만들어 보겠습니다. 다른 많은 유용한 GUI 라이브러리와 함께 제공되는 Tkinter 라이브러리를 사용합시다.
사용자의 메시지를 받아 봇에서 답변을 얻고 GUI에 표시하기 위해 만든 도우미 기능을 사용합니다. 다음은 GUI의 전체 소스 코드입니다.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
챗봇을 실행하기 위해 두 개의 기본 파일이 있습니다. train_chatbot.py 및 chatapp.py.
먼저 터미널에서 다음 명령을 사용하여 모델을 훈련합니다.
python train_chatbot.py
1665811740
Ein Chatbot ist eine KI-basierte Software, die entwickelt wurde, um mit Menschen in ihrer natürlichen Sprache zu interagieren. Diese Chatbots unterhalten sich normalerweise über auditive oder textuelle Methoden und können mühelos menschliche Sprachen nachahmen, um mit Menschen auf menschenähnliche Weise zu kommunizieren. Ein Chatbot ist wohl eine der besten Anwendungen für die Verarbeitung natürlicher Sprache.
In den letzten Jahren sind Chatbots in Python im Technologie- und Geschäftssektor sehr beliebt geworden. Diese intelligenten Bots sind so geschickt darin, natürliche menschliche Sprachen zu imitieren und sich mit Menschen zu unterhalten, dass Unternehmen aus verschiedenen Industriezweigen sie übernehmen. Von E-Commerce-Unternehmen bis hin zu Gesundheitseinrichtungen scheint jeder dieses raffinierte Tool zu nutzen, um geschäftliche Vorteile zu erzielen. In diesem Artikel erfahren wir mehr über Chatbots mit Python und wie man Chatbots in Python erstellt .
Um einen Chatbot in Python von Grund auf neu zu erstellen, folgen wir diesen Schritten.
Zuerst müssen Sie eine Datei namens train_chatbot.py. Wir bringen die Pakete ein, die unser Chatbot benötigt, und richten die Variablen ein, die wir in unserem Python-Projekt verwenden werden.
Die Datendatei hat das JSONFormat, also haben wir die verwendet , um die Datei in Python json packageeinzulesen .JSON
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Bevor wir aus Textdaten ein Machine-Learning- oder Deep-Learning-Modell erstellen können, müssen wir die Daten auf unterschiedliche Weise verarbeiten. Je nach Bedarf müssen wir verschiedene Operationen verwenden, um die Daten vorzuverarbeiten.
Das Tokenisieren von Textdaten ist das erste und grundlegendste, was Sie damit tun können. Tokenisierung ist der Prozess, einen Text in kleine Stücke, wie Wörter, zu zerlegen.
Hier gehen wir die Muster durch, verwenden die nltk.word_tokenize()Funktion, um den Satz in Wörter zu zerlegen, und fügen jedes Wort der Wortliste hinzu. Wir erstellen auch eine Liste der Klassen, zu denen unsere Tags gehören.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Jetzt werden wir herausfinden, was jedes Wort bedeutet, und alle Wörter entfernen, die bereits auf der Liste stehen. Lemmatisierung ist der Prozess, ein Wort in seine Lemmaform umzuwandeln und dann eine Pickle-Datei zu erstellen, um die Python-Objekte zu speichern, die wir bei der Vorhersage verwenden werden.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Jetzt erstellen wir die Trainingsdaten, die sowohl die Eingaben als auch die Ausgaben enthalten. Das Muster ist unsere Eingabe, und die Klasse, zu der das Muster gehört, ist unsere Ausgabe. Aber der Computer kann keine Wörter lesen, also verwandeln wir die Wörter in Zahlen.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Nachdem unsere Trainingsdaten nun fertig sind, werden wir ein dreischichtiges tiefes neuronales Netzwerk aufbauen. Wir tun dies mit der Kerassequentiellen API. Nach dem Training des Modells für 200 Iterationen war es zu 100 % genau. Nennen wir die Datei „ chatbot model.h5“ und speichern sie.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Um die Sätze vorherzusagen und eine Antwort vom Benutzer zu erhalten, damit wir eine neue Datei mit dem Namen „ chatapp.py“ erstellen können.
Wir laden das trainierte Modell und verwenden dann eine grafische Benutzeroberfläche, um die Antwort des Bots vorherzusagen. Das Modell teilt uns nur mit, zu welcher Klasse es gehört, also erstellen wir einige Funktionen, die die Klasse herausfinden und dann eine zufällige Antwort aus der Liste der Antworten auswählen.
Wieder laden wir die ' words.pkl'- und ' classes.pkl'-Pickle-Dateien, die wir beim Trainieren unseres Modells erstellt haben:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Um die Klasse vorherzusagen, müssen wir genauso Eingaben machen wie während des Trainings. Also werden wir einige Funktionen erstellen, die den Text vorverarbeiten und dann die Klasse erraten.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
Nachdem wir die Klasse vorhergesagt haben, erhalten wir eine zufällige Antwort aus der Liste der Absichten.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Jetzt erstellen wir eine grafische Benutzeroberfläche (GUI). Verwenden wir die Tkinter-Bibliothek, die mit vielen anderen nützlichen GUI-Bibliotheken geliefert wird.
Wir nehmen die Nachricht des Benutzers und verwenden die von uns erstellten Hilfsfunktionen, um die Antwort vom Bot zu erhalten und auf der GUI anzuzeigen. Hier ist der vollständige Quellcode der GUI.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Um den Chatbot auszuführen, haben wir zwei Hauptdateien; train_chatbot.py und chatapp.py.
Zuerst trainieren wir das Modell mit dem Befehl im Terminal:
python train_chatbot.py
1665808088
Um chatbot é um software baseado em IA projetado para interagir com humanos em seus idiomas naturais. Esses chatbots geralmente conversam por meio de métodos auditivos ou textuais e podem imitar sem esforço as linguagens humanas para se comunicar com seres humanos de maneira humana. Um chatbot é sem dúvida uma das melhores aplicações de processamento de linguagem natural.
Nos últimos anos, os chatbots em Python se tornaram muito populares nos setores de tecnologia e negócios. Esses bots inteligentes são tão hábeis em imitar linguagens humanas naturais e conversar com humanos, que empresas de vários setores industriais os estão adotando. De empresas de comércio eletrônico a instituições de saúde, todos parecem estar aproveitando essa ferramenta bacana para gerar benefícios comerciais. Neste artigo, vamos aprender sobre chatbot usando Python e como fazer chatbot em python .
Para criar um Chatbot em Python a partir do zero, seguimos estes passos.
Primeiro, você precisa criar um arquivo chamado train_chatbot.py. Trazemos os pacotes que nosso chatbot precisa e configuramos as variáveis que usaremos em nosso projeto Python.
O arquivo de dados está no JSONformato, então usamos o json packagepara ler o JSONarquivo em Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Antes de podermos criar um modelo de aprendizado de máquina ou aprendizado profundo a partir de dados de texto, precisamos processar os dados de maneiras diferentes. Dependendo das necessidades, temos que usar diferentes operações para pré-processar os dados.
Tokenizar dados de texto é a primeira e mais básica coisa que você pode fazer com eles. Tokenização é o processo de quebrar um texto em pequenos pedaços, como palavras.
Aqui, passamos pelos padrões, usamos a nltk.word_tokenize()função para quebrar a frase em palavras e adicionamos cada palavra à lista de palavras. Também fazemos uma lista das classes às quais nossas tags pertencem.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Agora, vamos descobrir o que cada palavra significa e nos livrar de todas as palavras que já estão na lista. Lematizar é o processo de mudar uma palavra para sua forma de lema e então criar um arquivo pickle para armazenar os objetos Python que usaremos na previsão.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Agora, faremos os dados de treinamento, que incluirão as entradas e saídas. O padrão será nossa entrada e a classe à qual o padrão pertence será nossa saída. Mas o computador não pode ler palavras, então vamos transformar as palavras em números.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Agora que nossos dados de treinamento estão prontos, construiremos uma rede neural profunda de 3 camadas. Fazemos isso com a KerasAPI sequencial. Após treinar o modelo por 200 iterações, ele ficou 100% preciso. Vamos nomear o arquivo “ chatbot model.h5” e salvá-lo.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Para prever as frases e obter uma resposta do usuário para criar um novo arquivo chamado “ chatapp.py.”
Carregaremos o modelo treinado e, em seguida, usaremos uma interface gráfica do usuário para prever a resposta do bot. O modelo apenas nos dirá a qual classe ele pertence, então faremos algumas funções que descobrirão a classe e então escolheremos uma resposta aleatória da lista de respostas.
Novamente, carregamos os arquivos ' words.pkl' e classes.pkl' pickle que criamos quando treinamos nosso modelo:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Para prever a classe, teremos que dar entrada da mesma forma que fizemos durante o treinamento. Então, vamos fazer algumas funções que vão fazer o pré-processamento do texto e depois adivinhar a classe.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
Depois de prever a classe, obteremos uma resposta aleatória da lista de intents.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Agora, vamos fazer uma interface gráfica do usuário (GUI). Vamos usar a biblioteca Tkinter, que vem com muitas outras bibliotecas GUI úteis.
Pegaremos a mensagem do usuário e usaremos as funções auxiliares que criamos para obter a resposta do bot e mostrá-la na GUI. Aqui está o código-fonte completo da GUI.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Para rodar o chatbot, temos dois arquivos principais; train_chatbot.py e chatapp.py.
Primeiro, treinamos o modelo usando o comando no terminal:
python train_chatbot.py
1665804480
Chatbot là một phần mềm dựa trên AI được thiết kế để tương tác với con người bằng ngôn ngữ tự nhiên của họ. Những chatbot này thường trò chuyện thông qua các phương pháp thính giác hoặc văn bản và chúng có thể dễ dàng bắt chước ngôn ngữ của con người để giao tiếp với con người theo cách giống như con người. Chatbot được cho là một trong những ứng dụng xử lý ngôn ngữ tự nhiên tốt nhất.
Trong vài năm qua, chatbots bằng Python đã trở nên cực kỳ phổ biến trong lĩnh vực công nghệ và kinh doanh. Các bot thông minh này rất giỏi trong việc bắt chước các ngôn ngữ tự nhiên của con người và trò chuyện với con người, đến mức các công ty trong các lĩnh vực công nghiệp khác nhau đang áp dụng chúng. Từ các công ty thương mại điện tử đến các tổ chức chăm sóc sức khỏe, mọi người dường như đang tận dụng công cụ tiện lợi này để thúc đẩy lợi ích kinh doanh. Trong bài viết này, chúng ta sẽ tìm hiểu về chatbot sử dụng Python và cách tạo chatbot trong python .
Để tạo một Chatbot bằng Python từ Scratch, chúng ta làm theo các bước sau.
Đầu tiên, bạn cần tạo một tệp có tên train_chatbot.py. Chúng tôi cung cấp các gói mà chatbot của chúng tôi cần và thiết lập các biến mà chúng tôi sẽ sử dụng trong dự án Python của mình.
Tệp dữ liệu có JSONđịnh dạng, vì vậy chúng tôi đã sử dụng json packageđể đọc JSONtệp sang Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Trước khi có thể tạo mô hình học máy hoặc học sâu từ dữ liệu văn bản, chúng ta phải xử lý dữ liệu theo những cách khác nhau. Tùy theo nhu cầu mà chúng ta phải sử dụng các thao tác khác nhau để xử lý trước dữ liệu.
Mã hóa dữ liệu văn bản là điều đầu tiên và cơ bản nhất bạn có thể làm với nó. Mã hóa là quá trình chia một văn bản thành các phần nhỏ, giống như các từ.
Ở đây, chúng ta cùng xem qua các mẫu, sử dụng nltk.word_tokenize()chức năng ngắt câu thành từ và thêm từng từ vào danh sách từ. Chúng tôi cũng tạo một danh sách các lớp mà các thẻ của chúng tôi thuộc về.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Bây giờ, chúng ta sẽ tìm ra ý nghĩa của từng từ và loại bỏ bất kỳ từ nào đã có trong danh sách. Bổ đề là quá trình thay đổi một từ thành dạng bổ đề của nó và sau đó tạo một tệp nhỏ để lưu trữ các đối tượng Python mà chúng ta sẽ sử dụng khi dự đoán.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Bây giờ, chúng tôi sẽ tạo dữ liệu đào tạo, dữ liệu này sẽ bao gồm cả đầu vào và đầu ra. Mẫu sẽ là đầu vào của chúng ta và lớp thuộc về mẫu sẽ là đầu ra của chúng ta. Nhưng máy tính không thể đọc các từ, vì vậy chúng tôi sẽ chuyển các từ thành số.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Bây giờ dữ liệu đào tạo của chúng tôi đã sẵn sàng, chúng tôi sẽ xây dựng một mạng nơ-ron sâu 3 lớp. Chúng tôi thực hiện điều này với KerasAPI tuần tự. Sau khi huấn luyện mô hình trong 200 lần lặp, nó chính xác 100%. Hãy đặt tên tệp là “ chatbot model.h5” và lưu nó.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
Để dự đoán các câu và nhận được phản hồi từ người dùng, hãy để chúng tôi tạo một tệp mới có tên “ chatapp.py.”
Chúng tôi sẽ tải mô hình được đào tạo và sau đó sử dụng giao diện người dùng đồ họa để dự đoán phản ứng của bot. Mô hình sẽ chỉ cho chúng ta biết nó thuộc về lớp nào, vì vậy chúng ta sẽ thực hiện một số hàm để tìm ra lớp đó và sau đó chọn một phản hồi ngẫu nhiên từ danh sách các phản hồi.
Một lần nữa, chúng tôi tải các tệp ' words.pkl' và ' classes.pkl' mà chúng tôi đã thực hiện khi đào tạo mô hình của mình:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
Để dự đoán lớp học, chúng tôi sẽ phải đưa ra đầu vào giống như cách chúng tôi đã làm trong quá trình đào tạo. Vì vậy, chúng tôi sẽ tạo một số hàm sẽ thực hiện tiền xử lý trên văn bản và sau đó đoán lớp.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
Sau khi dự đoán lớp, chúng tôi sẽ nhận được phản hồi ngẫu nhiên từ danh sách các ý định.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Bây giờ, chúng tôi sẽ tạo giao diện người dùng đồ họa (GUI). Hãy sử dụng thư viện Tkinter, đi kèm với rất nhiều thư viện GUI hữu ích khác.
Chúng tôi sẽ nhận thông báo của người dùng và sử dụng các chức năng trợ giúp mà chúng tôi đã thực hiện để nhận câu trả lời từ bot và hiển thị nó trên GUI. Đây là mã nguồn đầy đủ của GUI.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
Để chạy chatbot, chúng tôi có hai tệp chính; train_chatbot.py và chatapp.py.
Đầu tiên, chúng tôi đào tạo mô hình bằng cách sử dụng lệnh trong thiết bị đầu cuối:
python train_chatbot.py
1665799396
A chatbot is an AI-based software designed to interact with humans in their natural languages. These chatbots are usually converse via auditory or textual methods, and they can effortlessly mimic human languages to communicate with human beings in a human-like manner. A chatbot is arguably one of the best applications of natural language processing.
In the past few years, chatbots in Python have become wildly popular in the tech and business sectors. These intelligent bots are so adept at imitating natural human languages and conversing with humans, that companies across various industrial sectors are adopting them. From e-commerce firms to healthcare institutions, everyone seems to be leveraging this nifty tool to drive business benefits. In this article, we will learn about chatbot using Python and how to make chatbot in python.
To create a Chatbot in Python from Scratch we follow these steps.
First, you need to make a file called train_chatbot.py
. We bring in the packages our chatbot needs and set up the variables we will use in our Python project.
The data file is in the JSON
format, so we used the json package
to read the JSON
file into Python.
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
import random
words=[]
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('intents.json').read()
intents = json.loads(data_file)
Before we can make a machine learning or deep learning model from text data, we have to process the data in different ways. Depending on the needs, we have to use different operations to preprocess the data.
Tokenizing text data is the first and most basic thing you can do with it. Tokenizing is the process of breaking a text into small pieces, like words.
Here, we go through the patterns, use the nltk.word_tokenize()
function to break the sentence into words, and add each word to the words list. We also make a list of the classes our tags belong to.
for intent in intents['intents']:
for pattern in intent['patterns']:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent['tag']))
# add to our classes list
if intent['tag'] not in classes:
classes.append(intent['tag'])
Now, we’ll figure out what each word means and get rid of any words that are already on the list. Lemmatizing is the process of changing a word into its lemma form and then making a pickle file to store the Python objects we will use when predicting.
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))
Now, we’ll make the training data, which will include both the inputs and outputs. The pattern will be our input, and the class that pattern belongs to will be our output. But the computer can’t read words, so we’ll turn the words into numbers.
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word - create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a '0' for each tag and '1' for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
Now that our training data is ready, we will build a 3-layer deep neural network. We do this with the Keras
sequential API. After training the model for 200 iterations, it was 100% accurate. Let’s name the file “chatbot model.h5
” and save it.
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")
To predict the sentences and get a response from the user to let us create a new file named “chatapp.py
.”
We will load the trained model and then use a graphical user interface to predict the bot’s response. The model will only tell us what class it belongs to, so we will make some functions that will figure out the class and then pick a random response from the list of responses.
Again, we load the ‘words.pkl
‘ and ‘classes.pkl
‘ pickle files that we made when we trained our model:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np
from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))
To predict the class, we will have to give input the same way we did during training. So, we’ll make some functions that will do preprocessing on the text and then guess the class.
def clean_up_sentence(sentence):
# tokenize the pattern - split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word - create short form for word
sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=True):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words - matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(bag))
def predict_class(sentence, model):
# filter out predictions below a threshold
p = bow(sentence, words,show_details=False)
res = model.predict(np.array([p]))[0]
ERROR_THRESHOLD = 0.25
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
return return_list
After predicting the class, we’ll get a random response from the list of intents.
def getResponse(ints, intents_json):
tag = ints[0]['intent']
list_of_intents = intents_json['intents']
for i in list_of_intents:
if(i['tag']== tag):
result = random.choice(i['responses'])
break
return result
def chatbot_response(text):
ints = predict_class(text, model)
res = getResponse(ints, intents)
return res
Now, we’ll make a graphical user interface (GUI). Let’s use the Tkinter library, which comes with a lot of other useful GUI libraries.
We’ll take the user’s message and use the helper functions we’ve made to get the answer from the bot and show it on the GUI. Here is the GUI’s full source code.
#Creating GUI with tkinter
import tkinter
from tkinter import *
def send():
msg = EntryBox.get("1.0",'end-1c').strip()
EntryBox.delete("0.0",END)
if msg != '':
ChatLog.config(state=NORMAL)
ChatLog.insert(END, "You: " + msg + '\n\n')
ChatLog.config(foreground="#442265", font=("Verdana", 12 ))
res = chatbot_response(msg)
ChatLog.insert(END, "Bot: " + res + '\n\n')
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title("Hello")
base.geometry("400x500")
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg="white", height="8", width="50", font="Arial",)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview, cursor="heart")
ChatLog['yscrollcommand'] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=("Verdana",12,'bold'), text="Send", width="12", height=5,
bd=0, bg="#32de97", activebackground="#3c9d9b",fg='#ffffff',
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg="white",width="29", height="5", font="Arial")
#EntryBox.bind("<Return>", send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()
To run the chatbot, we have two main files; train_chatbot.py and chatapp.py.
First, we train the model using the command in the terminal:
python train_chatbot.py
1665757080
A small library for converting tokenized PHP source code into XML.
You can add this library as a local, per-project dependency to your project using Composer:
composer require theseer/tokenizer
If you only need this library during development, for instance to run your project's test suite, then you should add it as a development-time dependency:
composer require --dev theseer/tokenizer
$tokenizer = new TheSeer\Tokenizer\Tokenizer();
$tokens = $tokenizer->parse(file_get_contents(__DIR__ . '/src/XMLSerializer.php'));
$serializer = new TheSeer\Tokenizer\XMLSerializer();
$xml = $serializer->toXML($tokens);
echo $xml;
The generated XML structure looks something like this:
<?xml version="1.0"?>
<source xmlns="https://github.com/theseer/tokenizer">
<line no="1">
<token name="T_OPEN_TAG"><?php </token>
<token name="T_DECLARE">declare</token>
<token name="T_OPEN_BRACKET">(</token>
<token name="T_STRING">strict_types</token>
<token name="T_WHITESPACE"> </token>
<token name="T_EQUAL">=</token>
<token name="T_WHITESPACE"> </token>
<token name="T_LNUMBER">1</token>
<token name="T_CLOSE_BRACKET">)</token>
<token name="T_SEMICOLON">;</token>
</line>
</source>
Author: Theseer
Source Code: https://github.com/theseer/tokenizer
License: View license
1660886280
Tokenize
is a Julia package that serves a similar purpose and API as the tokenize module in Python but for Julia. This is to take a string or buffer containing Julia code, perform lexical analysis and return a stream of tokens.
The goals of this package is to be
The function tokenize
is the main entrypoint for generating Token
s. It takes a string or a buffer and creates an iterator that will sequentially return the next Token
until the end of string or buffer. The argument to tokenize
can either be a String
, IOBuffer
or an IOStream
.
julia> collect(tokenize("function f(x) end"))
1,1-1,8 KEYWORD "function"
1,9-1,9 WHITESPACE " "
1,10-1,10 IDENTIFIER "f"
1,11-1,11 LPAREN "("
1,12-1,12 IDENTIFIER "x"
1,13-1,13 RPAREN ")"
1,14-1,14 WHITESPACE " "
1,15-1,17 KEYWORD "end"
1,18-1,17 ENDMARKER ""
Token
sEach Token
is represented by where it starts and ends, what string it contains and what type it is.
The API for a Token
(non exported from the Tokenize.Tokens
module) is.
startpos(t)::Tuple{Int, Int} # row and column where the token start
endpos(t)::Tuple{Int, Int} # row and column where the token ends
startbyte(T)::Int # byte offset where the token start
endbyte(t)::Int # byte offset where the token ends
untokenize(t)::String # string representation of the token
kind(t)::Token.Kind # kind of the token
exactkind(t)::Token.Kind # exact kind of the token
The difference between kind
and exactkind
is that kind
returns OP
for all operators and KEYWORD
for all keywords while exactkind
returns a unique kind for all different operators and keywords, ex;
julia> tok = collect(tokenize("⇒"))[1];
julia> Tokens.kind(tok)
OP::Tokenize.Tokens.Kind = 90
julia> Tokens.exactkind(tok)
RIGHTWARDS_DOUBLE_ARROW::Tokenize.Tokens.Kind = 128
All the different Token.Kind
can be seen in the token_kinds.jl
file
Author: JuliaLang
Source Code: https://github.com/JuliaLang/Tokenize.jl/
License: View license
1659368242
Pragmatic Tokenizer is a multilingual tokenizer to split a string into tokens.
Add this line to your application's Gemfile:
Ruby
gem install pragmatic_tokenizer
Ruby on Rails
Add this line to your application’s Gemfile:
gem 'pragmatic_tokenizer'
Example Usage
text = "\"I said, 'what're you? Crazy?'\" said Sandowsky. \"I can't afford to do that.\""
PragmaticTokenizer::Tokenizer.new.tokenize(text)
# => ["\"", "i", "said", ",", "'", "what're", "you", "?", "crazy", "?", "'", "\"", "said", "sandowsky", ".", "\"", "i", "can't", "afford", "to", "do", "that", ".", "\""]
# You can pass many different options to #initialize:
options = {
language: :en, # the language of the string you are tokenizing
abbreviations: ['a.b', 'a'], # a user-supplied array of abbreviations (downcased with ending period removed)
stop_words: ['is', 'the'], # a user-supplied array of stop words (downcased)
remove_stop_words: true, # remove stop words
contractions: { "i'm" => "i am" }, # a user-supplied hash of contractions (key is the contracted form; value is the expanded form - both the key and value should be downcased)
expand_contractions: true, # (i.e. ["isn't"] will change to two tokens ["is", "not"])
filter_languages: [:en, :de], # process abbreviations, contractions and stop words for this array of languages
punctuation: :none, # see below for more details
numbers: :none, # see below for more details
remove_emoji: :true, # remove any emoji tokens
remove_urls: :true, # remove any urls
remove_emails: :true, # remove any emails
remove_domains: :true, # remove any domains
hashtags: :keep_and_clean, # remove the hastag prefix
mentions: :keep_and_clean, # remove the @ prefix
clean: true, # remove some special characters
classic_filter: true, # removes dots from acronyms and 's from the end of tokens
downcase: false, # do not downcase tokens
minimum_length: 3, # remove any tokens less than 3 characters
long_word_split: 10 # split tokens longer than 10 characters at hypens or underscores
}
Options
language
default = 'en'
:en
) or string (i.e. 'en'
)abbreviations
default = nil
stop_words
default = nil
contractions
default = nil
remove_stop_words
default = false
true
false
expand_contractions
default = false
true
false
filter_languages
default = nil
[:en, :de]
or ['en', 'de']
)punctuation
default = 'all'
:all
:semi
:none
:only
numbers
default = 'all'
:all
:semi
:none
:only
remove_emoji
default = false
true
false
remove_urls
default = false
true
false
remove_domains
default = false
true
false
remove_domains
default = false
true
false
clean
default = false
true
false
hashtags
default = :keep_original
:keep_original
:keep_and_clean
:remove
mentions
default = :keep_original
:keep_original
:keep_and_clean
:remove
classic_filter
default = false
true
false
downcase
default = true
minimum_length
default = 0
The minimum number of characters a token should be.
long_word_split
default = nil
The number of characters after which a token should be split at hypens or underscores.
The following lists the current level of support for different languages. Pull requests or help for any languages that are not fully supported would be greatly appreciated.
N.B. - contractions might not be applicable for all languages below - in that case the CONTRACTIONS hash should stay empty.
English
Specs: Yes
Abbreviations: Yes
Stop Words: Yes
Contractions: Yes
Arabic
Specs: No
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Bulgarian
Specs: More needed
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Catalan
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Czech
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Danish
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Deutsch
Specs: More needed
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Finnish
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
French
Specs: More needed
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Greek
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Indonesian
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Italian
Specs: No
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Latvian
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Norwegian
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Persian
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Polish
Specs: No
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Portuguese
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Romanian
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Russian
Specs: No
Abbreviations: Yes
Stop Words: Yes
Contractions: No
Slovak
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Spanish
Specs: No
Abbreviations: Yes
Stop Words: Yes
Contractions: Yes
Swedish
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
Turkish
Specs: No
Abbreviations: No
Stop Words: Yes
Contractions: No
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)Author: Diasks2
Source Code: https://github.com/diasks2/pragmatic_tokenizer
License: MIT license
1658763196
A Naive Bayes text classification implementation as an OmniCat classifier strategy.
Add this line to your application's Gemfile:
gem 'omnicat-bayes'
And then execute:
$ bundle
Or install it yourself as:
$ gem install omnicat-bayes
See rdoc for detailed usage.
Optional configuration sample:
OmniCat.configure do |config|
# you can enable auto train mode by :unique or :continues
# unique: only uniq docs will be added to training docs on prediction
# continues: always add docs to training docs on prediction
config.auto_train = :off
config.exclude_tokens = ['something', 'anything'] # exclude token list
config.token_patterns = {
# exclude tokens with Regex patterns
minus: [/[\s\t\n\r]+/, /(@[\w\d]+)/],
# include tokens with Regex patterns
plus: [/[\p{L}\-0-9]{2,}/, /[\!\?]/, /[\:\)\(\;\-\|]{2,3}/]
}
end
Create a classifier object with Bayes strategy.
# If you need to change strategy on runtime, you should prefer this inialization
bayes = OmniCat::Classifier.new(OmniCat::Classifiers::Bayes.new)
or
# If you only need to use only Bayes classification, then you can use
bayes = OmniCat::Classifiers::Bayes.new
Create a classification category.
bayes.add_category('positive')
bayes.add_category('negative')
Train category with a document.
bayes.train('positive', 'great if you are in a slap happy mood .')
bayes.train('negative', 'bad tracking issue')
Untrain category with a document.
bayes.untrain('positive', 'great if you are in a slap happy mood .')
bayes.untrain('negative', 'bad tracking issue')
Train category with multiple documents.
bayes.train_batch('positive', [
'a feel-good picture in the best sense of the term...',
'it is a feel-good movie about which you can actually feel good.',
'love and money both of them are good choises'
])
bayes.train_batch('negative', [
'simplistic , silly and tedious .',
'interesting , but not compelling . ',
'seems clever but not especially compelling'
])
Untrain category with multiple documents.
bayes.untrain_batch('positive', [
'a feel-good picture in the best sense of the term...',
'it is a feel-good movie about which you can actually feel good.',
'love and money both of them are good choises'
])
bayes.untrain_batch('negative', [
'simplistic , silly and tedious .',
'interesting , but not compelling . ',
'seems clever but not especially compelling'
])
Classify a document.
result = bayes.classify('I feel so good and happy')
=> #<OmniCat::Result:0x007febb152af68 @top_score_key="positive", @scores={"positive"=>#<OmniCat::Score:0x007febb152add8 @key="positive", @value=6.813226744186048e-09, @percentage=58>, "negative"=>#<OmniCat::Score:0x007febb152ac70 @key="negative", @value=4.875003449064939e-09, @percentage=42>}, @total_score=1.1688230193250986e-08>
result.to_hash
=> {:top_score_key=>"positive", :scores=>{"positive"=>{:key=>"positive", :value=>6.813226744186048e-09, :percentage=>58}, "negative"=>{:key=>"negative", :value=>4.875003449064939e-09, :percentage=>42}}, :total_score=>1.1688230193250986e-08}
result.top_score
=> #<OmniCat::Score:0x007febb152add8 @key="positive", @value=6.813226744186048e-09, @percentage=58>
result.top_score.to_hash
=> {:key=>"positive", :value=>6.813226744186048e-09, :percentage=>58}
Classify multiple documents at a time.
results = bayes.classify_batch(
[
'the movie is silly so not compelling enough',
'a good piece of work'
]
)
=> [#<OmniCat::Result:0x007febb14f3680 @top_score_key="negative", @scores={"positive"=>#<OmniCat::Score:0x007febb14f34a0 @key="positive", @value=7.971480930520432e-14, @percentage=22>, "negative"=>#<OmniCat::Score:0x007febb14f32c0 @key="negative", @value=2.834304330851709e-13, @percentage=78>}, @total_score=3.6314524239037524e-13>, #<OmniCat::Result:0x007febb14f2aa0 @top_score_key="positive", @scores={"positive"=>#<OmniCat::Score:0x007febb14f2960 @key="positive", @value=3.802731206057328e-07, @percentage=72>, "negative"=>#<OmniCat::Score:0x007febb14f2820 @key="negative", @value=1.4625010347194818e-07, @percentage=28>}, @total_score=5.26523224077681e-07>]
Convert full Bayes object to hash.
# For storing, restoring modal data
bayes_hash = bayes.to_hash
=> {:categories=>{"positive"=>{:doc_count=>4, :docs=>{"28fd29bbf840c86db65e510ff3cd07a9"=>{:content=>"great if you are in a slap happy mood .", :content_md5=>"28fd29bbf840c86db65e510ff3cd07a9", :count=>1, :tokens=>{"great"=>1, "if"=>1, "you"=>1, "are"=>1, "in"=>1, "slap"=>1, "happy"=>1, "mood"=>1}}, "82b4cd9513f448dea0024f2d0e2ccd44"=>{:content=>"a feel-good picture in the best sense of the term...", :content_md5=>"82b4cd9513f448dea0024f2d0e2ccd44", :count=>1, :tokens=>{"feel-good"=>1, "picture"=>1, "in"=>1, "the"=>2, "best"=>1, "sense"=>1, "of"=>1, "term"=>1}}, "f917bf1cf1256c78c5436d850dab3104"=>{:content=>"it is a feel-good movie about which you can actually feel good.", :content_md5=>"f917bf1cf1256c78c5436d850dab3104", :count=>1, :tokens=>{"it"=>1, "is"=>1, "feel-good"=>1, "movie"=>1, "about"=>1, "which"=>1, "you"=>1, "can"=>1, "actually"=>1, "feel"=>1, "good"=>1}}, "4343bbe84c035733708c3f58136f321e"=>{:content=>"love and money both of them are good choises", :content_md5=>"4343bbe84c035733708c3f58136f321e", :count=>1, :tokens=>{"love"=>1, "and"=>1, "money"=>1, "both"=>1, "of"=>1, "them"=>1, "are"=>1, "good"=>1, "choises"=>1}}}, :name=>"positive", :tokens=>{"great"=>1, "if"=>1, "you"=>2, "are"=>2, "in"=>2, "slap"=>1, "happy"=>1, "mood"=>1, "feel-good"=>2, "picture"=>1, "the"=>2, "best"=>1, "sense"=>1, "of"=>2, "term"=>1, "it"=>1, "is"=>1, "movie"=>1, "about"=>1, "which"=>1, "can"=>1, "actually"=>1, "feel"=>1, "good"=>2, "love"=>1, "and"=>1, "money"=>1, "both"=>1, "them"=>1, "choises"=>1}, :token_count=>37, :prior=>0.5}, "negative"=>{:doc_count=>4, :docs=>{"89b36e774579662591ea21b3283d9b35"=>{:content=>"bad tracking issue", :content_md5=>"89b36e774579662591ea21b3283d9b35", :count=>1, :tokens=>{"bad"=>1, "tracking"=>1, "issue"=>1}}, "b0ec48bc87527e285b26d6cce8e278e7"=>{:content=>"simplistic , silly and tedious .", :content_md5=>"b0ec48bc87527e285b26d6cce8e278e7", :count=>1, :tokens=>{"simplistic"=>1, "silly"=>1, "and"=>1, "tedious"=>1}}, "ae9d4fbaf40906614ca712a888648c5f"=>{:content=>"interesting , but not compelling . ", :content_md5=>"ae9d4fbaf40906614ca712a888648c5f", :count=>1, :tokens=>{"interesting"=>1, "but"=>1, "not"=>1, "compelling"=>1}}, "0e495f5d88d8049746a1b6961bf3cc90"=>{:content=>"seems clever but not especially compelling", :content_md5=>"0e495f5d88d8049746a1b6961bf3cc90", :count=>1, :tokens=>{"seems"=>1, "clever"=>1, "but"=>1, "not"=>1, "especially"=>1, "compelling"=>1}}}, :name=>"negative", :tokens=>{"bad"=>1, "tracking"=>1, "issue"=>1, "simplistic"=>1, "silly"=>1, "and"=>1, "tedious"=>1, "interesting"=>1, "but"=>2, "not"=>2, "compelling"=>2, "seems"=>1, "clever"=>1, "especially"=>1}, :token_count=>17, :prior=>0.5}}, :category_count=>2, :category_size_limit=>0, :doc_count=>8, :token_count=>54, :unique_token_count=>43, :k_value=>1.0}
Load full Bayes object from hash.
another_bayes_obj = OmniCat::Classifiers::Bayes.new(bayes_hash)
=> #<OmniCat::Classifiers::Bayes:0x007febb14d15a8 @categories={"positive"=>#<OmniCat::Classifiers::BayesInternals::Category:0x007febb14d1530 @doc_count=4, @docs={"28fd29bbf840c86db65e510ff3cd07a9"=>{:content=>"great if you are in a slap happy mood .", :content_md5=>"28fd29bbf840c86db65e510ff3cd07a9", :count=>1, :tokens=>{"great"=>1, "if"=>1, "you"=>1, "are"=>1, "in"=>1, "slap"=>1, "happy"=>1, "mood"=>1}}, "82b4cd9513f448dea0024f2d0e2ccd44"=>{:content=>"a feel-good picture in the best sense of the term...", :content_md5=>"82b4cd9513f448dea0024f2d0e2ccd44", :count=>1, :tokens=>{"feel-good"=>1, "picture"=>1, "in"=>1, "the"=>2, "best"=>1, "sense"=>1, "of"=>1, "term"=>1}}, "f917bf1cf1256c78c5436d850dab3104"=>{:content=>"it is a feel-good movie about which you can actually feel good.", :content_md5=>"f917bf1cf1256c78c5436d850dab3104", :count=>1, :tokens=>{"it"=>1, "is"=>1, "feel-good"=>1, "movie"=>1, "about"=>1, "which"=>1, "you"=>1, "can"=>1, "actually"=>1, "feel"=>1, "good"=>1}}, "4343bbe84c035733708c3f58136f321e"=>{:content=>"love and money both of them are good choises", :content_md5=>"4343bbe84c035733708c3f58136f321e", :count=>1, :tokens=>{"love"=>1, "and"=>1, "money"=>1, "both"=>1, "of"=>1, "them"=>1, "are"=>1, "good"=>1, "choises"=>1}}}, @name="positive", @tokens={"great"=>1, "if"=>1, "you"=>2, "are"=>2, "in"=>2, "slap"=>1, "happy"=>1, "mood"=>1, "feel-good"=>2, "picture"=>1, "the"=>2, "best"=>1, "sense"=>1, "of"=>2, "term"=>1, "it"=>1, "is"=>1, "movie"=>1, "about"=>1, "which"=>1, "can"=>1, "actually"=>1, "feel"=>1, "good"=>2, "love"=>1, "and"=>1, "money"=>1, "both"=>1, "them"=>1, "choises"=>1}, @token_count=37, @prior=0.5>, "negative"=>#<OmniCat::Classifiers::BayesInternals::Category:0x007febb14d14e0 @doc_count=4, @docs={"89b36e774579662591ea21b3283d9b35"=>{:content=>"bad tracking issue", :content_md5=>"89b36e774579662591ea21b3283d9b35", :count=>1, :tokens=>{"bad"=>1, "tracking"=>1, "issue"=>1}}, "b0ec48bc87527e285b26d6cce8e278e7"=>{:content=>"simplistic , silly and tedious .", :content_md5=>"b0ec48bc87527e285b26d6cce8e278e7", :count=>1, :tokens=>{"simplistic"=>1, "silly"=>1, "and"=>1, "tedious"=>1}}, "ae9d4fbaf40906614ca712a888648c5f"=>{:content=>"interesting , but not compelling . ", :content_md5=>"ae9d4fbaf40906614ca712a888648c5f", :count=>1, :tokens=>{"interesting"=>1, "but"=>1, "not"=>1, "compelling"=>1}}, "0e495f5d88d8049746a1b6961bf3cc90"=>{:content=>"seems clever but not especially compelling", :content_md5=>"0e495f5d88d8049746a1b6961bf3cc90", :count=>1, :tokens=>{"seems"=>1, "clever"=>1, "but"=>1, "not"=>1, "especially"=>1, "compelling"=>1}}}, @name="negative", @tokens={"bad"=>1, "tracking"=>1, "issue"=>1, "simplistic"=>1, "silly"=>1, "and"=>1, "tedious"=>1, "interesting"=>1, "but"=>2, "not"=>2, "compelling"=>2, "seems"=>1, "clever"=>1, "especially"=>1}, @token_count=17, @prior=0.5>}, @category_count=2, @category_size_limit=0, @doc_count=8, @token_count=54, @unique_token_count=43, @k_value=1.0>
another_bayes_obj.classify('best senses')
=> #<OmniCat::Result:0x007febb14c0fc8 @top_score_key="positive", @scores={"positive"=>#<OmniCat::Score:0x007febb14c0ed8 @key="positive", @value=0.00029069767441860465, @percentage=52>, "negative"=>#<OmniCat::Score:0x007febb14c0de8 @key="negative", @value=0.0002704164413196322, @percentage=48>}, @total_score=0.0005611141157382368>
For bayes classification always try to train same amount of documents for each category. So, do not activate auto training mode, because it make overages on balance of trained docs and makes algorithm go crazy :). To get best results on text classification you should apply some cleaning actions like spellchecking, stemming, stop words cleaning before training and prediction actions.
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)Author: Mustafaturan
Source Code: https://github.com/mustafaturan/omnicat-bayes
License: MIT license