Jace  Effertz

Jace Effertz

1609639140

Print Function in Python

The print() function prints the specified message to the screen, or other standard output device. The message can be a string, or any other object, the object will be converted into a string before written to the screen.

print() function and input() function makes the program more interactive with the user.

Although, there are some extra things we can do with print() which makes the output formattable.

#python #programming #developer

What is GEEK

Buddha Community

Print Function in Python
Ray  Patel

Ray Patel

1619510796

Lambda, Map, Filter functions in python

Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.

Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is

Syntax: x = lambda arguments : expression

Now i will show you some python lambda function examples:

#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map

Ray  Patel

Ray Patel

1619518440

top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

1) swap two numbers.

2) Reversing a string in Python.

3) Create a single string from all the elements in list.

4) Chaining Of Comparison Operators.

5) Print The File Path Of Imported Modules.

6) Return Multiple Values From Functions.

7) Find The Most Frequent Value In A List.

8) Check The Memory Usage Of An Object.

#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners

Chatbot conversacional de IA con Transformers en Python

Aprenda a usar la biblioteca de transformadores Huggingface para generar respuestas conversacionales con el modelo DialoGPT previamente entrenado en Python.

Los chatbots han ganado mucha popularidad en los últimos años y, a medida que crece el interés en el uso de chatbots para empresas, los investigadores también hicieron un gran trabajo en el avance de los chatbots de IA conversacionales.

En este tutorial, usaremos la biblioteca de transformadores Huggingface para emplear el modelo DialoGPT previamente entrenado para la generación de respuestas conversacionales.

DialoGPT es un modelo de generación de respuesta conversacional neuronal sintonizable a gran escala que se entrenó en 147 millones de conversaciones extraídas de Reddit, y lo bueno es que puede ajustarlo con su conjunto de datos para lograr un mejor rendimiento que el entrenamiento desde cero.

Para comenzar, instalemos transformadores :

$ pip3 install transformers

Abra un nuevo archivo o cuaderno de Python y haga lo siguiente:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# model_name = "microsoft/DialoGPT-large"
model_name = "microsoft/DialoGPT-medium"
# model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Hay tres versiones de DialoGPT; pequeño, mediano y grande. Por supuesto, cuanto más grande, mejor, pero si ejecuta esto en su máquina, creo que el tamaño pequeño o mediano se adapta a su memoria sin problemas. También puede utilizar Google Colab para probar el más grande.

Generación de respuestas con búsqueda codiciosa

En esta sección, usaremos el algoritmo de búsqueda codiciosa para generar respuestas. Es decir, seleccionamos la respuesta del chatbot que tiene la mayor probabilidad de ser seleccionada en cada paso de tiempo.

Hagamos un código para chatear con nuestra IA usando una búsqueda codiciosa:

# chatting 5 times with greedy search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Expliquemos el núcleo de este código:

  • Primero tomamos información del usuario para chatear.
  • Codificamos el texto para input_idsusar el tokenizador DialoGPT, también agregamos el final del token de cadena y lo devolvemos como un tensor de Pytorch.
  • Si esta es la primera vez que chateamos con el bot, alimentaremos directamente input_idsnuestro modelo durante una generación. De lo contrario, agregamos el historial de chat usando la concatenación con la ayuda del torch.cat()método.
  • Después de eso, usamos el model.generate()método para generar la respuesta del chatbot.
  • Por último, como la salida devuelta también es una secuencia tokenizada, decodificamos la secuencia usando tokenizer.decode()y configuramos skip_special_tokenspara Trueasegurarnos de que no veamos ningún token especial molesto como <|endoftext|>. Además, dado que el modelo devuelve la secuencia completa, omitimos el historial de chat anterior e imprimimos solo la respuesta del chatbot recién generada.

A continuación se muestra una discusión de muestra con el bot:

>> You:How can you be rich so quickly?
DialoGPT: I'm not rich, I'm just a rich man.
>> You:Ok, you don't make sense, how old are you ?
DialoGPT: I'm not a rich man, I'm just a rich man.
>> You:How can you be happy ?
DialoGPT: I'm not happy, I'm just a rich man.
>> You:But they say money buys happiness!
DialoGPT: Money buys happiness, but not happiness itself.
>> You:You don't make sense again!
DialoGPT: I'm not happy, I'm just a rich man.

Verá que el modelo repite muchas respuestas, ya que estas son la probabilidad más alta y lo elige cada vez.

De forma predeterminada, model.generate()utiliza un algoritmo de búsqueda codicioso cuando no se establecen otros parámetros; en las siguientes secciones, agregaremos algunos argumentos a este método para ver si podemos mejorar la generación.

Generación de respuestas con Beam Search

La búsqueda por haz nos permite reducir el riesgo de perder secuencias de alta probabilidad al mantener las num_beamshipótesis más probables en cada paso de tiempo y luego tomar las secuencias que tienen la probabilidad general más alta, el siguiente código generará respuestas de chatbot con búsqueda de haz:

# chatting 5 times with beam search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        num_beams=3,
        early_stopping=True,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Cuando se ajusta num_beamsa 3de model.generate()método, a continuación, vamos a seleccionar 3 palabras en cada paso de tiempo y desarrollarlas para encontrar la más alta probabilidad global de la secuencia, el establecimiento num_beamsde 1 es la misma que la búsqueda codiciosa.

A continuación se muestra una discusión de muestra con el chatbot usando la búsqueda de haz:

>> You:How can I be rich?
DialoGPT: You can't.
>> You:Why not?
DialoGPT: Because you can't.
>> You:Then how am I supposed to live well?
DialoGPT: You can't.
>> You:So basically, you want me to be a poor guy?
DialoGPT: No, I want you to be a rich guy.
>> You:Ok how to be rich, then?
DialoGPT: I don't know.

Generando respuestas con muestreo

En las secciones anteriores, usamos la búsqueda de rayos y codiciosos para generar la secuencia de probabilidad más alta. Eso es genial para tareas como la traducción automática o el resumen de texto donde el resultado es predecible. Sin embargo, no es la mejor opción para una generación abierta como en los chatbots.

Para una mejor generación, necesitamos introducir algo de aleatoriedad donde muestreamos de una amplia gama de secuencias candidatas basadas en probabilidades:

# chatting 5 times with sampling
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=0,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Esta vez, establece do_sampleque Truepara el muestreo, y nos pusimos top_ka 0lo que indica que estamos seleccionando todas las probabilidades posibles, vamos a discutir más adelante top_kparámetro.

Aquí hay un chat con estos parámetros:

>> You:Hello, how can I be rich?
DialoGPT: You can't
>> You:Really? then how others get rich?
DialoGPT: They work as money artists
>> You:How to be a money artist?
DialoGPT: I don't know, I don't have enough knowledge on artists
>> You:Alright, thank you!
DialoGPT: No problem, if you need tips pm me.
>> You:Great, you're my savior!
DialoGPT: Thank you for helping a stranger out like me!

Claramente hay algunas mejoras. Sin embargo, el muestreo en una amplia lista de secuencias con bajas probabilidades puede conducir a una generación aleatoria (como se ve en la última oración).

Para mejorarlo aún más, podemos:

  • temperatureDisminuir el muestreo , eso nos ayuda a disminuir la probabilidad de elegir palabras de baja probabilidad y aumentar la probabilidad de elegir palabras de alta probabilidad.
  • Utilice el muestreo de Top-k en lugar de seleccionar todas las ocurrencias probables, esto nos ayudará a descartar palabras de baja probabilidad para que no sean seleccionadas.
# chatting 5 times with Top K sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=100,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Ahora, configuramos top_kpara 100muestrear las 100palabras principales ordenadas de forma descendente por probabilidad. También establecemos temperatureen 0.75(el valor predeterminado es 1.0) para brindar una mayor probabilidad de elegir palabras de alta probabilidad, establecer la temperatura en 0.0es lo mismo que la búsqueda codiciosa, establecerla en infinito es lo mismo que completamente aleatorio.

Aquí hay una discusión con estos parámetros:

>> You:How can I get rich?
DialoGPT: I'd also like to know this.
>> You:So you don't know it?
DialoGPT: I know it, but it doesn't matter.
>> You:Really? money matters, man!
DialoGPT: Well, money is money and I gotta get my money.
>> You:Yeah how to get them?
DialoGPT: Well, that's a question. I'd like to know more, just for curiosity's sake, to know what other people have said about how I should get rich.
>> You:Some robots like you told me you have to buy gold, silver or stocks.
DialoGPT: I think my wallet is empty.

Como se puede ver, es mucho mejor ahora, no dude en modificar temperaturey top_kparámetros y ver si puede mejorarlo.

Muestreo de núcleos

El muestreo de núcleo o muestreo de Top-p elige entre las palabras más pequeñas posibles cuya probabilidad acumulada excede el parámetro pque establecimos.

A continuación se muestra un ejemplo con el muestreo de Top-p:

# chatting 5 times with nucleus sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=0,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Configuramos top_kpara 0deshabilitar el muestreo de Top-k, pero puede usar ambos métodos, lo que tiende a funcionar mejor. Aquí hay un chat:

>> You:How can I be rich fast?
DialoGPT: Inflation and free money
>> You:lol, but that takes time, I need fast money.
DialoGPT: Your post makes no sense. You just need a lot of money to be rich fast
>> You:Ok how to get a lot of money?
DialoGPT: Buy gold and then trade it for money.
>> You:Hmm, I got your point, thanks.
DialoGPT: No problem. You have to have money to buy gold, but its only good for trading it.
>> You:Yeah, I understand, thank you so much.
DialoGPT: Ok, let me know if you need help

Ahora el chatbot claramente tiene sentido en muchos casos.

Ahora agreguemos algo de código para generar más de una respuesta de chatbot, y luego elegimos qué respuesta incluir en la siguiente entrada:

# chatting 5 times with nucleus & top-k sampling & tweaking temperature & multiple
# sentences
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids_list = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=50,
        temperature=0.75,
        num_return_sequences=5,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the outputs
    for i in range(len(chat_history_ids_list)):
      output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
      print(f"DialoGPT {i}: {output}")
    choice_index = int(input("Choose the response you want for the next input: "))
    chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)

He configurado num_return_sequencespara 5devolver 5 oraciones a la vez, tenemos que elegir la que se incluirá en la siguiente secuencia. Así es como fue:

>> You:Hello, friend.
DialoGPT 0: Hey there!
DialoGPT 1: I am your friend, and I am also your friend.
DialoGPT 2: Hello. :
DialoGPT 3: Hiya! How are you?
DialoGPT 4: Hello. I like your username.
Choose the response you want for the next input: 3
>> You:Fine, thanks, what about you?
DialoGPT 0: Good. Not much to do.
DialoGPT 1: Not much, you?
DialoGPT 2: Fine, thanks.
DialoGPT 3: I'm okay, thank you for asking.
DialoGPT 4: I'm fine too.
Choose the response you want for the next input: 2
>> You:Can you tell me how can I get rich quickly?
DialoGPT 0: Yes, I can!
DialoGPT 1: Yes, just look up how to get rich on Google.
DialoGPT 2: Just keep doing your thing. You'll get rich.
DialoGPT 3: No, I can't.
DialoGPT 4: no, i have no money
Choose the response you want for the next input: 1
>> You:Yeah, but I thought you have an answer...
DialoGPT 0: I'm getting married in the morning.
DialoGPT 1: How do you get rich?
DialoGPT 2: I think there are many good answers, but I could send you one, if you want.
DialoGPT 3: I will, thanks!
DialoGPT 4: I thought you have an answer.
Choose the response you want for the next input: 2
>> You:Yes, please send me.
DialoGPT 0: You can send me link.
DialoGPT 1: I am very interested
DialoGPT 2: I sent you a PM
DialoGPT 3: I'll send you a PM
DialoGPT 4: I am always interested in new ideas.
Choose the response you want for the next input: 2

Conclusión

Y ahí lo tienes, espero que este tutorial te haya ayudado a generar texto en DialoGPT y modelos similares. Para obtener más información sobre cómo generar texto, le recomiendo que lea la guía Cómo generar texto con Transformers .

Te dejo ajustando los parámetros para ver si puedes hacer que el bot funcione mejor.

Además, puede combinar esto con tutoriales de texto a voz y de voz a texto para crear un asistente virtual como Alexa , Siri , Cortana , etc.

#python #chatbot #ai 

Hoang  Kim

Hoang Kim

1635843812

Chatbot AI hội thoại với Máy biến áp được đào tạo trước bằng Python

Tìm hiểu cách sử dụng thư viện máy biến áp Huggingface để tạo phản hồi hội thoại bằng mô hình DialoGPT được đào tạo trước bằng Python.

Chatbots đã trở nên phổ biến trong những năm gần đây và khi mối quan tâm ngày càng tăng trong việc sử dụng chatbots cho kinh doanh, các nhà nghiên cứu cũng đã làm rất tốt trong việc phát triển các chatbot AI đàm thoại.

Trong hướng dẫn này, chúng tôi sẽ sử dụng thư viện máy biến áp Huggingface để sử dụng mô hình DialoGPT đã được đào tạo trước để tạo phản hồi hội thoại.

DialoGPT là một mô hình tạo phản hồi hội thoại thần kinh có thể điều chỉnh quy mô lớn được đào tạo trên 147 triệu cuộc hội thoại được trích xuất từ ​​Reddit và điều tốt là bạn có thể tinh chỉnh nó với bộ dữ liệu của mình để đạt được hiệu suất tốt hơn so với đào tạo từ đầu.

Để bắt đầu, hãy cài đặt máy biến áp :

$ pip3 install transformers

Mở tệp hoặc sổ ghi chép Python mới và thực hiện như sau:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# model_name = "microsoft/DialoGPT-large"
model_name = "microsoft/DialoGPT-medium"
# model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Có ba phiên bản của DialoGPT; nhỏ, vừa và lớn. Tất nhiên, càng lớn càng tốt, nhưng nếu bạn chạy điều này trên máy của mình, tôi nghĩ nhỏ hoặc trung bình phù hợp với bộ nhớ của bạn mà không có vấn đề gì. Bạn cũng có thể sử dụng Google Colab để thử cái lớn.

Tạo phản hồi bằng Tìm kiếm tham lam

Trong phần này, chúng tôi sẽ sử dụng thuật toán tìm kiếm tham lam để tạo phản hồi. Đó là, chúng tôi chọn phản hồi chatbot có xác suất cao nhất được chọn trên mỗi bước thời gian.

Hãy tạo mã để trò chuyện với AI của chúng tôi bằng cách sử dụng tìm kiếm tham lam:

# chatting 5 times with greedy search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Hãy giải thích cốt lõi của mã này:

  • Đầu tiên chúng tôi lấy thông tin đầu vào từ người dùng để trò chuyện.
  • Chúng tôi mã hóa văn bản input_idsbằng cách sử dụng trình mã hóa DialoGPT, chúng tôi cũng nối vào cuối chuỗi mã thông báo và trả về nó dưới dạng bộ căng Pytorch.
  • Nếu đây là lần đầu tiên trò chuyện với bot, thì chúng tôi sẽ trực tiếp cung cấp input_idscho mô hình của mình trong một thế hệ. Nếu không, chúng tôi nối lịch sử trò chuyện bằng cách nối với sự trợ giúp của torch.cat()phương thức.
  • Sau đó, chúng tôi sử dụng model.generate()phương pháp tạo phản hồi chatbot.
  • Cuối cùng, như sản lượng trở lại là một chuỗi tokenized quá, chúng tôi giải mã trình tự sử dụng tokenizer.decode()và thiết lập skip_special_tokensđể Trueđảm bảo chúng tôi không thấy bất kỳ đặc biệt gây phiền nhiễu mã thông báo như <|endoftext|>. Ngoài ra, vì mô hình trả về toàn bộ chuỗi, chúng tôi bỏ qua lịch sử trò chuyện trước đó và chỉ in câu trả lời chatbot mới được tạo.

Dưới đây là một cuộc thảo luận mẫu với bot:

>> You:How can you be rich so quickly?
DialoGPT: I'm not rich, I'm just a rich man.
>> You:Ok, you don't make sense, how old are you ?
DialoGPT: I'm not a rich man, I'm just a rich man.
>> You:How can you be happy ?
DialoGPT: I'm not happy, I'm just a rich man.
>> You:But they say money buys happiness!
DialoGPT: Money buys happiness, but not happiness itself.
>> You:You don't make sense again!
DialoGPT: I'm not happy, I'm just a rich man.

Bạn thấy mô hình lặp lại rất nhiều phản hồi, vì đây là xác suất cao nhất và nó luôn chọn nó.

Theo mặc định, model.generate()sử dụng thuật toán tìm kiếm tham lam khi không có tham số nào khác được đặt, trong các phần tiếp theo, chúng tôi sẽ thêm một số đối số vào phương thức này để xem liệu chúng tôi có thể cải thiện việc tạo không.

Tạo phản hồi với Tìm kiếm chùm

Tìm kiếm theo chùm cho phép chúng tôi giảm nguy cơ bỏ lỡ các chuỗi có xác suất cao bằng cách giữ lại num_beamscác giả thuyết có khả năng xảy ra nhất ở mỗi bước thời gian và sau đó lấy các chuỗi có xác suất cao nhất tổng thể, đoạn mã dưới đây sẽ tạo ra các phản hồi của chatbot với tìm kiếm theo chùm:

# chatting 5 times with beam search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        num_beams=3,
        early_stopping=True,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Khi đặt num_beamsthành 3trong model.generate()phương thức, chúng tôi sẽ chọn 3 từ ở mỗi bước và phát triển chúng để tìm xác suất tổng thể cao nhất của chuỗi, đặt num_beamsthành 1 cũng giống như tìm kiếm tham lam.

Dưới đây là một cuộc thảo luận mẫu với chatbot bằng cách sử dụng tìm kiếm chùm:

>> You:How can I be rich?
DialoGPT: You can't.
>> You:Why not?
DialoGPT: Because you can't.
>> You:Then how am I supposed to live well?
DialoGPT: You can't.
>> You:So basically, you want me to be a poor guy?
DialoGPT: No, I want you to be a rich guy.
>> You:Ok how to be rich, then?
DialoGPT: I don't know.

Tạo phản hồi bằng lấy mẫu

Trong các phần trước, chúng tôi đã sử dụng tìm kiếm chùm và tham lam để tạo ra chuỗi xác suất cao nhất. Giờ đây, điều đó thật tuyệt vời cho các tác vụ như dịch máy hoặc tóm tắt văn bản trong đó kết quả đầu ra có thể dự đoán được. Tuy nhiên, nó không phải là lựa chọn tốt nhất cho thế hệ kết thúc mở như trong chatbots.

Để có một thế hệ tốt hơn, chúng tôi cần đưa ra một số ngẫu nhiên trong đó chúng tôi lấy mẫu từ một loạt các trình tự ứng viên dựa trên xác suất:

# chatting 5 times with sampling
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=0,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Lần này, chúng tôi đặt do_sampleđể Truelấy mẫu và chúng tôi đặt top_kđể 0chỉ ra rằng chúng tôi đang chọn tất cả các xác suất có thể xảy ra, sau đó chúng ta sẽ thảo luận về top_ktham số.

Đây là một cuộc trò chuyện với các thông số này:

>> You:Hello, how can I be rich?
DialoGPT: You can't
>> You:Really? then how others get rich?
DialoGPT: They work as money artists
>> You:How to be a money artist?
DialoGPT: I don't know, I don't have enough knowledge on artists
>> You:Alright, thank you!
DialoGPT: No problem, if you need tips pm me.
>> You:Great, you're my savior!
DialoGPT: Thank you for helping a stranger out like me!

Rõ ràng là có một số cải tiến. Tuy nhiên, việc lấy mẫu trên một danh sách rộng các chuỗi với xác suất thấp có thể dẫn đến việc tạo ngẫu nhiên (như bạn thấy trong câu cuối cùng).

Để cải thiện nó hơn nữa, chúng tôi có thể:

  • Hạ thấp mẫu temperature, điều đó giúp chúng tôi giảm khả năng chọn các từ có xác suất thấp và tăng khả năng chọn các từ có xác suất cao.
  • Sử dụng lấy mẫu Top-k thay vì chọn tất cả các lần xuất hiện có thể xảy ra, điều này sẽ giúp chúng tôi loại bỏ các từ có xác suất thấp để được chọn.
# chatting 5 times with Top K sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=100,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Bây giờ, chúng ta thiết lập top_kđể 100lấy mẫu từ đỉnh 100từ được sắp xếp descendingly bởi xác suất. Chúng tôi cũng đặt temperaturethành 0.75(mặc định là 1.0) để có cơ hội chọn các từ có xác suất cao hơn, đặt nhiệt độ 0.0giống như tìm kiếm tham lam, đặt nhiệt độ thành vô cùng giống như hoàn toàn ngẫu nhiên.

Đây là một cuộc thảo luận với các thông số này:

>> You:How can I get rich?
DialoGPT: I'd also like to know this.
>> You:So you don't know it?
DialoGPT: I know it, but it doesn't matter.
>> You:Really? money matters, man!
DialoGPT: Well, money is money and I gotta get my money.
>> You:Yeah how to get them?
DialoGPT: Well, that's a question. I'd like to know more, just for curiosity's sake, to know what other people have said about how I should get rich.
>> You:Some robots like you told me you have to buy gold, silver or stocks.
DialoGPT: I think my wallet is empty.

Như bạn có thể thấy, bây giờ nó đã tốt hơn nhiều, hãy thoải mái tinh chỉnh temperaturetop_kcác thông số và xem liệu nó có thể cải thiện nó hay không.

Lấy mẫu hạt nhân

Lấy mẫu hạt nhân hoặc lấy mẫu Top-p chọn từ các từ nhỏ nhất có thể có xác suất tích lũy vượt quá tham số pchúng tôi đặt.

Dưới đây là một ví dụ sử dụng lấy mẫu Top-p:

# chatting 5 times with nucleus sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=0,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

Chúng tôi đặt top_kđể 0tắt lấy mẫu Top-k, nhưng bạn có thể sử dụng cả hai phương pháp có xu hướng hoạt động tốt hơn. Đây là một cuộc trò chuyện:

>> You:How can I be rich fast?
DialoGPT: Inflation and free money
>> You:lol, but that takes time, I need fast money.
DialoGPT: Your post makes no sense. You just need a lot of money to be rich fast
>> You:Ok how to get a lot of money?
DialoGPT: Buy gold and then trade it for money.
>> You:Hmm, I got your point, thanks.
DialoGPT: No problem. You have to have money to buy gold, but its only good for trading it.
>> You:Yeah, I understand, thank you so much.
DialoGPT: Ok, let me know if you need help

Giờ đây, chatbot rõ ràng có ý nghĩa trong nhiều trường hợp.

Bây giờ, hãy thêm một số mã để tạo nhiều hơn một phản hồi chatbot và sau đó chúng tôi chọn phản hồi nào sẽ bao gồm trong đầu vào tiếp theo:

# chatting 5 times with nucleus & top-k sampling & tweaking temperature & multiple
# sentences
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids_list = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=50,
        temperature=0.75,
        num_return_sequences=5,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the outputs
    for i in range(len(chat_history_ids_list)):
      output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
      print(f"DialoGPT {i}: {output}")
    choice_index = int(input("Choose the response you want for the next input: "))
    chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)

Tôi đã thiết lập num_return_sequencesđể 5trả lại 5 câu cùng một lúc, chúng ta phải chọn một câu sẽ có trong chuỗi tiếp theo. Dưới đây là làm thế nào nó đi:

>> You:Hello, friend.
DialoGPT 0: Hey there!
DialoGPT 1: I am your friend, and I am also your friend.
DialoGPT 2: Hello. :
DialoGPT 3: Hiya! How are you?
DialoGPT 4: Hello. I like your username.
Choose the response you want for the next input: 3
>> You:Fine, thanks, what about you?
DialoGPT 0: Good. Not much to do.
DialoGPT 1: Not much, you?
DialoGPT 2: Fine, thanks.
DialoGPT 3: I'm okay, thank you for asking.
DialoGPT 4: I'm fine too.
Choose the response you want for the next input: 2
>> You:Can you tell me how can I get rich quickly?
DialoGPT 0: Yes, I can!
DialoGPT 1: Yes, just look up how to get rich on Google.
DialoGPT 2: Just keep doing your thing. You'll get rich.
DialoGPT 3: No, I can't.
DialoGPT 4: no, i have no money
Choose the response you want for the next input: 1
>> You:Yeah, but I thought you have an answer...
DialoGPT 0: I'm getting married in the morning.
DialoGPT 1: How do you get rich?
DialoGPT 2: I think there are many good answers, but I could send you one, if you want.
DialoGPT 3: I will, thanks!
DialoGPT 4: I thought you have an answer.
Choose the response you want for the next input: 2
>> You:Yes, please send me.
DialoGPT 0: You can send me link.
DialoGPT 1: I am very interested
DialoGPT 2: I sent you a PM
DialoGPT 3: I'll send you a PM
DialoGPT 4: I am always interested in new ideas.
Choose the response you want for the next input: 2

Phần kết luận

Và bạn hiểu rồi, tôi hy vọng hướng dẫn này đã giúp bạn cách tạo văn bản trên DialoGPT và các mô hình tương tự. Để biết thêm thông tin về cách tạo văn bản, tôi thực sự khuyên bạn nên đọc hướng dẫn Cách tạo văn bản bằng Transformers .

Tôi sẽ để bạn điều chỉnh các thông số để xem liệu bạn có thể làm cho bot hoạt động tốt hơn hay không.

Ngoài ra, bạn có thể kết hợp điều này với các hướng dẫn chuyển văn bản thành giọng nói và chuyển lời nói thành văn bản để xây dựng một trợ lý ảo như Alexa , Siri , Cortana , v.v.

#python #ai #chatbot 

Khaitan

Khaitan

1635844603

पायथन में ट्रांसफॉर्मर के साथ संवादी एआई चैटबॉट

जानें कि पाइथन में प्री-ट्रेन्ड DialoGPT मॉडल के साथ संवादी प्रतिक्रियाएं उत्पन्न करने के लिए हगिंगफेस ट्रांसफॉर्मर लाइब्रेरी का उपयोग कैसे करें।

हाल के वर्षों में चैटबॉट्स ने बहुत लोकप्रियता हासिल की है, और जैसे-जैसे व्यवसाय के लिए चैटबॉट्स का उपयोग करने में रुचि बढ़ती है, शोधकर्ताओं ने संवादी एआई चैटबॉट्स को आगे बढ़ाने पर भी बहुत अच्छा काम किया है।

इस ट्यूटोरियल में, हम संवादी प्रतिक्रिया पीढ़ी के लिए पूर्व-प्रशिक्षित DialoGPT मॉडल को नियोजित करने के लिए हगिंगफेस ट्रांसफॉर्मर लाइब्रेरी का उपयोग करेंगे ।

DialoGPT एक बड़े पैमाने पर ट्यून करने योग्य तंत्रिका संवादी प्रतिक्रिया पीढ़ी मॉडल है जिसे रेडिट से निकाले गए 147M वार्तालापों पर प्रशिक्षित किया गया था, और अच्छी बात यह है कि आप स्क्रैच से प्रशिक्षण की तुलना में बेहतर प्रदर्शन प्राप्त करने के लिए इसे अपने डेटासेट के साथ ठीक कर सकते हैं।

आरंभ करने के लिए, आइए ट्रांसफॉर्मर स्थापित करें :

$ pip3 install transformers

एक नई पायथन फ़ाइल या नोटबुक खोलें और निम्न कार्य करें:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# model_name = "microsoft/DialoGPT-large"
model_name = "microsoft/DialoGPT-medium"
# model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

DialoGPT के तीन संस्करण हैं; छोटा, मध्यम और बड़ा। बेशक, जितना बड़ा बेहतर होगा, लेकिन अगर आप इसे अपनी मशीन पर चला रहे हैं, तो मुझे लगता है कि छोटा या मध्यम आपकी याददाश्त को बिना किसी समस्या के फिट करता है। बड़े वाले को आज़माने के लिए आप Google Colab का भी उपयोग कर सकते हैं।

लालची खोज के साथ प्रतिक्रिया उत्पन्न करना

इस खंड में, हम प्रतिक्रिया उत्पन्न करने के लिए लालची खोज एल्गोरिथ्म का उपयोग करेंगे । यही है, हम चैटबॉट प्रतिक्रिया का चयन करते हैं जिसमें प्रत्येक समय चरण पर चुने जाने की सबसे अधिक संभावना होती है।

आइए लालची खोज का उपयोग करके हमारे AI के साथ चैट करने के लिए कोड बनाएं:

# chatting 5 times with greedy search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

आइए इस कोड के मूल की व्याख्या करें:

  • हम सबसे पहले चैटिंग के लिए यूजर से इनपुट लेते हैं।
  • हम input_idsDialoGPT टोकननाइज़र का उपयोग करने के लिए टेक्स्ट को एन्कोड करते हैं , हम स्ट्रिंग टोकन के अंत को भी जोड़ते हैं और इसे पाइटोरच टेंसर के रूप में वापस करते हैं।
  • अगर यह पहली बार बॉट के साथ चैट कर रहा है, तो हम input_idsएक पीढ़ी के लिए सीधे अपने मॉडल को फीड करते हैं। अन्यथा, हम torch.cat()मेथड की मदद से कॉन्सटेनेशन का उपयोग करके चैट हिस्ट्री को जोड़ देते हैं ।
  • उसके बाद, हम model.generate()चैटबॉट प्रतिक्रिया उत्पन्न करने के लिए विधि का उपयोग करते हैं ।
  • अंत में, जैसा कि लौटा हुआ आउटपुट एक टोकन अनुक्रम भी है, हम अनुक्रम का उपयोग करके डीकोड करते हैं tokenizer.decode()और यह सुनिश्चित skip_special_tokensकरने के Trueलिए सेट करते हैं कि हमें कोई कष्टप्रद विशेष टोकन जैसे कि <|endoftext|>. साथ ही, चूंकि मॉडल पूरे अनुक्रम को लौटाता है, हम पिछले चैट इतिहास को छोड़ देते हैं और केवल नए जेनरेट किए गए चैटबॉट उत्तर को प्रिंट करते हैं।

नीचे बॉट के साथ एक नमूना चर्चा है:

>> You:How can you be rich so quickly?
DialoGPT: I'm not rich, I'm just a rich man.
>> You:Ok, you don't make sense, how old are you ?
DialoGPT: I'm not a rich man, I'm just a rich man.
>> You:How can you be happy ?
DialoGPT: I'm not happy, I'm just a rich man.
>> You:But they say money buys happiness!
DialoGPT: Money buys happiness, but not happiness itself.
>> You:You don't make sense again!
DialoGPT: I'm not happy, I'm just a rich man.

आप देखते हैं कि मॉडल बहुत सारी प्रतिक्रियाओं को दोहराता है, क्योंकि ये सबसे अधिक संभावना है और यह हर बार इसे चुन रहा है।

डिफ़ॉल्ट रूप से, model.generate()लालची खोज एल्गोरिथ्म का उपयोग करता है जब कोई अन्य पैरामीटर सेट नहीं किया जाता है, अगले अनुभागों में, हम इस पद्धति में कुछ तर्क जोड़ेंगे कि क्या हम पीढ़ी में सुधार कर सकते हैं।

बीम खोज के साथ प्रतिक्रिया उत्पन्न करना

बीम खोज हमें num_beamsहर समय कदम पर परिकल्पना की सबसे अधिक संभावना रखते हुए उच्च संभावना अनुक्रमों के लापता होने के जोखिम को कम करने की अनुमति देता है और फिर उन अनुक्रमों को लेकर जिनकी समग्र उच्चतम संभावना है, नीचे दिए गए कोड बीम खोज के साथ चैटबॉट प्रतिक्रियाएं उत्पन्न करेंगे:

# chatting 5 times with beam search
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        num_beams=3,
        early_stopping=True,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

सेट करते समय num_beamsके लिए 3में model.generate()विधि है, तो हम हर बार कदम पर 3 शब्दों का चयन और अनुक्रम के उच्चतम समग्र संभावना खोजने के लिए उन्हें विकसित करने के लिए जा रहे हैं, की स्थापना num_beams1 के लिए लालची खोज के समान है।

नीचे बीम खोज का उपयोग करके चैटबॉट के साथ एक नमूना चर्चा है:

>> You:How can I be rich?
DialoGPT: You can't.
>> You:Why not?
DialoGPT: Because you can't.
>> You:Then how am I supposed to live well?
DialoGPT: You can't.
>> You:So basically, you want me to be a poor guy?
DialoGPT: No, I want you to be a rich guy.
>> You:Ok how to be rich, then?
DialoGPT: I don't know.

नमूनाकरण के साथ प्रतिक्रिया उत्पन्न करना

पिछले अनुभागों में, हमने उच्चतम संभाव्यता अनुक्रम उत्पन्न करने के लिए बीम और लालची खोज का उपयोग किया था। अब यह मशीनी अनुवाद या टेक्स्ट सारांश जैसे कार्यों के लिए बहुत अच्छा है जहां आउटपुट अनुमानित है। हालाँकि, चैटबॉट्स की तरह ओपन-एंडेड पीढ़ी के लिए यह सबसे अच्छा विकल्प नहीं है।

एक बेहतर पीढ़ी के लिए, हमें कुछ यादृच्छिकता पेश करने की आवश्यकता है जहां हम संभावनाओं के आधार पर उम्मीदवार अनुक्रमों की एक विस्तृत श्रृंखला से नमूना लेते हैं:

# chatting 5 times with sampling
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=0,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

इस बार, हमने नमूनाकरण के लिए सेट do_sampleकिया Trueहै, और हम यह इंगित करने के top_kलिए सेट हैं 0कि हम सभी संभावित संभावनाओं का चयन कर रहे हैं, हम बाद में top_kपैरामीटर पर चर्चा करेंगे ।

यहाँ इन मापदंडों के साथ बातचीत है:

>> You:Hello, how can I be rich?
DialoGPT: You can't
>> You:Really? then how others get rich?
DialoGPT: They work as money artists
>> You:How to be a money artist?
DialoGPT: I don't know, I don't have enough knowledge on artists
>> You:Alright, thank you!
DialoGPT: No problem, if you need tips pm me.
>> You:Great, you're my savior!
DialoGPT: Thank you for helping a stranger out like me!

स्पष्ट रूप से कुछ सुधार हैं। हालांकि, कम संभावनाओं वाले अनुक्रमों की एक विस्तृत सूची पर नमूना लेने से यादृच्छिक पीढ़ी हो सकती है (जैसा कि आप अंतिम वाक्य में देखते हैं)।

इसे और बेहतर बनाने के लिए, हम यह कर सकते हैं:

  • नमूनाकरण कम करें temperature, जिससे हमें कम संभावना वाले शब्दों को चुनने की संभावना कम करने में मदद मिलती है और उच्च संभावना वाले शब्दों को चुनने की संभावना बढ़ जाती है।
  • सभी संभावित घटनाओं को चुनने के बजाय टॉप-के नमूने का उपयोग करें, इससे हमें कम संभावना वाले शब्दों को चुनने से रोकने में मदद मिलेगी।
# chatting 5 times with Top K sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_k=100,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

अब, हम संभाव्यता द्वारा अवरोही क्रम में शीर्ष शब्दों से नमूना लेने के लिए सेट top_kकरते हैं । हम उच्च संभावना वाले शब्दों को चुनने का एक उच्च मौका देने के लिए (डिफ़ॉल्ट है ) पर भी सेट करते हैं , तापमान को लालची खोज के समान ही सेट करते हैं, इसे अनंत पर सेट करना पूरी तरह से यादृच्छिक के समान है।100100temperature0.751.00.0

यहाँ इन मापदंडों के साथ एक चर्चा है:

>> You:How can I get rich?
DialoGPT: I'd also like to know this.
>> You:So you don't know it?
DialoGPT: I know it, but it doesn't matter.
>> You:Really? money matters, man!
DialoGPT: Well, money is money and I gotta get my money.
>> You:Yeah how to get them?
DialoGPT: Well, that's a question. I'd like to know more, just for curiosity's sake, to know what other people have said about how I should get rich.
>> You:Some robots like you told me you have to buy gold, silver or stocks.
DialoGPT: I think my wallet is empty.

जैसा कि आप देख सकते हैं, यह अब बहुत बेहतर है, बेझिझक ट्विक करें temperatureऔर top_kपैरामीटर देखें और देखें कि क्या यह इसमें सुधार कर सकता है।

न्यूक्लियस सैंपलिंग

न्यूक्लियस सैंपलिंग या टॉप-पी सैंपलिंग उन सबसे छोटे संभव शब्दों में से चुनता है जिनकी संचयी संभावना pहमारे द्वारा निर्धारित पैरामीटर से अधिक होती है ।

टॉप-पी सैंपलिंग का उपयोग करते हुए एक उदाहरण नीचे दिया गया है:

# chatting 5 times with nucleus sampling & tweaking temperature
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=0,
        temperature=0.75,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the output
    output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {output}")

हमने टॉप-के सैंपलिंग को अक्षम top_kकरने के 0लिए सेट किया है, लेकिन आप दोनों विधियों का उपयोग कर सकते हैं जो बेहतर काम करती हैं। यहाँ एक चैट है:

>> You:How can I be rich fast?
DialoGPT: Inflation and free money
>> You:lol, but that takes time, I need fast money.
DialoGPT: Your post makes no sense. You just need a lot of money to be rich fast
>> You:Ok how to get a lot of money?
DialoGPT: Buy gold and then trade it for money.
>> You:Hmm, I got your point, thanks.
DialoGPT: No problem. You have to have money to buy gold, but its only good for trading it.
>> You:Yeah, I understand, thank you so much.
DialoGPT: Ok, let me know if you need help

अब चैटबॉट कई मामलों में स्पष्ट रूप से समझ में आता है।

अब एक से अधिक चैटबॉट प्रतिक्रिया उत्पन्न करने के लिए कुछ कोड जोड़ते हैं, और फिर हम चुनते हैं कि अगले इनपुट में किस प्रतिक्रिया को शामिल करना है:

# chatting 5 times with nucleus & top-k sampling & tweaking temperature & multiple
# sentences
for step in range(5):
    # take user input
    text = input(">> You:")
    # encode the input and add end of string token
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    # concatenate new user input with chat history (if there is)
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    # generate a bot response
    chat_history_ids_list = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=50,
        temperature=0.75,
        num_return_sequences=5,
        pad_token_id=tokenizer.eos_token_id
    )
    #print the outputs
    for i in range(len(chat_history_ids_list)):
      output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
      print(f"DialoGPT {i}: {output}")
    choice_index = int(input("Choose the response you want for the next input: "))
    chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)

मैंने एक बार में 5 वाक्यों को वापस num_return_sequencesकरने के 5लिए निर्धारित किया है, हमें एक को चुनना होगा जिसे अगले अनुक्रम में शामिल किया जाएगा। यहां बताया गया है कि यह कैसे चला गया:

>> You:Hello, friend.
DialoGPT 0: Hey there!
DialoGPT 1: I am your friend, and I am also your friend.
DialoGPT 2: Hello. :
DialoGPT 3: Hiya! How are you?
DialoGPT 4: Hello. I like your username.
Choose the response you want for the next input: 3
>> You:Fine, thanks, what about you?
DialoGPT 0: Good. Not much to do.
DialoGPT 1: Not much, you?
DialoGPT 2: Fine, thanks.
DialoGPT 3: I'm okay, thank you for asking.
DialoGPT 4: I'm fine too.
Choose the response you want for the next input: 2
>> You:Can you tell me how can I get rich quickly?
DialoGPT 0: Yes, I can!
DialoGPT 1: Yes, just look up how to get rich on Google.
DialoGPT 2: Just keep doing your thing. You'll get rich.
DialoGPT 3: No, I can't.
DialoGPT 4: no, i have no money
Choose the response you want for the next input: 1
>> You:Yeah, but I thought you have an answer...
DialoGPT 0: I'm getting married in the morning.
DialoGPT 1: How do you get rich?
DialoGPT 2: I think there are many good answers, but I could send you one, if you want.
DialoGPT 3: I will, thanks!
DialoGPT 4: I thought you have an answer.
Choose the response you want for the next input: 2
>> You:Yes, please send me.
DialoGPT 0: You can send me link.
DialoGPT 1: I am very interested
DialoGPT 2: I sent you a PM
DialoGPT 3: I'll send you a PM
DialoGPT 4: I am always interested in new ideas.
Choose the response you want for the next input: 2

निष्कर्ष

और आप वहां जाएं, मुझे आशा है कि इस ट्यूटोरियल ने आपको DialoGPT और इसी तरह के मॉडल पर टेक्स्ट जेनरेट करने में मदद की। टेक्स्ट जेनरेट करने के तरीके के बारे में अधिक जानकारी के लिए, मैं आपको ट्रांसफॉर्मर्स गाइड के साथ टेक्स्ट जेनरेट करने का तरीका पढ़ने की अत्यधिक सलाह देता हूं ।

यह देखने के लिए कि क्या आप बॉट को बेहतर प्रदर्शन कर सकते हैं, मैं आपको मापदंडों को बदलना छोड़ दूँगा।

साथ ही, आप इसे टेक्स्ट-टू-स्पीच और स्पीच-टू-टेक्स्ट ट्यूटोरियल्स के साथ जोड़कर एक वर्चुअल असिस्टेंट जैसे एलेक्सा , सिरी , कोरटाना आदि बना सकते हैं।

#python #chatbot #ai