The Conversational Speaker, a.k.a. "Friend Bot", uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back.
This project is written in .NET 6 which supports Linux/Raspbian, macOS, and Windows.
Read time: 15 minutes
Build time: 30 minutes
Cost:
Setup
You will need an instance of Azure Cognitive Services and an OpenAI account. You can run the software on nearly any platform, but let's start with a Raspberry Pi.
You may also use an instance of Azure OpenAI in place of OpenAI as well.
If you are new to Raspberry Pis, check out this getting started guide.
Choose OS
and select the default Raspberry Pi OS (32-bit).Choose Storage
, select the SD card.Write
and wait for the imaging to complete.The conversational speaker uses Azure Cognitive Service for speech-to-text and text-to-speech. Below are the steps to create an Azure account and an instance of Azure Cognitive Services.
Try Azure for Free
.Start Free
to start creating a free Azure account.NOTE: Even though this is a free account, Azure still requires credit card information. You will not be charged unless you change settings later.
Cognitive Services
. Under Marketplace
select Cognitive Services
. (It may take a few seconds to populate.)Resource Group
select Create New
. Enter a resource group name (e.g. conv-speak-rg
).my-conv-speak-cog-001
).NOTE: EastUS, WestEurope, or SoutheastAsia are recommended, as those regions tend to support the greatest number of features.
Review + Create
. After validation passes, click Create
.Go to resource
to view your Azure Cognitive Services resource.Resourse Management
, select Keys and Endpoint
.Windows 11 users: If the application is stalling when calling the text-to-speech API, make sure you have applied all current security updates (link).
The conversational speaker uses OpenAI's models to hold a friendly conversation. Below are the steps to create a new account and access the AI models.
Sign up
.NOTE: can use a Google account, Microsoft account, or email to create a new account.
NOTE: If you are new to OpenAI, please review the usage guidelines (https://beta.openai.com/docs/usage-guidelines).
View API keys
.+ Create new secret key
. Copy the generated key and save it in a secure location for later.If you are curious to play with the large language models directly, check out the https://platform.openai.com/playground?mode=chat at the top of the page after logging in to https://aka.ms/maker/openai.
The Code
Download .NET SDK x64
, and run the installer.{MyCognitiveServicesKey}
with your Azure Cognitive Services key and {MyOpenAIKey}
with your OpenAI API key from the sections above.There are several ways to run a program when the Raspberry Pi boots. Below is a suggested method which runs the application in a visible terminal window automatically. This allows you to not only see the output but also cancel the application by clicking on the terminal window and pressing CTRL+C.
/etc/xdg/autostart/friendbot.desktop
The code base has a default wake phrase ("Hey, Computer."
) already, which I suggest you use first. If you want to create your own (free!) custom wake word, then follow the steps below.
.table
file and copy it to src/ConversationalSpeaker/Handlers/WakePhrases
.ConversationalSpekaer.csproj
file to include your wake phrase file in the build.~/conversational-speaker/src/ConversationalSpeaker/configuration.json
.PromptEngine:OutputPrefix
),AzureCognitiveServices:SpeechSynthesisVoiceName
)PromptEngine:Description
)System:TextListener
to true
(good for testing changes).AzureCognitiveServices:SpeechSynthesisVoiceName
and PromptEngine
settings in src/ConverstationalSpeaker/configuration.json
.How it works
This application uses .NET's generic "HostBuilder" paradigm. The HostBuilder encapsulates handling dependencies (i.e., dependency injection), configuration, logging, and running a set of hosted services. In this example, there is only one hosted service, ConversationLoopHostedService
, which contains the primary logic loop.
// ConversationLoopHostedService.cs
while (!cancellationToken.IsCancellationRequested)
{
// Listen to the user.
string userMessage = await _listener.ListenAsync(cancellationToken);
// Run the message through the AI and get a response.
string response = await _conversationHandler.ProcessAsync(userMessage, cancellationToken);
// Speak the response.
await _speaker.SpeakAsync(response, cancellationToken);
}
Azure Cognitive Service's has an excellent (and free!) wake word support. After generating a keyword model (see "Create a custom wake word" above), we load it into the speech SDK and wait for the system to recognize the keyword.
// AzCognitiveServicesWakeWordListener.cs
_keywordModel = KeywordRecognitionModel.FromFile(keywordModelPath);
_audioConfig = AudioConfig.FromDefaultMicrophoneInput();
_keywordRecognizer = new KeywordRecognizer(_audioConfig);
do
{
result = await _keywordRecognizer.RecognizeOnceAsync(_keywordModel);
} while (result.Reason != ResultReason.RecognizedKeyword);
To listen to the user, the application leverages Azure Cognitive Service's speech-to-text feature. The feature supports many languages and configurations. This project's default language is english (en-US
) and uses the default system microphone.
// AzCognitiveServicesListener.cs
// Configure the connection to Azure.
SpeechConfig speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
speechConfig.SpeechRecognitionLanguage = _options.SpeechRecognitionLanguage;
speechConfig.SetProperty(PropertyId.SpeechServiceResponse_PostProcessingOption, "TrueText");
// Configure the local audio setup
_audioConfig = AudioConfig.FromDefaultMicrophoneInput();
_speechRecognizer = new SpeechRecognizer(speechConfig, _audioConfig);
And last, but not least, we head back to Azure Cognitive Services for its text-to-speech feature to give a voice to our AI. Since we are parsing out a style cue from OpenAI, we'll need to use the text-to-speech's Speech Synthesis Markup Language (SSML) support.
// AzCognitiveServicesSpeaker.cs
SpeechConfig speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
speechConfig.SpeechSynthesisVoiceName = _options.SpeechSynthesisVoiceName;
_speechSynthesizer = new SpeechSynthesizer(speechConfig);
message = ExtractStyle(message, out string style);
string ssml = GenerateSsml(message, style, _options.SpeechSynthesisVoiceName);
await _speechSynthesizer.SpeakSsmlAsync(ssml);
In the case of speaking "That's great to hear! ~~excited~~"
, the SSML sent to Azure Cognitive Services would like like this:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<mstts:express-as style="excited">That's great to hear!</mstts:express-as>
</voice>
</speak>
Troubleshooting
This can occur when the Azure Speech SDK is having trouble accessing your microphone. To get more details on the issue, enable debug logging by setting the Logging:LogLevel:Default
setting in configution.json
to Debug
and run the application again.
Additionally, make sure your microphone is not being used by another application and is not set to "Do not allow apps to access your microphone".
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
<ItemGroup>
<None Update="Handlers\WakePhrases\{YOUR FILE}.table">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
reboot
[Desktop Entry]
Exec=lxterminal --command "/bin/bash -c '~/.dotnet/dotnet run --project ~/conversational-speaker/src/ConversationalSpeaker; /bin/bash'"
sudo nano /etc/xdg/autostart/friendbot.desktop
cd ~/conversational-speaker/src/ConversationalSpeaker
dotnet build
dotnet run
cd ~/conversational-speaker/src/ConversationalSpeaker
dotnet user-secrets set "AzureCognitiveServices:Key" "{MyCognitiveServicesKey}"
dotnet user-secrets set "AzureCognitiveServices:Region" "{MyCognitiveServicesRegion (e.g., EastUS)}"
dotnet user-secrets set "OpenAI:Key" "{MyOpenAIKey}"
git clone https://github.com/microsoft/conversational-speaker.git
dotnet --version
echo 'export DOTNET_ROOT=$HOME/.dotnet' >> ~/.bashrc
echo 'export PATH=$PATH:$HOME/.dotnet' >> ~/.bashrc
source ~/.bashrc
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 6.0
Download Details:
Author: Microsoft
Official Github: https://github.com/microsoft/conversational-speaker
License: MIT