The intended audience for this article is for users who are already familiar with Amazon Sumerian Hosts and are looking for a way to reduce their AWS bill (specifically for Amazon Polly) while maintaining their interactive & immersive experience.

Introduction

Amazon Sumerian Hosts is an open-source GitHub repository published by AWS that allows developers to integrate 3D virtual characters into their Web-based (ThreeJS or BabylonJS) interactive experiences. The virtual characters are referred to as Amazon Sumerian Hosts.

Technology

Amazon Sumerian Hosts are mostly powered by Amazon Polly (Amazon’s text-to-speech service). With Amazon Polly, you can make the Hosts speak 29 different languages (at the time of writing) with a wide variety of control through SSML allowing the developer to configure exactly how they want a sentence to be spoken. To learn more about Amazon Sumerian Hosts please visit the GitHub Repository.

Since most interactive web applications are client-based meaning that the browser is rendering the experience the only cost associated with Amazon Sumerian Hosts, besides the cost of hosting and delivery, is the API calls made to Amazon Polly.

Amazon Polly is a server-less pay-as-you-go service meaning that you are billed monthly for the number of characters of text that you processed.

Amazon Polly’s Standard voices are priced at $4.00 per 1 million characters for speech or Speech Marks requests (when outside the free tier). Amazon Polly’s Neural voices are priced at $16.00 per 1 million characters for speech or Speech Marks requested (when outside the free tier). Referenced from Amazon Polly’s Pricing page.

An Amazon Sumerian Host perform 2 API calls every time the developer requests it to speak a message.

  1. The first API call is to synthesize the text (transform the text to an audio file) and retrieve the MP3 Audio file to play.
  2. The second API call is to retrieve the Speech Marks for the text.

Speech marks are metadata that describe the speech that you synthesize, such as where a sentence or word starts and ends in the audio stream. When you request speech marks for your text, Amazon Polly returns this metadata instead of synthesized speech. By using speech marks in conjunction with the synthesized speech audio stream, you can provide your applications with an enhanced visual experience. Referenced from Amazon Polly’s Documentation.

#threejs #amazon-sumerian-host #babylonjs #amazon-s3 #amazon-polly

Amazon Sumerian Hosts: How to reduce your Amazon Polly cost
1.85 GEEK