Building a “Smarter” Voice Feed
One of the many nice things about RSS feeds is that they are updated regularly – this means that your voice application will provide a steady stream of updated headlines or events. Unfortunately, it means that you have less control over how the content is presented. Its always a good practice in developing VoiceXML applications to prerecord prompt and messages – this helps avoid user confusion when a text-to-speech engine has difficulty pronouncing a word. But because RSS feeds are so dynamic, prerecording prompts is not possible.
This wouldn’t be such a problem if RSS didn’t have its roots in news delivery. Because RSS feeds are typically used to provide up to the minute headlines or news stories, the content of these feeds may not lend themselves to aural presentation. Headlines are meant to be eye-catching, not ear-catching. This is particularly true of technology feeds, which can be littered with
acronyms, abbreviations and other strange anomalies.
For example, consider the following example headline: “IBM introduces new UNIX OS to support customer IT efforts.” Depending on which TTS engine is being used, your results on rendering this written sentence as spoken text will vary. Some TTS engines will assume that words composed of all upper case letters are meant to be sounded out by individual letters (for example, IBM would be rendered in speech as eye-bee-em). However, the above sentence shows the limitations of such an approach. The word UNIX is typically written in upper case but is pronounced “yoo-niks.” Fortunately, with XSLT it is possible to create “smarter” voice feeds.
Using XSLT templates, it is possible to identify troublesome words or text strings and replace them with other, more easily pronounceable words or phrases. These replacements can be the full spelling of abbreviated words, or the phonetic spelling of acronyms and other pesky items. I’ll skip the details in this tutorial because there is an excellent overview of this technique available on XML.com. My example of a “search and replace” XSLT document to improve the pronunciation quality of the of the CNET news RSS feed we have been working with can be found here.
Once you have your search and replace XSLT file, you have some decisions to make. You must decide what words you want to replace as your XSLT file converts the source RSS data into VoiceXML. You must also decide what you want to replace these words with. For acronyms or abbreviated words, I recommend replacement using the entire word (for example, replace OS with Operating System). This will probably be the cleanest and most efficient way of ensuring your users hear the headlines correctly.
VoiceXML provides a way to phonetically pronounce words. The <phoneme> element provides guidance to the TTS engine to instruct it on how to pronounce words properly. Note – this element is actually part of the Speech Synthesis Markup Language (SSML) specification, as is the <voice> element discussed previously, but it is available for use in VoiceXML. (There is some excellent information on this available from the VoiceGenie web site.)
The SSML specification also provides another option for dealing with words that are sometimes troublesome for TTS engines to pronounce. The <say-as> element allows developers to specify the kind of data to be rendered as speech, to help the TTS engine pronounce it properly. This element has two different attributes (“format” and “interpret-as”) to provide the appropriate context for the text to be rendered as speech to the
For example, to help a TTS engine properly render an e-mail address, the following code could be used:
<prompt> This is a test to show proper rendering of an email address <break size=“100”/> <say-as interpret-as=“net” format=“email”>firstname.lastname@example.org</say-as>
This is nice, but the options provided for the <say-as> element are somewhat limited. As a result, the implementation of this specification can very greatly between different VoiceXML hosting providers. If you decide to take the <say-as> approach to deal with unique pronunciations, you should read the implementation materials for your provider carefully. Information from the major VoiceXML hosting providers can be reviewed at the following links:
- BeVocal Hosting Platform (custom implementation)
- TellMe Hosting Platform (custom
- Voxeo Hosting Platform (appears to conform to SSML specification)
This tutorial provides a basic overview of how to convert RSS feeds into dynamic VoiceXML. It demonstrates the power of XSLT to convert content in one XML format to VoiceXML. XSLT is a powerful enabler of VoiceXML documents and applications, as subsequent tutorials will demonstrate.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.