User Agent Sniffing for Multi-Channel Apps

Several months ago, voice platform company Voxeo announced an exciting addition to its industry leading VoiceXML platform – the ability to repurpose existing VoiceXML applications and turn them into instant messaging (IM) applications and text messaging (SMS) applications.

And while I’ve had several opportunities to expound on the importance of this announcement, I have not yet had time to use a practical example to demonstrate just how powerful the new functionality Voxeo is now offering can be.

How big a deal is this really?

To be honest, when I first read about Voxeo’s new offering, I didn’t think it was a big deal. I thought it was a freaking huge deal – I’m talking Godzilla, baby. This is big!

Lots of people are wary of proprietary extensions to open standards like VoiceXML because they can make your code less portable (if you find you need to transition from one vendor to another). But unlike other platform-specific extensions to VoiceXML, Voxeo’s new feature doesn’t make you write VoiceXML in a way that violates the standard, it changes the way that the platform consumes VoiceXML.

The example that I will demonstrate below is written in 100% standard-compliant VoiceXML. It can be run on any VoiceXML platform that adheres to the VoiceXML spec, and probably a bunch that don’t.

That’s why this announcement from Voxeo is a game changer for developers – it provides sophisticated and powerful new functionality without the burden of having to alter your code in a way that makes it less portable. In addition, it allows developer to create sophisticated applications that can be delivered to users via IM and SMS using all the familiar tools of VoiceXML.

All of the benefits of working with VoiceXML are available for developers that want to write SMS and IM apps — grammars, the FIA, the standard root-leaf structure of VoiceXML applications — its all there.

The nose knows

While it might be tempting to think that any existing VoiceXML application can be repurposed with no changes for IM and/or SMS, this is probably not true except for the most simplistic of apps. Just as there are certain types of interactions that may make sense for use with visual web applications and not with voice apps, there are certain types of dialog flows that work well in voice apps but don’t necessarily translate well to the world of IM and SMS.

Consider the following:

  • Phone applications often employ a confirmation dialog, to prompt a caller to verify what they have just entered – particularly if the information is important or sensitive (e.g., a credit card number, or account number). This probably not a good practice with SMS or IM apps because, unlike with a phone application, a user can see what information they are about to submit before the actually send it. Also, since multiple text messages can mean increased changes for a user its a good idea to cut down on this where possible.
  • DTMF-based phone applications often use prompts like “Press 1 to repeat, press 2 to go to the previous menu…” The notion of “pressing” something is really tied to the telephone key pad and is probably not appropriate for SMS or IM applications. Its much clearer to use prompts like “Enter 1 to go to the next option”, or “Send #back to go to the previous step”.
  • Re-prompting a user on noinput is a pretty standard practice in phone applications, but is probably not a good idea in IM or SMS apps. If a person is using their cell phone to interact with an SMS application, and then gets a call, they probably do not want to be repeatedly prompted for input by your app that they will not have an opportunity to enter until after their call has finished.

So if all of this is true, then how does a developer determine how an application is being accessed? It seems pretty clear that there needs to be a way to determine if an application is being called from a phone, from an SMS cell phone or from an IM client.

Turns out, there is – good old fashioned browser sniffing.

A simple example

The Voxeo blog has a good overview of this new feature, and how you can set up and run a VoiceXML application that is also IM and SMS enabled.

One of things that is really nice about this new feature of the Voxeo Prophecy platform is that the text that is sent to a user via SMS or IM, and all of the inputs sent back from the user, follow the same logic rules as with a phone interaction. This means that if there are VoiceXML elements with conditional attributes on them, they get evaluated when rendering text for SMS and IM, just like they do for the telephone.

Another really nice feature is that Voxeo lets you deploy a single number for SMS and traditional phone access to your application. So if you dial a provisioned number on your telephone, the traditional VoiceXML browser engages and executes your code in the typical fashion. If you send an SMS message to the number, the new Prophecy 10 browser engages your code and manages the SMS interaction.

Because of this, its fairly straightforward to detect which browser is accessing your code, and create a simple variable declaration that will govern how your output is rendered.

The following is a simple PHP class that can be used to sniff the browser type requesting a specific file.

To use this class, we simply include it in our PHP page that will render VoiceXML, determine what kind of user agent is requesting the page by calling getChannelType() and set a VoiceXML variable accordingly.

If an SMS or IM client is interacting with our application, the Prophecy 10 browser will make the request. If its a standard telephone, it will be the Prophecy 8 browser, so we really just need to use the value of $_SERVER['HTTP_USER_AGENT'] to guide our app.

You’ll notice that if an SMS or IM client is accessing our app, we skip the confirmation field. We’ve also customized the reprompt logic on noinput to ensure that a user does not get successive SMS or IM messages telling them “Sorry, I did not get your response.”

So with a few lines of server side code, we’ve custom tailored our VoiceXML dialog to ensure that it renders properly regardless of the type of user agent accessing it.

Voxeo Prophecy’s new features are powerful, and VoiceXML developers should take notice. With Prophecy, they can leverage their skills and become crackerjack SMS and IM app developers as well.


First Draft of VoiceXML 3.0 Released

The W3C has released the First Public Working Draft of Voice Extensible Markup Language (VoiceXML) 3.0. This is the next version of the VoiceXML language that reportedly will include a host of new features, including speaker authentication.

Although this is an early draft, according to the document:

By the middle of 2009 the group expects to have all existing functionality defined in detail, the new functionality stubbed out, and the VoiceXML 2.1 profile largely defined. By late-2009 the group expects to have all functionality defined and both profiles defined in detail.

I’ve got to find some time to go through this document — in addition to being a very interesting read, it might be kind of neat to provide input that might get incorporated into the standard by the W3C.

Guess I know what I’ll be doing for New Year’s Eve. 😉

Accessing Web Services From VoiceXML

A few weeks ago, I posted about accessing web services from CCXML using PHP. This post will demonstrate how to do the same thing, only from VoiceXML. We’ll be using Voxeo Prophecy and PHP for this example. We’ll also be referring to the GreenPhone project — available free for download — for the sample code.

Before we dive in, its important to keep in mind that there are a number of different techniques for getting information from web services into a VoiceXML dialog. This is just one method — there are many others. Voxeo even has its own platform-specific way of accessing SOAP web services via JavaScript. Ultimately, the method you employ needs to be a good fit for the environment your working in and the requirements of your project.

Using the greenSoapClient Class

In the last post on this topic, I demonstrated how to use a simple PHP class as a way to access multiple SOAP-based web services from CCXML. This class forms the basis of our method for accessing web services from VoiceXML as well. However, in this instance, instead of using the CCXML <send/> element, we’ll use a VoiceXML subdialog.

Subdialogs in VoiceXML are typically used to create reusable dialog components for capturing common types of input, like a series of digits (e.g., credit card numbers, account numbers, etc). They can also be used to compartmentalize complex interactions with a caller and provide a simple interface for accessing results. By way of example, this is how the OSDMs from Nuance work, as well as the Targus service from Voxeo. We’ll borrow this approach to access a web service from StrikeIron that will send the details of an E85 or bio-diesel station to a cell phone via SMS.

Setting up our Subdialog

In order to send an SMS message with details on an E85 or bio-diesel station, we’ll need 2 things; the station details, and a cell phone number to send it to.

In order to send the details on a station from VoiceXML to PHP, we’ll pack it up in a pipe-delimited string called “detailsToSend” (I won’t go into too much detail about how this is done in this post — to learn more, refer to the GreenPhone Project code). The cell phone number we are sending to is obtained from the caller ID of the calling party, stored in a variable named “ani”. Details on how to access caller ID are given in a previous post.

Our subdialog call will look like this:

<form id="sendDetails">
 <catch event="error.badfetch">
   There was a problem sending the station details to your phone.
   <break strength="weak"/>
  <goto next="#goodbye"/>
<subdialog name="sendSMS" src="../php/sendStationDetails.php" namelist="ani detailsToSend">
  Sending the station details to
  <say-as interpret-as="telephone"><value expr="ani"/></say-as>
  <if cond="sendSMS.result==0">
   <prompt>Your message has been sent.<break strength="weak"/></prompt>
    There was a problem sending the station details to your phone.
    <break strength="weak"/>
 <goto next="#goodbye"/>

We use the attributes on the <subdialog> element to give our subdialog a name (which we’ll use to access the results sent back from PHP), to specify where to POST our variables to and also to specify which variables to POST.

You’ll also notice that we have set up a handler here for an “error.badfetch” event. This is a good habit to get into whenever you set up a request to an external resource (like a PHP script). If the script isn’t there or has problems, an “error.badfetch” event will get returned and unless you specified a handler for this event, your day will not end well.

Additionally, we’ve set up logic in our filled block to inspect the result of the subdialog call. We access the result as a property of the subdialog, using the name we set up in the <subdialog> element and the dot notation (“.”) familiar to JavaScript.

<if cond=”sendSMS.result==0″>

… code logic goes here …


With this in mind, our PHP script needs to send back a variable called “result”. How do we do this? Lets take a look at the PHP script:

A Simple Subdialog using PHP

The subdialog that we want to render is extremely simple — we only need to render enough VoiceXML to declare a variable called “result” and return it to the parent dialog. We’ll do this after we make our web service call to send the SMS message.

There are two pieces of information returned from the StrikeIron web service that we are interested in; a string that holds the response message from the service (i.e., “success”, “failure”, etc.) and a number indicating the outcome of the web service call.

We’ll take these two bits if information and assign them to PHP variables:

$result = $xml->soapHeader->ResponseInfo->ResponseCode;
$message = $xml->soapHeader->ResponseInfo->Response;

Now, we want to write out these variables in a simple VoiceXML subdialog:

<?xml version="1.0" encoding="utf-8"?>
<vxml version="2.1" xmlns="">
<form id="F_1">
 <log>*** SMS response message was: <?php echo $message; ?>. ***</log>
  <var name="result" expr="<?php echo $result ?>"/>
  <return namelist="result"/>

As discussed above, this creates just enough VoiceXML to instantiate a variable and return it to the parent dialog. For good measure, we’ll write out the web service string (contained in the PHP variable $message) as a log statement, in case it contains information we want to look at later.

Why This Approach?

Using this technique for accessing web services from VoiceXML provides a couple of advantages. First, it allows us to completely separate the presentation layer (the VoiceXML) from the logic used to invoke the web service. This is a fairly standard design practice that makes creating the dialog much easier for a developer that does not necessarily know a whole lot about web services. With this approach, they don’t really need to — they only need to know that the subdialog call will return a variable called “result” whose value can be inspected to determine what to do next.

Additionally, because the parent dialog is just static VoiceXML it may be possible to cache it. Since the parent dialog isn’t dynamic, it can be cached for fast access, while the subdialog — which must be dynamic — is the only component sent from the web server to the VoiceXML platform each time a caller accesses the application. Careful design can yield additional caching opportunities that can make your applications more efficient and less bandwidth intensive.

In the next post, we’ll explore one additional method for accessing web service from VoiceXML. Stay tuned…