On the demise of SALT

There has been much talk of late on Microsoft’s announcement that it will support VoiceXML in a forthcoming version of Speech Server. Many that I have read or listened to have pointed to this as

  • Good news; and,
  • Evidence that open standards (like VoiceXML) are truly the best way to develop phone-based application.

There is ample evidence that Microsoft has no problem advancing it’s own standard – even under the banner of “openness” – if it sees a financial benefit in doing so. If you don’t agree, I’d refer you to the debate raging about an open document format. So, while I agree wholeheartedly with the second point, I’m not so sure I agree with the first. At least not totally.

I think the bad news in Microsoft’s announcement can be identified by remembering what it’s nascent SALT specification was designed to do. Speech Application Language Tags were designed to be extensions to XHTML – in other words, the specification was developed specifically to build multimodal applications. And although you can build pure telephone applications with SALT, this was not the original intent.

So, if Microsoft suddenly got religion and decided to support VoiceXML for building telephone applications it may mean that multimodal applications aren’t going anywhere for a while. That’s bad news in my opinion.

I’d ask that readers refute this assertion by pointing out some existing production uses of multimodal technology. If they can find any…

Changing VoiceXML Property Values

Properties are used to set values [in VoiceXML applications] that affect platform behavior, such as the recognition process, timeouts, caching policy, etc.

One of the most frequently used properties is the “inputmodes” property, which controls the types of input that a caller may use to interact with a VoiceXML application. In practice, it’s often a wise design consideration to construct handlers for when certain types of input are not working well (e.g., voice input) and direct a caller to utilize another input method (e.g., touch tone, or DTMF entry).

However, while it is fairly easy to detect the type of input method being used by a caller, and even easier to simply tell a caller to use an alternate input method, it can sometime be tricky to change the “inputmodes” property from within a VoiceXML application. Changing this property value can become important when a VoiceXML application is not handling voice input efficiently. This can occur in a noisy environment with lots of background noise, or when other factors (static on the line) may be causing the Automatic Speech Recognition (ASR) engine to try and recognize input. Often, an ASR engine will attempt a recognition when what it thinks is spoken input is detected. Setting the “inputmodes” property to DTMF will cause the VoiceXML platform to ignore spoken input.


<!-- Default / typical setting -->
<property name="inputmodes" value="dtmf voice"/>

<!-- Setting for use in noisy environments -->
<property name="inputmodes" value="dtmf"/>

VoiceXML doesn’t allow property values to be set through client-side scripting. There is no “expr” attribute to the <property> tag, as there is with so many other VoiceXML elements, but perhaps there will be in a future version of the specification. So, what options are there for manipulating this property from within a VoiceXML application…?

Detecting input modes

Before we can change the “inputmodes” property, we need to be able to detect which mode a caller is using, and set up handlers for when there are problems. To detect the input mode used by a caller, we can access the “application.lastresult$” variable. This variable holds, among other things, information about the input mode last used. A set of handlers that leverages this information to help a caller having trouble might look like this:


<nomatch count="1">
<prompt>I'm sorry I didn't understand what you said. </prompt>
<reprompt/>
</nomatch>

<nomatch count="2">
<if cond=" application.lastresult$.inputmode='voice'">
<prompt>I'm still having trouble hearing you.
Please try entering your selection using your touch tone key pad. </prompt>
<reprompt />
<else/>
<prompt>I didn't get that. Please try entering your selection again.
</prompt>
<reprompt />
</if>
</nomatch>

<nomatch count="3">
<prompt>I'm sorry. I'm still having trouble understanding your selection.
Please wait while I transfer you to a customer service representative.
</prompt>
<goto next="#transfer" />
</nomatch>

As stated above, its pretty straightforward to simply tell the caller to use DTMF entry – which I’ve done above – but quite another to actually enforce it by changing the “inputmodes” property. If the problem the caller is having is a function of a faulty line, or a noisy environment (both factors probably outside their control) telling them to use DTMF entry probably isn’t a good enough solution by itself.

Changing property values inside a VoiceXML application – the static approach

The easiest way to change property settings is simply to direct a caller to another dialog or module with the desired setting. So, using the above example, we might detect when a caller is having trouble and move them to a different part of the application:


<nomatch count="2">
<if cond=" application.lastresult$.inputmode='voice'">
<prompt>I'm still having trouble hearing you.
Please try entering your selection using your touch tone key pad. </prompt>
<goto next="#dtmfOnly"/>
<else/>
<prompt>I didn't get that. Please try entering your selection again.
</prompt>
<reprompt />
</if>
</nomatch>

<form id="dtmfOnly">
<property name="inputmodes" value="dtmf"/>

..some more VoiceXML logic...

</form>

By directing the caller to a specific portion of the application with a property setting scoped to that part of the application, we may have gone a long way toward solving the problem. However, we may have inadvertently created another – this approach can get burdensome because it means that there is more code to write and take care of. The “dtmfOnly” form may be identical to the form that directed the caller there in every way, with the exception of this one property setting. In some extreme cases, it could mean developing parallel call legs for DTMF and Voice entry. (Shudder!)

Another approach involves the introduction of some server-side scripting.

Changing property values inside a VoiceXML application – the dynamic approach

You can’t change the value of a property setting with VoiceXML alone (or even with VoiceXML and JavaScript), but it’s possible if you use some server side logic. For this example, I’ll use PHP but you can really use any flavor that you desire.

To start out, lets add some server-side code at the beginning of our VoiceXML script, or in the application root document if you prefer:


<?php

// Check to see if mode variable is submitted with page request
if (isset($_REQUEST['mode']) ) {
echo '<property name="inputmodes" value="'.$_REQUEST['mode'].'"/>';
}

// If not, use default value with property setting
else {
echo '<property name="inputmodes" value="dtmf voice"/>';
}

?>

This, admittedly simplistic set of logic does two things; first it checks to see if a variable submitted with the page request (named “mode”) is set, and if it is, it uses the value of “mode” to generate the VoiceXML markup to set the property value for our application. Otherwise, we simply set the value of our “inputmodes” property to DTMF/voice (this step is probably not necessary, since this is likely the default setting on the platform you are using).

Now, let’s return for a moment to our handy handler from previous examples, and continue to add server-side markup:


<nomatch count="2">
<if cond=" application.lastresult$.inputmode='voice'">
<prompt>I'm still having trouble hearing you. Please try entering your selection using your touch tone key pad. </prompt>
<assign name="mode" expr="'dtmf'"/>
<submit next="<?php echo $_SERVER['PHP_SELF'] ?>" namelist="mode"/>
<else/>
<prompt>I didn't get that. Please try entering your selection again. </prompt>
<reprompt />
</if>
</nomatch>

This handler will now transition the caller to a new dialog – the destination of the submit element is the name of the currently executing script (i.e., it’s submitted to itself). When this submit takes place, our “mode” variable will be sent along for the ride with a value of ‘dtmf’ to be caught and processed by the logic we added above. (Note, in order to use the <assign> within this handler, we should declare the “mode” variable in our application root document, or somewhere else within our application, before getting to this point.)

This approach lets us rewrite the value of our “inputmodes” property without creating a parallel call flow for DMTF entry only. It has the additional flexibility of allowing us to set our “inputmodes” property to voice only if that’s what we wanted to do:


<assign name="mode" expr="voice'"/>
<submit next="<?php echo $_SERVER['PHP_SELF'] ?>" namelist="mode"/>

You’ll obviously want to get a bit more sophisticated with your server-side logic. However, this approach demonstrates that it is possible to dynamically change the value of a property setting in a VoiceXML application to dramatically improve a caller’s experience.