Web 2.0 Talk

I made a presentation at the recent State of Delaware IT Conference on new advances in web application development. My talk focused specifically on Rich Interface Applications (RIAs – i.e., AJAX), web services and VoiceXML 2.1.

My main theme in this talk was that despite a good deal of hype around some of these new technologies, they will ultimately have an impact on the way that applications are developed and maintained. Governments need to think about how these changes will affect them in the future, and also look for opportunities to exploit these new approaches to their benefit today.

A copy of my presentation can be obtained here (PPT format). Any feedback or reactions are welcome.


IP Telephony & Voice 2.0?

Meg Whitman, Chair and CEO of eBay, made an interesting prediction about the futrure of Internet telephony recently. In an interview with financial analysts discussing eBay’s purchase of Skype, Whitman said:

“In the end, the price that anyone can provide for voice transmission on the ‘Net will trend toward zero…”

This raises questions about what is next for the world of telephony and voice applications. There is an interesting post on the Simply Relevant blog describing the concept of “Voice 2.0.” The concept and term (which appears to be derived from the notion of Web 2.0) are, in the words of the author:

“…the marriage of IP Telephony to the Web.”

There is a good discussion in this post about the confluence of certain factors that will drive the development of Voice 2.0 – things like meterless calling (via Skype or a similar service), development tools for building web applications like PHPVoice (a project to which I am a contributor) and presence.

While I agree with the author, I’d add another recent technology to the mix – podcasting. One of the key differences between the visual web and the voice web (or Voice 2.0 if you want to think of it that way) is permanence. A voice conversation or interaction happens in real time while the visual web has more permanence – value can be “stored up” as text or images and reread, reviewed and referenced for an indefinite period of time.

Podcasting is a recent trend that seeks to do something similar with voice – to store the value of a conversation in the form of an MP3 file that can be referenced at a later time. Its still pretty new, but so is this whole concept of Voice 2.0 (or Web 2.0 for that matter).

When Voice Applications Fail

It’s always frustrating to hear people complain about poorly designed telephone applications, particularly when these complaints relate to government-sponsored applications. (Encouraging good design principles for developing voice applications is kind of the point of this here blog – ‘ya dig?)

You can imagine how disheartened I was to read a story cataloging a litany of complaints about one such system in my hometown newspaper. This article focuses on some extremely negative experiences by callers to the Medicare help line (1-800-MEDICARE). In an analysis of the Medicare help line, the General Accounting Office concluded late last year that less than two thirds of callers to this “help” line received accurate answers to their questions.

To be perfectly fair, there are few things in this world – with the possible exception of the U.S. Tax Code – as complex as the Medicare program. Not a surprise considering it’s the largest health insurance program in the nation, covering some 40 million Americans (and counting!). When eligibility and benefit rules are complex, the job of providing clear concise answers to recipient questions is difficult, so some degree of frustration is probably inevitable.

It’s also more likely that a phone system set up to answer benefit questions will have to utilize a mixture of automated dialogs and customer service representatives. Given the nature of the program, developing a fully automated system would seem to be highly impractical.

However, when I read about some of the negative caller experiences, I can’t do anything but shake my head:

After two minutes of following instructions from an automated female voice, Bingham arrived at a list of choices.

“If you’re calling on the Medicare discount drug cards say, ‘Drug card,’ ” the voice said. “For Medicare health plans, say ‘Plan choices.’ ”

“Plan choices,” Bingham said.

“I’m sorry I didn’t understand you,” the voice said.

“Plan choices!” Bingham said again.

“I’m sorry, I still didn’t understand you.”

Bingham brought the telephone two inches from her lips.


“Please hold a moment while I transfer you to a customer service representative who can help you.”

This is just a fundamental lack of proper menu construction and grammar tuning – the fact that these things do not appear to have been done for an application as heavily used as this one is almost criminal. At a minimum, the menu should accept both spoken and DTMF input so that a caller can use their key pad to enter a numeric choice if the application is having trouble recognizing what they are saying.


<noinput>I'm sorry I didn't understand you.</noinput>
<nomatch>I'm sorry I didn't understand you.</nomatch>
<prompt>Welcome to the Medicare help line. If you're calling on the Medicare discount drug cards say, Drug card. For Medicare health plans, say Plan Choices.
<choice next="#drugs">drug cards</choice>
<choice next="#choice">plan choices</choice>

This type of menu structure has several flaws. First, the prompts are too dissimilar to the menu choices – if the desired input for the second item is “plan choices,” why doesn’t the prompt direct the user to this input? “For Medicare health plans, say Plan Choices.” should be “For Medicare health plans, say Health Plans.” This increases the likelihood that the caller will provide the right input.

Second, the menu does not allow for approximate input – if a caller says simply “cards,” or “discount cards” the application will not recognize the input. Under the VoiceXML 2.0 specification, the default setting for menu input is “exact” — in other words, the VoiceXML interpreter will look for an exact match on the menu items, and will not recognize input that includes some (but not all) of the words in the menu item.

Third, the menu does not allow for DTMF entry, which would allow a caller to fall back to their key pad for entry if the application was having trouble recognizing their input. Properly constructed voice applications will check for the type of input being provided, and direct callers to modify it accordingly (demonstrated below).

Finally, the menu does not use tapered prompts to assist the caller in refining their input, or selecting the appropriate input type.


<!-- Accept attribute is set to allow approximate input -->
<menu accept="approximate">

<!-- Tapered prompts when no input is detected -->
<noinput count="1">
<prompt>I'm sorry. I didn't hear what you said. </prompt>

<noinput count="2">
<prompt>I'm having trouble hearing you. If you are using a speaker phone, you may want to pick up your telephone handset or make sure your phone is not on mute</prompt>

<noinput count="3">
<prompt>I'm sorry I'm still having trouble hearing you. Please wait while I transfer you to a customer service representative.</prompt>
<goto next="#transfer" />

<!-- Tapered prompts when application does not recognize input -->

<nomatch count="1">
<prompt>I'm sorry I didn't understand what you said. </prompt>

<nomatch count="2">
<if cond=" application.lastresult$.inputmode='voice'">
<prompt>I'm still having trouble hearing you. Please try entering your selection using your touch tone key pad. </prompt>
<reprompt />
<prompt>I didn't get that. Please try entering your selection again. </prompt>
<reprompt />

<nomatch count="3">
<prompt>I'm sorry. I'm still having trouble understanding your selection. Please wait while I transfer you to a customer service representative.</prompt>
<goto next="#transfer" />

<prompt>Welcome to the Medicare help line.
For the Medicare <value expr="_prompt"/> service, say <value expr="_prompt"/>, or press <value expr="_dtmf"/>.

<!-- menu choices allow for alternate DTMF input -->
<choice next="#drugs" dtmf="1">discount drug cards</choice>
<choice next="#choice" dtmf ="2">health plans</choice>

Additionally, a good system will verify user input if the confidence level on recognition is lower than a pre-specified threshold. There is a good overview of this technique in an earlier posting on this site.

This is pretty basic stuff – even if the application developers didn’t know enough to take this approach when they built the system, it is most certainly something that should have been uncovered during testing. This is the kind of second rate development that gives voice applications a bad rep.

New Tutorial Coming Soon

Several weeks back, I wrote a tutorial covering the basics of converting visual XHTML web pages into superhappyfantastic XHTML+Voice content using PHP.

Specifically, I made use of a PHP class library called MiniXML that works with PHP 4.x. Since the new version of PHP (Version 5) has now been out for a bit and is gaining support, I think it makes sense to leverage some of the new features of this latest release to do the same thing. I’m hoping this will be easier to do – in my original tutorial, I had to modify (read “hack”) the underlying classes to get it to work the way I wanted.

To this end, I’m working on a new tutorial covering this same technique but using the new DOM Functions of PHP 5. Should be ready soon – stay tuned…