The Human Factor

The true value of a human being can be found in the degree to which he has attained liberation from the self.
Albert Einstein

When you create voice applications for a living, it can be hard to watch things like the new Citibank commercials that extol the virtues of being able to dump an automated phone system and talk to a human. (Although, I will admit that I laugh every time the actor in the commercial has to say his password louder a second time … “Big Boy!”)

There are entire web sites devoted to providing information on how to circumvent automated phone systems. As a VoiceXML developer, I’ve been asking myself some tough questions about what sort of reflection these developments are on the current state of voice technologies.

Not that aggravation with automated phone systems is a new phenomenon, it’s just that we’re currently sitting in a pretty happy time for the technologies used to build telephone applications. The VoiceXML 2.0 standard has been widely adopted, along with a host of related technologies that make creating voice applications easier and more cost effective; platform vendors and developers are embracing the new standards with gusto, and; improvements to the current standards, which will dramatically enhance their power and flexibility, are already in the works.

So why aren’t more people happy with the current state of voice applications? Why aren’t consumers taking corporate web applications to task in the same way they do voice applications? Why is it still possible for those Madison Avenue weenies to elicit such a visceral reaction from the public when they take jabs a telephone applications?

After I stopped feeling picked on for a few minutes (and secretly laughing at the Citibank commercials) I came up with at least two reasons that explain this apparent paradox:

  • With voice, it’s personal. Voice applications are inherently more personal than other types of interactive applications, web-based or otherwise. Because the act of talking is such a fundamental way of communicating and emoting, people will always react differently to voice applications. As such, they will always hold voice applications to a different (and higher) standard. I don’t think there is a way around this, but I do think that there is a silver lining in this precept for voice developers.
  • VoiceXML makes it easer, not (necessarily) better. There is an excellent discussion in the latest issue of VoiceXML Review that talks about the reasons the technology was developed. This helps underscore the simple fact that it is very possible to build a lousy IVR system using a great technology like VoiceXML. VoiceXML changes the economics and the complexity of building voice applications – it doesn’t make voice applications bulletproof to second rate performance or design issues.

It is incumbent upon voice application developers and designers to understand the unique nature of voice as an interactive medium, and to appreciate the limitations that even the most powerful new voice technologies come along with. Simply put, we have to use the new generation of voice technologies to build the intuitive, agile and elegant voice applications users expect. I think that most would admit that there is a lot of work that needs to be done to change the stigma that hangs over voice applications.

Until then, enjoy your laughs while you can Big Boy.

E4X in VoiceXML

Voice application developers familiar with the proposed VoiceXML 2.1 specification are aware of new functionality that allows external XML documents to be accessed from within VoiceXML scripts. The new <data> element will allow developers to fetch XML data from a web server without transitioning to a new VoiceXML document (a handy trick indeed). The XML data that is fetched is exposed as a read-only subset of the W3C Document Object Model (DOM) and can be manipulated using ECMAScript (JavaScript).

While I’m very excited about this new functionality, using the DOM is not really my favorite thing in the world to do – I’ve always found it a bit clunky and hard to get really comfortable with. Using the DOM is not the only way to manipulate XML from within JavaScript. There is another ECMA standard that allows developers to do this – ECMAScript for XML (or E4X).

E4X is an extension to JavaScript that adds native XML support to the language. With E4X, in addition to native object types in JavaScript like the Number type, the String type, the Boolean type, there is the XML type for representing XML elements, attributes, comments, processing-instructions or text nodes and there is the XMLList type for representing list of XML objects.

E4X is an official ECMA standard, but right now support for it is kind of sparse. There are a couple of JavaScript engines (not surprisingly, both from Mozilla) that support it, but it isn’t supported yet in any of the standard browser releases. If you want to use E4X, you have to download one of the nightly builds from the Mozilla download site. Hopefully this will change soon, and E4X support will become standard in JavaScript engines.

As the W3C and the VoiceXML community move toward final adoption of the VoiceXML 2.1 standard, it may be worth considering weather the DOM is the only (or best) way that voice developers should be able to access and manipulate XML data from within VoiceXML scripts. Choice is a good thing — hint, hint. 😉