Detecting Caller Frustration

There are not a lot of really good tools for IVR developers to detect when callers are getting frustrated. Anger is a human emotion, and human emotions are complex. The tools generally used within VoiceXML applications for dealing with frustrated callers tend to be a bit ham-fisted.

For example, most developers utilize a graduated set of noinput/nomatch handlers for transferring callers who are having problems to an agent. Additionally, it is also possible to detect when a certain type of input on the part of the caller is causing problems (e.g., voice input) and to direct them utilize another (e.g., DTMF) — there is a more complete discussion of this approach here, and here.

We can not detect frustration by looking at the volume, pitch or prosody of spoken input within VoiceXML – at least not yet.

However, the new VoiceXML 2.1 specification provides another tool that we can use to try and detect when callers are becoming testy. The <mark> element is a new VoiceXML element that allows developers to determine when caller bargin occurs. This can be very handy for detecting when callers are becoming frustrated with repetitive prompts (like confirmation dialogs that ask the caller to confirm what they have said or entered).

By using the <mark> element judiciously, we can make reasonable assumptions about when callers are getting sick of confirming input, and act accordingly. A sample script with some <mark>’s in it can be found here.

This sample contains a customer satisfaction survey that asks the caller to rate (from 1 to 5) their agreement with several statements. After each turn, the caller is asked to confirm their answer – you can see how this could become a bit of a pain for a caller, especially if they are not happy to begin with.

The following bit of code does the neat stuff:

<if cond="confirmation$.markname=='conf_start' && confirmation$.marktime < 1500">
<assign name="confirm" expr="false"/>

At the beginning of our confirmation dialog, we insert a <mark> with a name of ‘conf_start’, and at the end we insert a <mark> with a name of ‘conf_end’. Platforms that support this element will expose some useful information — the name of the last <mark> executed, and the time since its execution before the caller barged in.

The conditional statement above checks to see what the name of the last executed <mark> is. If the name is ‘conf_start’ we know that the ‘conf_end’ mark was not reached (the caller barged in before the prompt was done). We also check the time since the <mark> was executed before bargein occurred — if it is a relatively short period (the caller barged in quickly) we can assume that they are growing frustrated. We therefore turn off our confirmation flag (setting the variable “confirm” to false) so that the caller is not asked to conform any more input.

There are lots of ways to get creative with the <mark> element — we could use it to end the survey all together, we could use it to trigger a shorter, more concise set of prompts, etc. Generally speaking, there is no silver bullet for detecting caller frustration, but with the growing number of tools available (including the <mark> tag), the job is getting easier.

Interpreting Spoken Input

Since the VoiceXML Forum Community Bulletin Board is increasingly besieged by spammers, going forward I’m going to cross post responses I submit there on this site so that interested parties (assuming there are any ;-)) can read them.

This response relates to the use of semantic interpretation in VoiceXML applications, something I have written on before. I hope readers find the exchange below helpful.


Is there a way to map a response to a certain value? For instance, if the user says “yes,” “sure,” or “yeah” I’d like to put 1 in the database? If the user says “no,” “nope,” or “nah” I’d put 0.


There are a couple of option open to you if all you are using is a simple yes/no grammar.

Option 1 = use the builtin “boolean” grammar type. By specifying a field type of “boolean”, an implicit grammar is created that should cover affirmative or negative responses for whataver language is being used. A boolean field returns a JavaScript string based on what the user says (e.g., yes=’true’ or no=’false’). You can convert this to a 1 or 0 using a simple if/else construct and a predefnied variable.

<var name="convert" expr="0"/>


<field name="F_1" type="boolean">
<prompt>Do you think VoiceXML rocks?</prompt>

<!-- If the user says yes, then the expression in the "cond" attribute will evaluate to true -->
<if cond="F_1">
<assign name="convert" expr="1"/>

<!-- If the preceding if statement did not execute, then expression in the cond attribute evaluated to false. User said no, so we keep our original value of 0 -->
<submit next="mypage.jsp" namelist="convert"/>


Option 2 = you can use the <tag> element with a custom yes/no grammar to return a 1 or a 0. (Check your platform vendor’s documentation on this element, as there is some variation.)

<!-- In your VoiceXML document, reference the yes/no grammar -->
<field name="F_1">
<grammar src="yesno.grxml"/>

<!-- Contents of yesno.grxml file -->
< ?xml version = "1.0"?>
<grammar xml:lang="en-US" version="1.0" root="R_1" type="application/srgs+xml" xmlns="">
<rule id="R_1">
<one -of>
<item>yes <tag>F_1=1;</tag> </item>
<item>yeah <tag>F_1=1;</tag> </item>
<item>hells yeah <tag>F_1=1;</tag> </item>
<item>yur damn skippy <tag>F_1=1;</tag> </item>
<item>no <tag>F_1=0;</tag> </item>
<item>nope <tag>F_1=0;</tag> </item>
<item>no way <tag>F_1=0';</tag> </item>
<item>hells no <tag>F_1=0;</tag> </item>


This has the effect of filling the field named “F_1” with the value specified in the <tag> when one of the grammar items is recognized. A few good links to get you started follow:

BeVocal Cafe

Voxeo Community