Make the Cloud Listen (and Understand)

Yesterday I wrote a post about the changing cloud telephony landscape, and highlighted some key factors that will dictate which cloud telephony providers are around for the long haul and deliver the next innovations.

One of those factors – support for speech recognition – is a good differentiator for developers to use when choosing a cloud telephony platform.

Speech recognition is becoming increasingly important in our everyday lives. Smartphones and powerful handheld devices enable multimodality, and there are more and more restrictions placed on our use of phones while doing other tings (like driving).

Plus, I can’t think of a more deflating concept than a cloud telephony provider that allows developers to build sophisticated apps and mashups in the language of their choice but that chains users of those apps to a telephone keypad. No fun.

To give an example of how powerful speech recognition can be, and how easy it is to use with a cloud telephony provider that supports it, I worked up a small demo to illustrate the point. The sample code for this demo is on Github, and we’ll dive into it in more detail below.

This demo uses two PHP libraries that are designed to work with the Tropo platform (one of the only cloud telephony providers to support speech recognition):

If you’ve read any of my previous posts on build applications for the Tropo platform, you’ll see lots of similarities between this and previous sample apps. Here I continue my use of the insanely awesome Limonade Framework for PHP.

Let’s take the example of a company directory that allows callers to dial a single number, select a person or department at the company and then be transferred to the person they select.

With cloud telephony, there is no need to have such a system live on a machine in the server room – it can be hosted externally in the cloud, making it easier to manage and to scale. In addition, with the Tropo Platform, it doesn’t have to be the same tired old DTMF-based menu telling callers to press an extension number or to “dial by name…”.

Using the PHP WebAPI Library and Limonade, we can construct a simple, yet power script that looks like this:

This script is pretty self-explanatory, but there are some key points I want to emphasize. First, note the $options array that holds the reference to an external grammar file (more on that in a bit). Tropo seems to need for this reference to be an absolute one and not a relative reference to the file (not hard to do with PHP – you just need to be aware of it).

Also, the file reference needs to include a trailing parameter indicating that this is an XML grammar (;type=application/grammar-xml). This seems to be true even if the grammar file is served with the correct MIME type by whatever is serving it.

Now lets have a look at this grammar file.

This simplistic example demonstrates how to use the PHPGrammar library. Note the simple array structure that is being used to hold the details of employees for our fictitious company. This could very easily be replaced with a dip into a data source of pretty much any kind, like an LDAP directory or database holding employee details.

Also note in this example that we want to do something referred to as Semantic Interpretation. Our grammar file is a set of rules that will be applied to what the caller says – Semantic Interpretation (SI) dictates the value that is given to our application from the grammar when a successful match occurs.

In this example, we want the caller to be able to say the name of the person they want to be transfered to. We make the first name optional so they may either say the last name of the person or (optionally) the full name. Obviously this may need to be changed based on the size of the directory to render in a grammar file (e.g., multiple employees with the same last name).

Do note that the Tropo platform seems to require the “Script” sytax for returning SI values on a successful match as opposed to the “String Literal” syntax. (More on these alternatives here.)

Works on Tropo (Script syntax):
<item>foo<tag>out="bar";</tag></item>

Does not work on Tropo (String Literal syntax):
<item>foo<tag>bar</tag></item>

So, when a caller says the name of a person in our company directory we want to return the number for that person to our Tropo script so we can transfer the call to them. This can clearly be seen when we examine the Result object that is delivered by the Tropo platform.

Tropo’s Result object includes the full grammar engine output, and lots of very detailed information about the recognition. As you can see, the utterance that the speech recognition engine heard was the name of one of our faux employees. The value that was returned is the number of that person.

We use this value in the transfer_call() method of our Tropo script.


// Create a new instance of the Result object.
$result = new Result();

// Get the value of the selection the caller made.
$phone = $result->getValue();

// Create a new instance of the Tropo object and transfer the call.
$tropo = new Tropo();
$tropo->transfer('+1'.$phone);

// Write out the JSON for Tropo to consume.
$tropo->RenderJson();

Using the PHP WebAPI library, it takes just 5 lines of code (excluding comments) to get the value of the grammar result and transfer the call. How cool is that?!

Obviously there are lots of things that can be done to enhance this script, to make it more robust, but it illustrates the essential concepts of speech recognition in the cloud.

What’s more, because of all of the great functionality provided by the Tropo cloud platform we can really push the envelope on the tired old company directory:

  • We could take an inbound call from a Skype user and transfer to a cell phone (or a SIP endpoint).
  • We could let our caller select a department in our company and then ring several different numbers at once, transferring the call to the first one answered (sort of a “hunt group in the cloud”).
  • We could use Tropo’s built in IM capabilities to send a screen pop to the person receiving the call.

The sky is the limit. Which I guess is the point of cloud telephony…

Advertisements

Building Cloud Communication Apps with Tropo: Part 2

This post is a continuation of the series on building cloud communication applications with Tropo and the PHP WebAPI Library.

In this post, we’ll be looking at Tropo’s support for multi-channel applications and using the incredibly flexible and powerful Limonade library for PHP (think Sinatra for PHP).

Working with the Session Object

As I explained very briefly in the previous post on this subject, the Tropo WebAPI is an HTTP/JSON API for building multi-channel communication apps.

What this means essentially is that the Tropo platform does all of the hard stuff involved with executing a communication app – DTMF/speech recognition, rendering Text-To-Speech (TTS), maintaining and managing all of the connections to the different communication networks (PSTN, SMS, IM networks, Twitter). You tell Tropo how to govern the interaction between a caller and your application on a specific channel by sending it a set of instructions in JSON format.

In this series of posts, we’re using the PHP WebAPI Library for Tropo to generate the JSON that gets sent to, and consumed by Tropo. But this exchange of JSON isn’t one-way – Tropo also sends JSON packages to your application with important information about (among other things) the network a user selects to interact with your application on and any input they have provided in response to prompts.

At the beginning of a user session (when a user first connects to your application), Tropo will deliver a JSON Session object to your application. This object contains all sorts of useful information that your app can use when rendering out JSON instructions to send back to Tropo. Let’s examine what a real life Session object looks like.

The easiest way to do this is to simply go over to PostBin.org and make a new PostBin. PostBin is a service that lets you see HTTP posts that get sent to the special URL that is generated when you create a new PostBin.

After you have created a new PostBin, log into your Tropo account and create a new WebAPI application. Use the PostBin URL as the URL that powers your new Tropo WebAPI app. After your app is created, you will have a newly provisioned Skype number that you can use to call it.

When you call your application using the Skype number provisioned by Tropo, you won’t hear anything – remember, we haven’t yet generated any JSON to tell the Tropo platform what to say or do when a user connects. After you make your call (it will be over quickly), go back to your PostBin URL (you may need to refresh) and you will see an object in JSON format, like this:

This is the Session object for the call you just made. It’s what is sent to your application (via HTTP POST) each time a new session is started on Tropo. Working with this object using the PHP WebAPI Library is easy. You just create a new instance of the Session object in PHP and you can start accessing the properties of this object:


$session = new Session();
$from_info = $session->getFrom();
echo $from_info['channel'];

// Using the example Session object JSON from above would render VOICE.

Being able to access the channel and network a user is accessing your application from can be useful when you want to tailor prompts or actions to a specific channel – e.g., a phone call vs. an IM session.

Also make note of the initialText property – this will be important when building SMS and IM applications, where a user will begin an interaction with your application by sending information to it. This property will allow you to process the initial input for those channels without having to ask the user for it again (something users generally dislike).

Next, let’s take a look a the Result object that is sent from Tropo to your application when a user provides input in response to a prompt or direction. In order to do this, we need to take a sip of Limonade.

Mmmm… Limonade!

Limonade is a lightweight PHP framework that is very much like the Sinatra framework for Ruby. I won’t go into too much detail on it, as there is ample documentation available on the Limonade site , but here is quick introduction that will let us build enough of a structure to see the Tropo result object.

When you use Limonade, you set up routes for HTTP requests. A route is comprised of an HTTP method, a URL matching pattern and a PHP method. When an HTTP request is made to a URL that matches the pattern, and uses the method specified in the route, the designated PHP function gets invoked. For example:

dispatch_post('/', 'test');
  function test() {
    echo 'This is a test.';
}

The ‘dispatch_post()’ directive specifies that the HTTP method for this route with be POST (which is what is used by Tropo to send JSON to your application). The two parameters to this directive specify the URL pattern to match (in this case, the root directory on the domain were this script is located) and the PHP method to invoke, which is defined below this directive. In a nutshell, whenever an HTTP POST is made to the root domain where this script is located, the text This is a test will be rendered.

Let’s build out a simple shell that we’ll use to construct our Tropo application for the next few posts in this series:

// Include Tropo classes.
require('TropoClasses.php');

// Include Limonade framework (http://www.limonade-php.net/).
require('path/to/limonade/lib/limonade.php');

dispatch_post('/start', 'zip_start');
function zip_start() {
	// Tell the user to enter their zip code.
}

dispatch_post('/end', 'zip_end');
function zip_end() {
	// Do something with the entered zip code.
}

dispatch_post('/error', 'zip_error');
function zip_error() {
	// Tell the user an error has occurred.
}

// Run this sucker!
run();

Our Tropo application will collect a user’s zip code and then look up some information based on the input they provide. As you can see, we’ve included the PHP WebAPI Library and the Limonade Framework. We’ve also set up three Limonade routes start, end and error (all using the HTTP POST method) and stubbed out the PHP function that will render JSON for Tropo to consume.

To get a look at the Tropo Result object, lets add some logic to the zip_start() function:


dispatch_post('/start', 'zip_start');
function zip_start() {

	// Step 1. Create a new instance of the Session object, and get the channel information.
	$session = new Session();
	$from_info = $session->getFrom();
	$network = $from_info['channel'];

       // Step 2. Create a new instance of the Tropo object.
	$tropo = new Tropo();

	// Step 3. Welcome prompt.
	$tropo->say("Welcome to the Tropo PHP zip code example for $network");

	// Step 4. Set up options for zip code input.
	$options = array("attempts" => 3, "bargein" => true, "choices" => "[5 DIGITS]", "name" => "zip", "timeout" => 5);

	// Step 5. Ask the caller for input, pass in options.
	$tropo->ask("Please enter your 5 digit zip code.", $options);

	// Step 6. Tell Tropo what to do when the user has entered input. Enter your PostBin URL in the "next" array element.
	$tropo->on(array("event" => "continue", "next" => "http://www.PostBin.org/xxxxxxx", "say" => "Please hold."));

	// Step 7. Render the JSON for the Tropo WebAPI to consume.
	return $tropo->RenderJson();

}

As you can see, inside this function we create a new instance of the Session object and get the channel the user is accessing our application from. We also create a new instance of the Tropo object (this is what we’ll use to send JSON instructions back to the Tropo platform).

The next several steps are fairly self explanatory, but take special note of Step 6. Here we are telling the Tropo platform that when a ‘continue’ event is raised (when a user finishes entering input) tell them to ‘Please hold’ and then POST the results of their input to a PostBin URL. (Note – replace the value above with the PostBin URL you used at the beginning of this tutorial.)

Working with the Result Object

Save your script and change the URL for your WebAPI application in the Tropo Applications manager to point to it. You can now test your script using the the Skype number for your app as we did before . When you access your script, you’ll get the instructions to enter a zip code, after which Tropo will POST the results to the PostBin URL you inserted into the script in Step 6 above.

Now, when you look at your PostBin URL, you’ll see something like this:

As you can see, the Result object that gets sent from Tropo to your app has a wealth of information on what the user entered, how it was interpreted by Tropo and even the confidence level of the recognition (if speech recognition is used).

You can access the Result object using the PHP WebAPI Library just like you can the Session object:

$result = new Result();
$zip = $result->getValue();
echo $zip

// Using the example Result object JSON from above would render 12345

You would use the Result object in the zip_end() function we stubbed out above. You use the value of the zip code entered to look up information relevant for that zip code (like a weather forecast) and present it to the caller.

In the next post in this series, we’ll complete our simple zip code example by adding a weather forecast lookup and present it to the user. We’ll also tweak our script to optimize it for different channels that a user might employ to access it, to ensure the experience is optimized for phone, IM and SMS.

Stay tuned…