Make the Cloud Listen (and Understand)

Yesterday I wrote a post about the changing cloud telephony landscape, and highlighted some key factors that will dictate which cloud telephony providers are around for the long haul and deliver the next innovations.

One of those factors – support for speech recognition – is a good differentiator for developers to use when choosing a cloud telephony platform.

Speech recognition is becoming increasingly important in our everyday lives. Smartphones and powerful handheld devices enable multimodality, and there are more and more restrictions placed on our use of phones while doing other tings (like driving).

Plus, I can’t think of a more deflating concept than a cloud telephony provider that allows developers to build sophisticated apps and mashups in the language of their choice but that chains users of those apps to a telephone keypad. No fun.

To give an example of how powerful speech recognition can be, and how easy it is to use with a cloud telephony provider that supports it, I worked up a small demo to illustrate the point. The sample code for this demo is on Github, and we’ll dive into it in more detail below.

This demo uses two PHP libraries that are designed to work with the Tropo platform (one of the only cloud telephony providers to support speech recognition):

If you’ve read any of my previous posts on build applications for the Tropo platform, you’ll see lots of similarities between this and previous sample apps. Here I continue my use of the insanely awesome Limonade Framework for PHP.

Let’s take the example of a company directory that allows callers to dial a single number, select a person or department at the company and then be transferred to the person they select.

With cloud telephony, there is no need to have such a system live on a machine in the server room – it can be hosted externally in the cloud, making it easier to manage and to scale. In addition, with the Tropo Platform, it doesn’t have to be the same tired old DTMF-based menu telling callers to press an extension number or to “dial by name…”.

Using the PHP WebAPI Library and Limonade, we can construct a simple, yet power script that looks like this:

This script is pretty self-explanatory, but there are some key points I want to emphasize. First, note the $options array that holds the reference to an external grammar file (more on that in a bit). Tropo seems to need for this reference to be an absolute one and not a relative reference to the file (not hard to do with PHP – you just need to be aware of it).

Also, the file reference needs to include a trailing parameter indicating that this is an XML grammar (;type=application/grammar-xml). This seems to be true even if the grammar file is served with the correct MIME type by whatever is serving it.

Now lets have a look at this grammar file.

This simplistic example demonstrates how to use the PHPGrammar library. Note the simple array structure that is being used to hold the details of employees for our fictitious company. This could very easily be replaced with a dip into a data source of pretty much any kind, like an LDAP directory or database holding employee details.

Also note in this example that we want to do something referred to as Semantic Interpretation. Our grammar file is a set of rules that will be applied to what the caller says – Semantic Interpretation (SI) dictates the value that is given to our application from the grammar when a successful match occurs.

In this example, we want the caller to be able to say the name of the person they want to be transfered to. We make the first name optional so they may either say the last name of the person or (optionally) the full name. Obviously this may need to be changed based on the size of the directory to render in a grammar file (e.g., multiple employees with the same last name).

Do note that the Tropo platform seems to require the “Script” sytax for returning SI values on a successful match as opposed to the “String Literal” syntax. (More on these alternatives here.)

Works on Tropo (Script syntax):
<item>foo<tag>out="bar";</tag></item>

Does not work on Tropo (String Literal syntax):
<item>foo<tag>bar</tag></item>

So, when a caller says the name of a person in our company directory we want to return the number for that person to our Tropo script so we can transfer the call to them. This can clearly be seen when we examine the Result object that is delivered by the Tropo platform.

Tropo’s Result object includes the full grammar engine output, and lots of very detailed information about the recognition. As you can see, the utterance that the speech recognition engine heard was the name of one of our faux employees. The value that was returned is the number of that person.

We use this value in the transfer_call() method of our Tropo script.


// Create a new instance of the Result object.
$result = new Result();

// Get the value of the selection the caller made.
$phone = $result->getValue();

// Create a new instance of the Tropo object and transfer the call.
$tropo = new Tropo();
$tropo->transfer('+1'.$phone);

// Write out the JSON for Tropo to consume.
$tropo->RenderJson();

Using the PHP WebAPI library, it takes just 5 lines of code (excluding comments) to get the value of the grammar result and transfer the call. How cool is that?!

Obviously there are lots of things that can be done to enhance this script, to make it more robust, but it illustrates the essential concepts of speech recognition in the cloud.

What’s more, because of all of the great functionality provided by the Tropo cloud platform we can really push the envelope on the tired old company directory:

  • We could take an inbound call from a Skype user and transfer to a cell phone (or a SIP endpoint).
  • We could let our caller select a department in our company and then ring several different numbers at once, transferring the call to the first one answered (sort of a “hunt group in the cloud”).
  • We could use Tropo’s built in IM capabilities to send a screen pop to the person receiving the call.

The sky is the limit. Which I guess is the point of cloud telephony…

Advertisements

Building Cloud Communication Apps with Tropo: Part 3

This post is a continuation of the series on building cloud communication applications with Tropo, the PHP WebAPI Library and the Limonade framework for PHP .

If you’re just starting, you can take a look back at part 1 and part 2 to get caught up.

In this post, we’ll continue our work from the last post and complete a simple, yet powerful multi-channel application that can be accessed via telephone, SMS or IM client.

In the previous post, we looked closely at the Session and Result objects – these are JSON objects that are sent to your application by the Tropo platform that contain information about how a user is accessing your app (i.e., through which channel) and any input they have provided in response to prompts. If you worked through the last post, you have a partially complete script that looks like this:

You should save this script to a server that can be accessed by the Tropo platform – any web hosting platform that supports PHP >= 5.2.0 will do. Let’s call our script get_zip_code.php.

When you set up the start URL for this script in the Tropo Application Manager, you’ll want to structure it like so:

http://name_of_my_host.com/path/to/get_zip_code.php?uri=start

As you can see, we’ve added a querystring parameter called uri. This will ensure that the initial HTTP POST to this script by the Tropo platform matches our /start pattern and executes our zip_start() method, which is where we want users to begin. Make sure you review the Limonade documentation on setting up routes, as there are multiple options for configuring route pattern matching.

Next, we’ll want to start modifying our partially constructed script. First go to step 6 in the zip_start() method, where we had set up a PostBin URL for Tropo to send a user’s input to so we could examine the Result object. Now that we know what the Result object looks like, we want to start using it to look up information and present it to the caller.

You’ll want to set up a URL to the get_zip_code.php script that will match the route for the zip_end() method. This is where we will access the Tropo Result object and process it. Change the URL in the “next” array element to look like this:

$tropo->on(array(“event” => “continue”, “next” => “get_zip_code.php?uri=end“, “say” => “Please hold.”));

This change tells Tropo that when the “continue” event is raised (after the caller has completed entering input) POST the Result object back to the get_zip_code.php script using a relative URL and a querystring parameter that will ensure matching of our /end pattern.

Next, we need to build out the zip_end() method to process the results:

dispatch_post('/end', 'zip_end');
function zip_end() {

        // Step 1. Create a new instance of the result object
	$result = new Result();
	$zip = $result->getValue(); // get the value of the user input.

        // Step 2. Get weather information for the zip code the caller entered.
	$weather_info = getWeather($zip);
	$city = array_pop($weather_info);

        // Step 3. Create a new instance of the Tropo object.
	$tropo = new Tropo();

        // Step 4. Begin telling the user the weather for the city their zip code is in.
	$tropo->say("The current weather for $city is...");

        // Step 5. Iterate over an array of weather information.
	foreach ($weather_info as $info) {
	    $tropo->say("$info.");
	}

        // Step 6. Say thank you (never hurts to be polite) and end the session.
	$tropo->say("Thank you for using Tropo!");
        $tropo->hangup();

        // Step 7. Render the JSON for the Tropo WebAPI to consume.
       return $tropo->RenderJson();

}

As you can see, our zip_end() method looks similar to our zip_start() method – both use a Tropo object to format information that will be presented to the user, and both call the RenderJson() method of the Tropo object at the end.

You may be wondering about the getWeather() method that is called in step 2. Let’s build that out now and examine how it works – to keep things simple, we’ll make use of the Google Weather API, which provides weather information by zip code and returns the information in XML format.

// The URL to the Google weather service. Renders as XML doc.
define("GOOGLE_WEATHER_URL", "http://www.google.com/ig/api?weather=%zip%&hl=en");

// A helper method to get weather details by zip code.
function getWeather($zip) {

	$url = str_replace("%zip", $zip, GOOGLE_WEATHER_URL);
	$weatherXML = simplexml_load_file($url);
	$city = $weatherXML->weather->forecast_information->city["data"];
	$current_conditions = $weatherXML->weather->current_conditions;
	$current_weather = array(
		"condition" => $current_conditions->condition["data"],
		"temperature" => $current_conditions->temp_f["data"]." degrees",
		"wind" => formatDirection($current_conditions->wind_condition["data"]),
		"city" => $city
	);
	return $current_weather;

}

// A helper method to format directional abbreviations.
function formatDirection($wind) {
	$abbreviated = array(" N ", " S ", " E ", " W ", " NE ", " SE ", " SW ", " NW ");
	$full_name = array(" North ", " South ", " East ", " West ", " North East ", " South East ", " South West ", " North West ");
	return str_replace($abbreviated, $full_name, str_replace("mph", "miles per hour", $wind));
}

The mechanics of these functions are pretty straighforward, so I won’t go in to too much detail – you can now see the connection between the call to the getWeather() method mentioned above and the array of weather data that it returns.

The last thing we need to do in order to complete our zip code weather demo script is to finish the zip_error() method. This is a method we’ll use to tell a user an error occurred (never hurts to be prepared for the unexpected):

dispatch_post('/error', 'zip_error');
function zip_error() {

	// Step 1. Create a new instance of the Tropo object.
	$tropo = new Tropo();

	// Step 2. This is the last thing the user will be told before the session ends.
	$tropo->say("Please try your request again later.");

	// Step 3. End the session.
	$tropo->hangup();

	// Step 4. Render the JSON for the Tropo WebAPI to consume.
	return $tropo->renderJSON();
}

In order for this method to be invoked, we need to make sure that we set up the proper handler in our zip_start() method for it. The Tropo WebAPI makes it possible to set up callback methods that handle things when certain events are raised. This is done by using the On object.

Setting up an event handler using the On object with the PHP WebAPI Library is easy. In fact, we’ve already done it once – look at the zip_start() method and you’ll see a hander for the “continue” event (which is raised when a user has finished entering the proper input). We want to set up something similar for when an error event is raised. Let’s add a handler in our zip_start() method for an error event:

	// Step 6. Tell Tropo what to do when the user has entered input, or if there is an error.
	$tropo->on(array("event" => "continue", "next" => "get_zip_code.php?uri=end", "say" => "Please hold."));
	$tropo->on(array("event" => "error", "next" => "get_zip_code.php?uri=error", "say" => "An error has occured."));

Our script is now complete and ready to test.

Make sure you log into your Tropo account and set up the start URL to your script as discussed above. You can test this script with the phone numbers that are automatically provisioned by Tropo when you set up your account.

Tropo will automatically provision a Skype number, a SIP number and an iNum. You can additionally add a PSTN number in a range of different area codes at no charge. This PSTN number can also be used to send an SMS to, so you can interact with this script via text message. Additionally, you can add an IM account, so you can test this script using your favorite IM client/network.

You may notice, if you test this script using SMS or IM that there are things that don’t yet work perfectly. In the next post, we will make some very simple changes to this script to optimize it for use with SMS and IM (and even Twitter!).

This will transform our simple PHP script into a powerful unified communications application.

Stay tuned…

Building Cloud Communication Apps with Tropo: Part 1

A few months back, I wrote a series of posts on building NoSQL telephony applications with Tropo and CouchDB. Today I’m going to start a continuation of that series, focusing on how to build cutting edge cloud communications apps with the Tropo WebAPI.

What is the Tropo WebAPI?

The Tropo WebAPI is, in a nutshell, an HTTP/JSON API for building multi-channel communication applications – applications that you interact with via phone, IM, SMS or Twitter. While my earlier series on Tropo focused on building applications in Tropo’s scripting environment (another fine option for developers), this series will focus on building JSON-based applications (generated using PHP) that can be hosted anywhere and executed in the Tropo cloud environment.

Faithful readers will recognize some similarities here to a post I did a while back on the HTTP/JSON API provided by CloudVox, another cloud telephony provider. While the concept behind these two API’s is very similar, there are some key differences that make Tropo a highly attractive option for developers.

First, the Tropo service is truly multi-channel – using the Tropo WebAPI you can build applications that work on a range of different communication channels, not just phones (although you can build some pretty slamming phone apps as well).

Since I’m a phone app developer at heart, some of the features that Tropo provides for phone applications really get me excited. Tropo supports both DTMF entry and speech recognition. It also has broad multilingual support. In addition, Tropo gives phone application developers the ability to manipulate SIP headers, an important feature in building sophisticated cloud communication apps that I hope to demonstrate down the road a bit.

Getting Started

Head on over to Tropo.com and set up a new account (if you don’t have one already). Take a little time to review the documentation for the Tropo WebAPI. For the example applications in this series of blog posts I’ll be using a PHP class library I developed specifically to interact with the Tropo WebAPI.

The crew behind Tropo have provided a Ruby Gem for interacting with the Tropo WebAPI. However, since I like to do my cloud telephony work with PHP I decided to write my own set of classes for doing this. Whether you’re a Ruby-head or a PHP enthusiast, using one of these tools to generate JSON for consumption by the Tropo WebAPI can make build an application significantly easier, particularly as you get into more sophisticated application development.

You can get the PHP Library, as well as some of the sample apps we’ll be looking at, from GitHub:

$ git clone git://github.com/tropo/tropo-webapi-php.git

You’ll need to host these classes and the PHP scripts you write with them on a server that can be accessed from the Tropo environment. Any web server that supports PHP will do.

My First Tropo WebAPI Application

Let’s start with the standard Hello World app:


Say("Hello World!");

// Render the JSON for the Tropo WebAPI to consume.
$tropo->RenderJson();

?>

You can look at the rendered JSON in your browser, and you should see something like this:


{
    "tropo": [
        {
            "say": [
                {
                    "value": "Hello World!"
                }
            ]
        }
    ]
}

Go to the Applications section in your Tropo account and set up a new WebAPI application that points to the location of this script.

Create a new Tropo WebAPI application

Assign a URL to your new Tropo WebAPI application

When you create your application, Tropo will automatically provision a Skype number, a SIP number and an iNum. You can additionally add a PSTN number in a range of different area codes at no charge.

You may also notice the section below the provisioned phone numbers entitled “Instant Messaging Networks” – this section allows you to set up any number of different IM accounts (and Twitter!) that your application can use. We’ll dive deeper into this in future posts.

For now, we’ll keep it simple and use the auto provisioned Skype number – when you call this number, you will hear it say “Hello World.”

The next post in this series will focus on a more sophisticated application that uses the TropoPHP classes and the utterly awesome Limonade PHP framework.

Stay tuned…

User Agent Sniffing for Multi-Channel Apps

Several months ago, voice platform company Voxeo announced an exciting addition to its industry leading VoiceXML platform – the ability to repurpose existing VoiceXML applications and turn them into instant messaging (IM) applications and text messaging (SMS) applications.

And while I’ve had several opportunities to expound on the importance of this announcement, I have not yet had time to use a practical example to demonstrate just how powerful the new functionality Voxeo is now offering can be.

How big a deal is this really?

To be honest, when I first read about Voxeo’s new offering, I didn’t think it was a big deal. I thought it was a freaking huge deal – I’m talking Godzilla, baby. This is big!

Lots of people are wary of proprietary extensions to open standards like VoiceXML because they can make your code less portable (if you find you need to transition from one vendor to another). But unlike other platform-specific extensions to VoiceXML, Voxeo’s new feature doesn’t make you write VoiceXML in a way that violates the standard, it changes the way that the platform consumes VoiceXML.

The example that I will demonstrate below is written in 100% standard-compliant VoiceXML. It can be run on any VoiceXML platform that adheres to the VoiceXML spec, and probably a bunch that don’t.

That’s why this announcement from Voxeo is a game changer for developers – it provides sophisticated and powerful new functionality without the burden of having to alter your code in a way that makes it less portable. In addition, it allows developer to create sophisticated applications that can be delivered to users via IM and SMS using all the familiar tools of VoiceXML.

All of the benefits of working with VoiceXML are available for developers that want to write SMS and IM apps — grammars, the FIA, the standard root-leaf structure of VoiceXML applications — its all there.

The nose knows

While it might be tempting to think that any existing VoiceXML application can be repurposed with no changes for IM and/or SMS, this is probably not true except for the most simplistic of apps. Just as there are certain types of interactions that may make sense for use with visual web applications and not with voice apps, there are certain types of dialog flows that work well in voice apps but don’t necessarily translate well to the world of IM and SMS.

Consider the following:

  • Phone applications often employ a confirmation dialog, to prompt a caller to verify what they have just entered – particularly if the information is important or sensitive (e.g., a credit card number, or account number). This probably not a good practice with SMS or IM apps because, unlike with a phone application, a user can see what information they are about to submit before the actually send it. Also, since multiple text messages can mean increased changes for a user its a good idea to cut down on this where possible.
  • DTMF-based phone applications often use prompts like “Press 1 to repeat, press 2 to go to the previous menu…” The notion of “pressing” something is really tied to the telephone key pad and is probably not appropriate for SMS or IM applications. Its much clearer to use prompts like “Enter 1 to go to the next option”, or “Send #back to go to the previous step”.
  • Re-prompting a user on noinput is a pretty standard practice in phone applications, but is probably not a good idea in IM or SMS apps. If a person is using their cell phone to interact with an SMS application, and then gets a call, they probably do not want to be repeatedly prompted for input by your app that they will not have an opportunity to enter until after their call has finished.

So if all of this is true, then how does a developer determine how an application is being accessed? It seems pretty clear that there needs to be a way to determine if an application is being called from a phone, from an SMS cell phone or from an IM client.

Turns out, there is – good old fashioned browser sniffing.

A simple example

The Voxeo blog has a good overview of this new feature, and how you can set up and run a VoiceXML application that is also IM and SMS enabled.

One of things that is really nice about this new feature of the Voxeo Prophecy platform is that the text that is sent to a user via SMS or IM, and all of the inputs sent back from the user, follow the same logic rules as with a phone interaction. This means that if there are VoiceXML elements with conditional attributes on them, they get evaluated when rendering text for SMS and IM, just like they do for the telephone.

Another really nice feature is that Voxeo lets you deploy a single number for SMS and traditional phone access to your application. So if you dial a provisioned number on your telephone, the traditional VoiceXML browser engages and executes your code in the typical fashion. If you send an SMS message to the number, the new Prophecy 10 browser engages your code and manages the SMS interaction.

Because of this, its fairly straightforward to detect which browser is accessing your code, and create a simple variable declaration that will govern how your output is rendered.

The following is a simple PHP class that can be used to sniff the browser type requesting a specific file.

To use this class, we simply include it in our PHP page that will render VoiceXML, determine what kind of user agent is requesting the page by calling getChannelType() and set a VoiceXML variable accordingly.

If an SMS or IM client is interacting with our application, the Prophecy 10 browser will make the request. If its a standard telephone, it will be the Prophecy 8 browser, so we really just need to use the value of $_SERVER['HTTP_USER_AGENT'] to guide our app.

You’ll notice that if an SMS or IM client is accessing our app, we skip the confirmation field. We’ve also customized the reprompt logic on noinput to ensure that a user does not get successive SMS or IM messages telling them “Sorry, I did not get your response.”

So with a few lines of server side code, we’ve custom tailored our VoiceXML dialog to ensure that it renders properly regardless of the type of user agent accessing it.

Voxeo Prophecy’s new features are powerful, and VoiceXML developers should take notice. With Prophecy, they can leverage their skills and become crackerjack SMS and IM app developers as well.

Wanted: One IM/VoIP Client That Does it All

Vacancy Description:

There is an immediate opening in my life for a smart, fast, next-generation IM client that can integrate with multiple social networks, standard XMPP servers, email and POP accounts and VoIP services.

Candidates already evaluated:

Digsby: Digsby is an IM client that connects to AIM, MSN, Yahoo, ICQ, Google Talk, Jabber, and Facebook Chat.

Pros:

  • Can be used to manage e-mail on different networks (Hotmail, Gmail, Yahoo Mail, AOL/AIM Mail, IMAP, and POP accounts).
  • Integrates with social networks like Facebook (receive friend requests, etc.) and Twitter (change status).

Cons:

  • Windows only (for now).
  • Can’t use for VoIP phone / video calls.

WengoPhone: A SIP client that also integrates with IM networks such as Google and AIM.

Pros:

  • Multi-platform (runs on Windows, Linux or Mac).
  • Supports SIP.
  • Support for AIM and Jabber IM accounts.

Cons:

  • Can’t integrate with social networks like Facebook.
  • Can’t be used to manage e-mail accounts

Meebo: A web-based service that integrates with a host of different IM and social networks, including Facebook and MySpace.

Pros:

  • Web-based — no install required, and can be accessed from anywhere.
  • Can connect to AIM, Yahoo, Google, MySpace, Facebook, Jabber and ICQ accounts

Cons:

  • Can’t use for VoIP phone / video calls.
  • Ad supported, so I get asked if I want to watch Miley Cyrus videos on occasion (who doesn’t though…)

This position will remain open until a suitable candidate has been found.