Make the Cloud Listen (and Understand)

Yesterday I wrote a post about the changing cloud telephony landscape, and highlighted some key factors that will dictate which cloud telephony providers are around for the long haul and deliver the next innovations.

One of those factors – support for speech recognition – is a good differentiator for developers to use when choosing a cloud telephony platform.

Speech recognition is becoming increasingly important in our everyday lives. Smartphones and powerful handheld devices enable multimodality, and there are more and more restrictions placed on our use of phones while doing other tings (like driving).

Plus, I can’t think of a more deflating concept than a cloud telephony provider that allows developers to build sophisticated apps and mashups in the language of their choice but that chains users of those apps to a telephone keypad. No fun.

To give an example of how powerful speech recognition can be, and how easy it is to use with a cloud telephony provider that supports it, I worked up a small demo to illustrate the point. The sample code for this demo is on Github, and we’ll dive into it in more detail below.

This demo uses two PHP libraries that are designed to work with the Tropo platform (one of the only cloud telephony providers to support speech recognition):

If you’ve read any of my previous posts on build applications for the Tropo platform, you’ll see lots of similarities between this and previous sample apps. Here I continue my use of the insanely awesome Limonade Framework for PHP.

Let’s take the example of a company directory that allows callers to dial a single number, select a person or department at the company and then be transferred to the person they select.

With cloud telephony, there is no need to have such a system live on a machine in the server room – it can be hosted externally in the cloud, making it easier to manage and to scale. In addition, with the Tropo Platform, it doesn’t have to be the same tired old DTMF-based menu telling callers to press an extension number or to “dial by name…”.

Using the PHP WebAPI Library and Limonade, we can construct a simple, yet power script that looks like this:

This script is pretty self-explanatory, but there are some key points I want to emphasize. First, note the $options array that holds the reference to an external grammar file (more on that in a bit). Tropo seems to need for this reference to be an absolute one and not a relative reference to the file (not hard to do with PHP – you just need to be aware of it).

Also, the file reference needs to include a trailing parameter indicating that this is an XML grammar (;type=application/grammar-xml). This seems to be true even if the grammar file is served with the correct MIME type by whatever is serving it.

Now lets have a look at this grammar file.

This simplistic example demonstrates how to use the PHPGrammar library. Note the simple array structure that is being used to hold the details of employees for our fictitious company. This could very easily be replaced with a dip into a data source of pretty much any kind, like an LDAP directory or database holding employee details.

Also note in this example that we want to do something referred to as Semantic Interpretation. Our grammar file is a set of rules that will be applied to what the caller says – Semantic Interpretation (SI) dictates the value that is given to our application from the grammar when a successful match occurs.

In this example, we want the caller to be able to say the name of the person they want to be transfered to. We make the first name optional so they may either say the last name of the person or (optionally) the full name. Obviously this may need to be changed based on the size of the directory to render in a grammar file (e.g., multiple employees with the same last name).

Do note that the Tropo platform seems to require the “Script” sytax for returning SI values on a successful match as opposed to the “String Literal” syntax. (More on these alternatives here.)

Works on Tropo (Script syntax):
<item>foo<tag>out="bar";</tag></item>

Does not work on Tropo (String Literal syntax):
<item>foo<tag>bar</tag></item>

So, when a caller says the name of a person in our company directory we want to return the number for that person to our Tropo script so we can transfer the call to them. This can clearly be seen when we examine the Result object that is delivered by the Tropo platform.

Tropo’s Result object includes the full grammar engine output, and lots of very detailed information about the recognition. As you can see, the utterance that the speech recognition engine heard was the name of one of our faux employees. The value that was returned is the number of that person.

We use this value in the transfer_call() method of our Tropo script.


// Create a new instance of the Result object.
$result = new Result();

// Get the value of the selection the caller made.
$phone = $result->getValue();

// Create a new instance of the Tropo object and transfer the call.
$tropo = new Tropo();
$tropo->transfer('+1'.$phone);

// Write out the JSON for Tropo to consume.
$tropo->RenderJson();

Using the PHP WebAPI library, it takes just 5 lines of code (excluding comments) to get the value of the grammar result and transfer the call. How cool is that?!

Obviously there are lots of things that can be done to enhance this script, to make it more robust, but it illustrates the essential concepts of speech recognition in the cloud.

What’s more, because of all of the great functionality provided by the Tropo cloud platform we can really push the envelope on the tired old company directory:

  • We could take an inbound call from a Skype user and transfer to a cell phone (or a SIP endpoint).
  • We could let our caller select a department in our company and then ring several different numbers at once, transferring the call to the first one answered (sort of a “hunt group in the cloud”).
  • We could use Tropo’s built in IM capabilities to send a screen pop to the person receiving the call.

The sky is the limit. Which I guess is the point of cloud telephony…

Advertisements

Building Cloud Communication Apps with Tropo: Part 1

A few months back, I wrote a series of posts on building NoSQL telephony applications with Tropo and CouchDB. Today I’m going to start a continuation of that series, focusing on how to build cutting edge cloud communications apps with the Tropo WebAPI.

What is the Tropo WebAPI?

The Tropo WebAPI is, in a nutshell, an HTTP/JSON API for building multi-channel communication applications – applications that you interact with via phone, IM, SMS or Twitter. While my earlier series on Tropo focused on building applications in Tropo’s scripting environment (another fine option for developers), this series will focus on building JSON-based applications (generated using PHP) that can be hosted anywhere and executed in the Tropo cloud environment.

Faithful readers will recognize some similarities here to a post I did a while back on the HTTP/JSON API provided by CloudVox, another cloud telephony provider. While the concept behind these two API’s is very similar, there are some key differences that make Tropo a highly attractive option for developers.

First, the Tropo service is truly multi-channel – using the Tropo WebAPI you can build applications that work on a range of different communication channels, not just phones (although you can build some pretty slamming phone apps as well).

Since I’m a phone app developer at heart, some of the features that Tropo provides for phone applications really get me excited. Tropo supports both DTMF entry and speech recognition. It also has broad multilingual support. In addition, Tropo gives phone application developers the ability to manipulate SIP headers, an important feature in building sophisticated cloud communication apps that I hope to demonstrate down the road a bit.

Getting Started

Head on over to Tropo.com and set up a new account (if you don’t have one already). Take a little time to review the documentation for the Tropo WebAPI. For the example applications in this series of blog posts I’ll be using a PHP class library I developed specifically to interact with the Tropo WebAPI.

The crew behind Tropo have provided a Ruby Gem for interacting with the Tropo WebAPI. However, since I like to do my cloud telephony work with PHP I decided to write my own set of classes for doing this. Whether you’re a Ruby-head or a PHP enthusiast, using one of these tools to generate JSON for consumption by the Tropo WebAPI can make build an application significantly easier, particularly as you get into more sophisticated application development.

You can get the PHP Library, as well as some of the sample apps we’ll be looking at, from GitHub:

$ git clone git://github.com/tropo/tropo-webapi-php.git

You’ll need to host these classes and the PHP scripts you write with them on a server that can be accessed from the Tropo environment. Any web server that supports PHP will do.

My First Tropo WebAPI Application

Let’s start with the standard Hello World app:


Say("Hello World!");

// Render the JSON for the Tropo WebAPI to consume.
$tropo->RenderJson();

?>

You can look at the rendered JSON in your browser, and you should see something like this:


{
    "tropo": [
        {
            "say": [
                {
                    "value": "Hello World!"
                }
            ]
        }
    ]
}

Go to the Applications section in your Tropo account and set up a new WebAPI application that points to the location of this script.

Create a new Tropo WebAPI application

Assign a URL to your new Tropo WebAPI application

When you create your application, Tropo will automatically provision a Skype number, a SIP number and an iNum. You can additionally add a PSTN number in a range of different area codes at no charge.

You may also notice the section below the provisioned phone numbers entitled “Instant Messaging Networks” – this section allows you to set up any number of different IM accounts (and Twitter!) that your application can use. We’ll dive deeper into this in future posts.

For now, we’ll keep it simple and use the auto provisioned Skype number – when you call this number, you will hear it say “Hello World.”

The next post in this series will focus on a more sophisticated application that uses the TropoPHP classes and the utterly awesome Limonade PHP framework.

Stay tuned…

Customizing Caller ID with CiviCRM and trixbox

Recently, I had an opportunity to set up a CiviCRM site for a client to use to track contributors.

I think CiviCRM is a great piece of software that can be enormously useful to non-profit organizations, and I’m actually surprised that more don’t opt to use it. It’s powerful, robust, extensible and open source.

The site I set up is running a Red Hat-based LAMP stack with Drupal, CiviCRM and the CiviContribute module. I have the same set up mirrored in a local development environment that lets me test tweaks and upgrades to the site before they get pushed out into production. I also have an instance of trixbox CE running locally, and I decided to try and see if I could set up a trixbox PBX that uses CiviCRM as the data source for looking up caller information on inbound calls. This way, if a non-profit is using both CiviCRM and trixbox they could set their PBX to look up information about clients, donors, volunteers, etc. that call their offices and display this information in either a SIP client or in a screen pop.

The process of setting up trixbox to use CiviCRM as the data source for caller ID lookups couldn’t be easier (note, I used these steps on trixbox version 2.6.1.10):

  1. Log into your trixbox and enter admin mode.
  2. From the toolbar, select PBX >> PBX Settings.
  3. From the FreePBX left-hand side menu (under “Inbound Call Control”), select CallerID Lookup Sources
  4. In the section entitled “Add Source”, enter the Source Description (CiviCRM), and Source Type (MySQL). Using Cache Results will cache the CID lookup in AstDB. Depending on your own set up, you may or may not feel the need to enable this setting.
  5. In the section entitled “MySQL” (appears after you designate the Source Type), enter the Host, Database, Username and Password to access your CiviCRM database.
  6. The section entitled Query is where you will enter the SQL query that will return the information you want to display about on incoming calls. I used the following SQL (you can modify this as you like to alter your display information – note, the special token ‘[NUMBER]’ is replaced with the caller ID of the incoming call):

  7. SELECT CONCAT('CiviCRM: ',display_name) FROM civicrm_contact WHERE id = (SELECT id FROM civicrm_phone WHERE REPLACE(phone, '-', '') = SUBSTRING('[NUMBER]',-10))

  8. Click Submit Changes.
  9. Navigate to Inbound Routes (also under the “Inbound Call Control” section of the FreePBX menu).
  10. Select the route that you would like to use CiviCRM as the data source for.
  11. In the section entitled “CID Lookup Source”, select CiviCRM from the drop down menu.
  12. Click Submit.
  13. Scroll to the top of the page and click on the orange bar that says Apply Configuration Changes and reload the settings.

Thats it!

Now on an inbound call, the name of the CiviCRM contributor in the format CiviCRM: John Doe is displayed in my SIP client whenever a successful lookup occurs. There are all sorts of options that can now be used to display the caller information to non-profit staff, send a screen pop, route the call to a specific destination, etc.

CiviCRM and trixbox are a potent combination. I hope this brief explanation of how to use the two together gets more non-profits excited about using them.

Screen Pops with Asterisk and XMPP

A few weeks ago, I wrote a post on using CCXML and PHP to do screen pops with the Voxeo Prophecy platform. The task was made incredibly easy with a nifty PHP class library designed specifically to interact with XMPP servers.

After getting screen pops working in Prophecy, I decided to try my hand at getting them to work in Asterisk (this is, after all, how the majority of phone calls to me are handled). It turns out, the PHP script I developed to do screen pops in Prophecy can be reused to do the very same thing in Asterisk. If you’d like to give this a try on your own, here is what you will need:

  1. A working Asterisk instance. (This example assumes you know how to modify Asterisk config files.)
  2. A copy of the very cool XMPPHP library from Google Code. (This example assumes that the PHP code used to interact with XMPP servers is running on a separate server than the one housing Asterisk. More on this below.)
  3. An XMPP server and a client, or simply use your Google account.

There are three components to this example. First, you’ll need a PHP script to interact with an XMPP server. You can use the same script employed in my previous screen pop example:


connect();
$conn->processUntil('session_start');
$conn->message('someguy@gmail.com', 'You are receiving a call from: $ani');
$conn->disconnect();
?>

Next, you’ll need a simple AGI script for Asterisk to interact with. This script will send an HTTP request to the PHP script above:


#!/bin/bash
# Contents of screen_pop.agi
# It don't get any easier than this...
curl http://ip_address_to_your_web_server/screen_pop.php?ani="$1"

On your Asterisk server, place this script in /var/lib/asterisk/agi-bin/ and ensure that it is executable. You’ll need to specify the IP address to the server running this script. Note — there isn’t any reason that these two scripts can’t run on the same machine (you can run PHP scripts on an Asterisk server), or even be consolidated into one single script (just make sure to include the XMPPHP library). In fact, you could even run the XMPP server used to send screen pops on the same machine running Asterisk.

The mechanics of this specific example were influenced by the set up for my previous screen pop example, and I am (at heart) a lazy basterd. But I digress…

The last piece to be put in place is to modify extensions.conf to ensure that our AGI script is invoked. You’ll want to add something like the following to the appropriate dial plan context (your specific Asterisk set up will influence this heavily):

exten => 123,1,AGI(screen_pop.agi|${CALLERID(num)})

This will pass the caller ID to our AGI script, which will then invoke the PHP script and send the details of the call to the XMPP account of our choice. I’ve noticed in practice that adding this to my dial plan causes the IM to be sent to my Google Talk account a good 2-3 seconds before the phone rings. Plenty of time to give someone a heads up about who is on the other end of an incoming call.

Obviously there are lots of options for looking up information on the caller, once you have their caller ID, that you can use to augment the information in your screen pop.

Just goes to show you, there isn’t much you can’t do with open source / open standards.

Viva screen pop!

Accessing Web Services From VoiceXML

A few weeks ago, I posted about accessing web services from CCXML using PHP. This post will demonstrate how to do the same thing, only from VoiceXML. We’ll be using Voxeo Prophecy and PHP for this example. We’ll also be referring to the GreenPhone project — available free for download — for the sample code.

Before we dive in, its important to keep in mind that there are a number of different techniques for getting information from web services into a VoiceXML dialog. This is just one method — there are many others. Voxeo even has its own platform-specific way of accessing SOAP web services via JavaScript. Ultimately, the method you employ needs to be a good fit for the environment your working in and the requirements of your project.

Using the greenSoapClient Class

In the last post on this topic, I demonstrated how to use a simple PHP class as a way to access multiple SOAP-based web services from CCXML. This class forms the basis of our method for accessing web services from VoiceXML as well. However, in this instance, instead of using the CCXML <send/> element, we’ll use a VoiceXML subdialog.

Subdialogs in VoiceXML are typically used to create reusable dialog components for capturing common types of input, like a series of digits (e.g., credit card numbers, account numbers, etc). They can also be used to compartmentalize complex interactions with a caller and provide a simple interface for accessing results. By way of example, this is how the OSDMs from Nuance work, as well as the Targus service from Voxeo. We’ll borrow this approach to access a web service from StrikeIron that will send the details of an E85 or bio-diesel station to a cell phone via SMS.

Setting up our Subdialog

In order to send an SMS message with details on an E85 or bio-diesel station, we’ll need 2 things; the station details, and a cell phone number to send it to.

In order to send the details on a station from VoiceXML to PHP, we’ll pack it up in a pipe-delimited string called “detailsToSend” (I won’t go into too much detail about how this is done in this post — to learn more, refer to the GreenPhone Project code). The cell phone number we are sending to is obtained from the caller ID of the calling party, stored in a variable named “ani”. Details on how to access caller ID are given in a previous post.

Our subdialog call will look like this:

<form id="sendDetails">
 <catch event="error.badfetch">
  <prompt>
   There was a problem sending the station details to your phone.
   <break strength="weak"/>
  </prompt>
  <goto next="#goodbye"/>
 </catch>
<subdialog name="sendSMS" src="../php/sendStationDetails.php" namelist="ani detailsToSend">
 <prompt>
  Sending the station details to
  <say-as interpret-as="telephone"><value expr="ani"/></say-as>
 </prompt>
 <filled>
  <if cond="sendSMS.result==0">
   <prompt>Your message has been sent.<break strength="weak"/></prompt>
  <else/>
   <prompt>
    There was a problem sending the station details to your phone.
    <break strength="weak"/>
   </prompt>
  </if>
 <goto next="#goodbye"/>
 </filled>
</subdialog>
</form>

We use the attributes on the <subdialog> element to give our subdialog a name (which we’ll use to access the results sent back from PHP), to specify where to POST our variables to and also to specify which variables to POST.

You’ll also notice that we have set up a handler here for an “error.badfetch” event. This is a good habit to get into whenever you set up a request to an external resource (like a PHP script). If the script isn’t there or has problems, an “error.badfetch” event will get returned and unless you specified a handler for this event, your day will not end well.

Additionally, we’ve set up logic in our filled block to inspect the result of the subdialog call. We access the result as a property of the subdialog, using the name we set up in the <subdialog> element and the dot notation (“.”) familiar to JavaScript.

<if cond=”sendSMS.result==0″>

… code logic goes here …

</if>

With this in mind, our PHP script needs to send back a variable called “result”. How do we do this? Lets take a look at the PHP script:

A Simple Subdialog using PHP

The subdialog that we want to render is extremely simple — we only need to render enough VoiceXML to declare a variable called “result” and return it to the parent dialog. We’ll do this after we make our web service call to send the SMS message.

There are two pieces of information returned from the StrikeIron web service that we are interested in; a string that holds the response message from the service (i.e., “success”, “failure”, etc.) and a number indicating the outcome of the web service call.

We’ll take these two bits if information and assign them to PHP variables:


$result = $xml->soapHeader->ResponseInfo->ResponseCode;
$message = $xml->soapHeader->ResponseInfo->Response;

Now, we want to write out these variables in a simple VoiceXML subdialog:


<?xml version="1.0" encoding="utf-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml">
<form id="F_1">
 <log>*** SMS response message was: <?php echo $message; ?>. ***</log>
 <block>
  <var name="result" expr="<?php echo $result ?>"/>
  <return namelist="result"/>
 </block>
</form>
</vxml>

As discussed above, this creates just enough VoiceXML to instantiate a variable and return it to the parent dialog. For good measure, we’ll write out the web service string (contained in the PHP variable $message) as a log statement, in case it contains information we want to look at later.

Why This Approach?

Using this technique for accessing web services from VoiceXML provides a couple of advantages. First, it allows us to completely separate the presentation layer (the VoiceXML) from the logic used to invoke the web service. This is a fairly standard design practice that makes creating the dialog much easier for a developer that does not necessarily know a whole lot about web services. With this approach, they don’t really need to — they only need to know that the subdialog call will return a variable called “result” whose value can be inspected to determine what to do next.

Additionally, because the parent dialog is just static VoiceXML it may be possible to cache it. Since the parent dialog isn’t dynamic, it can be cached for fast access, while the subdialog — which must be dynamic — is the only component sent from the web server to the VoiceXML platform each time a caller accesses the application. Careful design can yield additional caching opportunities that can make your applications more efficient and less bandwidth intensive.

In the next post, we’ll explore one additional method for accessing web service from VoiceXML. Stay tuned…