Using SoX on Headerless (VOX) Audio files

Recently, I had occasion to use Sound eXchange (SoX), the open source sound processing program that lets you convert audio files into different formats, add sound effects to audio files, mix multiple audio files together, and a bunch of other cool stuff.

I really like this tool — it’s powerful, easy to use and easy to incorporate into scripts for batch processing. I did, however, run into a bit of a snag when using it to try and convert audio files in “.vox” format (used on one VoiceXML platform) to “.wav” format (for use on another platform). I simply could not get SoX to convert my “.vox” files to “.wav” files successfully. Every “.wav” audio file I generated was garbled nonsense that sounded only vaguely like human speech.

Trying things like…

~$ sox myfile.vox myfile.wav


~$ sox -r 8000 -t vox -c 1 myfile.vox myfile.wav

…failed miserably. I had a hard time figuring out why. I could find almost nothing helpful on the web.

According to the SoX documentation:

SoX attempts to determine the file type of input files automatically by looking at the header of the audio file. When it is unable to detect the file type or if its an output file then it uses the file extension of the file to determine what type of file format handler to use.

The thing about “.vox” files is that they are headerless — they don’t have descriptive information that can be read in by utilities that play these files. The utility has to be told things like sample rate, encoding format, etc. But based on my read of the documentation, I thought that SoX would pick this up through the “.vox” extension on the file.

After a number of unsuccessful attempts, I decided to try again taking a bit of a different approach. The SoX documentation also states that “audio data can usually be totally described by four characteristics”

  • The sample rate;
  • The precision the data is stored in;
  • The data encoding, and;
  • The number of channels

When I told SoX to treat the input file as a raw file (i.e., ignore the “.vox” extension), and then specifically told SoX each of the four characteristics of my file, it finally worked.

~$ sox -t raw -r 8000 -c 1 -U -b myfile.vox myfile.wav
  • The -t option tells SoX that the input file is raw (headerless) audio
  • The -r option tells SoX the sample rate of the input file
  • The -c option tells SoX that the input file has one channel (mono)
  • The -U option tells SoX that the data encoding is u-law
  • The -b option tells SoX that the data size is in bytes

My generated “.wav” files are clear as a bell and sound great. In the end, it turned out not to be a SoX issue as much as a lack of information on how to use it properly with “.vox” files, which is a rather old format but is still used on many telephony platforms. I hope this post helps someone else down the line struggling with the same issue.

Phoning up some controversy

It’s interesting to see the part that automated outbound calling applications (or “robo calls” as they are more commonly referred to) are playing in the election taking place today.

Unfortunately, it looks like for the most part they are not being used very scrupulously:

Political messages are exempt from the federal do-not-call rules meant to discourage unwanted sales pitches. But a New Hampshire law prohibits making automated calls to people who are on the do-not-call list. The Republican committee agreed on Sunday to halt calls there at the urging of the state attorney general.

I haven’t gotten that many calls, but for the next election I’ll be getting my Asterisk set up configured to block this kind of nonsense.

Credit where credit is due

It’s nice to see that the State of Delaware is still receiving acclaim for the Access Delaware project that I helped start several years ago with Lt. Governor John Carney. After it’s initial launch, the project was selected as an award winner in the Accenture/MIT Digital Government Award program — one of the first national technology awards the State of Delaware ever received.

Thanks to a core group of dedicated programmers in the Department of Technology and Information, and the support of forward-thinking public officials like John Carney, the project continues to evolve and improve.

It’s interesting to go back now and listen to an interview I gave when the program was first launched. I’m amazed, and very proud that the program is still being recognized. Now, if only more governments would follow this lead… 😉

Choice is good

This year at the polls, voters with visual impairments will have more options than ever. There was a nice article on this very issue waiting for me in my local newspaper this morning.

One of the downsides to those options that current exist for such voters is the cost of the equipment:

Voting machines nationwide are being fitted with such technology, paid for through the federal Help America Vote Act.

Each conversion cost $1,000 and an added machine was bought for each site, said Elaine Manlove, administrative director, Department of Elections for New Castle County. Added storage space also had to be rented.

“This was no small change for us,” Manlove said.

One of the benefits of a centralized remote voting system using telephones is that it could be dramatically less expensive. It could also make the process of voting more efficient:

A downside is that hearing a whole ballot and directions may take 20 minutes, [Manlove] said.

One of the nice things about a phone voting system using speech recognition is that it would provide a much more intuitive interface, cutting down on time needed to explain to voters how to use equipment. Let’s face it, voting equipment is technology you use once an election. Telephones are used everyday.

Vocal Democracy Released!

The alpha version of Vocal Democracy is now available for download from SourceForge. I’ve already got lots of changes planned, but I’m excited to get feedback on the state of the work so far.

This version was designed specifically to run on the Voxeo Prophecy platform. Going forward, I’ll be working on enhancements that will help ensure this package runs easily on a wide array of VoiceXML platforms.