Saturday, August 1, 2009

Julius Speech Recognition and Tools

I did not work for longer time on Speech Recognition Engines. Say around 3-4 months of work. I had to do some sort of research on which ASR can be best suited for the idea we had in mind. Julius topped our list. Sphinx 4 is exciting as its in Java and I expect it to mature even more as time goes on.
So as I mentioned in my previous post, from Julius home page you won't get much details other than source code ans a handbook. To try out your hands immediately on Julius download Julius Quick Start Demo from voxforge.

Julius is a written in C and compiled using GCC. So if you are using Windows then you will be needing cygwin to run or compile and build the Julius source.

Juilus provides you with a set of tools which are pretty useful in building you SR system. These executable will be found in 'bin' folder. I am going to include on 3 of those handy tools which we will use mostly.

  • 'Julius' - The main recognizer module that does all recognition part of speech. Julius needs a language model and an acoustic model to run as a speech recognizer. You can use HMM acoustic model, language model (word N-gram, grammar, isolated word). Input can be either in wave or mfc format or direct mic or even voice data from Network. Note that for waveform file input, only WAV (no compression) and RAW (mono, 16bit, big endian) are supported by default. There are options where one can specify a list of input files to be recognized in form of a file. There are dozens of command line options available. Going through Julius manual will give better idea. Its always better to use these options in a configuration file and pass this parameter to Julius.
  • 'adinrec' - This tool helps you in recording voice in Julius acceptable audio format. The audio format is 16 bit, 1 channel, in WAV format. Here too like Julius one can set sampling frequency to record even at 48k Hz. The tools records every utterance as a single file.
  • 'adintool' - This tool is similar to adinrec along with other options. All the Julius options like can be set, but since its just an audio tool other options will be skipped without any error. One interesting option is 'adinnet'. This option lets you run Julius in 'server mode'. With adinnet we can specify a port number which julius can listen to and a server name using -server option. This will make Julius receive data directly from adintool for recognition. Say the Julius recognition is on server side and you are running a SR program on client side. This option can indeed let you do real time recognition.

In my next post we will see a small example on how to run Julius in server mode. Explore Julius till then !