Showing posts with label julius. Show all posts
Showing posts with label julius. Show all posts

Saturday, August 1, 2009

Julius Speech Recognition and Tools

I did not work for longer time on Speech Recognition Engines. Say around 3-4 months of work. I had to do some sort of research on which ASR can be best suited for the idea we had in mind. Julius topped our list. Sphinx 4 is exciting as its in Java and I expect it to mature even more as time goes on.
So as I mentioned in my previous post, from Julius home page you won't get much details other than source code ans a handbook. To try out your hands immediately on Julius download Julius Quick Start Demo from voxforge.

Julius is a written in C and compiled using GCC. So if you are using Windows then you will be needing cygwin to run or compile and build the Julius source.

Juilus provides you with a set of tools which are pretty useful in building you SR system. These executable will be found in 'bin' folder. I am going to include on 3 of those handy tools which we will use mostly.

  • 'Julius' - The main recognizer module that does all recognition part of speech. Julius needs a language model and an acoustic model to run as a speech recognizer. You can use HMM acoustic model, language model (word N-gram, grammar, isolated word). Input can be either in wave or mfc format or direct mic or even voice data from Network. Note that for waveform file input, only WAV (no compression) and RAW (mono, 16bit, big endian) are supported by default. There are options where one can specify a list of input files to be recognized in form of a file. There are dozens of command line options available. Going through Julius manual will give better idea. Its always better to use these options in a configuration file and pass this parameter to Julius.
  • 'adinrec' - This tool helps you in recording voice in Julius acceptable audio format. The audio format is 16 bit, 1 channel, in WAV format. Here too like Julius one can set sampling frequency to record even at 48k Hz. The tools records every utterance as a single file.
  • 'adintool' - This tool is similar to adinrec along with other options. All the Julius options like can be set, but since its just an audio tool other options will be skipped without any error. One interesting option is 'adinnet'. This option lets you run Julius in 'server mode'. With adinnet we can specify a port number which julius can listen to and a server name using -server option. This will make Julius receive data directly from adintool for recognition. Say the Julius recognition is on server side and you are running a SR program on client side. This option can indeed let you do real time recognition.

In my next post we will see a small example on how to run Julius in server mode. Explore Julius till then !


/A

Wednesday, June 17, 2009

A Dive into Japanese LVCSR Engine

In previous post, I mentioned on switching to Julius from Sphinx. I had no idea what I had in store. Googling for 'Speech Recognizer' gives you 8 on 10 results for CMU Sphinx. I found Julius from wiki link. I only wanted an Open Source Speech Recognizer. Seems like Julius was the only best possible option for me to dig out if I can make use of it.

I must say using Julius was not easy but end results achieved from Julius were great.
Let me highlight some initial problems you will face when you go for Julius.
  • Julius is an Open Source Japanese Speech Recognition. Julius was developed as Japanese LVCSR since 1997. They have home page both in English and Japanese.
  • The site have a user documentation which is actually first written in Japanese. An English version is still under development. But do not worry, Google Translator comes to our rescue. Here is the translated English version of Julius Book.
  • Now being Japanese Recognizer it had only acoustic model for Japanese Language. But good people are present all over web. The VoxForge-project is working on the creation of an open-source acoustic model for the English language.
  • If you go at the Julius home site, you might get lost after downloading the source code or binaries and reading bits-n-pieces of info. I suggest you to start by downloading Julius Quick Start from Voxforge. Its on 3.5.2 version of Julius but porting to latest version is as easy as copying acoustic model and grammar files.
  • Julius Forum is also a painful experience for me. They have English and Japanese Forum Topics. So again use Google Translator. I don't think whatever is asked by Japanese guys are reflected in English Forum.

The above mentioned points will definitely get you started with Julius especially the Quick start from voxforge. Check out voxforge forums too. Useful information but meagre for a novice.

Saturday, May 30, 2009

Switching Sphinx to Julius

Ah.. Its been long time since I updated this blog. I was really busy trying to use Julius.
Yes. right. I had to ditch Sphinx4 and move to Julius.

Following are the reasons that made me shift from Sphinx to Julius:
  • First and Foremost, Poor Recognition. I really could not get even 80% accuracy from this ASR. Me and one of my American friend tested it many times. Still no success.
  • Sphinx4 is based on Java. Hence its 'obviously' slow and hogs lot of memory.
  • It doesn't recognize words properly. However Digits are pretty accurately recognized.
  • No backward compatibility. Sphinx4 is re-written in Java. Whereas all previous versions are written in C/C++.
Reasons I will miss Sphinx:
  • Good documentation
  • Great helper demo examples
  • Active Forums and help by their developers
  • I kind of feel comfortable using Java. Hence using Eclipse for Sphinx4 really helped me in learning about Sphinx4 easy.
From next article onwards I will switch to Julius. I really couldn't get hang of Sphinx to make it work for my task. Sometime later, I shall work again on this and find out where I did wrong or has sphinx indeed become better recognizer :)

Monday, April 13, 2009

Why this Blog?

Speech Recognition (SR) is budding technology. Big market players like Microsoft, IBM, Philips, etc are eyeing a pie in this field. Still as per my study past few months on SR Software, I think this technology needs lot of improvements. I found a few Open Source SR tools which do fairly good task of recognition.

I don't have big bucks to pay for SR software from above mentioned vendors. So I do not have idea on how these work. Based on reviews from various sites I found they are kind of OKAY. Not great !!

This blog will basically talk on my experience and trouble I faced using Open Source SR tools (cos' only they were in my reach). I have done a study on Sphinx4 and Julius. Used HMM Toolkit (HTK) to develop acoustic models and did lot of weird stuffs playing with these tools.
There are lot of resources on net. But I came across few problems which were not anywhere or information was distorted. So I will try and compile as much info I can.

PS: If you have any questions or any good links related to SR, please post a comment here. Thanks

ciao,
/A