Saturday, May 30, 2009

Switching Sphinx to Julius

Ah.. Its been long time since I updated this blog. I was really busy trying to use Julius.
Yes. right. I had to ditch Sphinx4 and move to Julius.

Following are the reasons that made me shift from Sphinx to Julius:
  • First and Foremost, Poor Recognition. I really could not get even 80% accuracy from this ASR. Me and one of my American friend tested it many times. Still no success.
  • Sphinx4 is based on Java. Hence its 'obviously' slow and hogs lot of memory.
  • It doesn't recognize words properly. However Digits are pretty accurately recognized.
  • No backward compatibility. Sphinx4 is re-written in Java. Whereas all previous versions are written in C/C++.
Reasons I will miss Sphinx:
  • Good documentation
  • Great helper demo examples
  • Active Forums and help by their developers
  • I kind of feel comfortable using Java. Hence using Eclipse for Sphinx4 really helped me in learning about Sphinx4 easy.
From next article onwards I will switch to Julius. I really couldn't get hang of Sphinx to make it work for my task. Sometime later, I shall work again on this and find out where I did wrong or has sphinx indeed become better recognizer :)

Thursday, May 7, 2009

Creating Your own Demo using Sphinx4

Sphinx4 provids good number of demos which I used in my program. I actually had to write an application which will record user speech on client side and send it as wav file to Server. On server side I had to recognize this wav file and return back the result with a confidence score attached as to how well the speech was recognized.
Sounds pretty simple?

I decided to use Java Applet like the one in voxforge. Display a list of sentences and ask user to record the voice. I was part successful in it. I developed an Applet that used Java Sound APIs for recording and playing it back. I ran into certain security issues as Applets are not supposed to save any file locally on client machine or access file system. After digging came over with this issue by signing my applet jar using Jarsigner. So my front end is ready. This applet sends the wav file to server.


Next, Server side planning. For demo I used Sockets to receive the input and send out results. Sphinx4 has a sample program that shows how to pass input audio file to sphinx for recognition. Thats it. My task over. I later on created a new program based on demo to recognize more words and used my own Language model for this task. This was my first application using Sphinx. I wished to let users download the application and test. But one problem with Sphinx4 is that its based on Java and the Acoustic Model and Dictionary make the program heavy for me to upload.

Commenting on accuracy, I was not very satisfied. There are various factors that determine accuracy of SRS, like pronunciation, microphone quality, surrounding noise, etc. I got good results when I used it to recognize Digits. But on providing random words for recognition, accuracy came down to less than 50%. I visited forums for solution, still no proper solution.
Still focus is on improving the results. Changing few parameters did increase the accuracy but it did not convince me to use it for production purpose. I had to leave this work stalled for now.

Edit: This is one of the initial samples I had developed. Download
Update 20/3/2012: The download link was broken. Thanks Jaishu for pointing it out.