Wednesday, June 17, 2009

A Dive into Japanese LVCSR Engine

In previous post, I mentioned on switching to Julius from Sphinx. I had no idea what I had in store. Googling for 'Speech Recognizer' gives you 8 on 10 results for CMU Sphinx. I found Julius from wiki link. I only wanted an Open Source Speech Recognizer. Seems like Julius was the only best possible option for me to dig out if I can make use of it.

I must say using Julius was not easy but end results achieved from Julius were great.
Let me highlight some initial problems you will face when you go for Julius.
  • Julius is an Open Source Japanese Speech Recognition. Julius was developed as Japanese LVCSR since 1997. They have home page both in English and Japanese.
  • The site have a user documentation which is actually first written in Japanese. An English version is still under development. But do not worry, Google Translator comes to our rescue. Here is the translated English version of Julius Book.
  • Now being Japanese Recognizer it had only acoustic model for Japanese Language. But good people are present all over web. The VoxForge-project is working on the creation of an open-source acoustic model for the English language.
  • If you go at the Julius home site, you might get lost after downloading the source code or binaries and reading bits-n-pieces of info. I suggest you to start by downloading Julius Quick Start from Voxforge. Its on 3.5.2 version of Julius but porting to latest version is as easy as copying acoustic model and grammar files.
  • Julius Forum is also a painful experience for me. They have English and Japanese Forum Topics. So again use Google Translator. I don't think whatever is asked by Japanese guys are reflected in English Forum.

The above mentioned points will definitely get you started with Julius especially the Quick start from voxforge. Check out voxforge forums too. Useful information but meagre for a novice.

3 comments:

  1. Which results did you get with Julius and what was your pronounciation dictionary size?

    I trained models with HTK on 40h of audio, my pronounciation dictionary contains 120000 entries, I can't get above 55%

    Besides, Julius doesn't have acoustic model adaptation

    ReplyDelete
  2. Hi Rauf,
    My Dictionary size was approax ~7MB.
    I got very good results from Sphinx. I still have that sample. If you need just let me know your mail id. I hope that can help you.

    Changing Julius parameters might help you.

    ReplyDelete
  3. Hi Amit,
    I am an Indian student at Delft university of Technology, Netherlands. I am developing a speech enabled application for Dutch people. I would like to exchange experiences with you regarding speech recognition. Can I have your email ID?
    Akhil

    ReplyDelete