Sunday, April 26, 2009

Sphinx4 Configuration file : config.xml

There are three primary modules in the Sphinx-4 framework:
  • The FrontEnd
  • The Decoder
  • The Linguist

Sphinx4 is very modular in nature. Every small block here can be separately configured. All these blocks can be separately configured in a Configuaration File. In this file we need to specify the front end which sphinx4 will use, Acoustic models and dictionary used to create a search graph which is used during recognition, language model(grammar) makes recognizer look for 'most likely' words occuring during recognition. Sphinx-4 Decoder block use output from the FrontEnd in conjunction with the SearchGraph (output) from the Linguist to generate recognition Result.

Let us now walk through a sample Configuration file: config.xml (download here)
Every config file has been logically separated into different sections. You can find syntax and rules for creating a configuration file at Sphinx Configuration management site.
  1. Frequently Used Properties consists of properties that are used by other sections.
  2. In Language Model we specify the grammar to use which will be used by Sphinx to match the speech. Pluggable language model support for ASCII and binary versions of unigram, bigram, trigram, Java Speech API Grammar Format (JSGF), and ARPA-format FST grammars.
  3. Dictionary can be either Wall Street Journal (WSJ) or TIDIGTS or your own dictionary in standard ARPA format. You can find WSJ and TIDIGITS dictionary in Sphinx4 binaries itself. Dictionary consists of the words and their pronunciation phenome.
  4. Next define Acoustic Model depending upon the type of Dictionary you use. Again Sphinx4 has included acoustic models for WSJ and TIDIGITS.
  5. In Front End we can specify if the input is from Microphone or any Data Source (wav, au, etc format).
These are the major sections in any Sphinx4 configuration file. I will discuss few of these sections in subsequent articles.


Friday, April 17, 2009

Getting Started with Sphinx4 and Eclipse

I started reading the Getting Started page of Sphinx4. After downloading both source code and binaries it was time to set our environment. I have always used and enjoyed working on Eclipse. So went for setting up Eclipse environment.

Pre-requisites: Eclipse (Callisto, Europa or Ganymede). JRE 1.4+
Follow these steps to setup your development environement for Eclipse IDE:
  1. Extract Sphinx4 source and binaries in folder. I have /Speech/Sphinx/sphinx4-1.0beta2.
  2. Create a Empty Java Project titled 'sphinx' or any name you wish to.
  3. In the Package Explorer, right click on Project and select 'Build Path -> Link Source'
  4. A Dialog Box will appear asking you to enter the source folder from your filesystem. navigate your path to /Speech/Sphinx/sphinx4-1.0beta2/src/sphinx4.
  5. Eclipse parses entire folder structure recursively and also names your source folder.
  6. Click Next and Finish. A new source folder is now linked with your project.
  7. It will contain lot of errors as libraries are not been added yet.
  8. Add /lib/js.jar, lib/tags.jar and lib/jsapi.jar to your project classpath by right-clicking project and selecting 'Build Path -> Configure Build Path -> Libraries'.
  9. Eclipse then refreshes the workspace and all errors are nullified.

Sphinx4 also provides you with a handful of sample programs. I found every sample program very useful and it covered most of the details needed to learn Sphinx4.

To Setup the environment for using the samples and viewing the source code, you can follow these steps.

  1. Create another new project and title it as 'sphinx-demos'
  2. Right click and Link this Source folder: /Speech/Sphinx/sphinx4-1.0beta2/src/apps
  3. Add the missing Libraries: /lib/js.jar, lib/tags.jar and lib/jsapi.jar and lib/sphinx4.jar to your project classpath
  4. Every demo is a simple Java file. To test any demo, just right-click select Run -> Run as Java Application File.

There you go. You are now ready to learn Sphinx4 with proper development environment setup.

Sphinx Project gives you access to entire source code of Sphinx.
Sphinx-demo Project gives you access to all the demo apps for Sphinx.

Wednesday, April 15, 2009

Now where to start?

Initially I had no idea what to look into and where to look into to find a good speech recognition software. I kind of 'specialize' in Java. So narrowed down my search by searching any SR based on Java.

I came across Java Speech API. The Java Speech API was developed by Sun Microsystems, Inc. in collaboration with leading speech technology companies: Apple Computer, Inc., AT&T, Dragon Systems, Inc., IBM Corporation, Novell, Inc., Philips Speech Processing, and Texas Instruments Incorporated. Now the moment I saw this, I felt It's SUPER COOL. Sun has done some brilliant work in every field. BUT then after reading few more stuffs I realized that these are just APIs.

The Java Speech API defines a standard, easy-to-use, cross-platform software interface to state-of-the-art speech technology. JS API defines two technologies: Speech Synthesis and Speech Recognition.

Speech Synthesis is basically Text To Speech.
Speech Recognition is Speech to Text.


These APIs lead me to SPHINX, IBM's Via Voice, Microsoft Speech API, Julius, and others.
I couldn't get IBM Via Voice or MS Speech API for my use. Hence I started off with Sphinx.

Sphinx was the first-of-its-kind continuous speech recognizer. It has only recognition part of JS API. Sphinx was developed at Carnegie Mellon University by Kai-Fu Lee. Sphinx is currently in 4th version. Sphinx 4 is a complete re-write of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition. It is written entirely in the Java programming language leveraging the Java Speech API Standard.

Sphinx4 currently is one of the best speech recognizer used. It is also available for devices called 'pocket-sphinx'. If you need a Speech Recognizer then I suggest you to start with this.
Sphinx has very good Documentation and an active Forum too. I really enjoyed exploring it. In following article I will write more detailed scripts for using Sphinx.

One thing that struck me was that Sphinx was developed by an Non-American (Kai-Fu Lee). Kai-Fu Lee is said to be man behind Microsoft Speech *WRECKecognition* Initiative. This video shows all. :)

Monday, April 13, 2009

Why this Blog?

Speech Recognition (SR) is budding technology. Big market players like Microsoft, IBM, Philips, etc are eyeing a pie in this field. Still as per my study past few months on SR Software, I think this technology needs lot of improvements. I found a few Open Source SR tools which do fairly good task of recognition.

I don't have big bucks to pay for SR software from above mentioned vendors. So I do not have idea on how these work. Based on reviews from various sites I found they are kind of OKAY. Not great !!

This blog will basically talk on my experience and trouble I faced using Open Source SR tools (cos' only they were in my reach). I have done a study on Sphinx4 and Julius. Used HMM Toolkit (HTK) to develop acoustic models and did lot of weird stuffs playing with these tools.
There are lot of resources on net. But I came across few problems which were not anywhere or information was distorted. So I will try and compile as much info I can.

PS: If you have any questions or any good links related to SR, please post a comment here. Thanks

ciao,
/A