Wednesday, April 15, 2009

Now where to start?

Initially I had no idea what to look into and where to look into to find a good speech recognition software. I kind of 'specialize' in Java. So narrowed down my search by searching any SR based on Java.

I came across Java Speech API. The Java Speech API was developed by Sun Microsystems, Inc. in collaboration with leading speech technology companies: Apple Computer, Inc., AT&T, Dragon Systems, Inc., IBM Corporation, Novell, Inc., Philips Speech Processing, and Texas Instruments Incorporated. Now the moment I saw this, I felt It's SUPER COOL. Sun has done some brilliant work in every field. BUT then after reading few more stuffs I realized that these are just APIs.

The Java Speech API defines a standard, easy-to-use, cross-platform software interface to state-of-the-art speech technology. JS API defines two technologies: Speech Synthesis and Speech Recognition.

Speech Synthesis is basically Text To Speech.
Speech Recognition is Speech to Text.

These APIs lead me to SPHINX, IBM's Via Voice, Microsoft Speech API, Julius, and others.
I couldn't get IBM Via Voice or MS Speech API for my use. Hence I started off with Sphinx.

Sphinx was the first-of-its-kind continuous speech recognizer. It has only recognition part of JS API. Sphinx was developed at Carnegie Mellon University by Kai-Fu Lee. Sphinx is currently in 4th version. Sphinx 4 is a complete re-write of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition. It is written entirely in the Java programming language leveraging the Java Speech API Standard.

Sphinx4 currently is one of the best speech recognizer used. It is also available for devices called 'pocket-sphinx'. If you need a Speech Recognizer then I suggest you to start with this.
Sphinx has very good Documentation and an active Forum too. I really enjoyed exploring it. In following article I will write more detailed scripts for using Sphinx.

One thing that struck me was that Sphinx was developed by an Non-American (Kai-Fu Lee). Kai-Fu Lee is said to be man behind Microsoft Speech *WRECKecognition* Initiative. This video shows all. :)

1 comment:

  1. Your article contains very much information about the text to speech API. Your article is very informative and I liked your way to share this information here. Thank you. text to speech audio download