Wednesday, April 15, 2009

Now where to start?

Initially I had no idea what to look into and where to look into to find a good speech recognition software. I kind of 'specialize' in Java. So narrowed down my search by searching any SR based on Java.

I came across Java Speech API. The Java Speech API was developed by Sun Microsystems, Inc. in collaboration with leading speech technology companies: Apple Computer, Inc., AT&T, Dragon Systems, Inc., IBM Corporation, Novell, Inc., Philips Speech Processing, and Texas Instruments Incorporated. Now the moment I saw this, I felt It's SUPER COOL. Sun has done some brilliant work in every field. BUT then after reading few more stuffs I realized that these are just APIs.

The Java Speech API defines a standard, easy-to-use, cross-platform software interface to state-of-the-art speech technology. JS API defines two technologies: Speech Synthesis and Speech Recognition.

Speech Synthesis is basically Text To Speech.
Speech Recognition is Speech to Text.


These APIs lead me to SPHINX, IBM's Via Voice, Microsoft Speech API, Julius, and others.
I couldn't get IBM Via Voice or MS Speech API for my use. Hence I started off with Sphinx.

Sphinx was the first-of-its-kind continuous speech recognizer. It has only recognition part of JS API. Sphinx was developed at Carnegie Mellon University by Kai-Fu Lee. Sphinx is currently in 4th version. Sphinx 4 is a complete re-write of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition. It is written entirely in the Java programming language leveraging the Java Speech API Standard.

Sphinx4 currently is one of the best speech recognizer used. It is also available for devices called 'pocket-sphinx'. If you need a Speech Recognizer then I suggest you to start with this.
Sphinx has very good Documentation and an active Forum too. I really enjoyed exploring it. In following article I will write more detailed scripts for using Sphinx.

One thing that struck me was that Sphinx was developed by an Non-American (Kai-Fu Lee). Kai-Fu Lee is said to be man behind Microsoft Speech *WRECKecognition* Initiative. This video shows all. :)

3 comments:

  1. Your article contains very much information about the text to speech API. Your article is very informative and I liked your way to share this information here. Thank you. text to speech audio download

    ReplyDelete
  2. Blackjack Classic Casino Bonus Code - JtmHub
    Blackjack Classic Casino Bonus Code - JTMHub. The code "BINJARGBHAND", available 충청북도 출장샵 at 계룡 출장마사지 JTG Marriott 울산광역 출장마사지 Las 대전광역 출장안마 Vegas, offers a risk-free 김천 출장마사지

    ReplyDelete
  3. CNC machining works properly in creating parts that require these operations, want excessive ranges of precision and accuracy and have moderate sizes. Depending on CNC machining what {you want to|you should|you have to} have created, extra-large or cumbersome parts may not work properly with CNC machining. Since some facilities have the equipment essential to manufacture massive parts with CNC machines, all the time ask about any measurement limitations when requesting a quote or information about your order. One of the advantages of hand manufacturing is the machinist’s capacity to govern a large part. For some CNC machining initiatives, the measurements of the part may have} machined may have limitations.

    ReplyDelete