Thursday, May 7, 2009

Creating Your own Demo using Sphinx4

Sphinx4 provids good number of demos which I used in my program. I actually had to write an application which will record user speech on client side and send it as wav file to Server. On server side I had to recognize this wav file and return back the result with a confidence score attached as to how well the speech was recognized.
Sounds pretty simple?

I decided to use Java Applet like the one in voxforge. Display a list of sentences and ask user to record the voice. I was part successful in it. I developed an Applet that used Java Sound APIs for recording and playing it back. I ran into certain security issues as Applets are not supposed to save any file locally on client machine or access file system. After digging came over with this issue by signing my applet jar using Jarsigner. So my front end is ready. This applet sends the wav file to server.


Next, Server side planning. For demo I used Sockets to receive the input and send out results. Sphinx4 has a sample program that shows how to pass input audio file to sphinx for recognition. Thats it. My task over. I later on created a new program based on demo to recognize more words and used my own Language model for this task. This was my first application using Sphinx. I wished to let users download the application and test. But one problem with Sphinx4 is that its based on Java and the Acoustic Model and Dictionary make the program heavy for me to upload.

Commenting on accuracy, I was not very satisfied. There are various factors that determine accuracy of SRS, like pronunciation, microphone quality, surrounding noise, etc. I got good results when I used it to recognize Digits. But on providing random words for recognition, accuracy came down to less than 50%. I visited forums for solution, still no proper solution.
Still focus is on improving the results. Changing few parameters did increase the accuracy but it did not convince me to use it for production purpose. I had to leave this work stalled for now.

Edit: This is one of the initial samples I had developed. Download
Update 20/3/2012: The download link was broken. Thanks Jaishu for pointing it out.

55 comments:

  1. Hello, First of all, thank you by all those knowledge in this blog, I'm doing an Academic Project of research that use voice recognition (authentication and dictation).. so I liked your way to explain easyly about this harder subject...
    if you could send me an email with that code to me analyse it.. It'll help me a lot
    thank you!

    ReplyDelete
  2. Hello even i tried all the procedure of sphinx4 but still i m not able to recognize my application perfectly. can u please help me with the work you have done.

    my email id is:
    moizrajgadhwala@gmail.com
    gunjapatel@gmail.com

    waiting for your reply
    Thankyou Very Very much.

    ReplyDelete
  3. Hello friend!
    Thank you for writting such a helpfull blog. Can you please send your code to me? thanking you in advance
    dun_kiill_me@yahoo.com

    ReplyDelete
  4. Hello,
    I am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
    I'd like to try your codes to see if it offers better results. Could you please email me your codes to:taleofdragon@gmail.com?
    Thanks a lot!

    ReplyDelete
  5. Hey i am trying to develop speech recognication application.But i got runtime error.i m not able to use sphinx4.can u help me for that.
    my email id is:kundan.pijdurkar@gmail.com

    thanks in advance.

    ReplyDelete
  6. Hi. Can u elaborate on your error. We might help you out here.

    ReplyDelete
  7. I solved that error.
    Now I want to use my own dictionary.
    Can u tell me how can I creat my own dictionary.

    My email-id:kundan.pijdurkar@gmail.com

    ReplyDelete
  8. Hi Kundan,
    If my memory is correct, I have used LM and a toolkit that is in Sphinx site itself.

    You can also download some free speech data available in voxforge and create a dictionary using that.

    ReplyDelete
  9. Can u elaborate on this toolkit more..

    ReplyDelete
  10. check this link : http://cmusphinx.sourceforge.net/wordpress/download/

    Read more on cmuclmtk : This is LM Toolkit you will require.

    ReplyDelete
  11. Hello,
    I m working on speech recognition project from a few days. So far i have figured tht to run a progam you need a .java , .config.xml, .gram file. I have tried the digits demo program so far it works perfectly alright but i want to write a program tht recognizes words...for that I m not sure what all files i need to create and how much i need to edit the xml file...could u help wd dis?

    ReplyDelete
  12. I want to recognize only digits throgh conversation.
    and I am using tidigts.Now I want to it only recognizes digits .But it converts non-digit to digit.

    is there any possibility to remove non-digit words.

    ReplyDelete
  13. @Nisha,
    I believe this blog can definitely get you started a bit. :) If you have any specific doubts feel free to ask.
    You just need Sphinx4 + Sphinx Config file + LM + Grammar. CMU guys have done wonderful job in providing us with sample. Please do look into that.


    @Kundan,
    AFAIK, every utterance will be best possibly matched with the grammar words. There is a setting which lets you skip the unknown words too. It is a configuration setting.. Have you checked Sphinx4 Forum in Soruceforge.? http://sourceforge.net/projects/cmusphinx/forums/forum/382337

    ReplyDelete
  14. Hi...I tried the hellodigits program...now I want it to detect 16 continous digits from a wav file...so far i am able to read a wav file but the problem is whatever text also present into the wav file is also being converted into digits...so its becoming all mess...

    ReplyDelete
  15. @Neha.
    16.. hmm.. sounds like a credit card number to me ;)

    Well, if you see the above question from kundan, he too have had same problem. I found the solution on Sphinx forum. So you too need to do some digging around.

    You need to change the way you accept the grammar input.

    ReplyDelete
  16. Hi..thanks for replying..and its NISHA :)...I followed the link earlier but didn't find anything helpful..can u tell me what exactly needs to be changed...I mean the xml file or the grammar file or dictionary or java file??

    ReplyDelete
  17. @NISHA
    The changes shall be in Config file + grammer file.

    ReplyDelete
  18. Hi..thanks for replying...I tried changing the parameters



    but no difference in the output.

    ReplyDelete
  19. "absoluteBeamWidth"
    relativeBeamWidth"
    wordInsertionProbability"
    I tried editing these parameters

    ReplyDelete
  20. I have tried all result class functions..
    but it's giving thec same result...
    so can u tell me something specific....

    ReplyDelete
  21. i solved that problem...
    Now i m using NGram model....
    can u tell why do we need sentence for recognition.

    ReplyDelete
  22. Great !

    @NISHA :: Kundan has your solution. NGram Model will not bring in extra words which you dont want.

    @Kundan. Can you be more specific ? Sentence as in ? Example pls.

    ReplyDelete
  23. lm toolkit create three files..
    dict file,lm file and sentence file..
    so,why do we need sentence file...

    ReplyDelete
  24. Hi Kundan,
    A sentence file is nothing but your corpus file. It consists of the pattern in which your sentence will be recognized.

    The *.sent file consists of SILENCE \ pattern to make sure there is pause once the recognition starts and ends !.

    Every grammar you use must have a pattern with pause and silence. That will improve recognition.

    ReplyDelete
  25. Hi,
    Can you me the code for recording the userspeech at a client side
    my email id is sphinx4project@gmail.com
    Thanks in advance

    ReplyDelete
  26. @anonymous.
    Check entire post. I have provided the link to download the sample. !

    ReplyDelete
  27. Hi Amit..

    I am working on the same project .But not in java but with php.Can u explain to me how AER is returned from Sphinx?How recognition percentage is achieved??

    ReplyDelete
  28. Hey An0nymous..

    You can better write your ASR code in Java, pack it in a jar and call the jar from PHP.
    You can return the result the way you want.

    When I tried Sphinx, the result was moderate. Not gr8 as my accent is not native English.

    ReplyDelete
  29. Thank you Amit.But I have already used jar file in php page.It includes grammer representation of my project.In my php page I execute a shell commane(shell_exec($lname);)and result returned is only text representation of speech.But I really want to do not only recognition but also analysis of speech.

    ReplyDelete
  30. Can u publish the output of your application??

    ReplyDelete
  31. I see. So you want to get some percentage out of the result. I used Sphinx API itself to parse the output and analyze the recognition.

    No. I cannot publish. Its not free ! :)

    What I could publish is in the blog.

    ReplyDelete
  32. k...i wil pay 4 it..how much??

    ReplyDelete
  33. I don't deal with anonymous !
    Thanks for offer.

    ReplyDelete
  34. You are welcome!!
    Is it possible to get dis percentage out from the result???plzz tell me..or shall I move to julius or any other SR.How can i get WER from sphinx4??

    ReplyDelete
  35. Now we are speaking!
    Yep. I was able to approximate the percentage recognition using some logarithmic calc and stuff.

    ReplyDelete
  36. Thank you:)
    Is that logarithmic calc and stuff are included in ur sample demo??

    ReplyDelete
  37. Thank You.
    I tested Confidence.jar file and that was successful.But when i tried to run it by passing my own wav file,It returns the following error


    Loading Recognizer...

    Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Erro
    r while parsing line 1 of file:/D:/sphinx4-1.0beta3/bin/HelloWorld/streams/sph/R
    ecording_0.8951813718304038.wav: Content is not allowed in prolog.
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
    nager.java:61)
    at edu.cmu.sphinx.demo.confidence.Confidence.main(Confidence.java:43)
    Caused by: java.io.IOException: Error while parsing line 1 of file:/D:/sphinx4-1
    .0beta3/bin/HelloWorld/streams/sph/Recording_0.8951813718304038.wav: Content is
    not allowed in prolog.
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:77)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
    nager.java:59)
    ... 1 more

    how can I pass my own wave file ??help me 4 dis too...

    ReplyDelete
  38. Hi amit,

    Ive downloaded your intial samples and have also download sphinx-4. Following .jars were added after un-zipping sphinx4:

    -WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz
    -jsapi
    -sphinx4

    no error shows up after building but when i run the project it shows following error:

    class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    java.lang.ExceptionInInitializerError
    Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
    edu.cmu.sphinx.util.props.InternalConfigurationException
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
    at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
    at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
    at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
    Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
    Exception in thread "main" Java Result: 1

    ReplyDelete
  39. Hi,

    seems like you are missing
    com.sample.speech.socket.SpeechProcessingServer

    Make sure all the dependencies are added in classpath as well.

    ReplyDelete
  40. hi,

    amit its not working somehow. It's wrong at my side. I am using Eclipse IDE. Import your project and then what? Please guide.

    ReplyDelete
  41. Amit, any updates for my request.You can either email if possible on blurlogic@gmail.com.

    ReplyDelete
  42. Hi..,
    How can i modify the dictionary so that it would contain only those words that i would require in my application and so i can improve the efficiency!!!
    help me plzzzzzz!!

    ReplyDelete
  43. Hey
    Is it possible to make sphinx recognize all words from its vocabulary without including it in the grammer?

    ReplyDelete
  44. Hello sir,
    Can You mail me your project sir. I need it for a research project in sphinx.
    My mail id is
    sai25590@gmail.com

    ReplyDelete
  45. hello , I also tried to expand the sphinx4 dictionary bt I m also getting the error as :

    class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    java.lang.ExceptionInInitializerError
    Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
    edu.cmu.sphinx.util.props.InternalConfigurationException
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
    at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
    at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
    at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
    Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
    Exception in thread "main" Java Result: 1

    can anybody please help me....

    ReplyDelete
  46. Hi AMIT.S

    Thank you for sharing Sphinx.
    I got error when i try to download your download link
    http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip

    Thank you.

    ReplyDelete
    Replies
    1. Hey dude,

      That location doesn't exist anymore. I will try to find the file and update the link. thanks for pointing out!

      Delete
    2. Thank you Amit.
      Can you try sphinx on Android?
      If yes, Please guide me.

      Thanks for sharing.

      Delete
    3. Hi Amit,
      Can you send SpeechRecognizerServer.zip to my mail
      jaisankar.arumugam@gmail.com

      Thanks in advance ;)

      Delete
  47. Hi Amit,

    I am working android + sphinx also.
    Your download link http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip is not available.
    Can you send your source code to my mail id
    jaisankar.arumugam@gmail.com

    thank you in advance

    ReplyDelete
  48. Hi Amit,
    Did you find SpeechRecognizerServer.zip?
    Please you send if you can possible soon?

    Thanks

    ReplyDelete
  49. Hello sir,
    Can You mail me your project sir. I need it for a research project in sphinx.
    My mail id is
    sun.futbol@ymail.com

    ReplyDelete
  50. Hello,
    I am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
    I'd like to try your codes to see if it offers better results. Could you please email me your codes to: soccer.ravi@gmail.com
    Thanks a lot!

    ReplyDelete