Sphinx4 provids good number of demos which I used in my program. I actually had to write an application which will record user speech on client side and send it as wav file to Server. On server side I had to recognize this wav file and return back the result with a confidence score attached as to how well the speech was recognized.
Sounds pretty simple?
I decided to use Java Applet like the one in voxforge. Display a list of sentences and ask user to record the voice. I was part successful in it. I developed an Applet that used Java Sound APIs for recording and playing it back. I ran into certain security issues as Applets are not supposed to save any file locally on client machine or access file system. After digging came over with this issue by signing my applet jar using Jarsigner. So my front end is ready. This applet sends the wav file to server.
Next, Server side planning. For demo I used Sockets to receive the input and send out results. Sphinx4 has a sample program that shows how to pass input audio file to sphinx for recognition. Thats it. My task over. I later on created a new program based on demo to recognize more words and used my own Language model for this task. This was my first application using Sphinx. I wished to let users download the application and test. But one problem with Sphinx4 is that its based on Java and the Acoustic Model and Dictionary make the program heavy for me to upload.
Commenting on accuracy, I was not very satisfied. There are various factors that determine accuracy of SRS, like pronunciation, microphone quality, surrounding noise, etc. I got good results when I used it to recognize Digits. But on providing random words for recognition, accuracy came down to less than 50%. I visited forums for solution, still no proper solution.
Still focus is on improving the results. Changing few parameters did increase the accuracy but it did not convince me to use it for production purpose. I had to leave this work stalled for now.
Edit: This is one of the initial samples I had developed. Download
Update 20/3/2012: The download link was broken. Thanks Jaishu for pointing it out.
Subscribe to:
Post Comments (Atom)
Hello, First of all, thank you by all those knowledge in this blog, I'm doing an Academic Project of research that use voice recognition (authentication and dictation).. so I liked your way to explain easyly about this harder subject...
ReplyDeleteif you could send me an email with that code to me analyse it.. It'll help me a lot
thank you!
Hello even i tried all the procedure of sphinx4 but still i m not able to recognize my application perfectly. can u please help me with the work you have done.
ReplyDeletemy email id is:
moizrajgadhwala@gmail.com
gunjapatel@gmail.com
waiting for your reply
Thankyou Very Very much.
Hello friend!
ReplyDeleteThank you for writting such a helpfull blog. Can you please send your code to me? thanking you in advance
dun_kiill_me@yahoo.com
Hello,
ReplyDeleteI am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
I'd like to try your codes to see if it offers better results. Could you please email me your codes to:taleofdragon@gmail.com?
Thanks a lot!
Hey i am trying to develop speech recognication application.But i got runtime error.i m not able to use sphinx4.can u help me for that.
ReplyDeletemy email id is:kundan.pijdurkar@gmail.com
thanks in advance.
Hi. Can u elaborate on your error. We might help you out here.
ReplyDeleteI solved that error.
ReplyDeleteNow I want to use my own dictionary.
Can u tell me how can I creat my own dictionary.
My email-id:kundan.pijdurkar@gmail.com
Hi Kundan,
ReplyDeleteIf my memory is correct, I have used LM and a toolkit that is in Sphinx site itself.
You can also download some free speech data available in voxforge and create a dictionary using that.
thanks Amit...
ReplyDeleteCan u elaborate on this toolkit more..
ReplyDeletecheck this link : http://cmusphinx.sourceforge.net/wordpress/download/
ReplyDeleteRead more on cmuclmtk : This is LM Toolkit you will require.
Hello,
ReplyDeleteI m working on speech recognition project from a few days. So far i have figured tht to run a progam you need a .java , .config.xml, .gram file. I have tried the digits demo program so far it works perfectly alright but i want to write a program tht recognizes words...for that I m not sure what all files i need to create and how much i need to edit the xml file...could u help wd dis?
I want to recognize only digits throgh conversation.
ReplyDeleteand I am using tidigts.Now I want to it only recognizes digits .But it converts non-digit to digit.
is there any possibility to remove non-digit words.
@Nisha,
ReplyDeleteI believe this blog can definitely get you started a bit. :) If you have any specific doubts feel free to ask.
You just need Sphinx4 + Sphinx Config file + LM + Grammar. CMU guys have done wonderful job in providing us with sample. Please do look into that.
@Kundan,
AFAIK, every utterance will be best possibly matched with the grammar words. There is a setting which lets you skip the unknown words too. It is a configuration setting.. Have you checked Sphinx4 Forum in Soruceforge.? http://sourceforge.net/projects/cmusphinx/forums/forum/382337
Hi...I tried the hellodigits program...now I want it to detect 16 continous digits from a wav file...so far i am able to read a wav file but the problem is whatever text also present into the wav file is also being converted into digits...so its becoming all mess...
ReplyDelete@Neha.
ReplyDelete16.. hmm.. sounds like a credit card number to me ;)
Well, if you see the above question from kundan, he too have had same problem. I found the solution on Sphinx forum. So you too need to do some digging around.
You need to change the way you accept the grammar input.
Hi..thanks for replying..and its NISHA :)...I followed the link earlier but didn't find anything helpful..can u tell me what exactly needs to be changed...I mean the xml file or the grammar file or dictionary or java file??
ReplyDelete@NISHA
ReplyDeleteThe changes shall be in Config file + grammer file.
Hi..thanks for replying...I tried changing the parameters
ReplyDeletebut no difference in the output.
"absoluteBeamWidth"
ReplyDeleterelativeBeamWidth"
wordInsertionProbability"
I tried editing these parameters
I have tried all result class functions..
ReplyDeletebut it's giving thec same result...
so can u tell me something specific....
i solved that problem...
ReplyDeleteNow i m using NGram model....
can u tell why do we need sentence for recognition.
Great !
ReplyDelete@NISHA :: Kundan has your solution. NGram Model will not bring in extra words which you dont want.
@Kundan. Can you be more specific ? Sentence as in ? Example pls.
lm toolkit create three files..
ReplyDeletedict file,lm file and sentence file..
so,why do we need sentence file...
Hi Kundan,
ReplyDeleteA sentence file is nothing but your corpus file. It consists of the pattern in which your sentence will be recognized.
The *.sent file consists of SILENCE \ pattern to make sure there is pause once the recognition starts and ends !.
Every grammar you use must have a pattern with pause and silence. That will improve recognition.
Hi,
ReplyDeleteCan you me the code for recording the userspeech at a client side
my email id is sphinx4project@gmail.com
Thanks in advance
@anonymous.
ReplyDeleteCheck entire post. I have provided the link to download the sample. !
Hi Amit..
ReplyDeleteI am working on the same project .But not in java but with php.Can u explain to me how AER is returned from Sphinx?How recognition percentage is achieved??
Hey An0nymous..
ReplyDeleteYou can better write your ASR code in Java, pack it in a jar and call the jar from PHP.
You can return the result the way you want.
When I tried Sphinx, the result was moderate. Not gr8 as my accent is not native English.
Thank you Amit.But I have already used jar file in php page.It includes grammer representation of my project.In my php page I execute a shell commane(shell_exec($lname);)and result returned is only text representation of speech.But I really want to do not only recognition but also analysis of speech.
ReplyDeleteCan u publish the output of your application??
ReplyDeleteI see. So you want to get some percentage out of the result. I used Sphinx API itself to parse the output and analyze the recognition.
ReplyDeleteNo. I cannot publish. Its not free ! :)
What I could publish is in the blog.
k...i wil pay 4 it..how much??
ReplyDeleteI don't deal with anonymous !
ReplyDeleteThanks for offer.
You are welcome!!
ReplyDeleteIs it possible to get dis percentage out from the result???plzz tell me..or shall I move to julius or any other SR.How can i get WER from sphinx4??
Now we are speaking!
ReplyDeleteYep. I was able to approximate the percentage recognition using some logarithmic calc and stuff.
Thank you:)
ReplyDeleteIs that logarithmic calc and stuff are included in ur sample demo??
Yes !
ReplyDeleteThank You.
ReplyDeleteI tested Confidence.jar file and that was successful.But when i tried to run it by passing my own wav file,It returns the following error
Loading Recognizer...
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Erro
r while parsing line 1 of file:/D:/sphinx4-1.0beta3/bin/HelloWorld/streams/sph/R
ecording_0.8951813718304038.wav: Content is not allowed in prolog.
at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
nager.java:61)
at edu.cmu.sphinx.demo.confidence.Confidence.main(Confidence.java:43)
Caused by: java.io.IOException: Error while parsing line 1 of file:/D:/sphinx4-1
.0beta3/bin/HelloWorld/streams/sph/Recording_0.8951813718304038.wav: Content is
not allowed in prolog.
at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:77)
at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
nager.java:59)
... 1 more
how can I pass my own wave file ??help me 4 dis too...
Hi amit,
ReplyDeleteIve downloaded your intial samples and have also download sphinx-4. Following .jars were added after un-zipping sphinx4:
-WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz
-jsapi
-sphinx4
no error shows up after building but when i run the project it shows following error:
class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
java.lang.ExceptionInInitializerError
Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
edu.cmu.sphinx.util.props.InternalConfigurationException
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
Exception in thread "main" Java Result: 1
Hi,
ReplyDeleteseems like you are missing
com.sample.speech.socket.SpeechProcessingServer
Make sure all the dependencies are added in classpath as well.
hi,
ReplyDeleteamit its not working somehow. It's wrong at my side. I am using Eclipse IDE. Import your project and then what? Please guide.
Amit, any updates for my request.You can either email if possible on blurlogic@gmail.com.
ReplyDeleteHi..,
ReplyDeleteHow can i modify the dictionary so that it would contain only those words that i would require in my application and so i can improve the efficiency!!!
help me plzzzzzz!!
Hey
ReplyDeleteIs it possible to make sphinx recognize all words from its vocabulary without including it in the grammer?
Hello sir,
ReplyDeleteCan You mail me your project sir. I need it for a research project in sphinx.
My mail id is
sai25590@gmail.com
hello , I also tried to expand the sphinx4 dictionary bt I m also getting the error as :
ReplyDeleteclass not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
java.lang.ExceptionInInitializerError
Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
edu.cmu.sphinx.util.props.InternalConfigurationException
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
Exception in thread "main" Java Result: 1
can anybody please help me....
Hi AMIT.S
ReplyDeleteThank you for sharing Sphinx.
I got error when i try to download your download link
http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip
Thank you.
Hey dude,
DeleteThat location doesn't exist anymore. I will try to find the file and update the link. thanks for pointing out!
Thank you Amit.
DeleteCan you try sphinx on Android?
If yes, Please guide me.
Thanks for sharing.
Hi Amit,
DeleteCan you send SpeechRecognizerServer.zip to my mail
jaisankar.arumugam@gmail.com
Thanks in advance ;)
Hi Amit,
ReplyDeleteI am working android + sphinx also.
Your download link http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip is not available.
Can you send your source code to my mail id
jaisankar.arumugam@gmail.com
thank you in advance
Hi Amit,
ReplyDeleteDid you find SpeechRecognizerServer.zip?
Please you send if you can possible soon?
Thanks
Hello sir,
ReplyDeleteCan You mail me your project sir. I need it for a research project in sphinx.
My mail id is
sun.futbol@ymail.com
Hello,
ReplyDeleteI am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
I'd like to try your codes to see if it offers better results. Could you please email me your codes to: soccer.ravi@gmail.com
Thanks a lot!
Merkur Futur Futur Adjustable Safety Razor
ReplyDeleteMerkur Futur Adjustable Safety Razor - The Futur 샌즈카지노 Adjustable Safety Razor. Merkur Futur Adjustable Safety Razor. $50.00. 1 in stock Rating: 4.1 · 제왕 카지노 1,734 reviews 메리트카지노
Vintage Tinties - Tinties in Gold - The Tinado - Tinties in Gold
ReplyDeleteA titanium athletics perfect tintie men\'s titanium wedding bands in gold, the Tinties of Gold titanium coating is an babylisspro nano titanium hair dryer Australian Wedding Band revlon titanium max edition that will love any occasion of their holiday.
Casino games casino games online, free slots, and
ReplyDeleteMobile Apps how to order air jordan 18 retro men · Casino Bonuses · air jordan 18 retro men blue from us Free Spins · air jordan 18 retro yellow discount Online Gaming 골드머니 · air jordan 18 retro online site Casino Wagering