Speech Recognition Woes: Creating Your own Demo using Sphinx4

Thursday, May 7, 2009

Creating Your own Demo using Sphinx4

Sphinx4 provids good number of demos which I used in my program. I actually had to write an application which will record user speech on client side and send it as wav file to Server. On server side I had to recognize this wav file and return back the result with a confidence score attached as to how well the speech was recognized.
Sounds pretty simple?

I decided to use Java Applet like the one in voxforge. Display a list of sentences and ask user to record the voice. I was part successful in it. I developed an Applet that used Java Sound APIs for recording and playing it back. I ran into certain security issues as Applets are not supposed to save any file locally on client machine or access file system. After digging came over with this issue by signing my applet jar using Jarsigner. So my front end is ready. This applet sends the wav file to server.

Next, Server side planning. For demo I used Sockets to receive the input and send out results. Sphinx4 has a sample program that shows how to pass input audio file to sphinx for recognition. Thats it. My task over. I later on created a new program based on demo to recognize more words and used my own Language model for this task. This was my first application using Sphinx. I wished to let users download the application and test. But one problem with Sphinx4 is that its based on Java and the Acoustic Model and Dictionary make the program heavy for me to upload.

Commenting on accuracy, I was not very satisfied. There are various factors that determine accuracy of SRS, like pronunciation, microphone quality, surrounding noise, etc. I got good results when I used it to recognize Digits. But on providing random words for recognition, accuracy came down to less than 50%. I visited forums for solution, still no proper solution.
Still focus is on improving the results. Changing few parameters did increase the accuracy but it did not convince me to use it for production purpose. I had to leave this work stalled for now.

Edit: This is one of the initial samples I had developed. Download
Update 20/3/2012: The download link was broken. Thanks Jaishu for pointing it out.

57 comments:

DarkSHare® 2008©July 27, 2009 at 12:43 AM
Hello, First of all, thank you by all those knowledge in this blog, I'm doing an Academic Project of research that use voice recognition (authentication and dictation).. so I liked your way to explain easyly about this harder subject...
if you could send me an email with that code to me analyse it.. It'll help me a lot
thank you!
ReplyDelete
Replies
HåkanSeptember 10, 2009 at 5:41 AM
Hello friend!
Thank you for writting such a helpfull blog. Can you please send your code to me? thanking you in advance
dun_kiill_me@yahoo.com
ReplyDelete
Replies
AnonymousNovember 16, 2009 at 9:39 AM
Hello,
I am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
I'd like to try your codes to see if it offers better results. Could you please email me your codes to:taleofdragon@gmail.com?
Thanks a lot!
ReplyDelete
Replies
AnonymousDecember 10, 2009 at 1:10 AM
Hey i am trying to develop speech recognication application.But i got runtime error.i m not able to use sphinx4.can u help me for that.
my email id is:kundan.pijdurkar@gmail.com

thanks in advance.
ReplyDelete
Replies
Amit SDecember 10, 2009 at 2:21 AM
Hi. Can u elaborate on your error. We might help you out here.
ReplyDelete
Replies
KundanDecember 22, 2009 at 4:53 AM
I solved that error.
Now I want to use my own dictionary.
Can u tell me how can I creat my own dictionary.

My email-id:kundan.pijdurkar@gmail.com
ReplyDelete
Replies
Amit SDecember 22, 2009 at 7:54 AM
Hi Kundan,
If my memory is correct, I have used LM and a toolkit that is in Sphinx site itself.

You can also download some free speech data available in voxforge and create a dictionary using that.
ReplyDelete
Replies
KundanDecember 22, 2009 at 10:14 PM
thanks Amit...
ReplyDelete
Replies
KundanDecember 22, 2009 at 11:07 PM
Can u elaborate on this toolkit more..
ReplyDelete
Replies
Amit SDecember 22, 2009 at 11:29 PM
check this link : http://cmusphinx.sourceforge.net/wordpress/download/

Read more on cmuclmtk : This is LM Toolkit you will require.
ReplyDelete
Replies
NishaDecember 22, 2009 at 11:49 PM
Hello,
I m working on speech recognition project from a few days. So far i have figured tht to run a progam you need a .java , .config.xml, .gram file. I have tried the digits demo program so far it works perfectly alright but i want to write a program tht recognizes words...for that I m not sure what all files i need to create and how much i need to edit the xml file...could u help wd dis?
ReplyDelete
Replies
KundanDecember 23, 2009 at 2:06 AM
I want to recognize only digits throgh conversation.
and I am using tidigts.Now I want to it only recognizes digits .But it converts non-digit to digit.

is there any possibility to remove non-digit words.
ReplyDelete
Replies
Amit SDecember 23, 2009 at 5:22 AM
@Nisha,
I believe this blog can definitely get you started a bit. :) If you have any specific doubts feel free to ask.
You just need Sphinx4 + Sphinx Config file + LM + Grammar. CMU guys have done wonderful job in providing us with sample. Please do look into that.

@Kundan,
AFAIK, every utterance will be best possibly matched with the grammar words. There is a setting which lets you skip the unknown words too. It is a configuration setting.. Have you checked Sphinx4 Forum in Soruceforge.? http://sourceforge.net/projects/cmusphinx/forums/forum/382337
ReplyDelete
Replies
NishaDecember 23, 2009 at 8:36 PM
Hi...I tried the hellodigits program...now I want it to detect 16 continous digits from a wav file...so far i am able to read a wav file but the problem is whatever text also present into the wav file is also being converted into digits...so its becoming all mess...
ReplyDelete
Replies
Amit SDecember 23, 2009 at 9:37 PM
@Neha.
16.. hmm.. sounds like a credit card number to me ;)

Well, if you see the above question from kundan, he too have had same problem. I found the solution on Sphinx forum. So you too need to do some digging around.

You need to change the way you accept the grammar input.
ReplyDelete
Replies
NishaDecember 23, 2009 at 9:54 PM
Hi..thanks for replying..and its NISHA :)...I followed the link earlier but didn't find anything helpful..can u tell me what exactly needs to be changed...I mean the xml file or the grammar file or dictionary or java file??
ReplyDelete
Replies
Amit SDecember 23, 2009 at 10:31 PM
@NISHA
The changes shall be in Config file + grammer file.
ReplyDelete
Replies
NishaDecember 23, 2009 at 10:49 PM
Hi..thanks for replying...I tried changing the parameters

but no difference in the output.
ReplyDelete
Replies
NishaDecember 23, 2009 at 10:51 PM
"absoluteBeamWidth"
relativeBeamWidth"
wordInsertionProbability"
I tried editing these parameters
ReplyDelete
Replies
KundanDecember 24, 2009 at 2:35 AM
I have tried all result class functions..
but it's giving thec same result...
so can u tell me something specific....
ReplyDelete
Replies
KundanDecember 28, 2009 at 3:16 AM
i solved that problem...
Now i m using NGram model....
can u tell why do we need sentence for recognition.
ReplyDelete
Replies
Amit SDecember 28, 2009 at 3:20 AM
Great !

@NISHA :: Kundan has your solution. NGram Model will not bring in extra words which you dont want.

@Kundan. Can you be more specific ? Sentence as in ? Example pls.
ReplyDelete
Replies
KundanDecember 28, 2009 at 7:53 PM
lm toolkit create three files..
dict file,lm file and sentence file..
so,why do we need sentence file...
ReplyDelete
Replies
Amit SDecember 28, 2009 at 8:14 PM
Hi Kundan,
A sentence file is nothing but your corpus file. It consists of the pattern in which your sentence will be recognized.

The *.sent file consists of SILENCE \ pattern to make sure there is pause once the recognition starts and ends !.

Every grammar you use must have a pattern with pause and silence. That will improve recognition.
ReplyDelete
Replies
AnonymousMarch 29, 2010 at 1:05 PM
Hi,
Can you me the code for recording the userspeech at a client side
my email id is sphinx4project@gmail.com
Thanks in advance
ReplyDelete
Replies
Amit SApril 4, 2010 at 3:20 AM
@anonymous.
Check entire post. I have provided the link to download the sample. !
ReplyDelete
Replies
AnonymousJune 23, 2010 at 2:27 AM
Hi Amit..

I am working on the same project .But not in java but with php.Can u explain to me how AER is returned from Sphinx?How recognition percentage is achieved??
ReplyDelete
Replies
Amit SJune 23, 2010 at 3:00 AM
Hey An0nymous..

You can better write your ASR code in Java, pack it in a jar and call the jar from PHP.
You can return the result the way you want.

When I tried Sphinx, the result was moderate. Not gr8 as my accent is not native English.
ReplyDelete
Replies
AnonymousJune 23, 2010 at 3:35 AM
Thank you Amit.But I have already used jar file in php page.It includes grammer representation of my project.In my php page I execute a shell commane(shell_exec($lname);)and result returned is only text representation of speech.But I really want to do not only recognition but also analysis of speech.
ReplyDelete
Replies
AnonymousJune 23, 2010 at 3:36 AM
Can u publish the output of your application??
ReplyDelete
Replies
Amit SJune 23, 2010 at 3:47 AM
I see. So you want to get some percentage out of the result. I used Sphinx API itself to parse the output and analyze the recognition.

No. I cannot publish. Its not free ! :)

What I could publish is in the blog.
ReplyDelete
Replies
AnonymousJune 23, 2010 at 3:52 AM
k...i wil pay 4 it..how much??
ReplyDelete
Replies
Amit SJune 23, 2010 at 4:00 AM
I don't deal with anonymous !
Thanks for offer.
ReplyDelete
Replies
sruthiJune 23, 2010 at 4:05 AM
You are welcome!!
Is it possible to get dis percentage out from the result???plzz tell me..or shall I move to julius or any other SR.How can i get WER from sphinx4??
ReplyDelete
Replies
Amit SJune 23, 2010 at 4:27 AM
Now we are speaking!
Yep. I was able to approximate the percentage recognition using some logarithmic calc and stuff.
ReplyDelete
Replies
sruthiJune 23, 2010 at 4:31 AM
Thank you:)
Is that logarithmic calc and stuff are included in ur sample demo??
ReplyDelete
Replies
Amit SJune 23, 2010 at 6:57 AM
Yes !
ReplyDelete
Replies
sruthiJune 24, 2010 at 3:49 AM
Thank You.
I tested Confidence.jar file and that was successful.But when i tried to run it by passing my own wav file,It returns the following error

Loading Recognizer...

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Erro
r while parsing line 1 of file:/D:/sphinx4-1.0beta3/bin/HelloWorld/streams/sph/R
ecording_0.8951813718304038.wav: Content is not allowed in prolog.
at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
nager.java:61)
at edu.cmu.sphinx.demo.confidence.Confidence.main(Confidence.java:43)
Caused by: java.io.IOException: Error while parsing line 1 of file:/D:/sphinx4-1
.0beta3/bin/HelloWorld/streams/sph/Recording_0.8951813718304038.wav: Content is
not allowed in prolog.
at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:77)
at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationMa
nager.java:59)
... 1 more

how can I pass my own wave file ??help me 4 dis too...
ReplyDelete
Replies
ciscosoccerAugust 5, 2010 at 5:46 AM
Hi amit,

Ive downloaded your intial samples and have also download sphinx-4. Following .jars were added after un-zipping sphinx4:

-WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz
-jsapi
-sphinx4

no error shows up after building but when i run the project it shows following error:

class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
java.lang.ExceptionInInitializerError
Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
edu.cmu.sphinx.util.props.InternalConfigurationException
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
Exception in thread "main" Java Result: 1
ReplyDelete
Replies
Amit SAugust 5, 2010 at 6:38 AM
Hi,

seems like you are missing
com.sample.speech.socket.SpeechProcessingServer

Make sure all the dependencies are added in classpath as well.
ReplyDelete
Replies
ciscosoccerAugust 9, 2010 at 4:11 AM
hi,

amit its not working somehow. It's wrong at my side. I am using Eclipse IDE. Import your project and then what? Please guide.
ReplyDelete
Replies
ciscosoccerAugust 13, 2010 at 11:52 PM
Amit, any updates for my request.You can either email if possible on blurlogic@gmail.com.
ReplyDelete
Replies
archanaNovember 8, 2010 at 3:47 AM
Hi..,
How can i modify the dictionary so that it would contain only those words that i would require in my application and so i can improve the efficiency!!!
help me plzzzzzz!!
ReplyDelete
Replies
SalmaFebruary 8, 2011 at 10:32 AM
Hey
Is it possible to make sphinx recognize all words from its vocabulary without including it in the grammer?
ReplyDelete
Replies
saiFebruary 20, 2011 at 8:22 AM
Hello sir,
Can You mail me your project sir. I need it for a research project in sphinx.
My mail id is
sai25590@gmail.com
ReplyDelete
Replies
AnonymousMarch 28, 2011 at 3:12 AM
hello , I also tried to expand the sphinx4 dictionary bt I m also getting the error as :

class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
java.lang.ExceptionInInitializerError
Caused by: Property Exception component:'flatLinguist' property:'acousticModel' - mandatory property is not set!
edu.cmu.sphinx.util.props.InternalConfigurationException
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:291)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:222)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:279)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:460)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
at com.sample.speech.recognizer.WavFileRecognizer.intialize(WavFileRecognizer.java:37)
at com.sample.speech.recognizer.WavFileRecognizer.(WavFileRecognizer.java:22)
at com.sample.speech.socket.SpeechProcessingServer.(SpeechProcessingServer.java:15)
Could not find the main class: com.sample.speech.socket.SpeechProcessingServer. Program will exit.
Exception in thread "main" Java Result: 1

can anybody please help me....
ReplyDelete
Replies
JaishuMarch 15, 2012 at 3:58 AM
Hi AMIT.S

Thank you for sharing Sphinx.
I got error when i try to download your download link
http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip

Thank you.
ReplyDelete
Replies
JaishuMarch 15, 2012 at 4:23 AM
Hi Amit,

I am working android + sphinx also.
Your download link http://files.suranaamit.com/uploads/SpeechRecognizerServer.zip is not available.
Can you send your source code to my mail id
jaisankar.arumugam@gmail.com

thank you in advance
ReplyDelete
Replies
JaishuMarch 19, 2012 at 5:41 AM
Hi Amit,
Did you find SpeechRecognizerServer.zip?
Please you send if you can possible soon?

Thanks
ReplyDelete
Replies
AnonymousFebruary 28, 2013 at 3:54 AM
Hello sir,
Can You mail me your project sir. I need it for a research project in sphinx.
My mail id is
sun.futbol@ymail.com
ReplyDelete
Replies
RaviFebruary 28, 2013 at 3:55 AM
Hello,
I am currently evaluting Sphinx4 and other SR sw and I find your reports very useful. My initial test of Sphinx4 is also very disappointing in terms of recognition precision. I thought I did not use it correctly.
I'd like to try your codes to see if it offers better results. Could you please email me your codes to: soccer.ravi@gmail.com
Thanks a lot!
ReplyDelete
Replies
AnonymousJanuary 27, 2022 at 2:21 AM
Merkur Futur Futur Adjustable Safety Razor
Merkur Futur Adjustable Safety Razor - The Futur 샌즈카지노 Adjustable Safety Razor. Merkur Futur Adjustable Safety Razor. $50.00. 1 in stock Rating: 4.1 · 제왕 카지노 ‎1,734 reviews 메리트카지노
ReplyDelete
Replies
calandreracioppiMarch 5, 2022 at 3:18 AM
Vintage Tinties - Tinties in Gold - The Tinado - Tinties in Gold
A titanium athletics perfect tintie men\'s titanium wedding bands in gold, the Tinties of Gold titanium coating is an babylisspro nano titanium hair dryer Australian Wedding Band revlon titanium max edition that will love any occasion of their holiday.
ReplyDelete
Replies
AnonymousMarch 18, 2022 at 8:13 PM
Casino games casino games online, free slots, and
‎Mobile Apps how to order air jordan 18 retro men · ‎Casino Bonuses · air jordan 18 retro men blue from us ‎Free Spins · air jordan 18 retro yellow discount ‎Online Gaming 골드머니 · air jordan 18 retro online site ‎Casino Wagering
ReplyDelete
Replies

Add comment

Speech Recognition Woes

Thursday, May 7, 2009

Creating Your own Demo using Sphinx4

57 comments:

No. of People Having trouble in SR

Blog Archive

Followers

About Me

Visitors are From?