- The FrontEnd
- The Decoder
- The Linguist
Sphinx4 is very modular in nature. Every small block here can be separately configured. All these blocks can be separately configured in a Configuaration File. In this file we need to specify the front end which sphinx4 will use, Acoustic models and dictionary used to create a search graph which is used during recognition, language model(grammar) makes recognizer look for 'most likely' words occuring during recognition. Sphinx-4 Decoder block use output from the FrontEnd in conjunction with the SearchGraph (output) from the Linguist to generate recognition Result.
Let us now walk through a sample Configuration file: config.xml (download here)
Every config file has been logically separated into different sections. You can find syntax and rules for creating a configuration file at Sphinx Configuration management site.
- Frequently Used Properties consists of properties that are used by other sections.
- In Language Model we specify the grammar to use which will be used by Sphinx to match the speech. Pluggable language model support for ASCII and binary versions of unigram, bigram, trigram, Java Speech API Grammar Format (JSGF), and ARPA-format FST grammars.
- Dictionary can be either Wall Street Journal (WSJ) or TIDIGTS or your own dictionary in standard ARPA format. You can find WSJ and TIDIGITS dictionary in Sphinx4 binaries itself. Dictionary consists of the words and their pronunciation phenome.
- Next define Acoustic Model depending upon the type of Dictionary you use. Again Sphinx4 has included acoustic models for WSJ and TIDIGITS.
- In Front End we can specify if the input is from Microphone or any Data Source (wav, au, etc format).