Chatterbox - STT / TTS / TTA project. Part 2
Chatterbox - STT / TTS / TTA project. Part 2
Part 2 of my "chatterbox" project is aimed at getting a Puppy to monitor the microphone and listen to my response and create a text file which accurately reflects what I have spoken.
Part 1 and chatterbox project description here:
http://www.murga-linux.com/puppy/viewtopic.php?t=89258
Part 3 (Making puppy act on decoded commands) here:
http://murga-linux.com/puppy/viewtopic.php?t=89260
Progress so far is based on the 'pocketsphinx_continuous" .pet offered by technosaurus here:
http://www.murga-linux.com/puppy/viewto ... 5&start=27
.
Part 1 and chatterbox project description here:
http://www.murga-linux.com/puppy/viewtopic.php?t=89258
Part 3 (Making puppy act on decoded commands) here:
http://murga-linux.com/puppy/viewtopic.php?t=89260
Progress so far is based on the 'pocketsphinx_continuous" .pet offered by technosaurus here:
http://www.murga-linux.com/puppy/viewto ... 5&start=27
.
Last edited by greengeek on Wed 27 Nov 2013, 04:45, edited 4 times in total.
This is going to be the tough bit. Getting your computer to understand even one single word is tough enough, never mind the entire English language.
For these purposes, however, even the ability to discern between 2 words like "yes" and "no" would be extremely helpful.
Ive heard to try sphinx, verbio, ubuntu, and all manner and sorts of other things, but I have not had any luck with any of it. But I can tell you this much; I know when I am beaten, and there is a 6 month chunk of my life gone that I wont ever get back that I spent banging my head against this very wall (hindsight being 20/20, I'd avoid Sphinx if I were you), so by all means, please have a go at it...
I look forward to seeing what comes of it!
For these purposes, however, even the ability to discern between 2 words like "yes" and "no" would be extremely helpful.
Ive heard to try sphinx, verbio, ubuntu, and all manner and sorts of other things, but I have not had any luck with any of it. But I can tell you this much; I know when I am beaten, and there is a 6 month chunk of my life gone that I wont ever get back that I spent banging my head against this very wall (hindsight being 20/20, I'd avoid Sphinx if I were you), so by all means, please have a go at it...
I look forward to seeing what comes of it!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
Sorry to hear of your experience with Sphinx - I was getting my hopes up this morning when technosaurus posted a pet of pocketsphinx
http://www.murga-linux.com/puppy/viewto ... 5&start=27
Never mind, I will give it a go. As you say, teaching it the difference between "yes" and "no" is all that is required to make a start. To be honest I've read a few posts that suggest it is a mistake to use short words with STT - better to try to teach it the difference between "affirmative" and "not bloody likely" - apparently the longer phrases are easier to decode reliably.
http://www.murga-linux.com/puppy/viewto ... 5&start=27
Never mind, I will give it a go. As you say, teaching it the difference between "yes" and "no" is all that is required to make a start. To be honest I've read a few posts that suggest it is a mistake to use short words with STT - better to try to teach it the difference between "affirmative" and "not bloody likely" - apparently the longer phrases are easier to decode reliably.
Well, I've been playing with pocketsphinx and it seems to be pretty good at decoding what I'm saying. I can certainly get it to distinguish yes and no with excellent reliability. Surprisingly it also seems very good (sometimes) at assembling entire sentences - although the accuracy does vary if the room has background noise.
I found that the program itself was extremely sensitive to mic volume and it was necessary for me turn the capture volume right DOWN to almost nothing, and to turn OFF the 20db mic boost which is usually a necessity with all other audio programs like mhwaveedit etc. Quite surprising.
The problem is what to do with the output of the recognition program? I can see the decoded speech in the terminal but how to feed it to a text file in real time??
Technosaurus mentioned the following tutorial:
http://hackaday.com/2010/07/11/adding-s ... -platform/
and one of the comments was as follows:
Technosaurus makes the following comment:
I found that the program itself was extremely sensitive to mic volume and it was necessary for me turn the capture volume right DOWN to almost nothing, and to turn OFF the 20db mic boost which is usually a necessity with all other audio programs like mhwaveedit etc. Quite surprising.
The problem is what to do with the output of the recognition program? I can see the decoded speech in the terminal but how to feed it to a text file in real time??
Technosaurus mentioned the following tutorial:
http://hackaday.com/2010/07/11/adding-s ... -platform/
and one of the comments was as follows:
To which the author replied:I have a robot and I want to use Pocketsphinx so I can talk to the robot thing like…where is this room and it will tell me where it is or move foward and it should move forward. Right now I have install pockectsphinx.07 and sphinxbase and when I run using ubuntu 10.04LTS: pocketsphinx_continuous -lm 1998.lm -dict .dict 1998.dic it say READY then listening the when I say something like Good morning it write back Goodmorning….But how do I go from here…how do I use pocketsphinx to allow me to just talk and have what I just said be recorded and send to my robot to move…PLEASE HELP
So - not being a programmer, I'm stuck.Hello Steve
The way to connect recognizer library output to an action is a standard task every programmer could solve. I suppose you need to learn how to write programs. I’m sure you could find quite some references on the web. If you learn Python for example you can do it in a minute. For futher questions please use CMUSphinx forums
http://cmusphinx.sourceforge.net/wiki/communicate
Technosaurus makes the following comment:
I will need to scavenge the CMUSphinx forums and learn what all this means and see if there are any examples that give me some clues how to finetune this for puppy.One way to handle the output from speech recognition is to use /dev/stdout as the output and pipe it through a while-read-case block like:Code: Select all
pocketsphinx_continuous <params>| while read LINE; do case "$LINE" in *)...;; #use different regex here for different actions esac; done
lucid 5.2.8
Code: Select all
One or more words that identify this distribution:
DISTRO_NAME='Lucid '
#A three-digit numeric value, version number of this distribution:
DISTRO_VERSION=528
#A two-digit numeric value, minor-version number of this distribution:
DISTRO_MINOR_VERSION=00
#The distro whose binary packages were used to build this distribution:
DISTRO_BINARY_COMPAT='ubuntu'
#Prefix for some filenames: exs: lupusave.2fs, lupu-528.sfs
DISTRO_FILE_PREFIX='lupu'
#The version of the distro whose binary packages were used to build this distro:
DISTRO_COMPAT_VERSION='lucid'
#the kernel pet package used:
DISTRO_KERNEL_PET='linux_kernel-2.6.33.2-tickless_smp_patched-L3.pet'
#16-byte alpha-numeric ID-string appended to vmlinuz, lupu_528.sfs, zl528332.sfs and devx.sfs:
DISTRO_IDSTRING='l528120404231153'
#Puppy default filenames...
#Note, the 'SFS' files below are what the 'init' script in initrd.gz searches for,
#for the partition, path and actual files loaded, see PUPSFS and ZDRV in /etc/rc.d/PUPSTATE
DISTRO_PUPPYSFS='lupu_528.sfs'
DISTRO_ZDRVSFS='zl528332.sfs'
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
When the decoded speech is extracted I thought it would be useful to have it placed into a text file called something like chatdump or voicedump or something like that - the stream of text would flow in as the user spoke, and maybe the file would need to be cleared every 3 seconds or so.
When Puppy was ready to assess the users answer to a question it would go looking at the chatdump and view the last word (or words if appropriate).
If the user was busy chatting to other people in the room this chatter would be discarded after 3 seconds, and then when it came time to answer a Puppy question the user would reach a natural break in their conversation and the chatdump would just contain their answer to that question.
Just tossing ideas into the mix....
When Puppy was ready to assess the users answer to a question it would go looking at the chatdump and view the last word (or words if appropriate).
If the user was busy chatting to other people in the room this chatter would be discarded after 3 seconds, and then when it came time to answer a Puppy question the user would reach a natural break in their conversation and the chatdump would just contain their answer to that question.
Just tossing ideas into the mix....
we are of one mind here. While I can see the merit of piping the stdout using python and then continuing in python, i would prefer to stay in the shallow end with my water wings and just write the stdout to a txt file which can then be bash-ed into submission. I can write a monitor-script to check the bash file for changes every few seconds and when they are detected, to act on them appropriately.
Arguably not as elegant as a singular python script, but i think it will do the job. Luckily there are many ways to skin a cat programmatically
Arguably not as elegant as a singular python script, but i think it will do the job. Luckily there are many ways to skin a cat programmatically
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
I've just booted into a live session of Lupu 528 and can confirm that pocketsphinx works fine.
(Interestingly I did not need to wind down the mic volume in Lupu the way I did on Upup. It worked fine in Lupu without any changes).
Steps as follows:
1) Download technosaurus pocketsphinx pet from here:
http://murga-linux.com/puppy/viewtopic. ... 5&start=27
2) Install the pet
3) Create a new directory of /usr/share/pocketsphinx (we will be using this later...)
4) Download the other source files referred to by technoasurus from this link:
http://hivelocity.dl.sourceforge.net/pr ... 0.8.tar.gz
5) Extract these files in your download directory and copy the "model" directory from the source into the /usr/share/pocketsphinx directory created above. (ie it becomes /usr/share/pocketsphinx/model)
6) Go into /usr/bin, rightclick in the open space and choose "window, terminal here"
7) Type: #./pocketsphinx_continuous
You should see sphinx set itself up and eventually show a "Ready" prompt. At that point you can speak into your microphone and you should see it say "listening..." and then once you stop speaking it will try to decode what you said.
Try saying "negative" or "affirmative" - I found the detection of those words to be 100% accurate if I used an American accent (ie: roll the r slightly in affirmative, just like Mr Spock would have.)
(The biggest problem is I keep spelling "shpinx" wrong a million times).
(Interestingly I did not need to wind down the mic volume in Lupu the way I did on Upup. It worked fine in Lupu without any changes).
Steps as follows:
1) Download technosaurus pocketsphinx pet from here:
http://murga-linux.com/puppy/viewtopic. ... 5&start=27
2) Install the pet
3) Create a new directory of /usr/share/pocketsphinx (we will be using this later...)
4) Download the other source files referred to by technoasurus from this link:
http://hivelocity.dl.sourceforge.net/pr ... 0.8.tar.gz
5) Extract these files in your download directory and copy the "model" directory from the source into the /usr/share/pocketsphinx directory created above. (ie it becomes /usr/share/pocketsphinx/model)
6) Go into /usr/bin, rightclick in the open space and choose "window, terminal here"
7) Type: #./pocketsphinx_continuous
You should see sphinx set itself up and eventually show a "Ready" prompt. At that point you can speak into your microphone and you should see it say "listening..." and then once you stop speaking it will try to decode what you said.
Try saying "negative" or "affirmative" - I found the detection of those words to be 100% accurate if I used an American accent (ie: roll the r slightly in affirmative, just like Mr Spock would have.)
(The biggest problem is I keep spelling "shpinx" wrong a million times).
Also, I found some words worked really well and others were unreliable (this probably depends on the microphone, the soundcard and the voice of the user etc)
Here is a list of the words I found that work pretty consistently so far:
negative (pronounce the t clearly)
affirmative (pronounce the t clearly and roll the r slightly as Americans do)
yes
no
right
down
north (roll the r slightly as americans do)
program
clear
again (pronounce "agen" not "agayn")
welcome
beginning
screen
return (roll the r slightly as americans do)
absolutely
music
internet (pronounced as "innnternet" as Americans would. Roll the r slightly)
one
four (roll the r slightly as americans do)
six
self
finish
fiction
america
Out house
Avoid start and stop as they are too easily confused.
.
.
Here is a list of the words I found that work pretty consistently so far:
negative (pronounce the t clearly)
affirmative (pronounce the t clearly and roll the r slightly as Americans do)
yes
no
right
down
north (roll the r slightly as americans do)
program
clear
again (pronounce "agen" not "agayn")
welcome
beginning
screen
return (roll the r slightly as americans do)
absolutely
music
internet (pronounced as "innnternet" as Americans would. Roll the r slightly)
one
four (roll the r slightly as americans do)
six
self
finish
fiction
america
Out house
Avoid start and stop as they are too easily confused.
.
.
Last edited by greengeek on Sun 13 Oct 2013, 00:39, edited 1 time in total.
Code: Select all
sh-4.1# ./pocketsphinx_continuous
INFO: cmd_ln.c(691): Parsing command line:
./pocketsphinx_continuous
Current configuration:
[NAME] [DEFLT] [VALUE]
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-infile
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242): 5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339): 37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359): 14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379): 36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407): 854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463): 5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): ./pocketsphinx_continuous COMPILED ON: Oct 11 2013, AT: 11:34:56
Warning: Could not find Mic element
FATAL_ERROR: "continuous.c", line 254: Failed to calibrate voice activity detection
? i have 2 mics. they work and are recognized....any thoughts?
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
when i switchbetween mics, i get this...
i assume it cannot find the mic? i dunno...ill keep picking at it. no smoking HDDs tho so its progress...
Code: Select all
sh-4.1# ./pocketsphinx_continuous
INFO: cmd_ln.c(691): Parsing command line:
./pocketsphinx_continuous
Current configuration:
[NAME] [DEFLT] [VALUE]
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-infile
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242): 5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339): 37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359): 14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379): 36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407): 854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463): 5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): ./pocketsphinx_continuous COMPILED ON: Oct 11 2013, AT: 11:34:56
Warning: Could not find Mic element
READY....
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
despite the error message, it DOES seem to be listening!
NICE JOB!
give me a few to play with this and see what I cant make of it Looks like part 2 may be close to done
Ill be back....
NICE JOB!
give me a few to play with this and see what I cant make of it Looks like part 2 may be close to done
Ill be back....
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
Code: Select all
#!/bin/sh
file="inputtxt"
pocketsphinx_continuous | while read LINE; do
case "$LINE" in
echo "$LINE" >> "$file"
done
i have created a script in the usr/bin folder and given it the above code to chew on, but im getting no joy as yet. ill figure it out tho...might take me a minute to nail down but ill get it.
if any other code monkey wants to jump in and tell me my syntax error i would not complain...feel free! but this is not so tough and ill untangle it sooner or later.
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson