Author |
Message |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 17:36 Post subject:
Chatterbox - STT / TTS / TTA project. Part 2 Subject description: Make Puppy listen. |
|
Part 2 of my "chatterbox" project is aimed at getting a Puppy to monitor the microphone and listen to my response and create a text file which accurately reflects what I have spoken.
Part 1 and chatterbox project description here:
http://www.murga-linux.com/puppy/viewtopic.php?t=89258
Part 3 (Making puppy act on decoded commands) here:
http://murga-linux.com/puppy/viewtopic.php?t=89260
Progress so far is based on the 'pocketsphinx_continuous" .pet offered by technosaurus here:
http://www.murga-linux.com/puppy/viewtopic.php?t=88095&start=27
.
Last edited by greengeek on Wed 27 Nov 2013, 00:45; edited 4 times in total
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 17:36 Post subject:
|
|
reserved
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 17:36 Post subject:
|
|
reserved
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 17:37 Post subject:
|
|
reserved
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 17:37 Post subject:
|
|
reserved
|
Back to top
|
|
 |
H4LF82

Joined: 02 Oct 2012 Posts: 124
|
Posted: Fri 11 Oct 2013, 21:04 Post subject:
|
|
This is going to be the tough bit. Getting your computer to understand even one single word is tough enough, never mind the entire English language.
For these purposes, however, even the ability to discern between 2 words like "yes" and "no" would be extremely helpful.
Ive heard to try sphinx, verbio, ubuntu, and all manner and sorts of other things, but I have not had any luck with any of it. But I can tell you this much; I know when I am beaten, and there is a 6 month chunk of my life gone that I wont ever get back that I spent banging my head against this very wall (hindsight being 20/20, I'd avoid Sphinx if I were you), so by all means, please have a go at it...
I look forward to seeing what comes of it!
_________________ "The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Fri 11 Oct 2013, 22:05 Post subject:
|
|
Sorry to hear of your experience with Sphinx - I was getting my hopes up this morning when technosaurus posted a pet of pocketsphinx
http://www.murga-linux.com/puppy/viewtopic.php?t=88095&start=27
Never mind, I will give it a go. As you say, teaching it the difference between "yes" and "no" is all that is required to make a start. To be honest I've read a few posts that suggest it is a mistake to use short words with STT - better to try to teach it the difference between "affirmative" and "not bloody likely" - apparently the longer phrases are easier to decode reliably.
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Sat 12 Oct 2013, 04:32 Post subject:
|
|
Well, I've been playing with pocketsphinx and it seems to be pretty good at decoding what I'm saying. I can certainly get it to distinguish yes and no with excellent reliability. Surprisingly it also seems very good (sometimes) at assembling entire sentences - although the accuracy does vary if the room has background noise.
I found that the program itself was extremely sensitive to mic volume and it was necessary for me turn the capture volume right DOWN to almost nothing, and to turn OFF the 20db mic boost which is usually a necessity with all other audio programs like mhwaveedit etc. Quite surprising.
The problem is what to do with the output of the recognition program? I can see the decoded speech in the terminal but how to feed it to a text file in real time??
Technosaurus mentioned the following tutorial:
http://hackaday.com/2010/07/11/adding-speach-recognition-to-your-embedded-platform/
and one of the comments was as follows:
Quote: | I have a robot and I want to use Pocketsphinx so I can talk to the robot thing like…where is this room and it will tell me where it is or move foward and it should move forward. Right now I have install pockectsphinx.07 and sphinxbase and when I run using ubuntu 10.04LTS: pocketsphinx_continuous -lm 1998.lm -dict .dict 1998.dic it say READY then listening the when I say something like Good morning it write back Goodmorning….But how do I go from here…how do I use pocketsphinx to allow me to just talk and have what I just said be recorded and send to my robot to move…PLEASE HELP | To which the author replied:
Quote: | Hello Steve
The way to connect recognizer library output to an action is a standard task every programmer could solve. I suppose you need to learn how to write programs. I’m sure you could find quite some references on the web. If you learn Python for example you can do it in a minute. For futher questions please use CMUSphinx forums
http://cmusphinx.sourceforge.net/wiki/communicate
| So - not being a programmer, I'm stuck.
Technosaurus makes the following comment:
Quote: | One way to handle the output from speech recognition is to use /dev/stdout as the output and pipe it through a while-read-case block like:
Code: |
pocketsphinx_continuous <params>| while read LINE; do
case "$LINE" in
*)...;; #use different regex here for different actions
esac;
done |
|
I will need to scavenge the CMUSphinx forums and learn what all this means and see if there are any examples that give me some clues how to finetune this for puppy.
|
Back to top
|
|
 |
H4LF82

Joined: 02 Oct 2012 Posts: 124
|
Posted: Sat 12 Oct 2013, 13:04 Post subject:
sphinx |
|
if we can practice on Lucid ill give it a go...
gimme a few to get caffiene and im on it...
_________________ "The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Sat 12 Oct 2013, 13:28 Post subject:
|
|
What version of Lucid are you using? (could you post the bottom few lines of your /etc/DISTRO_SPECS file?
|
Back to top
|
|
 |
H4LF82

Joined: 02 Oct 2012 Posts: 124
|
Posted: Sat 12 Oct 2013, 13:39 Post subject:
|
|
lucid 5.2.8
Code: |
One or more words that identify this distribution:
DISTRO_NAME='Lucid '
#A three-digit numeric value, version number of this distribution:
DISTRO_VERSION=528
#A two-digit numeric value, minor-version number of this distribution:
DISTRO_MINOR_VERSION=00
#The distro whose binary packages were used to build this distribution:
DISTRO_BINARY_COMPAT='ubuntu'
#Prefix for some filenames: exs: lupusave.2fs, lupu-528.sfs
DISTRO_FILE_PREFIX='lupu'
#The version of the distro whose binary packages were used to build this distro:
DISTRO_COMPAT_VERSION='lucid'
#the kernel pet package used:
DISTRO_KERNEL_PET='linux_kernel-2.6.33.2-tickless_smp_patched-L3.pet'
#16-byte alpha-numeric ID-string appended to vmlinuz, lupu_528.sfs, zl528332.sfs and devx.sfs:
DISTRO_IDSTRING='l528120404231153'
#Puppy default filenames...
#Note, the 'SFS' files below are what the 'init' script in initrd.gz searches for,
#for the partition, path and actual files loaded, see PUPSFS and ZDRV in /etc/rc.d/PUPSTATE
DISTRO_PUPPYSFS='lupu_528.sfs'
DISTRO_ZDRVSFS='zl528332.sfs' |
_________________ "The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
|
Back to top
|
|
 |
H4LF82

Joined: 02 Oct 2012 Posts: 124
|
Posted: Sat 12 Oct 2013, 13:46 Post subject:
|
|
if you would confirm that sphinx plays well with lucid ( i.e. no smoking HDD's) then i will give it another try. i may have had an old version last time...
_________________ "The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Sat 12 Oct 2013, 14:01 Post subject:
|
|
When the decoded speech is extracted I thought it would be useful to have it placed into a text file called something like chatdump or voicedump or something like that - the stream of text would flow in as the user spoke, and maybe the file would need to be cleared every 3 seconds or so.
When Puppy was ready to assess the users answer to a question it would go looking at the chatdump and view the last word (or words if appropriate).
If the user was busy chatting to other people in the room this chatter would be discarded after 3 seconds, and then when it came time to answer a Puppy question the user would reach a natural break in their conversation and the chatdump would just contain their answer to that question.
Just tossing ideas into the mix....
|
Back to top
|
|
 |
H4LF82

Joined: 02 Oct 2012 Posts: 124
|
Posted: Sat 12 Oct 2013, 14:12 Post subject:
|
|
we are of one mind here. While I can see the merit of piping the stdout using python and then continuing in python, i would prefer to stay in the shallow end with my water wings and just write the stdout to a txt file which can then be bash-ed into submission. I can write a monitor-script to check the bash file for changes every few seconds and when they are detected, to act on them appropriately.
Arguably not as elegant as a singular python script, but i think it will do the job. Luckily there are many ways to skin a cat programmatically
_________________ "The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
|
Back to top
|
|
 |
greengeek

Joined: 20 Jul 2010 Posts: 4937 Location: Republic of Novo Zelande
|
Posted: Sat 12 Oct 2013, 15:53 Post subject:
|
|
I've just booted into a live session of Lupu 528 and can confirm that pocketsphinx works fine.
(Interestingly I did not need to wind down the mic volume in Lupu the way I did on Upup. It worked fine in Lupu without any changes).
Steps as follows:
1) Download technosaurus pocketsphinx pet from here:
http://murga-linux.com/puppy/viewtopic.php?t=88095&start=27
2) Install the pet
3) Create a new directory of /usr/share/pocketsphinx (we will be using this later...)
4) Download the other source files referred to by technoasurus from this link:
http://hivelocity.dl.sourceforge.net/project/cmusphinx/pocketsphinx/0.8/pocketsphinx-0.8.tar.gz
5) Extract these files in your download directory and copy the "model" directory from the source into the /usr/share/pocketsphinx directory created above. (ie it becomes /usr/share/pocketsphinx/model)
6) Go into /usr/bin, rightclick in the open space and choose "window, terminal here"
7) Type: #./pocketsphinx_continuous
You should see sphinx set itself up and eventually show a "Ready" prompt. At that point you can speak into your microphone and you should see it say "listening..." and then once you stop speaking it will try to decode what you said.
Try saying "negative" or "affirmative" - I found the detection of those words to be 100% accurate if I used an American accent (ie: roll the r slightly in affirmative, just like Mr Spock would have.)
(The biggest problem is I keep spelling "shpinx" wrong a million times).
|
Back to top
|
|
 |
|