Audeme MOVI User manual

MOVI™ Voice Dialog Shield
for Arduino
Ⓡ
boards
User’s Manual
covers MOVI™ Firmware v1.10beta
Copyright © 2015, 2016 by Audeme LLC.

Acknowledgements
MOVI ™would not have been possible without the enduring feedback and assistance of
many people. First, we’d like to thank the 334 Kickstarter backers who, through their
nancial contribution, questions and comments allowed us to make this project happen. A
list of the backers is available here: http://www.audeme.com/kickstarter-backers.html
Moreover, we are grateful for the voluntary help by our beta testers Jared Peters from
Origami Robotics and Lars Knipping from University of Technology Berlin.
We are also indebted to the open source community. Without the many people creating
open source tools, we wouldn’t been able to put together MOVI. MOVI uses the following
open source packages:
- The Advanced Linux Sound Architecture (ALSA) project ( http://www.alsa-project.org )
- eSpeak text to speech ( http://espeak.sourceforge.net/ )
- CMU Sphinx and PocketSphinx, Speech Recognition Toolkit
( http://sourceforge.net/projects/cmusphinx/ )
- The OpenFst Library ( http://www.openfst.org/twiki/bin/view/FST/WebHome )
- The Many-to-Many alignment model ( https://code.google.com/p/m2m-aligner/ )
- The CMUCLMTK Development Toolkit
( http://cmusphinx.sourceforge.net/wiki/cmuclmtkdevelopment )
- The Phonetisaurus package ( https://github.com/AdolfVonKleist/Phonetisaurus )
- The MIT Language Modeling Toolkit ( https://code.google.com/p/mitlm/ )
- SVOX Pico ( https://launchpad.net/ubuntu/precise/+source/svox/+copyright )
MOVI is booting a standard Linux Kernel using u-boot and is relying on many of the GNU
standard tools and libraries such as GLIBC, bash, etc. We also make extensive use of
Python ( https://www.python.org/ ).
The SDCard shipped with MOVI has a mountable ext3 partition “MOVI Root” which contains
a directory src/ with instructions how to obtain the source code for any of the open source
packages we used, independently of the any license requirement for us to do that.
Ultimately though, we need to thank you, the reader of this document, for you interest and
commitment. Users are what makes a product.
Thank you so much for all your support!
2 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

Table of Contents
Acknowledgements
Table of Contents
1. Introduction
2. Board Description
Power Supply
Speakers
Audio Input
LED
Reset Button
Jumper 1 (5V_REF)
Jumper 2 and Jumper 3 (Alternative Serial Communication)
3. Getting Started
4. Getting the Best Speech Recognition Results
Operation Modes
Training Sentences vs Words
Saving Arduino Memory
5. MOVI Firmware Updates
Linux and Mac OS X
Windows
3 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

6. Further Information
7. FAQ
Appendix
A. Compatibility
General Compatibility
Uno R1 and R2, MEGA2560 R1 and R2, Leonardo R1 and R2
Uno R3, Mega2560 R3, Leonardo R3
Freeduino
Olimexino-328
Diavolino
Arduino Yun
Arduino Due
Intel Galileo Gen 2
Intel Edison
Boards our users were able to get to work with MOVI
Boards we were not able to get to work with MOVI
B. MOVIShield Library Reference
Methods that must be used in setup()
MOVI constructors
Initialization methods
Methods that are typically used in setup() but can also be used in loop()
4 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

Methods that are typically used in loop()
Infrequently used advanced commands
C. The Low Level Interface
D. MOVI Event Categories
E. Special Files on the SD card
Playing sound les
Changing the communication bit rate
Voxforge.org models
F. Terms and Conditions
5 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

1. Introduction
Welcome to MOVI!
MOVI stands for My Own Voice Interface and is the rst standalone speech recognizer and
voice synthesizer for Arduino with full sentence capability, natively supporting English and
optionally other languages from Voxforge.org:
●Lot’s of space for customizable English sentences (we tested up to 200, users
reported up to 1000)
●Speaker independent
●Standalone, cloudless and private
●Easy to program
●Dierent, congurable speech synthesizers included
MOVI provides an alternative to buttons, remote controls, or cell phones by letting you use
full-sentence voice commands for tasks such as turning devices on and o, entering alarm
codes, and carrying on programmed conversations with projects.
This manual will guide you through the rst steps, provides compatibility information and
serves as a programming reference. It will also give you some tips on how to get the best
speech recognitions results with MOVI. We know how boring it seems to have to read a
manual when a shiny device just arrived in the mail and all we want is to get our hands on
it... However, the world of speech recognition is not only fascinating but also sometimes
tricky and things that should be easy aren’t while things that should be hard just
miraculously work. So we strongly recommend to not only read this manual but to also
keep it handy. Further information can be found on our website, especially in the
ever-growing forum: http://www.audeme.com/forum.html
Yours,
Bertrand Irissou and Gerald Friedland
Makers of MOVI
6 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

2. Board Description
Figure 1. Birds-eye view of the MOVI
TM
board.
Figure 1 shows your MOVI board from the top with a legend of the most important
components.
Power Supply
What you don’t see in Figure 1 is a power supply jack. MOVI is powered through the
Arduino board that needs to be powered using an external power supply. The external
power supply should provide between 7V and 16V and at least 500mA current. During tests
we usually used either 9V or 12V. Battery packs with this specication work as well.
Please note: MOVI cannot be powered through a USB power supply or the USB cable
of the Arduino board as the voltage provided is less than 6V.
7 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

Speakers
For proper operation, MOVI requires a speaker connected to Audio Out. The speaker can
be mono or stereo but the signal provided by MOVI is mono. The speaker impedance
should be 32 ohms, which is the standard impedance for headphones. The output volume
can be controlled in software using the MOVI library (see Appendix B and C ). For
convenience, we recommend active speakers with an amplier and volume control.
Note: Ocially, you cannot connect 4 ohm or 8 ohm speakers. These require an
amplier and might damage the board. Inocially (i.e. no warrantee!), many users
have tested it and it seems ne when the wattage of the speakers is low (e.g. 0.25W,
0.5W).
Note: Only connect 3-conductor (stereo) headphone jacks to MOVI. 2-conductor
(mono) and 4-conductor jacks (stereo plus microphone), require an adapter.
Audio Input
By default, the integrated electret microphone (see Microphone in Figure 1) is used. This
microphone is internally connected to an Automatic Gain Control that will amplify incoming
sounds to standard level independent of the distance. This will work up to about 15 feet (5
meters), under good conditions sometimes even for wider distances. Under bad conditions
(noise, room echo) the distance will be shorter. For usability reasons or in dicult
environmental conditions, a headset microphone should be used. A headset microphone
or an alternative electret microphone can be connected to External MicIn. This audio jack is
a stereo jack but only accepts a mono signal. Connecting a device to External MicIn disables
the integrated microphone. Also, the signal that comes through External MicIn is not
amplied.
Note: Do not connect a Line-In signal or any other signal that is pre-amplied to the
microphone jack. Also, microphones that require phantom power will not work.
Note: Only connect 3-conductor (stereo) headphone jacks to MOVI. 2-conductor
(mono) and 4-conductor jacks (stereo plus microphone), require an adapter.
8 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

LED
MOVI uses the LED as an indicator for the state that MOVI is in. The following states are
signalled:
LED o : LED constantly o means MOVI is turned o, there is not enough power to
operate, and/or the SD card is not plugged in.
LED blinking faster and faster : MOVI is booting.
LED blinking randomly : MOVI is writing to the SD-Card. This happens during an update,
training, or resetting to defaults. MOVI should not be powered o while the LED is blinking
randomly.
LED blinking with constant frequency : If MOVI’s LED is blinking with constant frequency,
there is a serious issue with the SD-Card e.g., MOVI’s le system check failed permanently.
LED constantly on : MOVI’s LED constantly on indicates MOVI is ready to operate or is
operating normally. Only in this state, MOVI will recognize the call sign and the
programmed sentences.
Reset Button
The button is programmed as a reset button on a short press. MOVI will reboot (not the
underlying Arduino board though). Please do not press the reset button while the LED
blinks with randomly (see above).
A long press will revert MOVI’s callsign, trained sentences and other conguration
parameters to factory default. Please note, that the board can’t be reset and shouldn’t be
powered o during the restoration process.
Jumper 1 (5V_REF)
On most boards Jumper 1 must be open. Jumper 1 only needs to be closed when a 5 V
board is used that does not have the IOREF pin. This is explained in detail in Appendix A .
Caution: Jumper 1 hard wires the 5 V pin to IOREF. Therefore, setting Jumper 1 on a
3.3V board will destroy the board!
9 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

Jumper 2 and Jumper 3 (Alternative Serial Communication)
In most cases, Jumper 2 and Jumper 3 should be closed. By default, MOVI is using Arduino
pins D10 and D11 for communication between shield and board. If these pins are used for
other purposes or by another shield, Jumper 2 and Jumper 3 can be used to rewire the
communication. Also, some boards, such as the Arduino Due, are not able to have serial
communication on D10 and D11 and therefore need to be rewired for MOVI to operate.
Refer Appendix A for details.
To rewire MOVI’s communication to dierent pins, open Jumper 2 and Jumper 3 and
connect the left side of MOVI's TX jumper (Jumper 2) and the left side of MOVI's RX jumper
(Jumper 3) to two other connectors on the Arduino headers using jumper wires. The left
side is the pin that is further away from the Arduino headers and the microphone (MIC1)
and closer to the button.
10 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

3. Getting Started
If you use a new Arduino UNO R3, an Arduino MEGA R3 or an Arduino Leonardo R3 (with an
IOREF pin) go right ahead through this Section. If you are not sure or use any other board,
including older versions of the UNO, MEGA and LEONARDO and “compatibles” please read
Appendix A rst.
You need: A computer that can run the Arduino IDE, your MOVI board, your Arduino board,
your Arduino programming cable, an external power supply, and active speakers that can
be driven by a headphone jack. Optional: A green and a red LED for some of the example
sketches.
1. Download and install Arduino IDE recommended for your board.
To do that, follow the instructions in your Arduino documentation or on this
website: https://www.arduino.cc/en/Main/Software
2. Download the MOVI library as a zip le from http://www.audeme.com/MOVI/ .
Users familiar with both MOVI and open source programming may also check out
the latest source code from https://github.com/audeme/MOVIArduinoAPI .
11 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

3. Install the MOVI library into the Arduino IDE. Instructions on how to install a library
can be found here: https://www.arduino.cc/en/Guide/Libraries
4. Load the LightSwitch example by opening the File menu under Examples . Choose
MOVI or MOVI(tm) Voice Dialog Shield , depending on the version of the IDE. The result
should look similar to this:
5. Disconnect all power and USB cables from your Arduino board and connect the
MOVI shield onto it:
12 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

6. Connect an external speaker or a headset to the Audio Out (see Section 2). Audio
Out is labeled “HEADPHONES” and is the audio jack further away from the
integrated microphone, closer to the Arduino headers.
7. On an Arduino board with IOREF pin, make sure Jumper 1 is open (unset).
Otherwise, check Appendix A about the best setting for Jumper 1. Connect the
external power supply to the Arduino board and switch it on.
8. After about 2 seconds, you should see MOVI’s LED (close to microphone) blinking
with increasing frequency. The speakers will say “MOVI is booting”. Eventually the
13 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

LED will stop blinking and just be constantly on. This indicates MOVI is ready. If the
LED does not go on at all, please turn o the power and read Appendix A . If, by the
time, the LED has become steady red, you didn’t hear anything, please check your
speakers/headset and the connection.
9. Connect the USB programming cable to the Arduino board
Important: Always, connect the USB cable after you have connected the external
power supply. It is safe to disconnect the USB while the power is on. With the
exception of MOVI updating, learning a new call sign, learning new sentences, or
resetting to factory settings, you can always unplug the power safely. MOVI’s LED
(close to the microphone, see image) will blink randomly while it is not safe to
unplug. However, please do not disconnect the external power while the USB
cable is plugged to the Arduino. Powering MOVI from USB will not supply
enough voltage to the board and will therefore leave MOVI in an unstable
state where the LED might be on or blinking but MOVI not work properly.
10. In the Arduino IDE, compile and upload the LightSwitch Example.
11. Get close to the microphone capsule and say “Arduino” in a normal voice, wait for
<beep beep>. Say “Let there be light”. Wait for <beep>.
12. Speakers should say “and there was light” and LED on Arduino board turns on.
Please note that Arduino’s onboard LED might be a bit hidden below the MOVI
shield. For better visibility, connect an LED to Arduino port D13 (+) and GND (-).
14 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

13. Say “Arduino”, wait for <beep beep>. Say “Go dark”. Wait for <beep>
14. LED on Arduino board turns o.
15. Congrats! Now play around with the code and or load other examples. The
examples are roughly sorted by diculty. Many of them don’t require extra
hardware but are mostly focussed on MOVI’s functionality.
16. Read ahead in this manual, especially Section 4, and also check out our forum on
http://www.audeme.com/forum .
15 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

4. Getting the Best Speech Recognition Results
Read this section when everything is setup and you have had your rst experiences with
MOVI but before you are about to plan out your rst bigger project. Speech Recognition can
be tricky and sometimes things just need a little `magic` even when everything was setup
correctly. After all, speech recognition is still a eld of active research and many problems
haven’t even been nearly solved ( http://en.wikipedia.org/wiki/Speech_recognition ). Having
said that, knowing how MOVI principally works will help tinkering with issues that come up.
Operation Modes
MOVI’s speech recognizer has two basic modes of operation, training and recognition,
which are described as follows.
Training : MOVI’s Arduino library sends the training sentences in textual form over the
serial connection to the shield. The shield phonetizes the words in each sentences using a
2 GB English dictionary that knows spelling rules and approximates even for proper names.
The phoneme sequences are used to create a temporal model that makes sure that only
words are recognized that have been part of the training sentences. A second temporal
model favors word sequences that are part of the sentences over sequences that are not
by assigning higher probabilities to phoneme sequences that occurred in the trained
sentences over those that didn’t.
Recognition : During recognition, a waveform comes in over the microphone and is broken
down into speech and non-speech regions. This is done by an algorithm that monitors the
energy of the incoming signal over a short time period and compares it to a threshold. If
the pattern looks like speech and speech pauses, it assumes speech, otherwise the signal is
ignored. The speech regions of the signal are then passed to a classier that has been
trained on hundreds of adult speakers. It breaks down the waveform into possible
phonemes sequences. Using the temporal model created in training, the phoneme
sequences are matched to the pre-trained words and also word sequences that are part of
the training sentences are favored. A last correction step maps the words to the most likely
16 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

sentence in the training set (result from poll() ). This second step can be omitted in the
library by using getResult() .
Now that we’ve got that out of the way, let’s discuss some common issues.
Training Sentences vs Words
Let’s assume you want to recognize any combination of numbers between one and three. Is
it better to train one sentence “one two three” or three ‘sentences’ “one”, “two”, and “three”?
If the goal is to recognize any combination of the numbers one, two, and three and each of
them are equally likely to appear, it’s best to train three sentences and use the getResult()
method. Training one sentences will work too but there is a high likelihood that the
recognizer will favor “one two three”.
If it’s really always a combination of three dierent numbers between one and three, it is
preferable to train all six combinations of “one two three”, “one three two”, “two three one”,
“three two one”, “two one three”, “three one two”. This way, poll() can be used and MOVI’s
algorithm can correct errors.
What if the combination of numbers was between 1 and 10? We have tested MOVI
successfully with about 150 short sentences and we are pretty sure there can be some
more but we also know training 10!=3628800 sentences will not work. So obviously 10
sentences need to be trained and getResult() needs to be used.
What if only one number between one and ten was to be recognized? In this, case it is ne
to train one sentence of (“one two three four ve six seven eight nine ten”) since it saves
memory and training speed and the temporality isn’t used anyways as there is only one
word to be recognized. However, training ten sentences will not harm the recognition
accuracy.
What it there was some known word sequencing but not for the entire sentence? Let’s say
you want to recognize ‘0.x’ and ‘1.x’ with x being a number between 0 and 9. The best way
to do this is to train twenty sentences “zero point zero”, “zero point one”, “zero point two”, ...
“one point nine”. However, if the acoustic conditions in the room are good, it’s feasible to
break the sequences up into less sentences, for example: “zero point”, “one point”, and 8
17 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

single word sentences “two”, “three”, “four”, etc... (the words zero and one have already
been trained). This may be further reduced to three sentences by making the numbers 2-9
one sentence “two three four ve six seven eight nine ten”. Splitting up this task in less than
twenty sentences, however, requires to use the getResult() method.
The overall rule of thumb is: Favor training all known word sequences as sentences.
Otherwise, train words as sentences individually.
Sentences do not get higher priority if they are dened twice as, in fact, the system will
remove duplicate sentences. However, if one can give a certain sequence (out of many
possible) a higher priority by rst dening individual word sentences and then the actual
sequence. For example, dening the sentence “one”, “two”, “three” and the sentences “three
two one” will give a higher probability to the sequence “three two one” than any other
sequence. This does play a role in noisy room conditions.
If you want to create a keyword spotter, e.g. recognize a particular word out of many, it’s
best to train a lot of other words as well. For example, if you want to recognize whenever
the word ”Simon” appears in a sentence, you would train the word “simon” as a sentence
along with a set of other words, for example words that appear very frequently in English
such as “the”, “be”, “to”, “o”, (for a more comprehensive list checkout this link:
https://en.wikipedia.org/wiki/Most_common_words_in_English ) as well as words that are
similar to Simon (e.g, “assignment”). This way, these words are recognized and it lowers the
false alarm rate of MOVI detecting “Simon” when other words are spoken.
Please also note that for sentence matching (as part of the poll() function) it is best for all
trained sentences to have about equal length. A single very long sentence will always be
favored when a lot of words are spoken.
Saving Arduino Memory
Some Arduino models, especially the Uno, come with very limited memory. Using the
addSentence() commands in the init() method is convenient but it does mean that the
sentences are stored in Arduino’s memory even when MOVI has already learned them.
More on Arduino’s memory restrictions can be found here:
https://www.arduino.cc/en/Tutorial/Memory
18 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

The rst solution is to uncomment the addSentence() and train() calls and re-compile and
upload after MOVI has learned the sentences. Please note, however, that the MOVI API
itself as well as another other libraries potentially included in a sketch also occupy some
SRAM. Another solution therefore is to use the so-called PROGMEM method and the F()
macro to store variables in ash memory. The concept is described here:
https://www.arduino.cc/en/Reference/PROGMEM
MOVI’s API allows to use F() macro strings to be used with addSentence, say, ask, and
password . This means addSentence(“Let there be light”) works as well as addSentence(F(“Let
there be light”) but the second options saves critical SRAM.
If the above tricks does not provide enough memory savings, then the best way is to use
the low level interface. Compile and upload the LowLevelInterface example (see
Examples/MOVI/procient/LowLevelInterface in the Arduino IDE menu) or make sure the
MOVI object is constructed using ‘true’ as rst argument. Open the Serial Console and then
use the manual TRAIN command as described in Appendix C . The TRAIN command will ask
for one sentence line by line (enter ends a sentence) until “ # ” is used to nish the input and
will then automatically learn all the sentences given. Then switch back to your project and
use poll() and/or getResult() normally. Just make sure, no addSentence() or train() call is used
in your sketch, as this would overwrite your trained sentences. Sentences are stored even
after MOVI is reset or powered down until either retrained or a factory reset is initiated,
regardless of the method used for training.
Good Call Signs
Since MOVI maps any input to the trained words, regardless of how far o it is, MOVI uses a
keyword spotting algorithm to make sure the sound registered is going to be intended for
recognition. This cuts down on false alarm rates and allows MOVI to run in the background,
e.g. as a light switch that doesn’t go erratic while a TV is on. Good call signs are call signs
that are pretty distinctive from other words. Obviously, choosing the word “the” or “to” as a
call sign works pretty terribly. We found that names that are easy to pronounce and spell
and contain two to three syllables work the best. The default, “Arduino” has more syllables
but denitely works. So does “MOVI” or “computer”. Call signs from other personal assistant
speech recognition products, unsurprisingly, work well too.
19 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016

The Role of the Acoustic Environment
Perhaps one of the most counterintuitive aws of state-of-the-art speech recognizers is that
they can’t cope with the inuence that the environment has on the acoustic properties of
the speech signal. At the same time, the human ear is incredibly good at it. The human ear
can ignore overlapping sounds, echo, reverberation, noise induced by wind, etc. Speech
recognizers are not able to do that or only to a limit extend. Room inuence is usually a
bigger factor than individual speech variance, such as accent. Having said that, speaking
rate is a factor as well. Very slow or very fast speech is hard to recognize as well.
While testing MOVI, we have observed situations where the room was completely silent, yet
MOVI had serious problems picking up the call sign. Analyzing the situation, we found that
there was too much reverberation in the room. It was not audible but clearly visible on the
oscilloscope. Short of installing carpet just to make MOVI work, we found dierent locations
in the room worked with dierent accuracy. Also, of course, the closer we moved to MOVI’s
integrated microphone, the better it work as this cuts down on the room inuence. Using a
headset microphone worked perfectly.
The rule of thumb here is: Try to shorten the distance to the microphone as much as
possible. If in doubt, use a headset microphone connected to External MicIn.
Operating MOVI under Noisy Conditions
By default MOVI, will give a one-time spoken warning about a too noisy room environment.
Moreover, if you nd that MOVI takes very long to acknowledge your spoken sentence with
beeps after you are nished, the noise level is too high. Needless to say, with the presence
of any kind of noise, recognition accuracy will go down, especially when using the getResult()
method.
The short version is: If there is signicant noise in the room, use a headset microphone
connected to External MicIn.
If the noise isn’t too heavy and a headset is not in question, MOVI provides the THRESHOLD
command (or setT hreshold() call in the MOVI library). The call sets the noise threshold of the
speech/non-speech detector component of the recognizer. The possible values the
20 MOVI
TM
User’s Manual Revision 1.10beta2 -- July, 31st 2016
Table of contents
Popular Computer Hardware manuals by other brands

StetSom
StetSom STX2448 user manual

Nuvoton
Nuvoton NuMicro NuMaker-IoT-M487 Connecting guide

Peavey
Peavey KOSMOS PRO Operation guide

Alphacool
Alphacool EISBLOCK AURORA ACRYL GPX-A RADEON RX 6800 XT / 6900 XT MERC... manual

Commell
Commell MPX-24794G2 user manual

StarTech.com
StarTech.com ST10GPEXNDPI quick start guide