Micromint MICROVOX User manual

USERS
MANUAL
THE
MICROMINT
INC.
561
Willow
Avenue,
Cedarhurst,
N.Y.
11516


TABLE
OF
CONTENTS
Page
Introduction
1-2
MicroVox
Hardware
3-4
Speech
Synthesizer
and
Inflection
Circuitry
5-6
Device
Control
Codes
7-15
Setting
the
Serial
Port
15-16
Parallel
Input
Port
-
17
Phoneme
Codes
18-19
J3
-
Connector
Pinout
20
Jl
-
Connector
Pinout
21
Phonetic
Word
List
22-25
Parts
List
26-27
Layout
28
Schematics
29-32
Warranty
33


THE
MICROVOX
TEXT
TO
SPEECH
SYNTHESIZER
Copyright
1982
The
Micromint
Inc.
The
MicroVox
is
a
stand
alone
intelligent
peripheral
that
converts
ASCII
text
to
spoken
English.
The
MicroVox
is
attached
to
the
computer
(or
terminal,
modem,
etc.)
via
either
a
serial
or
parallel
cable.
Its
operation
is
similar
to
that
of
a
printer
except
that
the
output
is
speech
rather
than
printed
word.
The
MicroVox
has
many
programmable
options
which
produce
its
high
level
of
intelligibility.
It
has
the
following
features:
*
Phoneme
based
speech
synthesizer
*
6502
Microprocessor
*
64
crystal
controlled
inflection
levels
*
700
character
buffer
(optional
2.7K)
*
6K
byte
text-to-phoneme
algorithm
*
Full
ASCII
character
set
recognition
and
echo
*
Adjustable
baud
rates
(75-9600)
*
EIA
RS232C
and
parallel
input
interfaces
*
Phoneme
access
modes
*
Serial
X-on/X-off
handshaking
*
One
watt
amplifier
and
volume
control
*
Onboard
power
supply
*
Music
and
sound
effects
capability
Basically,
the
MicroVox
Speech
Synthesizer
consists
of
a
6502
based
microcomputer
with
a
voice
synthesizer
output
port.
It
has
a
6502
microprocesor,
crystal
controlled
75-9600
bps
full
handshaking
serial
interface,
parallel
input
port,
2K
bytes
of
RAM,
8K
bytes
of
EPROM,
and
has
an
onboard
power
supply.
The
EPROM
contains
the
operating
system
and
text
to
speech
algorithm.
Special
control
signals
are
sent
from
the
host
computer
to
select
among
many
different
user
options.
In
general,
these
control
signals
are
in
the
form:
I(letter)(option).
The
exclamation
point
is
a
signal
to
the
MicroVox
that
a
control
code
follows.
Options
can
be
changed
at
any
time
by
sending
the
appropriate
codes
preceding
or
imbedded
within
the
text.
What
is
a
Text
to
Speech
Synthesizer
?
With
the
majority
of
speech
synthesizer
interfaces,
text
to
speech,
or
the
actual
conversion
from
ASCII
characters
to
phonemes,
LPC
formants,
word
codes
etc.,
is
left
to
the
user.
Such
a
conversion
routine
will
be
more
or
less
elaborate,
depending
upon
the
required
vocabulary.
For
short
vocabularies,
the
conversion
program
might
consist
merely
of
a
table
of
words
and
their
appropriate
synthesizer
codes.
When
the
required
vocabulary
becomes
very
long,
or
in
fact
unlimited,
tables
become
cumbersome
and
a
text
to
speech
algorithm
is
required
instead.
page
1

A
text
to
speech
algorithm
is
a
program
which
takes
ASCII
data
and
performs
a
synthesis
by
rule
analysis
of
character
strings.
It
determines
which
characters
are
voiced
and
which
are
silent
by
following
a
set
of
general
rules
for
pronouncing
English
(text
to
speech
algorithms
can
be
written
for
other
languages
as
well).
Text
to
speech
algorithms
vary
in
length
depending
upon
exactness
of
voice
translation.
Typical
algorithms
are
in
the
4K
to
8K
byte
range
but
f
some
of
the
more
sophisticated
programs
are
up
to
80K
bytes.
The
primary
difference
between
a
6K
and
a
20K
algorithm
is
more
often
the
spelling
of
input
text
rather
than
any
specific
sound
quality
differences
(an
80K
algorithm
can
often
be
half
look
up
tables
for
exceptions
to
the
rules).
For
exact
pronunciation
it
might
be
necessary
to
spell
words
differently
to
more
easily
fit
the
prescribed
rules
on
the
smaller
algorithm,
such
as
entering
"com
pu
ter"
instead
of
computer.
The
only
other
limitations
are
features
such
as
pronunciation
of
punctuation
or
inflected
speech.
Both
of
these
capabilities
are
supported
in
the
MicroVox.
The
MicroVox
text
to
speech
synthesizer
is
a
smart
peripheral.
It
speaks
only
those
ASCII
strings
which
are
directed
to
it
through
either
its
serial
or
parallel
input
ports.
The
ASCII
text
can
result
from
PRINT
statements
in
BASIC
or
the
contents
of
complete
disk
files.
MicroVox
connects
to
the
computer
in
the
same
manner
as
a
printer
or
modem
and
virtually
anything
that
can
be
printed
or
viewed
on
the
CRT
can
be
vocalized.
The
MicroVox
is
a
combination
of
two
major
elements:
a
6502
based
microcomputer
and
a
Votrax
SC-01
speech
synthesizer
chip.
The
SC-01
is
a
CMOS
(complimentary
metal
oxide
semiconductor)
chip
which
consists
of
a
digital
code
translator
and
an
electronic
model
of
the
vocal
tract.
Internally,
there
is
a
phoneme
controller
which
translates
a
6
bit
phoneme
and
2
bit
pitch
code
into
a
matrix
of
spectral
parameters
which
adjusts
the
vocal
tract
model
to
synthesize
speech.
The
output
pitch
of
the
phonemes
is
controlled
by
the
frequency
of
the
clock
signal.
The
clock
frequency
is
nominally
720
KHz
but
subtle
variations
of
pitch
can
be
induced
to
add
inflection.
This
prevents
the
synthesized
voice
from
sounding
too
monotonous
or
"robotlike".
Listed
in
Table
1
are
the
64
phonemes
defined
for
the
English
language
(three
produce
no
sounds).
The
phoneme
sound
is
generated
when
a
6
bit
phoneme
code
is
transmitted
to
the
SC-01.
Each
phoneme
is
internally
timed
and
has
a
duration
of
47-250
msec
depending
on
the
particular
phoneme
selected
and
the
clock
frequency.
The
computer
operating
system
sends
these
codes
to
the
synthesizer
chip
through
a
latched
parallel
output
port
and
monitors
the
synthesizer's
activities
(the
A/R
line)
through
an
interrupt
line.
page
2

The
MicroVox
Hardware
As
previously
mentioned,
the
MicroVox
is
a
stand-alone
microcomputer
configured
to
function
as
an
intelligent
peripheral.
Figure
1
is
a
basic
block
diagram
of
MicroVox.
It
can
be
viewed
as
a
general
purpose
6502
based
computer
with
a
speech
synthesizer
attached
as
a
memory
mapped
I/O
port.
MicroVox
is
best
explained
by
dividing
the
circuitry
into
four
functional
subsections:
processor
and
timing,
memory,
serial
and
parallel
I/O,
and
speech
synthesizer.
Figure
2
is
the
complete
MicroVox
schematic.
Processor
and
Data
Rate
Clock
The
processor
is
a
1
MHz
6502.
The
processor
and
data
rate
clocks
are
derived
by
dividing
down
a
4.9152
MHz
crystal
through
IC6.
Using
a
4.9152
MHz
crystal
(base
is
75
times
2
to
the
16th)
and
a
12
stage
CD4040
binary
divider
(IC6),
9
rates
are
derived
directly:
75
bps,
150
bps,
300
bps,
600
bps,
1200
bps,
2400
bps,
4800
bps,
9600
bps,
and
19200
bps
(while
the
hardware
can
produce
19200
bps,
it
is
not
supported
by
the
operating
system).
The
MicroVox
will
not
communicate
at
110
bps.
See
"Setting
the
serial
port".
The
6502
processor
operates
at
a
clock
frequency
of
611
KHz.
Memory
Section
ICs
2-5
and
9
form
the
address
decoding
and
memory
section
of
the
MicroVox.
IC
9
decodes
the
5
most
significant
address
bits
to
create
8
strobes.
They
are
defined
as
follows:
Name
Hex
Address
Connection
and
Function
SEL0
000
IC2
Memory
Block
(RAM)
SELl
800
IC3
Memory
Block
(RAM)
SEL2
1000
IC10
Serial
Port
SEL3
1800
IC11
Parallel
Ports
SEL4
8000
IC14
Inflection
Clock
Rate
SEL5
A0
00
IC14
Phoneme
Latch
SEL6
cooo
IC5
Memory
Block
(EPROM)
SEL7
E000
IC4
Memory
Block
(EPROM)
the
MicroVox
configuration,
ICs
2
and
3
are
intended
be
RAM
while
ICs
4
and
5
are
EPROM
or
ROM.
The
pin
designations
for
ICs
2
and
3
are
for
2K
by
8
RAM
chips
such
as
the
Hitachi
6116
or
Toshiba
2016
(these
devices
are
2716
pin
compatible.
You
could
also
use
2716
EPROMs
in
these
sockets).
This
programmable
memory
is
used
for
conversion
tables,
register
stacks,
and
the
ASCII
input
buffer
(the
MicroVox
can
receive
data
faster
than
it
can
speak
it).
The
basic
MicroVox
uses
only
one
RAM
chip
which
allows
a
700
character
input
buffer.
By
adding
the
second
RAM
chip
in
IC3
(and
changing
a
few
EPROM
constants),
this
buffer
can
be
optionally
expanded
to
2.7K
characters.
page
3

The
text
to
speech
algorithm
is
placed
in
EPROM/ROM
positions
ICs
4
and
5.
Either
2716
(2K
X
8),
2732
(4K
X
8),
or
2764
(8K
X
8)
devices
can
be
used
in
these
positions
depending
upon
the
jumper
selections
JP4
and
JP5.
The
8K
byte
MicroVox
software
will
be
either
on
two
2732
EPROMs
and
require
both
sockets
or
a
single
8K
2764
(or
ROM
equivalent).
Serial
and
Parallel
I/O
MicroVox,
unlike
most
other
voice
synthesizers,
has
both
serial
and
parallel
input
ports
to
receive
ASCII
characters.
The
serial
port
uses
a
6850
asynchronous
communications
interface
adapter
(ACIA,
IC10)
which
is
software
programmable.
During
initialization,
the
ACIAs
functional
configuration
is
preset.
Considerations
such
as
word
length,
clock
division
ratios,
parity,
stop
bits,
etc.,
are
selected
by
properly
setting
bits
in
the
ACIA's
control
register.
The
data
rate
is
set
by
the
system
data
rate
clock
(from
SW2
and
IC6)
and
data
is
sent
and
received
from
the
Transmit
and
Receive
data
registers
respectively.
Information
such
as
framing
errors,
parity
errors,
and
buffer
and
handshaking
status,
are
determined
by
reading
the
ACIA
status
register.
On
the
MicroVox,
the
serial
port
can
be
used
with
or
without
hardware
handshaking
(CTS,
DCD,
RTS,
etc.).
This
is
especially
useful
when
communicating
over
modems
or
terminals
which
have
no
handshaking
signals.
Instead,
the
MicroVox
software
incorporates
software
handshaking.
When
receiving
ASCII
text
in
the
software
handshaking
mode
the
MicroVox
sends
an
to
the
host
computer
when
its
input
buffer
is
almost
full
(the
host
should
stop
sending
data).
It
sends
a
"#"
when
it
is
ready
to
receive
data
again.
Obviously,
even
this
can
be
ignored
if
the
data
rate
from
the
host
computer
never
exceeds
the
speed
at
which
the
buffer
is
emptied.
The
parallel
input
section
uses
an
8255
PIA
(IC11)
which
is
also
programmable.
As
configured,
8
bits
of
it
are
used
to
receive
parallel
format
ASCII
data
such
as
would
be
transmitted
to
a
parallel
printer.
Using
2
additional
pairs
for
the
strobe
and
acknowlege
handshaking,
the
MicroVox
can
attach
to
any
conventional
Centronics
printer
interface.
(As
configured,
the
34
pin
edge
connector
is
exactly
compatible
with
the
Radio
Shack
TRS
line
of
computers
and
can
connect
directly
to
their
34
pin
Centronics
printer
edge
connector).
DIP
switch
SWl
also
attaches
to
IC11.
Switch
positions
6
thru
8
set
serial
word
length,
stop
bits,
and
parity
on
the
ACIA;
switch
section
3
selects
hardware
or
software
handshaking;
sections
1,
2,
4,
and
5
are
not
used.
page
4

Speech
Synthesizer
and
Inflection
Circuitry
Probably
the
most
important
section
of
the
MicroVox
is
the
actual
speech
synthesizer
circuitry.
The
MicroVox
allows
64
levels
of
pitch
inflection.
The
output
pitch
of
the
phonemes
is
controlled
by
the
frequency
of
the
clock
signal.
The
output
pitch
is
a
function
of
this
clock
input
frequency
and
two
pitch
control
lines,
il
and
12
(each
acts
independently).
Four
rather
large
variations
in
pitch
(corresponding
to
!P1
thru
!P4
in
the
operating
system),
can
be
achieved
simply
by
using
these
manual
inflection
inputs.
More
subtle
variations
in
output
pitch
are
attained
by
externally
controlling
the
synthesizer
clock.
Using
the
1.22
MHz
system
clock
and
a
digital
rate
multiplier,
a
programmable
clock
can
be
created
to
produce
smaller
and
more
defined
pitch
inflection
changes.
On
a
SEL4
strobe,
a
four
bit
inflection
code
is
latched
into
IC13
and
applied
to
the
rate
multiplier.
The
four
bit
combination
results
in
16
clock
rates
from
614.4
KHz
to
902.4
KHz
in
19.2
KHz
increments
(corresponding
to
!Rl
thru
1R16
in
the
operating
system).
20
KHz
creates
a
relatively
small
pitch
change
by
itself
(out
of
a
720
KHz
nominal
input
frequency)
but,
used
dynamically
in
a
sentence
it
creats
a
definite
improvement
in
intelligibility.
The
pitch
levels
IPl
thru
IP4
are
the
base
pitch
and
the
16
frequencies
from
the
rate
multiplier,
IRl
thru
1R16,
are
the
clock
rate.
The
combination
of
the
two
functions
results
in
64
pitch
levels
or
inflections.
The
pitch
at
which
individual
phonemes
are
pronounced
may
be
controlled
automatically
by
the
text
to
speech
algorithm,
kept
fixed,
or
altered
by
user
command.
Some
users
prefer
automatic
inflection,
because
of
the
variety
it
gives
to
the
speech.
Others
think
a
computer
should
sound
like
a
computer
and
prefer
the
flat
speech
to
artificially
intoned
speech.
Still
others
may
wish
to
directly
control
the
pitch
to
make
the
unit
"sing"
(pitch
and
rate
codes
may
be
mixed
with
phoneme
codes
to
produce
"singing")
or
pronounce
words
with
special
emphasis.
The
user
may
control
the
base
pitch
setting
independently
of
the
clock
rate.
The
user
options
are:
IPl
(low
pitch)
IP2
(medium
low
pitch)
IP3
(medium
high
pitch)
1P4
(high
pitch)
The
user
may
also
control
the
clock
rate.
IRl
(slowest
rate
—
lowest
level
for
the
given
base
pitch)
IR2
(slightly
faster)
1R3...IR16
(increasingly
faster
rates)
page
5

The
MicroVox
has
the
ability
to
play
musical
notes
and
produce
sound
effects.
This
is
accomplished
by
using
a
program
routine
to
toggle
one
bit
of
IC11
at
a
predetermined
rate.
This
lead,
in
addition
to
the
output
from
the
speech
synthesizer
chip
(IC12)
is
connected
to
the
output
amplifier.
The
results
are
similar
to
the
sound
produced
on
the
internal
speaker
in
an
APPLE
II
computer
(it
uses
the
same
technique).
OPERATOR
INTERACTION
WITH
THE
TEXT
TO
SPEECH
SOFTWARE
The
MicroVox
is
a
stand
alone
intelligent
peripheral
that
converts
ASCII
text
to
spoken
English.
The
MicroVox
is
attached
to
the
computer
(or
terminal,
modem,
etc.)
via
either
a
serial
or
parallel
cable.
Its
operation
is
similar
to
that
of
a
printer
except
that
the
output
is
speech
rather
than
printed
word.
The
MicroVox
has
many
programmable
options
which
produce
its
high
level
of
intelligiblity.
These
options
are
called
device
control
signals
and
are
transmitted
to
the
MicroVox
along
with
the
text.
Device
controls
signals
are
sent
from
the
host
computer
to
select
among
many
different
user
options.
In
general,
MicroVox
control
signals
are
in
the
form:
l(letter)(option)(option)
for
example:
1HXY
The
exclamation
point
is
a
signal
to
the
MicroVox
that
a
control
code
follows.
The
user
may
if
he
wishes
use
any
other
character
as
the
signal.
This
is
done
by
giving
the
following
instruction:
(old
signal
character)X(new
signal
character)
for
example:
1X$
will
change
the
control
signal
from
an
exclamation
point
to
a
dollar
sign
and
$X*
will
change
it
then
from
a
dollar
sign
to
an
asterisk.
Device
control
signals
can
be
imbedded
anywhere
in
the
text
transmission
and
are
not
spoken.
Once
a
device
control
signal
has
been
sent
to
the
MicroVox,
all
succeeding
text
entry
will
be
subject
to
that
default
setting
until
it
is
changed.
For
example
if
letter
by
letter
pronunciation
is
invoked
with
IE
then
all
text
will
be
spelled
until
a
IT
is
sent
to
reinvoke
text
to
speech
translation.
page
6

DEVICE
CONTROL
CODES
Software
Handshaking
If
standard
parallel
or
RS-232C
serial
connections
are
used
the
sending
computer
hardware
will
detect
and
examine
the
RTS
signal
and
determine
whether
the
MicroVox
is
ready
to
receive
a
character
or,
if
busy,
take
appropriate
action.
However,
many
popular
brands
of
microcomputers
lack
the
hardware
to
detect
RS-232C
handshaking
signals
and
these
handshaking
signals
do
not
pass
through
modems
back
to
mainframe
computers.
In
the
MicroVox,
special
software
handshaking
signals,
described
below,
are
provided
for
these
purposes
(in
general,
hardware
handshaking
is
preferable
whenever
it
is
possible
to
use
it,
because
it
relieves
the
host
computer's
processor
of
the
handshaking
chores
and
allows
use
of
higher
data
rates).
For
software
handshaking,
switch
position
3
on
dip
switch
SWl
is
set
in
the
closed
position
(open
is
hardware
handshaking).
The
following
option
is
provided:
!H(busy
character)(ready
character)
Example:
!H@#
In
the
example
shown,
the
MicroVox
will
send
the
character
to
the
computer
when
it
is
unable
to
receive
more
data,
and
will
send
to
the
computer
when
it
is
again
ready
to
receive
data.
It
is
the
responsibility
of
the
computer
programmer
to
write
the
software
necessary
for
the
use
of
these
options.
NOTE:
While
in
the
example
above
the
handshaking
characters
are
'@'
and
'#',
the
default
mode
of
the
MicroVox
uses
the
characters
'R'
and
'B'
instead.
Use
the
above
described
method
to
set
any
other
pair
of
handshaking
characters.
Finally,
it
is
possible
to
use
the
MicroVox
with
no
handshaking
by
simply
invoking
the
software
handshaking
mode
and
ignoring
the
handshaking
transmissions.
In
this
case,
it
is
the
user's
responsibility
to
insert
timing
delays
in
the
program
so
that
data
will
not
be
sent
to
the
MicroVox
faster
than
it
can
handle
the
data.
Speech,
Spelled
Speech,
Phoneme
Code,
and
Music
Modes
The
MicroVox
can
operate
in
four
different
modes:
text
to
speech,
text
to
spelled
speech,
phoneme
codes,
and
music.
When
the
MicroVox
is
turned
on
it
is
in
text
to
speech
mode,
however,
the
user
can
select
among
the
following
options:
IT
(text
to
speech)
IE
(spelled
speech
—
say
each
letter)
1C
(phoneme
codes)
IN
(musical
notes)
Page
7

NOTE:
The
default
mode
is
IT.
To
exit
any
mode
you
must
enter
another.
For
example,
if
you
are
in
the
IE
mode,
to
return
to
text
to
speech
you
must
type
IT.
Also,
changing
between
mode
frequently
resets
selected
options
to
the
default
mode.
Text
to
Speech
The
software
used
in
the
text
to
speech
algorithm
incorporated
in
the
MicroVox
is
derived
from
an
algorithm
conceived
by
the
Naval
Research
Laboratory.
This
algorithm
combines
word,
morph
and
letter
rules
in
a
single
table
of
about
400
rules.
This
table
contains
subtables
for
each
letter
of
the
alphabet
and
achieves
very
intelligible
speech.
In
the
text
to
speech
mode
(IT),
this
algorithm
attempts
the
correct
pronunciation
of
any
phrase
sent
to
it.
However,
no
program
of
reasonable
size
can
possibly
contain
all
the
rules
and
exceptions
for
the
pronunciation
of
English.
Moreover,
since
the
MicroVox
lacks
extra-sensory
perception,
it
cannot
tell
for
instance,
when
the
user
sends
"READ"
if
the
present
or
the
past
tense
is
meant.
The
solution
when
a
word
is
not
pronounced
to
the
user's
satisfaction
is
to
alter
the
spelling.
By
typing
RED
or
REED
instead
of
READ,
the
user
can
be
sure
to
get
the
desired
pronunciation.
If
HICCOUGH
is
pronounced
strangely,
try
HICCUP.
Often
it
helps
to
break
a
word
into
syllables.
Compare
the
pronunciation
of
TYPEWRITER
and
TYPE
WRITE
ER.
Foreign
words
will
require
considerable
ingenuity,
since
the
MicroVox
works
on
the
principles
of
English
pronunciation.
Compare
PARLEZ
VOUS
and
PARLAY
VOO.
Spelled
Speech
The
spelled
speech
mode
is
useful
for
abbreviations
and
words
that
a
user
might
have
difficulty
in
understanding.
When
this
option
is
selected,
every
letter
is
pronounced
separately.
(By
selecting
the
IA
punctuation
mode,
punctuation
will
also
be
pronounced).
Example:
IT
THE
WORD
AWFUL
IS
SPELLED
IE
AWFUL
IT
In
this
example,
the
MicroVox
will
say
"THE
WORD
AWFUL
IS
SPELLED",
and
then
spell
out
A
W
F
U
L.
The
IT
at
the
end
returns
the
Microvox
to
the
text
to
speech
mode.
Phoneme
Mode
The
MicroVox
may
also
be
programmed
directly
in
phoneme
codes.
A
space
must
be
left
between
the
mnemonic
codes.
For
example:
IC
AE
N
D
PAO
THV
UH2
PAO
S
E
PAO
I
Z
PAO
B
01
AY
13
L
I
NG
PAO
H
AH
T
PAl
will
say
"and
the
sea
is
boiling
hot".
Page
8

The
intonation
I
or
F
modes
can
be
either
on
or
off
when
phoneme
codes
are
used.
If
the
intonation
is
off,
the
rate
which
is
output
will
be
the
base
rate.
If
it
is
on,
intonation
will
be
like
that
for
text.
If
there
are
errors
in
the
codes,
the
erroneous
codes
will
be
spoken
as
if
they
were
text.
Music
Mode
Music
mode
can
be
turned
on
by
IN.
In
music
mode,
the
following
notation
is
used.
There
are
7
octaves
centered
about
middle
C,
indicated
by
numbers
from
1
to
7.
Notes
are
A,
B,
C,
D,
E,
F,
G.
A
sharp
is
indicated
by
"+",
flat
by
The
length
of
a
note
may
be
from
1
to
256
times
an
internal
time
constant.
Rests
are
indicated
by
R.
For
instance
3F+26
means
third
octave,
F
sharp,
26
time
constants
long.
R16
means
a
sixteen
time
constant
rest.
The
music
mode
suspends
the
MicroVox
operating
system
and
no
serial
or
parallel
data
can
be
received
during
music
output.
Also,
entering
music
mode
will
reset
most
previously
set
control
codes.
Text
Synchronization
For
many
applications
it
is
important
to
synchronize
speech
with
external
such
as
text
or
actions
appearing
on
the
screen.
For
instance,
an
instructional
program
may
require
placing
a
picture
on
the
screen
when
certain
speech
output
begins
and
a
question
on
the
screen
when
it
ends.
For
synchronization,
the
following
option
is
provided:
IK(synchronization
character)
Example:
IK#J0HN!K%MARSHA1K$
In
the
example
shown,
the
MicroVox
will
send
a
"#"
back
to
the
computer
just
before
starting
to
say
"JOHN";
it
will
send
a
"%"
to
the
computer
just
after
saying
"JOHN"
and
just
before
starting
to
say
"MARSHA";
and
it
will
send
a
"$"
character
to
the
screen
just
after
saying
"MARSHA".
Example:
LOOK
AT
THE
SCREEN
NOW
IK#
In
this
example,
a
"#"
will
be
transmitted
to
the
host
computer
after
saying
"LOOK
AT
THE
SCREEN
NOW".
None
of
these
special
synchronization
characters
will
be
spoken.
It
is
the
programmer's
responsibility
to
use
the
incoming
synchronization
characters
to
coordinate
the
screen
display
with
the
speech.
Page
9

Phrase
Termination
Many
aspects
of
English
pronunciation
are
controlled
by
the
context
in
which
a
given
letter
or
word
is
spoken.
For
this
reason,
the
MicroVox
will
await
a
complete
phrase
before
translating
from
text
to
speech.
If
the
user
does
not
specify
otherwise,
the
MicroVox
will
wait
to
translate
a
phrase
until
it
has
received
one
of
the
following
phrase
terminating
characters:
(1)
a
period
followed
by
two
spaces
or
a
carriage
return
(2)
a
comma,
semicolon,
colon,
exclamation
point,
or
question
mark
followed
by
a
space
or
carriage
return.
(3)
a
carriage
return
For
some
types
of
output,
such
as
computer
programs
or
poems,
the
user
will
want
each
line
read
as
a
separate
phrase.
For
others,
such
as
ordinary
English
text,
the
user
may
not
want
a
carriage
return
to
terminate
a
phrase.
The
user
is
given
the
following
options
to
deal
with
this
situation:
!L
and
IW
"IW"
means
"Whole
text
pronunciation".
If
this
option
is
selected,
a
carriage
return
will
not
terminate
a
phrase
unless
the
carriage
return
is
preceded
by
one
of
the
punctuation
marks
indicated
in
(1)
and
(2)
above.
"!L"
means
"Line-by-line
pronunciation".
If
this
option
is
selected,
a
carriage
return
will
always
be
treated
by
the
MicroVox
as
terminating
a
phrase.
When
the
MicroVox
is
first
turned
on
it
is
in
the
"L"
mode.
Rather
than
send
a
special
signal
to
terminate
a
phrase,
the
user
may
wish
to
have
the
MicroVox
treat
a
phrase
as
terminated
if
a
certain
delay
occurs
without
any
phrase
terminator
being
received.
Possible
applications
of
this
option
include
situations
where
the
user
does
not
fully
control
the
output.
For
instance,
suppose
the
MicroVox
is
passively
connected
to
a
transmitting
device
which
doesn't
send
any
of
the
terminating
characters
listed
above
(maybe
it
sends
"STOP"
instead).
In
such
a
case,
there
is
no
way
to
insert
phrase
termination
characters
in
the
output
stream.
However,
if
the
MicroVox
is
set
to
treat
a
half
second
delay
without
receipt
of
information
as
the
end
of
a
phrase,
computer
output
will
not
be
lost
or
ignored.
The
user
is
given
the
following
option
to
provide
delayed
phrase
termination:
ID(delay
number)
1D1
through
ID8
result
in
a
delay
of
50
x
2n
milliseconds
where
"n"
is
the
number
following
"D"
(Note:
If
too
short
a
delay
is
used,
a
phrase
may
be
translated
in
pieces
resulting
in
odd
intonation
or
pronunciation,
since
the
MicroVox
uses
the
context
of
letters
and
words
to
determine
their
pronunciation.)
Page
10

1D9
is
a
special
case.
The
MicroVox
waits
for
a
phrase
terminating
character
even
if
it
has
to
wait
forever.
1D9
is
the
default
mode
(at
power
up)
and
should
be
used
with
slow
data
sources
such
as
hand
typing
on
a
terminal.
This
selectable
delay
feature
is
particularly
useful
for
the
handicapped.
It
allows
a
blind
programmer
to
use
a
standard
unintelligent
terminal.
This
is
facilitated
by
connecting
the
MicroVox
to
receive
the
output
from
both
the
user
and
the
computer.
Using
the
"ID"
command,
the
MicroVox
can
echo
all
communication
either
way.
If
the
delay
is
set
to
about
0.1
seconds,
keys
pressed
by
the
user
would
be
echoed
as
spelled
letters
because
the
slight
delay
between
them
will
be
treated
as
an
end
of
phrase
but,
output
generated
by
the
computer
will
be
spoken
as
complete
lines,
because
there
generally
will
be
no
significant
delay
between
characters.
The
delay
may
be
varied
to
fit
the
particular
application.
The
MicroVox
must
be
in
the
IF
mode
before
entering
the
D
mode.
Also,
once
in
the
D
mode,
other
control
changes
can
only
be
received
if
the
MicroVox
is
set
to
1D9
first
(so
that
it
can
interpret
the
input
rather
than
just
echo
the
characters).
Intonation
Within
the
MicroVox,
a
special
intonation
algorithm
is
included.
However,
providing
realistic
intonation
is
much
more
difficult
than
choosing
the
correct
phonemes.
Most
intonation
patterns
are
not
represented
in
English
spelling.
Without
knowing
the
writer's
state
of
mind,
achieving
the
proper
intonation
may
require
grammatical
parsing
of
a
sentence.
The
algorithm
attempts
to
raise
the
pitch
on
stressed
syllables,
raising
it
at
the
start
of
sentences
and
before
commas,
lowering
the
pitch
before
the
period
at
the
end
of
a
sentence.
Before
a
question
mark,
the
pitch
is
'
raised,
unless
the
sentence
begins
with
a
question
word
(who,
what,
when,
where,
etc.),
in
which
case
it
is
lowered.
The
pitch
at
which
individual
phonemes
are
pronounced
may
be
controlled
automatically
by
the
text
to
speech
algorithm,
be
kept
fixed,
or
be
altered
by
user
command.
Some
people
prefer
automatic
inflection,
because
of
the
variety
it
gives
to
the
speech,
even
though
the
inflection
is
often
not
accurate.
Others
think
a
computer
should
sound
like
a
computer
and
prefer
the
flat
speech
to
artificially
intoned
speech.
Still
others
may
wish
to
experiment
with
controlling
the
pitch
themselves
to
optimize
intelligibility.
This
control
can
extend
even
to
make
the
MicroVox
"sing".
The
hardware
in
the
MicroVox
allows
control
of
pitch
in
two
different
ways.
The
VOTRAX
SC-01A
synthesizer
chip
has
four
selectable
pitch
levels.
In
addition,
the
output
pitch
may
be
varied
by
selecting
one
of
sixteen
different
rates
for
the
clock
which
controls
the
synthesizer
chip.
When
the
MicroVox
is
first
turned
on,
the
synthesizer
chip
is
set
to
base
pitch
level
1
(low)
and
clock
rate
#5
(defined
below).
The
intonation
Page
11

is
generated
by
an
algorithm
which
selects
an
appropriate
clock
rate
for
each
phoneme.
To
turn
on
or
off
the
automatic
intonation
algorithm,
the
user
may
send
the
command:
IF
(flat
intonation
—
monotone)
and
the
output
rate
will
stay
at
the
default
base
and
clock
rate.
To
invoke
automatic
clock
rate
setting,
the
user
may
send
the
command:
II
(inflected
intonation
by
algorithm)
The
intonation
algorithm
adds
or
subtracts
from
the
base
rate
to
ultimately
select
the
final
voice
pitch.
Using
the
II
mode
however,
only
four
clock
rate
pitch
level
shifts
(out
of
16
possible)
are
used.
The
user
may
decide
not
to
implement
automatic
inflection
on
all
text
to
speech
translation
yet
desire
to
add
certain
pitch
changes
on
specific
words
or
phonemes.
This
can
be
easily
done
on
the
MicroVox
since
the
base
pitch
and
the
clock
rate
can
be
controlled
independently
and
changed
at
any
time.
The
user
options
are:
IP1
(low
pitch)
IP2
(medium
low
pitch)
IP3
(medium
high
pitch)
IP4
(high
pitch)
The
user
may
also
control
the
clock
rate:
IRl
(slowest
rate,
lowest
level
for
the
given
base
pitch)
IR2
(slightly
faster)
1R3...IR16
(increasingly
faster
rates)
Example:
IPi
IR5
THIS
IS
A
IR8
TEST
In
this
example,
"THIS
IS"
will
be
spoken
at
clock
rate
R5
and
"TEST"
will
be
spoken
at
R8.
(Note:
The
clock
rate
will
remain
at
R8
from
then
on
unless
changed).
Example:IF
IPl
IR5
IS
YOUR
NUM
IR8
BER
IR4
FOUR
FIVE
IR9
NINE
?
In
this
example,
we
can
make
a
question
sound
more
like
a
question
by
adding
pitch
changes
at
important
points
in
the
sentence.
"IS
YOUR"
and
"NUM"
are
spoken
at
R5.
"BER"
is
raised
in
pitch
to
R8
and
then,
"FOUR
FIVE"
(you
could
also
use
45)
is
pronounced
at
a
lower
frequency
of
R4.
Finally,
"NINE"
is
raised
in
pitch
to
R9
to
end
the
sentence
in
a
questioning
tone.
The
question
mark
will
only
be
spoken
if
the
punctuation
modes
(IA
or
IM)
are
invoked.
Page
12

Note:
When
using
the
manual
inflection
mode
f
it
is
important
to
set
flat
inflection
(IF)
mode
or
the
algorithm
will
try
to
add
automatic
inflection
in
addition
to
that
manually
selected.
Also,
pitch
and
clock
rates
may
be
changed
at
any
time
in
any
mode.
Punctuation
modes
There
are
three
modes
for
pronunciation
of
punctuation
in
the
MicroVox.
The
user
options
are:
IA
(all
mode
—all
punctuation
pronounced)
1M
(most
mode
—
all
punctuation
pronounced
except
return,
linefeed,
and
space)
IS
(some
mode
—
only
unusual
punctuation
pronounced)
When
the
MicroVox
is
turned
on
it
is
in
"some"
mode.
In
the
IM
mode
spaces
between
words
are
treated
as
pauses
and
can
be
used
to
regulate
the
pace
of
speech
or
emphasize
particular
words.
The
MicroVox
recognizes
and
pronounces
all
ASCII
characters
with
codes
between
hex
20
and
hex
7F.
The
operating
system
does
not
recognize
control
codes
other
than
BACKSPACE
(08)
,
TAB
(09),
LINE
FEED
(0A),
RETURN
(0D),
an
ESCAPE
(IB).
Receipt
of
other
control
codes
or
nulls,
can
have
unpredictable
results
since
the
MicroVox
uses
some
of
them
for
internal
coding.
Illegal
control
codes
should
be
avoided
in
the
text
sent
to
the
MicroVox.
On
Line
/
Off
Line
Mode
The
MicroVox
can
be
selectively
turned
on
and
off
line
(it
has
to
remain
powered,
however).
This
capability
allows
it
to
be
attached
in
parallel
with
another
peripheral
such
as
a
printer,
yet
not
speak
what
is
being
printed.
The
control
code
is:
10
(On
Line
-
MicroVox
is
operational.
It
responds
to
all
device
codes
and
text
input)
IQ
(Quit
-
Off
Line
-
MicroVox
only
responds
to
10)
Default
Modes
When
the
MicroVox
is
powered
up
certain
default
modes
are
in
force.
They
are
equivalent
to
entering
the
following
commands:
Page
13

!0
on
line
IP1
IR5
low
base
pitch,
clock
rate
#5
IF
flat
intonation
IT
text
to
speech
mode
IS
some
punctuation
IL
Line
by
line
pronunciation
ID9
wait
for
carriage
return
phrase
terminator
(When
shipped
from
the
factory,
MicroVox
is
set
for
300
bps,
8
bit
words,
no
parity,
2
stop
bits,
and
software
handshaking)
At
any
time
these
defaults
are
to
be
changed,
simply
send
the
control
code
to
the
MicroVox.
The
codes
can
be
transmitted
separately
or
imbedded
in
text.
For
example,
entering
THIS
IS
A
TEST,
and
a
carriage
return
will
result
in
that
phrase
being
spoken
with
no
intonation.
To
add
automatic
intonation
the
sentence
becomes
(all
sentences
are
presumed
to
end
with
a
carriage
return):
II
THIS
IS
A
TEST
From
this
point
on
all
spoken
text
will
have
automatic
inflection
unless
flat
intonation
is
resumed
with
IF.
As
previously
mentioned,
intonation
can
be
added
selectively
or
by
the
automatic
algorithm.
You
can
say
the
following
sentence
four
ways:
1.
text
to
speech,
no
added
inflection
IT
IF
PLEASE
ENTER
YOUR
ACCESS
NUMBER
2.
automatic
inflection
in
text
to
speech
mode
IT
II
PLEASE
ENTER
YOUR
ACCESS
NUMBER
3.
selected
inflection
in
text
to
speech
mode
IT
IF
IP1
IR5
PLEASE
IR8
EN
IR5
TER
IR7
YOR
IR5
ACCESS
NUMBER
4.
phoneme
input
mode
with
selected
intonation
IF
IC
IP1IR5
P
L
El
Y
Z
PAl
PAl
PAl
PAl
IR9
EHl
EH3
N
IR5
T
ER
PAl
Y
IR8
02
02
02
IR5
R
PAl
IR7
AEl
IR5
K
S
EHl
EH3
S
PAl
N
UH1
M
B
ER
These
examples
demonstrate
various
ways
in
which
the
user
can
increase
intelligibility
of
the
synthesized
speech.
The
MicroVox
is
completely
programmable,
you
can
combine
text
to
speech
with
either
selective
or
automatic
intonation
or
optimize
pronunciation
by
choosing
exactly
the
pitches
and
phonemes
you
wish.
An
exaggerated
example
of
combined
pitch
and
phoneme
control
can
actually
allow
MicroVox
to
sing
as
demonstrated
in
a
bar
of
"happy
birthday"
and
a
musical
scale.
Page
14

"Happy
Birthday"
IC
IP3
IR3
H
H
H
AE1
AEl
AEl
AEl
AE1
AEl
P
P
IP2IR5
Y
Y
Y
IP31R5
B
ER
ER
ER
ER
R
TH
TH
TH
TH
1R1
D
Al
Al
Al
A1
13
IR9
T
IU
IU
IU
IU
U1
U1
U1
U1
Ul
IR7
Y1
IU
IU
IU
U1
U1
U1
U1
Ul
Ul
!C
1P1
IR1
D
D
El
El
Y
Y
Y
I
PI
IR5
El
El
El
Y
Y
Y
!P1
IRll
EH1
EH1
EH1
EH2
F
F
P
1P2
1R5
D
J
J
El
El
Y
Y
Y
1P2
IRll
Al
Al
Al
Al
Al
Y
IP2
IR14
B
B
El
El
Y
Y
Y
1P3
IRll
S
S
El
El
Y
Y
Y
IP3
IR15
D
D
El
El
Y
Y
Y
Summary
Table
of
Device
Codes
10,
IQ
-
On
line
and
Off
line
IK
-
synchronize
speech
and
text
!L
-
line
by
line
pronunciation
IW
-
whole
text
pronunciation
IE
-
each
letter
pronunciation
IC
-
pronounce
by
direct
phoneme
input
IN
-
produce
musical
notes
IT
-
pronounce
by
text-to-speech
algorithm
!A,
IM,
or
IS
-
speak
all,
most,
or
some
punctuation
IF
-
set
monotone
or
flat
intonation
II
-
set
automatic
inflected
intonation
IP
and
IR
-
set
intonation
base
pitch
and
clock
rate
ID1-1D8
and
ID9
-
set
phrase
terminator
delay
SETTING
THE
SERIAL
PORT
DTE/DCE
Setting
Behind
Jl
(the
DB-25
serial
connector)
on
the
PC
board
is
a
2
by
3
header
and
two
jumpers.
These
jumpers
set
whether
pins
2
and
3
are
transmit
data
and
receive
data
respectively
or
vice
versa.
As
received
from
the
factory,
the
jumpers
are
in
the
DCE
position
and
pin
2
is
RD
and
pin
3
is
TD.
To
reverse
these
designations,
place
the
jumpers
in
the
DTE
positions.
Data
Rate
SW2
is
the
data
rate
(sometimes
called
BAUD
rate)
selection
switch.
The
data
rates
are
listed
along
side
SW2.
SW2
can
be
either
a
2
by
8
or
9
position
Berg
type
pin
connector
or
a
16
pin
DIP
switch.
If
a
Berg
connector
is
installed,
a
jumper
is
provided
to
select
the
desired
data
rate.
Simply
place
it
across
the
pair
of
terminals
next
to
the
desired
data
rate.
Page
15

If
SW2
is
a
DIP
switch,
close
the
switch
position
next
to
the
desired
data
rate.
Only
that
one
position
should
be
closed
and
the
other
seven
positions
should
be
in
the
open
position.
For
75
bits
per
second,
it
will
be
necessary
to
attach
a
physical
jumper
across
JP1.
All
positions
on
SW2
should
be
left
open.
Handshaking
For
software
handshaking,
switch
position
3
on
dip
switch
SWl
is
set
in
the
closed
position.
For
hardware
handshaking,
switch
position
3
is
left
open.
If
standard
EIA
RS-232C
serial
connections
are
used,
the
sending
computer
hardware
will
detect
and
examine
the
RTS
signal
and
determine
whether
the
MicroVox
is
ready
to
receive
a
character
or,
if
busy,
take
appropriate
action.
With
software
handshaking,
the
MicroVox
will
send
the
character
"
R
"
to
the
computer
when
it
is
unable
to
receive
more
data,
and
will
send
"B
w
to
the
computer
when
it
is
again
ready
to
receive
data.
It
is
the
responsibility
of
the
computer
programmer
to
write
the
software
necessary
for
the
use
of
these
options.
Finally,
it
is
possible
to
use
the
MicroVox
with
no
handshaking
by
simply
invoking
the
software
handshaking
mode
and
ignoring
the
handshaking
transmissions.
In
this
case,
it
is
the
user's
responsibility
to
insert
timing
delays
in
the
program
so
that
data
will
not
be
sent
to
the
MicroVox
faster
than
it
can
handle
the
data.
Word
Length,
Parity
and
Stop
Bits
Three
switch
positions
on
SWl
set
the
transmission
protocol.
The
following
is
a
list
of
the
eight
possibilities
and
their
functions:
Function
Position
6
Position
7
Position
8
7
bits.
EP,
2
SB
closed
closed
closed
7
bits.
OP,
ZSB
closed
open
closed
7
bits.
EP,
1SB
closed
closed
open
7
bits,
OP,
1SB
closed
open
open
8
bits.
2
SB
open
closed
closed
8
bits.
1SB
open
open
closed
8
bits.
EP,
1SB
open
closed
open
8
bits.
OP,
1SB
open
open
open
EP
=
Even
Parity
OP
=
Odd
Parity
SB
=
Stop
Bit(s)
Page
16
Other manuals for MICROVOX
1
Table of contents
Other Micromint Synthesizer manuals