Micromint Microvox User manual

USERS

MANUAL

THE

MICROMINT

INC.

561

Willow

Avenue,

Cedarhurst,

N.Y.

11516

TABLE

OF

CONTENTS

Page

Introduction

1-2

MicroVox

Hardware

3-4

Speech

Synthesizer

and

Inflection

Circuitry

5-6

Device

Control

Codes

7-15

Setting

the

Serial

Port

15-16

Parallel

Input

Port

-

17

Phoneme

Codes

18-19

J3

-

Connector

Pinout

20

Jl

-

Connector

Pinout

21

Phonetic

Word

List

22-25

Parts

List

26-27

Layout

28

Schematics

29-32

Warranty

33

THE

MICROVOX

TEXT

TO

SPEECH

SYNTHESIZER

Copyright

1982

The

Micromint

Inc.

The

MicroVox

is

a

stand

alone

intelligent

peripheral

that

converts

ASCII

text

to

spoken

English.

The

MicroVox

is

attached

to

the

computer

(or

terminal,

modem,

etc.)

via

either

a

serial

or

parallel

cable.

Its

operation

is

similar

to

that

of

a

printer

except

that

the

output

is

speech

rather

than

printed

word.

The

MicroVox

has

many

programmable

options

which

produce

its

high

level

of

intelligibility.

It

has

the

following

features:

*

Phoneme

based

speech

synthesizer

*

6502

Microprocessor

*

64

crystal

controlled

inflection

levels

*

700

character

buffer

(optional

2.7K)

*

6K

byte

text-to-phoneme

algorithm

*

Full

ASCII

character

set

recognition

and

echo

*

Adjustable

baud

rates

(75-9600)

*

EIA

RS232C

and

parallel

input

interfaces

*

Phoneme

access

modes

*

Serial

X-on/X-off

handshaking

*

One

watt

amplifier

and

volume

control

*

Onboard

power

supply

*

Music

and

sound

effects

capability

Basically,

the

MicroVox

Speech

Synthesizer

consists

of

a

6502

based

microcomputer

with

a

voice

synthesizer

output

port.

It

has

a

6502

microprocesor,

crystal

controlled

75-9600

bps

full

handshaking

serial

interface,

parallel

input

port,

2K

bytes

of

RAM,

8K

bytes

of

EPROM,

and

has

an

onboard

power

supply.

The

EPROM

contains

the

operating

system

and

text

to

speech

algorithm.

Special

control

signals

are

sent

from

the

host

computer

to

select

among

many

different

user

options.

In

general,

these

control

signals

are

in

the

form:

I(letter)(option).

The

exclamation

point

is

a

signal

to

the

MicroVox

that

a

control

code

follows.

Options

can

be

changed

at

any

time

by

sending

the

appropriate

codes

preceding

or

imbedded

within

the

text.

What

is

a

Text

to

Speech

Synthesizer

?

With

the

majority

of

speech

synthesizer

interfaces,

text

to

speech,

or

the

actual

conversion

from

ASCII

characters

to

phonemes,

LPC

formants,

word

codes

etc.,

is

left

to

the

user.

Such

a

conversion

routine

will

be

more

or

less

elaborate,

depending

upon

the

required

vocabulary.

For

short

vocabularies,

the

conversion

program

might

consist

merely

of

a

table

of

words

and

their

appropriate

synthesizer

codes.

When

the

required

vocabulary

becomes

very

long,

or

in

fact

unlimited,

tables

become

cumbersome

and

a

text

to

speech

algorithm

is

required

instead.

page

1

A

text

to

speech

algorithm

is

a

program

which

takes

ASCII

data

and

performs

a

synthesis

by

rule

analysis

of

character

strings.

It

determines

which

characters

are

voiced

and

which

are

silent

by

following

a

set

of

general

rules

for

pronouncing

English

(text

to

speech

algorithms

can

be

written

for

other

languages

as

well).

Text

to

speech

algorithms

vary

in

length

depending

upon

exactness

of

voice

translation.

Typical

algorithms

are

in

the

4K

to

8K

byte

range

but

f

some

of

the

more

sophisticated

programs

are

up

to

80K

bytes.

The

primary

difference

between

a

6K

and

a

20K

algorithm

is

more

often

the

spelling

of

input

text

rather

than

any

specific

sound

quality

differences

(an

80K

algorithm

can

often

be

half

look

up

tables

for

exceptions

to

the

rules).

For

exact

pronunciation

it

might

be

necessary

to

spell

words

differently

to

more

easily

fit

the

prescribed

rules

on

the

smaller

algorithm,

such

as

entering

"com

pu

ter"

instead

of

computer.

The

only

other

limitations

are

features

such

as

pronunciation

of

punctuation

or

inflected

speech.

Both

of

these

capabilities

are

supported

in

the

MicroVox.

The

MicroVox

text

to

speech

synthesizer

is

a

smart

peripheral.

It

speaks

only

those

ASCII

strings

which

are

directed

to

it

through

either

its

serial

or

parallel

input

ports.

The

ASCII

text

can

result

from

PRINT

statements

in

BASIC

or

the

contents

of

complete

disk

files.

MicroVox

connects

to

the

computer

in

the

same

manner

as

a

printer

or

modem

and

virtually

anything

that

can

be

printed

or

viewed

on

the

CRT

can

be

vocalized.

The

MicroVox

is

a

combination

of

two

major

elements:

a

6502

based

microcomputer

and

a

Votrax

SC-01

speech

synthesizer

chip.

The

SC-01

is

a

CMOS

(complimentary

metal

oxide

semiconductor)

chip

which

consists

of

a

digital

code

translator

and

an

electronic

model

of

the

vocal

tract.

Internally,

there

is

a

phoneme

controller

which

translates

a

6

bit

phoneme

and

2

bit

pitch

code

into

a

matrix

of

spectral

parameters

which

adjusts

the

vocal

tract

model

to

synthesize

speech.

The

output

pitch

of

the

phonemes

is

controlled

by

the

frequency

of

the

clock

signal.

The

clock

frequency

is

nominally

720

KHz

but

subtle

variations

of

pitch

can

be

induced

to

add

inflection.

This

prevents

the

synthesized

voice

from

sounding

too

monotonous

or

"robotlike".

Listed

in

Table

1

are

the

64

phonemes

defined

for

the

English

language

(three

produce

no

sounds).

The

phoneme

sound

is

generated

when

a

6

bit

phoneme

code

is

transmitted

to

the

SC-01.

Each

phoneme

is

internally

timed

and

has

a

duration

of

47-250

msec

depending

on

the

particular

phoneme

selected

and

the

clock

frequency.

The

computer

operating

system

sends

these

codes

to

the

synthesizer

chip

through

a

latched

parallel

output

port

and

monitors

the

synthesizer's

activities

(the

A/R

line)

through

an

interrupt

line.

page

2

The

MicroVox

Hardware

As

previously

mentioned,

the

MicroVox

is

a

stand-alone

microcomputer

configured

to

function

as

an

intelligent

peripheral.

Figure

1

is

a

basic

block

diagram

of

MicroVox.

It

can

be

viewed

as

a

general

purpose

6502

based

computer

with

a

speech

synthesizer

attached

as

a

memory

mapped

I/O

port.

MicroVox

is

best

explained

by

dividing

the

circuitry

into

four

functional

subsections:

processor

and

timing,

memory,

serial

and

parallel

I/O,

and

speech

synthesizer.

Figure

2

is

the

complete

MicroVox

schematic.

Processor

and

Data

Rate

Clock

The

processor

is

a

1

MHz

6502.

The

processor

and

data

rate

clocks

are

derived

by

dividing

down

a

4.9152

MHz

crystal

through

IC6.

Using

a

4.9152

MHz

crystal

(base

is

75

times

2

to

the

16th)

and

a

12

stage

CD4040

binary

divider

(IC6),

9

rates

are

derived

directly:

75

bps,

150

bps,

300

bps,

600

bps,

1200

bps,

2400

bps,

4800

bps,

9600

bps,

and

19200

bps

(while

the

hardware

can

produce

19200

bps,

it

is

not

supported

by

the

operating

system).

The

MicroVox

will

not

communicate

at

110

bps.

See

"Setting

the

serial

port".

The

6502

processor

operates

at

a

clock

frequency

of

611

KHz.

Memory

Section

ICs

2-5

and

9

form

the

address

decoding

and

memory

section

of

the

MicroVox.

IC

9

decodes

the

5

most

significant

address

bits

to

create

8

strobes.

They

are

defined

as

follows:

Name

Hex

Address

Connection

and

Function

SEL0

000

IC2

Memory

Block

(RAM)

SELl

800

IC3

Memory

Block

(RAM)

SEL2

1000

IC10

Serial

Port

SEL3

1800

IC11

Parallel

Ports

SEL4

8000

IC14

Inflection

Clock

Rate

SEL5

A0

00

IC14

Phoneme

Latch

SEL6

cooo

IC5

Memory

Block

(EPROM)

SEL7

E000

IC4

Memory

Block

(EPROM)

the

MicroVox

configuration,

ICs

2

and

3

are

intended

be

RAM

while

ICs

4

and

5

are

EPROM

or

ROM.

The

designations

for

ICs

2

and

3

are

for

2K

by

8

RAM

chips

such

as

the

Hitachi

6116

or

Toshiba

2016

(these

devices

are

2716

compatible.

You

could

also

use

2716

EPROMs

in

these

sockets).

This

programmable

memory

is

used

for

conversion

tables,

register

stacks,

and

the

ASCII

input

buffer

(the

MicroVox

can

receive

data

faster

than

it

can

speak

it).

The

basic

MicroVox

uses

only

one

RAM

chip

which

allows

a

700

character

input

buffer.

By

adding

the

second

RAM

chip

in

IC3

(and

changing

a

few

EPROM

constants),

this

buffer

can

be

optionally

expanded

to

2.7K

characters.

page

3

The

text

to

speech

algorithm

is

placed

in

EPROM/ROM

positions

ICs

4

and

5.

Either

2716

(2K

X

8),

2732

(4K

X

8),

or

2764

(8K

X

8)

devices

can

be

used

in

these

positions

depending

upon

the

jumper

selections

JP4

and

JP5.

The

8K

byte

MicroVox

software

will

be

either

on

two

2732

EPROMs

and

require

both

sockets

or

a

single

8K

2764

(or

ROM

equivalent).

Serial

and

Parallel

I/O

MicroVox,

unlike

most

other

voice

synthesizers,

has

both

serial

and

parallel

input

ports

to

receive

ASCII

characters.

The

serial

port

uses

a

6850

asynchronous

communications

interface

adapter

(ACIA,

IC10)

which

is

software

programmable.

During

initialization,

the

ACIAs

functional

configuration

is

preset.

Considerations

such

as

word

length,

clock

division

ratios,

parity,

stop

bits,

etc.,

are

selected

by

properly

setting

bits

in

the

ACIA's

control

register.

The

data

rate

is

set

by

the

system

data

rate

clock

(from

SW2

and

IC6)

and

data

is

sent

and

received

from

the

Transmit

and

Receive

data

registers

respectively.

Information

such

as

framing

errors,

parity

errors,

and

buffer

and

handshaking

status,

are

determined

by

reading

the

ACIA

status

register.

On

the

MicroVox,

the

serial

port

can

be

used

with

or

without

hardware

handshaking

(CTS,

DCD,

RTS,

etc.).

This

is

especially

useful

when

communicating

over

modems

or

terminals

which

have

no

handshaking

signals.

Instead,

the

MicroVox

software

incorporates

software

handshaking.

When

receiving

ASCII

text

in

the

software

handshaking

mode

the

MicroVox

sends

an

to

the

host

computer

when

its

input

buffer

is

almost

full

(the

host

should

stop

sending

data).

It

sends

a

"#"

when

it

is

ready

to

receive

data

again.

Obviously,

even

this

can

be

ignored

if

the

data

rate

from

the

host

computer

never

exceeds

the

speed

at

which

the

buffer

is

emptied.

The

parallel

input

section

uses

an

8255

PIA

(IC11)

which

is

also

programmable.

As

configured,

8

bits

of

it

are

used

to

receive

parallel

format

ASCII

data

such

as

would

be

transmitted

to

a

parallel

printer.

Using

2

additional

pairs

for

the

strobe

and

acknowlege

handshaking,

the

MicroVox

can

attach

to

any

conventional

Centronics

printer

interface.

(As

configured,

the

34

edge

connector

is

exactly

compatible

with

the

Radio

Shack

TRS

line

of

computers

and

can

connect

directly

to

their

34

Centronics

printer

edge

connector).

DIP

switch

SWl

also

attaches

to

IC11.

Switch

positions

6

thru

8

set

serial

word

length,

stop

bits,

and

parity

on

the

ACIA;

switch

section

3

selects

hardware

or

software

handshaking;

sections

1,

2,

4,

and

5

are

not

used.

page

4

Speech

Synthesizer

and

Inflection

Circuitry

Probably

the

most

important

section

of

the

MicroVox

is

the

actual

speech

synthesizer

circuitry.

The

MicroVox

allows

64

levels

of

pitch

inflection.

The

output

pitch

of

the

phonemes

is

controlled

by

the

frequency

of

the

clock

signal.

The

output

pitch

is

a

function

of

this

clock

input

frequency

and

two

pitch

control

lines,

il

and

12

(each

acts

independently).

Four

rather

large

variations

in

pitch

(corresponding

to

!P1

thru

!P4

in

the

operating

system),

can

be

achieved

simply

by

using

these

manual

inflection

inputs.

More

subtle

variations

in

output

pitch

are

attained

by

externally

controlling

the

synthesizer

clock.

Using

the

1.22

MHz

system

clock

and

a

digital

rate

multiplier,

a

programmable

clock

can

be

created

to

produce

smaller

and

more

defined

pitch

inflection

changes.

On

a

SEL4

strobe,

a

four

bit

inflection

code

is

latched

into

IC13

and

applied

to

the

rate

multiplier.

The

four

bit

combination

results

in

16

clock

rates

from

614.4

KHz

to

902.4

KHz

in

19.2

KHz

increments

(corresponding

to

!Rl

thru

1R16

in

the

operating

system).

20

KHz

creates

a

relatively

small

pitch

change

by

itself

(out

of

a

720

KHz

nominal

input

frequency)

but,

used

dynamically

in

a

sentence

it

creats

a

definite

improvement

in

intelligibility.

The

pitch

levels

IPl

thru

IP4

are

the

base

pitch

and

the

16

frequencies

from

the

rate

multiplier,

IRl

thru

1R16,

are

the

clock

rate.

The

combination

of

the

two

functions

results

in

64

pitch

levels

or

inflections.

The

pitch

at

which

individual

phonemes

are

pronounced

may

be

controlled

automatically

by

the

text

to

speech

algorithm,

kept

fixed,

or

altered

by

user

command.

Some

users

prefer

automatic

inflection,

because

of

the

variety

it

gives

to

the

speech.

Others

think

a

computer

should

sound

like

a

computer

and

prefer

the

flat

speech

to

artificially

intoned

speech.

Still

others

may

wish

to

directly

control

the

pitch

to

make

the

unit

"sing"

(pitch

and

rate

codes

may

be

mixed

with

phoneme

codes

to

produce

"singing")

or

pronounce

words

with

special

emphasis.

The

user

may

control

the

base

pitch

setting

independently

of

the

clock

rate.

The

user

options

are:

IPl

(low

pitch)

IP2

(medium

low

pitch)

IP3

(medium

high

pitch)

1P4

(high

pitch)

The

user

may

also

control

the

clock

rate.

IRl

(slowest

rate

—

lowest

level

for

the

given

base

pitch)

IR2

(slightly

faster)

1R3...IR16

(increasingly

faster

rates)

page

5

The

MicroVox

has

the

ability

to

play

musical

notes

and

produce

sound

effects.

This

is

accomplished

by

using

a

program

routine

to

toggle

one

bit

of

IC11

at

a

predetermined

rate.

This

lead,

in

addition

to

the

output

from

the

speech

synthesizer

chip

(IC12)

is

connected

to

the

output

amplifier.

The

results

are

similar

to

the

sound

produced

on

the

internal

speaker

in

an

APPLE

II

computer

(it

uses

the

same

technique).

OPERATOR

INTERACTION

WITH

THE

TEXT

TO

SPEECH

SOFTWARE

The

MicroVox

is

a

stand

alone

intelligent

peripheral

that

converts

ASCII

text

to

spoken

English.

The

MicroVox

is

attached

to

the

computer

(or

terminal,

modem,

etc.)

via

either

a

serial

or

parallel

cable.

Its

operation

is

similar

to

that

of

a

printer

except

that

the

output

is

speech

rather

than

printed

word.

The

MicroVox

has

many

programmable

options

which

produce

its

high

level

of

intelligiblity.

These

options

are

called

device

control

signals

and

are

transmitted

to

the

MicroVox

along

with

the

text.

Device

controls

signals

are

sent

from

the

host

computer

to

select

among

many

different

user

options.

In

general,

MicroVox

control

signals

are

in

the

form:

l(letter)(option)(option)

for

example:

1HXY

The

exclamation

point

is

a

signal

to

the

MicroVox

that

a

control

code

follows.

The

user

may

if

he

wishes

use

any

other

character

as

the

signal.

This

is

done

by

giving

the

following

instruction:

(old

signal

character)X(new

signal

character)

for

example:

1X$

will

change

the

control

signal

from

an

exclamation

point

to

a

dollar

sign

and

$X*

will

change

it

then

from

a

dollar

sign

to

an

asterisk.

Device

control

signals

can

be

imbedded

anywhere

in

the

text

transmission

and

are

not

spoken.

Once

a

device

control

signal

has

been

sent

to

the

MicroVox,

all

succeeding

text

entry

will

be

subject

to

that

default

setting

until

it

is

changed.

For

example

if

letter

by

letter

pronunciation

is

invoked

with

IE

then

all

text

will

be

spelled

until

a

IT

is

sent

to

reinvoke

text

to

speech

translation.

page

6

DEVICE

CONTROL

CODES

Software

Handshaking

If

standard

parallel

or

RS-232C

serial

connections

are

used

the

sending

computer

hardware

will

detect

and

examine

the

RTS

signal

and

determine

whether

the

MicroVox

is

ready

to

receive

a

character

or,

if

busy,

take

appropriate

action.

However,

many

popular

brands

of

microcomputers

lack

the

hardware

to

detect

RS-232C

handshaking

signals

and

these

handshaking

signals

do

not

pass

through

modems

back

to

mainframe

computers.

In

the

MicroVox,

special

software

handshaking

signals,

described

below,

are

provided

for

these

purposes

(in

general,

hardware

handshaking

is

preferable

whenever

it

is

possible

to

use

it,

because

it

relieves

the

host

computer's

processor

of

the

handshaking

chores

and

allows

use

of

higher

data

rates).

For

software

handshaking,

switch

position

3

on

dip

switch

SWl

is

set

in

the

closed

position

(open

is

hardware

handshaking).

The

following

option

is

provided:

!H(busy

character)(ready

character)

Example:

!H@#

In

the

example

shown,

the

MicroVox

will

send

the

character

to

the

computer

when

it

is

unable

to

receive

more

data,

and

will

send

to

the

computer

when

it

is

again

ready

to

receive

data.

It

is

the

responsibility

of

the

computer

programmer

to

write

the

software

necessary

for

the

use

of

these

options.

NOTE:

While

in

the

example

above

the

handshaking

characters

are

'@'

and

'#',

the

default

mode

of

the

MicroVox

uses

the

characters

'R'

and

'B'

instead.

Use

the

above

described

method

to

set

any

other

pair

of

handshaking

characters.

Finally,

it

is

possible

to

use

the

MicroVox

with

no

handshaking

by

simply

invoking

the

software

handshaking

mode

and

ignoring

the

handshaking

transmissions.

In

this

case,

it

is

the

user's

responsibility

to

insert

timing

delays

in

the

program

so

that

data

will

not

be

sent

to

the

MicroVox

faster

than

it

can

handle

the

data.

Speech,

Spelled

Speech,

Phoneme

Code,

and

Music

Modes

The

MicroVox

can

operate

in

four

different

modes:

text

to

speech,

text

to

spelled

speech,

phoneme

codes,

and

music.

When

the

MicroVox

is

turned

on

it

is

in

text

to

speech

mode,

however,

the

user

can

select

among

the

following

options:

IT

(text

to

speech)

IE

(spelled

speech

—

say

each

letter)

1C

(phoneme

codes)

IN

(musical

notes)

Page

7

NOTE:

The

default

mode

is

IT.

To

exit

any

mode

you

must

enter

another.

For

example,

if

you

are

in

the

IE

mode,

to

return

to

text

to

speech

you

must

type

IT.

Also,

changing

between

mode

frequently

resets

selected

options

to

the

default

mode.

Text

to

Speech

The

software

used

in

the

text

to

speech

algorithm

incorporated

in

the

MicroVox

is

derived

from

an

algorithm

conceived

by

the

Naval

Research

Laboratory.

This

algorithm

combines

word,

morph

and

letter

rules

in

a

single

table

of

about

400

rules.

This

table

contains

subtables

for

each

letter

of

the

alphabet

and

achieves

very

intelligible

speech.

In

the

text

to

speech

mode

(IT),

this

algorithm

attempts

the

correct

pronunciation

of

any

phrase

sent

to

it.

However,

no

program

of

reasonable

size

can

possibly

contain

all

the

rules

and

exceptions

for

the

pronunciation

of

English.

Moreover,

since

the

MicroVox

lacks

extra-sensory

perception,

it

cannot

tell

for

instance,

when

the

user

sends

"READ"

if

the

present

or

the

past

tense

is

meant.

The

solution

when

a

word

is

not

pronounced

to

the

user's

satisfaction

is

to

alter

the

spelling.

By

typing

RED

or

REED

instead

of

READ,

the

user

can

be

sure

to

get

the

desired

pronunciation.

If

HICCOUGH

is

pronounced

strangely,

try

HICCUP.

Often

it

helps

to

break

a

word

into

syllables.

Compare

the

pronunciation

of

TYPEWRITER

and

TYPE

WRITE

ER.

Foreign

words

will

require

considerable

ingenuity,

since

the

MicroVox

works

on

the

principles

of

English

pronunciation.

Compare

PARLEZ

VOUS

and

PARLAY

VOO.

Spelled

Speech

The

spelled

speech

mode

is

useful

for

abbreviations

and

words

that

a

user

might

have

difficulty

in

understanding.

When

this

option

is

selected,

every

letter

is

pronounced

separately.

(By

selecting

the

IA

punctuation

mode,

punctuation

will

also

be

pronounced).

Example:

IT

THE

WORD

AWFUL

IS

SPELLED

IE

AWFUL

IT

In

this

example,

the

MicroVox

will

say

"THE

WORD

AWFUL

IS

SPELLED",

and

then

spell

out

A

W

F

U

L.

The

IT

at

the

end

returns

the

Microvox

to

the

text

to

speech

mode.

Phoneme

Mode

The

MicroVox

may

also

be

programmed

directly

in

phoneme

codes.

A

space

must

be

left

between

the

mnemonic

codes.

For

example:

IC

AE

N

D

PAO

THV

UH2

PAO

S

E

PAO

I

Z

PAO

B

01

AY

13

L

I

NG

PAO

H

AH

T

PAl

will

say

"and

the

sea

is

boiling

hot".

Page

8

The

intonation

I

or

F

modes

can

be

either

on

or

off

when

phoneme

codes

are

used.

If

the

intonation

is

off,

the

rate

which

is

output

will

be

the

base

rate.

If

it

is

on,

intonation

will

be

like

that

for

text.

If

there

are

errors

in

the

codes,

the

erroneous

codes

will

be

spoken

as

if

they

were

text.

Music

Mode

Music

mode

can

be

turned

on

by

IN.

In

music

mode,

the

following

notation

is

used.

There

are

7

octaves

centered

about

middle

C,

indicated

by

numbers

from

1

to

7.

Notes

are

A,

B,

C,

D,

E,

F,

G.

A

sharp

is

indicated

by

"+",

flat

by

The

length

of

a

note

may

be

from

1

to

256

times

an

internal

time

constant.

Rests

are

indicated

by

R.

For

instance

3F+26

means

third

octave,

F

sharp,

26

time

constants

long.

R16

means

a

sixteen

time

constant

rest.

The

music

mode

suspends

the

MicroVox

operating

system

and

no

serial

or

parallel

data

can

be

received

during

music

output.

Also,

entering

music

mode

will

reset

most

previously

set

control

codes.

Text

Synchronization

For

many

applications

it

is

important

to

synchronize

speech

with

external

such

as

text

or

actions

appearing

on

the

screen.

For

instance,

an

instructional

program

may

require

placing

a

picture

on

the

screen

when

certain

speech

output

begins

and

a

question

on

the

screen

when

it

ends.

For

synchronization,

the

following

option

is

provided:

IK(synchronization

character)

Example:

IK#J0HN!K%MARSHA1K$

In

the

example

shown,

the

MicroVox

will

send

a

"#"

back

to

the

computer

just

before

starting

to

say

"JOHN";

it

will

send

a

"%"

to

the

computer

just

after

saying

"JOHN"

and

just

before

starting

to

say

"MARSHA";

and

it

will

send

a

"$"

character

to

the

screen

just

after

saying

"MARSHA".

Example:

LOOK

AT

THE

SCREEN

NOW

IK#

In

this

example,

a

"#"

will

be

transmitted

to

the

host

computer

after

saying

"LOOK

AT

THE

SCREEN

NOW".

None

of

these

special

synchronization

characters

will

be

spoken.

It

is

the

programmer's

responsibility

to

use

the

incoming

synchronization

characters

to

coordinate

the

screen

display

with

the

speech.

Page

9

Phrase

Termination

Many

aspects

of

English

pronunciation

are

controlled

by

the

context

in

which

a

given

letter

or

word

is

spoken.

For

this

reason,

the

MicroVox

will

await

a

complete

phrase

before

translating

from

text

to

speech.

If

the

user

does

not

specify

otherwise,

the

MicroVox

will

wait

to

translate

a

phrase

until

it

has

received

one

of

the

following

phrase

terminating

characters:

(1)

a

period

followed

by

two

spaces

or

a

carriage

return

(2)

a

comma,

semicolon,

colon,

exclamation

point,

or

question

mark

followed

by

a

space

or

carriage

return.

(3)

a

carriage

return

For

some

types

of

output,

such

as

computer

programs

or

poems,

the

user

will

want

each

line

read

as

a

separate

phrase.

For

others,

such

as

ordinary

English

text,

the

user

may

not

want

a

carriage

return

to

terminate

a

phrase.

The

user

is

given

the

following

options

to

deal

with

this

situation:

!L

and

IW

"IW"

means

"Whole

text

pronunciation".

If

this

option

is

selected,

a

carriage

return

will

not

terminate

a

phrase

unless

the

carriage

return

is

preceded

by

one

of

the

punctuation

marks

indicated

in

(1)

and

(2)

above.

"!L"

means

"Line-by-line

pronunciation".

If

this

option

is

selected,

a

carriage

return

will

always

be

treated

by

the

MicroVox

as

terminating

a

phrase.

When

the

MicroVox

is

first

turned

on

it

is

in

the

"L"

mode.

Rather

than

send

a

special

signal

to

terminate

a

phrase,

the

user

may

wish

to

have

the

MicroVox

treat

a

phrase

as

terminated

if

a

certain

delay

occurs

without

any

phrase

terminator

being

received.

Possible

applications

of

this

option

include

situations

where

the

user

does

not

fully

control

the

output.

For

instance,

suppose

the

MicroVox

is

passively

connected

to

a

transmitting

device

which

doesn't

send

any

of

the

terminating

characters

listed

above

(maybe

it

sends

"STOP"

instead).

In

such

a

case,

there

is

no

way

to

insert

phrase

termination

characters

in

the

output

stream.

However,

if

the

MicroVox

is

set

to

treat

a

half

second

delay

without

receipt

of

information

as

the

end

of

a

phrase,

computer

output

will

not

be

lost

or

ignored.

The

user

is

given

the

following

option

to

provide

delayed

phrase

termination:

ID(delay

number)

1D1

through

ID8

result

in

a

delay

of

50

x

2n

milliseconds

where

"n"

is

the

number

following

"D"

(Note:

If

too

short

a

delay

is

used,

a

phrase

may

be

translated

in

pieces

resulting

in

odd

intonation

or

pronunciation,

since

the

MicroVox

uses

the

context

of

letters

and

words

to

determine

their

pronunciation.)

Page

10

1D9

is

a

special

case.

The

MicroVox

waits

for

a

phrase

terminating

character

even

if

it

has

to

wait

forever.

1D9

is

the

default

mode

(at

power

up)

and

should

be

used

with

slow

data

sources

such

as

hand

typing

on

a

terminal.

This

selectable

delay

feature

is

particularly

useful

for

the

handicapped.

It

allows

a

blind

programmer

to

use

a

standard

unintelligent

terminal.

This

is

facilitated

by

connecting

the

MicroVox

to

receive

the

output

from

both

the

user

and

the

computer.

Using

the

"ID"

command,

the

MicroVox

can

echo

all

communication

either

way.

If

the

delay

is

set

to

about

0.1

seconds,

keys

pressed

by

the

user

would

be

echoed

as

spelled

letters

because

the

slight

delay

between

them

will

be

treated

as

an

end

of

phrase

but,

output

generated

by

the

computer

will

be

spoken

as

complete

lines,

because

there

generally

will

be

no

significant

delay

between

characters.

The

delay

may

be

varied

to

fit

the

particular

application.

The

MicroVox

must

be

in

the

IF

mode

before

entering

the

D

mode.

Also,

once

in

the

D

mode,

other

control

changes

can

only

be

received

if

the

MicroVox

is

set

to

1D9

first

(so

that

it

can

interpret

the

input

rather

than

just

echo

the

characters).

Intonation

Within

the

MicroVox,

a

special

intonation

algorithm

is

included.

However,

providing

realistic

intonation

is

much

more

difficult

than

choosing

the

correct

phonemes.

Most

intonation

patterns

are

not

represented

in

English

spelling.

Without

knowing

the

writer's

state

of

mind,

achieving

the

proper

intonation

may

require

grammatical

parsing

of

a

sentence.

The

algorithm

attempts

to

raise

the

pitch

on

stressed

syllables,

raising

it

at

the

start

of

sentences

and

before

commas,

lowering

the

pitch

before

the

period

at

the

end

of

a

sentence.

Before

a

question

mark,

the

pitch

is

'

raised,

unless

the

sentence

begins

with

a

question

word

(who,

what,

when,

where,

etc.),

in

which

case

it

is

lowered.

The

pitch

at

which

individual

phonemes

are

pronounced

may

be

controlled

automatically

by

the

text

to

speech

algorithm,

be

kept

fixed,

or

be

altered

by

user

command.

Some

people

prefer

automatic

inflection,

because

of

the

variety

it

gives

to

the

speech,

even

though

the

inflection

is

often

not

accurate.

Others

think

a

computer

should

sound

like

a

computer

and

prefer

the

flat

speech

to

artificially

intoned

speech.

Still

others

may

wish

to

experiment

with

controlling

the

pitch

themselves

to

optimize

intelligibility.

This

control

can

extend

even

to

make

the

MicroVox

"sing".

The

hardware

in

the

MicroVox

allows

control

of

pitch

in

two

different

ways.

The

VOTRAX

SC-01A

synthesizer

chip

has

four

selectable

pitch

levels.

In

addition,

the

output

pitch

may

be

varied

by

selecting

one

of

sixteen

different

rates

for

the

clock

which

controls

the

synthesizer

chip.

When

the

MicroVox

is

first

turned

on,

the

synthesizer

chip

is

set

to

base

pitch

level

1

(low)

and

clock

rate

#5

(defined

below).

The

intonation

Page

11

is

generated

by

an

algorithm

which

selects

an

appropriate

clock

rate

for

each

phoneme.

To

turn

on

or

off

the

automatic

intonation

algorithm,

the

user

may

send

the

command:

IF

(flat

intonation

—

monotone)

and

the

output

rate

will

stay

at

the

default

base

and

clock

rate.

To

invoke

automatic

clock

rate

setting,

the

user

may

send

the

command:

II

(inflected

intonation

by

algorithm)

The

intonation

algorithm

adds

or

subtracts

from

the

base

rate

to

ultimately

select

the

final

voice

pitch.

Using

the

II

mode

however,

only

four

clock

rate

pitch

level

shifts

(out

of

16

possible)

are

used.

The

user

may

decide

not

to

implement

automatic

inflection

on

all

text

to

speech

translation

yet

desire

to

add

certain

pitch

changes

on

specific

words

or

phonemes.

This

can

be

easily

done

on

the

MicroVox

since

the

base

pitch

and

the

clock

rate

can

be

controlled

independently

and

changed

at

any

time.

The

user

options

are:

IP1

(low

pitch)

IP2

(medium

low

pitch)

IP3

(medium

high

pitch)

IP4

(high

pitch)

The

user

may

also

control

the

clock

rate:

IRl

(slowest

rate,

lowest

level

for

the

given

base

pitch)

IR2

(slightly

faster)

1R3...IR16

(increasingly

faster

rates)

Example:

IPi

IR5

THIS

IS

A

IR8

TEST

In

this

example,

"THIS

IS"

will

be

spoken

at

clock

rate

R5

and

"TEST"

will

be

spoken

at

R8.

(Note:

The

clock

rate

will

remain

at

R8

from

then

on

unless

changed).

Example:IF

IPl

IR5

IS

YOUR

NUM

IR8

BER

IR4

FOUR

FIVE

IR9

NINE

?

In

this

example,

we

can

make

a

question

sound

more

like

a

question

by

adding

pitch

changes

at

important

points

in

the

sentence.

"IS

YOUR"

and

"NUM"

are

spoken

at

R5.

"BER"

is

raised

in

pitch

to

R8

and

then,

"FOUR

FIVE"

(you

could

also

use

45)

is

pronounced

at

a

lower

frequency

of

R4.

Finally,

"NINE"

is

raised

in

pitch

to

R9

to

end

the

sentence

in

a

questioning

tone.

The

question

mark

will

only

be

spoken

if

the

punctuation

modes

(IA

or

IM)

are

invoked.

Page

12

Note:

When

using

the

manual

inflection

mode

f

it

is

important

to

set

flat

inflection

(IF)

mode

or

the

algorithm

will

try

to

add

automatic

inflection

in

addition

to

that

manually

selected.

Also,

pitch

and

clock

rates

may

be

changed

at

any

time

in

any

mode.

Punctuation

modes

There

are

three

modes

for

pronunciation

of

punctuation

in

the

MicroVox.

The

user

options

are:

IA

(all

mode

—all

punctuation

pronounced)

1M

(most

mode

—

all

punctuation

pronounced

except

return,

linefeed,

and

space)

IS

(some

mode

—

only

unusual

punctuation

pronounced)

When

the

MicroVox

is

turned

on

it

is

in

"some"

mode.

In

the

IM

mode

spaces

between

words

are

treated

as

pauses

and

can

be

used

to

regulate

the

pace

of

speech

or

emphasize

particular

words.

The

MicroVox

recognizes

and

pronounces

all

ASCII

characters

with

codes

between

hex

20

and

hex

7F.

The

operating

system

does

not

recognize

control

codes

other

than

BACKSPACE

(08)

,

TAB

(09),

LINE

FEED

(0A),

RETURN

(0D),

an

ESCAPE

(IB).

Receipt

of

other

control

codes

or

nulls,

can

have

unpredictable

results

since

the

MicroVox

uses

some

of

them

for

internal

coding.

Illegal

control

codes

should

be

avoided

in

the

text

sent

to

the

MicroVox.

On

Line

/

Off

Line

Mode

The

MicroVox

can

be

selectively

turned

on

and

off

line

(it

has

to

remain

powered,

however).

This

capability

allows

it

to

be

attached

in

parallel

with

another

peripheral

such

as

a

printer,

yet

not

speak

what

is

being

printed.

The

control

code

is:

10

(On

Line

-

MicroVox

is

operational.

It

responds

to

all

device

codes

and

text

input)

IQ

(Quit

-

Off

Line

-

MicroVox

only

responds

to

10)

Default

Modes

When

the

MicroVox

is

powered

up

certain

default

modes

are

in

force.

They

are

equivalent

to

entering

the

following

commands:

Page

13

!0

on

line

IP1

IR5

low

base

pitch,

clock

rate

#5

IF

flat

intonation

IT

text

to

speech

mode

IS

some

punctuation

IL

Line

by

line

pronunciation

ID9

wait

for

carriage

return

phrase

terminator

(When

shipped

from

the

factory,

MicroVox

is

set

for

300

bps,

8

bit

words,

no

parity,

2

stop

bits,

and

software

handshaking)

At

any

time

these

defaults

are

to

be

changed,

simply

send

the

control

code

to

the

MicroVox.

The

codes

can

be

transmitted

separately

or

imbedded

in

text.

For

example,

entering

THIS

IS

A

TEST,

and

a

carriage

return

will

result

in

that

phrase

being

spoken

with

no

intonation.

To

add

automatic

intonation

the

sentence

becomes

(all

sentences

are

presumed

to

end

with

a

carriage

return):

II

THIS

IS

A

TEST

From

this

point

on

all

spoken

text

will

have

automatic

inflection

unless

flat

intonation

is

resumed

with

IF.

As

previously

mentioned,

intonation

can

be

added

selectively

or

by

the

automatic

algorithm.

You

can

say

the

following

sentence

four

ways:

1.

text

to

speech,

no

added

inflection

IT

IF

PLEASE

ENTER

YOUR

ACCESS

NUMBER

2.

automatic

inflection

in

text

to

speech

mode

IT

II

PLEASE

ENTER

YOUR

ACCESS

NUMBER

3.

selected

inflection

in

text

to

speech

mode

IT

IF

IP1

IR5

PLEASE

IR8

EN

IR5

TER

IR7

YOR

IR5

ACCESS

NUMBER

4.

phoneme

input

mode

with

selected

intonation

IF

IC

IP1IR5

P

L

El

Y

Z

PAl

IR9

EHl

EH3

N

IR5

T

ER

PAl

Y

IR8

02

IR5

R

PAl

IR7

AEl

IR5

K

S

EHl

EH3

S

PAl

N

UH1

M

B

ER

These

examples

demonstrate

various

ways

in

which

the

user

can

increase

intelligibility

of

the

synthesized

speech.

The

MicroVox

is

completely

programmable,

you

can

combine

text

to

speech

with

either

selective

or

automatic

intonation

or

optimize

pronunciation

by

choosing

exactly

the

pitches

and

phonemes

you

wish.

An

exaggerated

example

of

combined

pitch

and

phoneme

control

can

actually

allow

MicroVox

to

sing

as

demonstrated

in

a

bar

of

"happy

birthday"

and

a

musical

scale.

Page

14

"Happy

Birthday"

IC

IP3

IR3

H

AE1

AEl

AE1

AEl

P

IP2IR5

Y

IP31R5

B

ER

R

TH

1R1

D

Al

A1

13

IR9

T

IU

U1

Ul

IR7

Y1

IU

U1

Ul

!C

1P1

IR1

D

El

Y

I

PI

IR5

El

Y

!P1

IRll

EH1

EH2

F

P

1P2

1R5

D

J

El

Y

1P2

IRll

Al

Y

IP2

IR14

B

El

Y

1P3

IRll

S

El

Y

IP3

IR15

D

El

Y

Summary

Table

of

Device

Codes

10,

IQ

-

On

line

and

Off

line

IK

-

synchronize

speech

and

text

!L

-

line

by

line

pronunciation

IW

-

whole

text

pronunciation

IE

-

each

letter

pronunciation

IC

-

pronounce

by

direct

phoneme

input

IN

-

produce

musical

notes

IT

-

pronounce

by

text-to-speech

algorithm

!A,

IM,

or

IS

-

speak

all,

most,

or

some

punctuation

IF

-

set

monotone

or

flat

intonation

II

-

set

automatic

inflected

intonation

IP

and

IR

-

set

intonation

base

pitch

and

clock

rate

ID1-1D8

and

ID9

-

set

phrase

terminator

delay

SETTING

THE

SERIAL

PORT

DTE/DCE

Setting

Behind

Jl

(the

DB-25

serial

connector)

on

the

PC

board

is

a

2

by

3

header

and

two

jumpers.

These

jumpers

set

whether

pins

2

and

3

are

transmit

data

and

receive

data

respectively

or

vice

versa.

As

received

from

the

factory,

the

jumpers

are

in

the

DCE

position

and

2

is

RD

and

3

is

TD.

To

reverse

these

designations,

place

the

jumpers

in

the

DTE

positions.

Data

Rate

SW2

is

the

data

rate

(sometimes

called

BAUD

rate)

selection

switch.

The

data

rates

are

listed

along

side

SW2.

SW2

can

be

either

a

2

by

8

or

9

position

Berg

type

connector

or

a

16

DIP

switch.

If

a

Berg

connector

is

installed,

a

jumper

is

provided

to

select

the

desired

data

rate.

Simply

place

it

across

the

pair

of

terminals

the

desired

data

rate.

Page

15

If

SW2

is

a

DIP

switch,

close

the

switch

position

the

desired

data

rate.

Only

that

one

position

should

be

closed

and

the

other

seven

positions

should

be

in

the

open

position.

For

75

bits

per

second,

it

will

be

necessary

to

attach

a

physical

jumper

across

JP1.

All

positions

on

SW2

should

be

left

open.

Handshaking

For

software

handshaking,

switch

position

3

on

dip

switch

SWl

is

set

in

the

closed

position.

For

hardware

handshaking,

switch

position

3

is

left

open.

If

standard

EIA

RS-232C

serial

connections

are

used,

the

sending

computer

hardware

will

detect

and

examine

the

RTS

signal

and

determine

whether

the

MicroVox

is

ready

to

receive

a

character

or,

if

busy,

take

appropriate

action.

With

software

handshaking,

the

MicroVox

will

send

the

character

"

R

"

to

the

computer

when

it

is

unable

to

receive

more

data,

and

will

send

"B

w

to

the

computer

when

it

is

again

ready

to

receive

data.

It

is

the

responsibility

of

the

computer

programmer

to

write

the

software

necessary

for

the

use

of

these

options.

Finally,

it

is

possible

to

use

the

MicroVox

with

no

handshaking

by

simply

invoking

the

software

handshaking

mode

and

ignoring

the

handshaking

transmissions.

In

this

case,

it

is

the

user's

responsibility

to

insert

timing

delays

in

the

program

so

that

data

will

not

be

sent

to

the

MicroVox

faster

than

it

can

handle

the

data.

Word

Length,

Parity

and

Stop

Bits

Three

switch

positions

on

SWl

set

the

transmission

protocol.

The

following

is

a

list

of

the

eight

possibilities

and

their

functions:

Function

Position

6

Position

7

Position

8

7

bits.

EP,

2

SB

closed

7

bits.

OP,

ZSB

closed

open

closed

7

bits.

EP,

1SB

closed

open

7

bits,

OP,

1SB

closed

open

8

bits.

2

SB

open

closed

8

bits.

1SB

open

closed

8

bits.

EP,

1SB

open

closed

open

8

bits.

OP,

1SB

open

EP

=

Even

Parity

OP

=

Odd

Parity

SB

=

Stop

Bit(s)

Page

16

Micromint MICROVOX User manual

Popular Synthesizer manuals by other brands