
VoiceXML: Introduction to Developing
Speech Applications
11-1
Explain the features of a VoiceXML dialog that support callers in each
of the following stages of general knowledge of the subject?
A. Orientation stage
Directed-dialogs
assist users during the orientation stage by directing them answer
questions. Event handlers also direct users in the orientation stage
by helping them answer prompts correctly.
B. Exploration stage
Event handlers,
especially the help event handler, instruct callers by informing them
what is possible to do at each point in the dialog.
C. Manipulation
stage
Callers in the
manipulation stage can use the mixed initiative features of VoiceXML
to quickly perform their tasks. Callers in this stage make heavy use
of barge in.
11-2 Design
and implement a VoiceXML application for collecting preference data.
It should be easy to change the questions when the application is used
to collect preference data for a new application.
See Figure 11.3
11-3 Design
and implement a VoiceXML application for measuring performance data
for the pizza application developed in exercise 8-3.
A. Determine what
performance data is needed for this application
In order to measure
word error rate, someone must listen to what the user says and transcribe
it. Instead, we will use a weaker form of performance measure that
can be completely automated by simply measuring if the system understood
one of the prompt words. In effect, this is a measure of both the
system and the user.
Caller Task |
Measure |
Typical Criteria |
Calculation |
The caller speaks one
of "small, medium, large" to the prompt "what size?" |
System understands the words "small, medium, large" |
More than 95% of the time |
(number of times size was understood)/
(number of times size was understood + number of times size was
not understood) |
The caller speaks one
of "coke, pepsi, diet pepsi, lemonade, water" in response to "what
drink?" |
System understands the words "coke, pepsi, diet pepsi,
lemonade, water" |
More than 95% of the time |
(number of times drink was understood)/
(number of times drink was
understood + number of times drink was not understood) |
The caller confirms that the drink order is correct
|
The system
says "yes" in response to "is that correct?" |
More than 95% of the time |
(number of times user said yes)/(number of times user
said yes + number of times user said no) |
Caller orders a drink |
Time taken to order drink |
|
Average of (timestamp of end of order - timestamp
of beginning of order) |
B. Modify the VoiceXML
application to create a log file containing start and stop times required
to calculate the performance data needed for this application.
C. Create an application
that reads the log file and calculates the performance data. You may
use any programming language, scripting language, or even a spreadsheet
application for this purpose.
11-4 Normally
a human must transcribe the words and phrases spoken by a caller in
order to calculate the word error rate. Explain how it may be possible
to use a second speech recognition system to replace human transcriber.
Specify a high-level architecture of a system for automatically estimating
the word-error rate.

The usual approach
for calculating the word error rate is to have a human annotator listen
to the audio file and transcribe it into text. The annotator's text
is then compared to the ASR's text, the differences noted, and the word
error rate calculated.
It is possible to
replace the human annotator by a powerful speech recognition engine
(different form the production ASR) Use this text from this powerful
speech recognition engine to estimate the word error rate. Because the
second ASR is not perfect, it will itself produce errors, but because
it works offline, it's errors should be fewer than the original ASR
a
|