HomeTrainingTechnology TrendsAboutCalendar

Voice XML: Chapter 11 Exercises

 


VoiceXML: Introduction to Developing
Speech Applications

11-1 Explain the features of a VoiceXML dialog that support callers in each of the following stages of general knowledge of the subject?

A. Orientation stage

Directed-dialogs assist users during the orientation stage by directing them answer questions. Event handlers also direct users in the orientation stage by helping them answer prompts correctly.

B. Exploration stage

Event handlers, especially the help event handler, instruct callers by informing them what is possible to do at each point in the dialog.

C. Manipulation stage

Callers in the manipulation stage can use the mixed initiative features of VoiceXML to quickly perform their tasks. Callers in this stage make heavy use of barge in.

11-2 Design and implement a VoiceXML application for collecting preference data. It should be easy to change the questions when the application is used to collect preference data for a new application.

See Figure 11.3

11-3 Design and implement a VoiceXML application for measuring performance data for the pizza application developed in exercise 8-3.

A. Determine what performance data is needed for this application

In order to measure word error rate, someone must listen to what the user says and transcribe it. Instead, we will use a weaker form of performance measure that can be completely automated by simply measuring if the system understood one of the prompt words. In effect, this is a measure of both the system and the user.

Caller Task Measure Typical Criteria Calculation
The caller speaks one
of "small, medium, large" to the prompt "what size?"
System understands the words "small, medium, large" More than 95% of the time (number of times size was understood)/
(number of times size was understood + number of times size was not understood)
The caller speaks one
of "coke, pepsi, diet pepsi, lemonade, water" in response to "what drink?"
System understands the words "coke, pepsi, diet pepsi, lemonade, water" More than 95% of the time (number of times drink was understood)/
(number of times drink was
understood + number of times drink was not understood)
The caller confirms that the drink order is correct The system
says "yes" in response to "is that correct?"
More than 95% of the time (number of times user said yes)/(number of times user said yes + number of times user said no)
Caller orders a drink Time taken to order drink     Average of (timestamp of end of order - timestamp of beginning of order)

B. Modify the VoiceXML application to create a log file containing start and stop times required to calculate the performance data needed for this application.

C. Create an application that reads the log file and calculates the performance data. You may use any programming language, scripting language, or even a spreadsheet application for this purpose.

11-4 Normally a human must transcribe the words and phrases spoken by a caller in order to calculate the word error rate. Explain how it may be possible to use a second speech recognition system to replace human transcriber. Specify a high-level architecture of a system for automatically estimating the word-error rate.

The usual approach for calculating the word error rate is to have a human annotator listen to the audio file and transcribe it into text. The annotator's text is then compared to the ASR's text, the differences noted, and the word error rate calculated.

It is possible to replace the human annotator by a powerful speech recognition engine (different form the production ASR) Use this text from this powerful speech recognition engine to estimate the word error rate. Because the second ASR is not perfect, it will itself produce errors, but because it works offline, it's errors should be fewer than the original ASR

a