
VoiceXML: Introduction to Developing
Speech Applications
In addition to the
below text areas, there is a link for a Microsoft Word version of the
code.
2-1 The
grammar for telephoneNumber.grxml is:
This grammar is
used by the VoiceXML code for Example 2.1:
An equivalent grammar
that enables the caller to press the keys on a touch-tone phone follows:
This revised grammar
is used by the following VoiceXML application. The only changes to the
application is the p
rompt which solicits the one digit phone number from the caller, and
the reference to the revised grammar.
2-2 A VoiceXML
form that reuses the grammars month.grxml, day.grxml, and year.grxml
that solicits a date.
2-3 See the
resources web page for information about HTML verbal browsers
2-4 The W3C
home page looks like this (click for larger image):

A. What information
from this site should be extracted and included in a voice Web page?
- Navigation
bar. From the Navigational bar at the top of the W3C home page,
include each of the five links.
- Working Group
list. From the list of Working Groups in the left-hand column,
include each of the Working Group links.
- General information.
From the General information column in the right-hand column, include
each of the links.
- News section.
From the news section in the middle of the web page, include the text
and links.
B. Briefly describe
the menus and forms in the voice Web page that presents this information
to the caller.
- Navigation
bar. Voice menu that presents five choices to the caller.
- Working Group
list. A voice menu that presents the lists of Working Groups to
the caller. Because this is such a long list, consider using a scrollable
voice menu-a voice menu with commands for ship ahead and ship backwards.
- General information.
This section lends itself to a two-level hierarchy of voice menus.
The top level contains options for mission, contact, get involved,
member area, W3C Tem, and past news. The second level of menus contains
the corresponding links.
- News section.
From the news section in the middle of the web page, include the text
and links. Use a scrollable voice menu for the news items. Each news
item is sequence of prompts that read the text to the users. Between
some of the prompts are one-option menus that enable the user to jump
another page.
C. Indicate the
sequence that the menus and forms from part (b) above should be presented
to the caller.
First, welcome
the user with the following prompt: Welcome to the W3C-The World Wide
Web Consortium-leading the web to its full potential.
The four main sections
of the visual web page should be presented in order of most frequently
accessed. For example, if most of the callers want to access the Working
Group page to which they belong, then the Working Group List option
should be presented next, followed by news with general information
last.
D. What prompt wording
do you recommend for each menu and form from part (b)?
- Navigation
bar. Say the name of one of the following information categories:
activities, technical reports site index, about W3C, or Contact us.
- Working Group
list. Speak the name of one of the following working groups: Accessibility,
Amaya, .
- General information.
Speak the name of one of the following categories: mission, contact
us, get involved, member area, W3C team, and past news.
- News section.
News headlines. Say this one to hear details. W3C team presentations
in January (pause), SVG 1.1 and Mobile SVG Profiles Working Drafts
Published (pause), .
E. What words should
be in the grammar for each menu and form from part (b)?
- Navigation
bar. Activities, Technical reports, Site index, About W3C, contact
us, and the appropriate synonyms.
- Working Group
list. The names of each of the Working Groups and appropriate
synonyms (for example, speech is a frequent synonym for voice.) and
browsing commands such as skip ahead and skip backward.
- General information.
Mission, contact us, get involved, member area, W3C Team, past news,
and synonyms.
- News section.
This one, first, second plus the keywords in each headline.
F. What error handlers
should be specified for each menu and form from part (b)?
Each menu and
form field should have event handlers that respond to the three most
common errors and events caused by users:
- Failure to response-no
response.
- Respond with
a word or phrase not covered by the grammar-mismatch.
- User asks for
assistance-help.
2-5 Discuss
the possibility of constructing a "transcoding" procedure that produces
a VoiceXML document for the home page of the W3C.
A. Which of the
steps (a) - (f) from question 2-6 do you think can be automated as part
of the transcoding procedure?
(a) Determine
what info to extract. While it is easy to automate the extraction
of data from the W3C home page, it is not, in general, possible to
automatically determine what to extract. For example, if most of the
callers are members of Working Groups, then it may not be necessary
to extract general information that all members already know. On the
other hand, if most are new to the W3C, then the general information
should be extracted, but not the detailed Working Groups.
(b) Determine
whether to use menus or forms. It may be possible to automatic heuristics
that determines whether to use menus or forms for each information
set. However, these decisions should be reviewed by a designer and
then user tested to make sure the decision is correct.
(c) Determine
the sequence that the menus and forms should be presented to the user.
Because this is very dependent upon how callers will use the voice
site, it may be impossible to automatically predict.
(d) Specify the
grammar. The words used as link names on the visual page can be automatically
extracted for use in the grammar. A grammar specialists will be necessary
to extend the grammar to include synonyms and phrases that callers
frequently speak yet do not appear on the home web page.
(e) Determine
what event handlers to write. Every menu and form item should have
the three major event handlers-mismatch, no response, and help. However,
it may not be possible to automatically determine how to phrase each
prompt in the error handlers.
B. Do you recommend
implementing a transcoding procedure for the W3C web site?
Only under two conditions
does it make sense to implement a transcoding procdeure:
(a) A designer
fine tunes the transcoding procedure, providing the information that
can not be automatically derived by the transcoding procedure.
(b) The web site
does not change so that the extraction task does not need to be modified.
C. Do you recommend
implementing a transcoding procedure that can be applied to any visual
web site?
No, Web sites
that frequently change their layout and/or content will break transcoding
procedures.
2-6 WML is
a language for writing applications that display data, menus, and forms
in a small screen on many of today's cell phones. See http://www.wapforum.org/
to download a copy of the WML language. The browser for WML resides
on the cell-phone itself, while the browser for VoiceXML resides on
a server that is connected to the cell phone. By combining the functionality
of VoiceXML and WML, it is possible to enable multimodal applications
that can both speak and listen to the caller as well as display info
and accept touch-tone button input. Discuss the advantages and disadvantages
of each of the following architectural approaches, a, b, and c:
|
a.
Integrate the WML and VoiceXML browsers for execution on the cell
phone. |
b.
Integrate the WML and VoiceXML browsers for execution on a connected
server. |
c.
Enable the synchronization between the WML browser executing on
the cell phone and the VoiceXML browser executing on the server.
|
Service
delays due to communication delays |
No |
Yes |
Yes |
Service
disruptions then communication facility is "out of range) |
No |
Yes |
Yes |
Extra
expense for memory and processing capability |
Yes |
No |
No |
Extra
complexity due to synchronization between client and server |
No |
No |
Yes |
a
|