![]() ![]() ![]() ![]() ![]() ![]() |
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
![]() |
|
|
|
![]() |
B-1 Extend the W3C Speech Interface Framework (Figure B.1) as follows: A. Insert languages for voice identification and verification, word spotting, language generation, classification, machine translation, summarization, music synthesis, and history. B. Insert SPIs for ASR, TTS, touch tone recognition, audio capture, and audio. C. Insert multimodal technologies for handwriting recognition, motion detection, and vision. D. What else can be inserted into the W3C Speech Interface Framework to make it more complete? B-2 Review the W3C Voice Browser web site at http://www.w3.org/voice to determine the status of each of the W3C Speech Interface Framework languages. Which languages are at Working Draft, Last Call Working Draft, Candidate Recommendation, Proposed Recommendation, and Full Recommendation status?
B-4 For each of the following applications, briefly define the application and list its major functions. List the components of the W3C Speech Interface Framework required for to implement the application. List additional technology required to implement the application. Which of these applications would you recommend be implemented now and which would you recommend be postponed until additional standards are in place? Explain why. A. Foreign language training Goal: Train users to speak a second language. It should be able to converse with the user in the user's first language as well as the second language. Requirements: The application should be able to listen to the user speak the second language and detect and correct mispronunciations. There may be a visual component, which supports language training, for example, objects which the user may manipulate and discuss using the second language.
The detection of misprounced words is not easily achieved with VoiceXML. Additional programming may be necessary to achieve this. B. Catalogue sales Goal: Browse a visual catalogue, ask and listen to answers about products, and order goods and services. Requirements: A multimodal user interface in which the user can browse a visual catalogue, and then ask simple questions and listen to the answers. The application requires a "shopping cart" facility and a transaction capability to pay for goods and services purchased. There are several possible versions of this application. The first uses application-directed dialogs to lead the user to the desired product. If the product is audio, the the user may listen to "samples" and decide to purchase.
The next version is a involves natural language so that the user can ask questions about a product and discuss its features. True natural langauge is not supported by current standards.
C. Receive, create, and send e-mail messages Goal: Browse, listen, create and send e-mail messages. Requirements: Browse, listen, create and send e-mail messages. To create messages to forward to another users, the user may record voice messages or dictate text messages
Browsing a long list of messages is difficult to do with a voice-only interface, but possible with a browsable list object. Creating and editing a text message is very difficult to do with voice-only. A dictation engine will be necessary, and editing the text will be difficult to achieve. A better solution is to record the outgoing message and replay to the recipient. D. Personal calendar Goal: access and update information on a calendar. Required: Browse calendar entries, edit and update calendar entries
Record each user entry and store with the text generated by the speech recognition engine. If the engine doesn't recognize the text correctly, the user can listen to the recording for the information. Because this is a personal calendar, others will not hear the ASR mistakes. E. Group calendar Same as personal calendar except that users may not want ASR mistakes presented to other users. There needs to be some way to edit text created by ASR for this application to be successful. F. Track shipping Goal: User enters package number using DTMF to find out the status of a package in rout for delivery. Requirements
The few responses generated by the system are known ahead of time and can be prerecorded Reusable components - very useful (such as telephone number, address, etc) This is possible today, especially if the package number is a digit string rather than an alphanumeric string. G. Audio jukebox Goal: Enable user to select and listen to musical selections. Requirements: System directed menus guide user to select tunes. User can construct play list of tunes.
This is similar to (b) catalogue sales, above. H. "To do" lists and reminders Goal: Enable use to record, search, and listen to a list of things to do. Requirements: Dictation engine to record arbitrary list of things to do. Speech synthesis to present list of things to do to the user.
It may be possible to develop this application using just record and playback (no speech recognition and synthesis). But the searching the to do list would be difficult. a |