James A. Larson
Portland State University
Oregon Health and Sciences University
Oregon Institue of Technology
Opera with its new voice feature enables developers to create and deploy multimodal applications. If you want to experience the multimodal applications below, you will need to have the Opera version 8.0 browser. Voice Quick Start-up Guide: How to talk with your browser
Opera can support three types of user interfaces:
Dialog Type | URI | Comments |
---|---|---|
Verbal only | jim/paymentVerbal.xml | VoiceXML code embedded into HTML so Opera browser users can hear the verbal-only dialog |
GUI only | jim/paymentGUI.xml | A traditional HTML application |
Multimodal | jim/paymentMM.xml | VoiceXML code embedded into HTML code using X+V |
During the last two weeks of the school term, students in the Oregon Health and Sciences University course CSE 564 and the Portland State University course CS 410/510 created several XHTML plus Voice (X+V) applications. Students had a working knowledge of XHTML and VoiceXML but no previous knowledge of X+V. I presented an hour overview of X+V syntax and two sample applications to students in both courses. Teams of two students were formed to design and implement a multimodal application within two weeks. The table below contains some of the student projects. If you install the Opera browser and the voice plug-in, then you can experience these multimodal applications. By clicking "source" under the "view" pulldown, you can examine the source code to see how students implemented the multimodal user interface.
Project number | Project Name | Author | URI | Comments | |
---|---|---|---|---|---|
1 | Kid's holiday craft projects | Ashley Irving | ashley/home.xml | Choose the project by saying its number. Page through the instructions by saying "next." Do you think that the content should be spoken in addition to being displayed on the screen? | |
2 | Making a peanut butter sandwich | Emerson Murphy-Hill | No longer available | Page through the instructions by saying "next," "back," or "read." | |
3 | Buying a car | Glenn Diviney | glenn/index.xml | ||
4 | Origami | Khanh Duong | No longer available | ||
5 | Collecting health data | Sunil Lahudia | sunil/healthData.xml | ||
6 | Buy movie tickets | Oindrila Mukherjee | qindrila/movieSel_test.xml | Use the name "James Bond" on the second screen to reserve tickets. | |
7 | National park tours | Medha Nirguide | medha/Main.xml | ||
8 | Restaurant menu | Quang Nguyen | quang/project3.xml | ||
9 | Banking | Rajeshwari Patil | No longer available | Account = "10"; Passcode = "hello". Amounts must be in increments of 100. | |
10 | Library | Frank Adrian and Ken Anderson | No longer available | ||
11 | Quick finder | Driss Takir | driss/voice.xml | ||
12 | Order computer | Ashwini Kulkarni | ashwini/login.xml | Use "david" for the user name and "capital" for the password. | |
13 | Personal travel pictures | Chris Holm | chris/index.vxml | Say "next" or "previous" to view the cities. | |
14 | Animal shelter | David Graves and Dina Suehiro |
dina/index.xml | ||
15 | Cyber reader | Tom Feliz | tom/index.xml | Hear how much better a recording is than TTS. | |
16 | Picture album | Ricky Cancro | No longer available | PHP to generate XML code for a picture album. | |
17 | Flash cards (addition) | jim/flashCardAddition.xml | |||
18 | Tune your violin | Based on a SALT program developed by Deborah Dahl, chair of the W3C Multimodal Interaction Working Group | dahl/tune.xml | Both hands are busy as a violin player requests to hear tones so the violin player can tune the violin. | |
19 | Music world | Doan Ng | doan/mainpage.xml | This application simulates a hardware device with a push-to-speak button. You must click the "push to speak" soft button as you press the (default) ScrLk button on your computer. | |
20 | The game of "go" | Sean Pearson | http://web.pdx.edu/~seanp/school/cs410/capturego.cgi |
I have categorized the student projects into the following categories.
Hands-busy instruction. This category of project provides incremental instruction while the user manipulates real-world artifacts with their hands. This category includes applications in which the user diagnoses a problem (determine what is wrong with a car's motor), repaire a device (fix a leaky faucet); and construct artifacts (project 1: a talking holiday project book for children; project 2: a talking recipe book; and project 3: creating a complex origami artifact). This category has the potential of being used to diagnose or repair products without calling a live-help agent. The application may be available by connecting to an application server, or the application may be supplied on a CD-rom packaged with the product. Project 17 (not necessarily a hands-busy instruction application) illustrates the multimodal equivalent of "flash cards." Some children use "flash cards" to learn their addition tables. Repetitive drills, such as flash cards, can be used not only for math skills but also for language training, spelling, and other rote memory training.
Entertainment. This category includes audio poems, stories (project 3: a child's fairy tale book), music (a audio-controlled juke box), and games. Gaming enthusiasts can use voice as a "third hand" while manipulating controls with both real hands.
Data collection. These applications are primarily verbal forms. For each electronic form slot or menu, the user speaks an answer to a question presented to the user either verbally, visually, or both. As a fall-back, voice users may also enter information directly into a GUI-based form. Projects falling into this category, include:
These applications can be very useful if your hands are busy, or if these applications are implemented on a handheld device. Usability testing is needed to determine if voice really benefits these applications when used on a desktop PC in an office environment.
Photo album tour. Another popular class of applications is a photo album tour. Some projects consist of an ordered sequence pictures that the user navigates by saying "next," "previous," or "home" (project 13: personal travel pictures). More complicated projects support a hierarchical structure in which users move up or down a hierarchy of pictures and navigate within a sequential sets of pictures at the leaves of the hierarchy (project 7: national parks tour). Project 19 (music world), replaces the photos by audio clips, which enables users to create their own playlist of downloaded tunes. These are simple applications to construct. One student constructed a PHP program to generate a photo album tour (project 16: picture album). Almost anyone can use such a generation tool to create their own personal photo album tour. Just as enabling users to author text, spreadsheets, and presentations, and e-mail make office GUI-based applications popular. Enabling users to be authors may be the key to popularizing multimodal applications.
Multimodal interface to a traditional Web page. Project 10 (library application) is an example of a GUI application that also supports verbal input. The user may switch between the traditional GUI and the multimodal user interface at any time. Because of user's typing speed, background noise, and privacy issues, extensive usability testing will determine multimodal user interfaces, such as the library application, will become popular.
New novel applications. These applications are not possible with a traditional GUI. For example, project 18 illustrates an interesting "hands busy" application: the user asks for musical notes to be played while using both hands to tune a violin or other musical instrument. Project 19 illustrates how to speak the name of a tune to be played. Imagine, speaking to your I-pod to select tunes!
While most students were able to complete their projects within two weeks, they experienced several types of problems:
I spoke with an Opera representative who indicated that the Opera documentation will be improved, and W3C SSML and SRGS will be supported in the future. He also indicated that Opera is considering development tools.
IBM recently announced the Multimodal Tools Project for Eclipse [http://www.alphaworks.ibm.com/tech/mmtp] that avoids or overcomes the bulleted problems above. There are lots of papers and articles referenced by http://www-306.ibm.com/software/pervasive/multimodal/ and http://www-128.ibm.com/developerworks/. There is also an X+V Programmer's Reference Guide in the IBM Multimodal Tools Package. For debugging, the IBM Browser has a voice log window where programmers can trace their application and an X+V debugger. The IBM version also supports the W3C speech Synthesis Markup Language (SSML) and both versions of the W3C Speech Recognition Grammar Specifications (SRGS).
If you are inspired by these examples and have a novel multmodal application in mind, I invite you to build a demo using X+V. See getting started.
Clever students are able to create interesting and novel applications as long as some training in VoiceXML, HTML, XPath, Java Script, and X+V is provided. Usability testing is needed to: (1) refine and improve the user interfaces to the initial prototypes and (2) determine if the application has both wide appeal and use.
I challenge students to use X+V to create other new and novel multimodal applications. I'll post them along with the applications shown above if they (1) are error free, (2) conform to general moral standards (no pornography, please), and (3) represent a new use of multimodal technology.
Fall quarter students at Portland State University and Oregon Institue of Technology developed the following X+V applications
Project number | Project Name | Author | URI | Comments | |
---|---|---|---|---|---|
21 | Repot an orchrid | Dianna Carroll | Caroll/1_main.xml | A useful hands-busy appliclation that uses prerecorded voice in place of speech synthesis | |
22 | Multimodal favorites | John Chee | Chee/favorites.xml | Speak the name of the web site you wish to visit. Questrion: C an you think of a way to keep the voice active after going to a site, so that the user can always choose one of the site at any time by speaking?John's answer: I thought about this and I think the simplest way would be to have the website that the user chose inside a frame. Although, I don't know how Opera would like that. My thought was that a user using this application would likely be able to get used to the opera built-in voice commands fairly quickly and use phrases such as "opera back" although that still leaves something to be desired. Users could also use the application as a landing page, so that every time they open a window (or tab) the application would automatically start, users could then create a new tab and close the one that navigated away from the application. |
|
23 | World Guide | Chris Roberts | http://dev.modspox.com/cgi-bin/guide.rb |
Provides speech access to wikipedia content. Chris's comments: The vxml engine that opera uses is very particular about what it will let you do and the fact that it doesn't support the 2.0 version of vxml made it very hard to get things working. I finally broke down and split it into two pages and used a ruby script to load the actual pages up. This way I can pre-insert the dynamic options that can be matched. I tried many things to allow a dynamic grammar but everything I tried from using Javascript to physically change the field during runtime to messing around the src to load the correct dynamic page just crashed the browser. So, attached is the finished script. |
|
24 | Online dictionary | Yasuko (Katharine) K Horiguchi | Kathy/onlineDictionary.xml | For the traveler, translate English phrases to Chinese | |
25 | Internet phone system | Hwa Young Lee | Lee/phone.xml | Multimodal user inteface for an VoIP phone system | |
26 | Buy movie tickets | Michael A .Laskowski | mal/mazeGame/mazeGame.xhtml | Trains children to pronounce difficult-to-say sounds by repeating words containing the sound as part of a maze game. This maze uses your voice to navigate the maze. Navigation is accomplished by saying a word that is in the path of the user icon (a kid on a tricycle). At each location in the maze, the words that can be said are outlined in yellow. The user can also say "help" to get the list of available words. Say "stop" to exit the game.
|
|
27 | Nagoya Scenic Spots | Michiko I Luther | michiko/proj3.xml | ||
28 | Online unit conversion | Sidy | Sidy/index.xml | Author's comments:
The JavaScript code: <vxml:script <![CDATA[ function convertV(m) {return Math.floor((m -32) / 1.8);} ]]></vxml:script> works fine. But Opera just won't let me call the convertV function for some reason. From all the research that I have done, it seems like coverage of JavaScript language with Opera is kind of spotty. I guess this is how the Opera browser is kept so small and fast.
One more thing: even though I used the <xv:sync xv:input="..." xv:field="..."> tags to synchronize between my voiceXML form and my graphical (XHTML) form, the functionality that I got was not exactly what I expected. What happened is that the <xv:sync> tag did the right thing in triggering an event to stop the prompt ( i.e. barge in) whenever I clicked on the input corresponding to a field or simple barged in. But the entered value was NOT reflected in the HTML form until I threw another event. Fortunately, I read on the w3c specification that since a voiceXML form is contained within an XHTML form, elements of the XHTML document was accessible from within voiceXML. This allowed me to update my XHTML field after collecting my voice input. |
|
29 | Computer museum | Michael A. Turner | Turner/index.xhtml | - Say the # of the link you want to visit - To refresh the page say "refresh" - To go to the previous page say "back" - To go to the next page in the # sequence say "next" - To go to the home page say "home" - To have the computer read the specs say "read" |
|