Twenty Multimodal Projects Using X+V on the Opera Browser

James A. Larson

Portland State University
Oregon Health and Sciences University
Oregon Institue of Technology

 

Student projects

Opera with its new voice feature enables developers to create and deploy multimodal applications. If you want to experience the multimodal applications below, you will need to have the Opera version 8.0 browser. Voice Quick Start-up Guide: How to talk with your browser

Opera can support three types of user interfaces:

Dialog Type URI Comments
Verbal only jim/paymentVerbal.xml VoiceXML code embedded into HTML so Opera browser users can hear the verbal-only dialog
GUI only jim/paymentGUI.xml A traditional HTML application
Multimodal jim/paymentMM.xml VoiceXML code embedded into HTML code using X+V

During the last two weeks of the school term, students in the Oregon Health and Sciences University course CSE 564 and the Portland State University course CS 410/510 created several XHTML plus Voice (X+V) applications. Students had a working knowledge of XHTML and VoiceXML but no previous knowledge of X+V. I presented an hour overview of X+V syntax and two sample applications to students in both courses. Teams of two students were formed to design and implement a multimodal application within two weeks. The table below contains some of the student projects. If you install the Opera browser and the voice plug-in, then you can experience these multimodal applications. By clicking "source" under the "view" pulldown, you can examine the source code to see how students implemented the multimodal user interface.

Project number Project Name Author URI Comments  
1 Kid's holiday craft projects Ashley Irving ashley/home.xml Choose the project by saying its number. Page through the instructions by saying "next." Do you think that the content should be spoken in addition to being displayed on the screen?  
2 Making a peanut butter sandwich Emerson Murphy-Hill No longer available Page through the instructions by saying "next," "back," or "read."  
3 Buying a car Glenn Diviney glenn/index.xml    
4 Origami Khanh Duong No longer available    
5 Collecting health data Sunil Lahudia sunil/healthData.xml    
6 Buy movie tickets Oindrila Mukherjee qindrila/movieSel_test.xml Use the name "James Bond" on the second screen to reserve tickets.  
7 National park tours Medha Nirguide medha/Main.xml    
8 Restaurant menu Quang Nguyen quang/project3.xml    
9 Banking Rajeshwari Patil No longer available Account = "10"; Passcode = "hello". Amounts must be in increments of 100.  
10 Library Frank Adrian and Ken Anderson No longer available    
11 Quick finder Driss Takir driss/voice.xml    
12 Order computer Ashwini Kulkarni ashwini/login.xml Use "david" for the user name and "capital" for the password.  
13 Personal travel pictures Chris Holm chris/index.vxml Say "next" or "previous" to view the cities.  
14 Animal shelter

David Graves and Dina Suehiro

dina/index.xml    
15 Cyber reader Tom Feliz tom/index.xml Hear how much better a recording is than TTS.  
16 Picture album Ricky Cancro No longer available PHP to generate XML code for a picture album.  
17 Flash cards (addition)   jim/flashCardAddition.xml    
18 Tune your violin Based on a SALT program developed by Deborah Dahl, chair of the W3C Multimodal Interaction Working Group dahl/tune.xml Both hands are busy as a violin player requests to hear tones so the violin player can tune the violin.  
19 Music world Doan Ng doan/mainpage.xml This application simulates a hardware device with a push-to-speak button. You must click the "push to speak" soft button as you press the (default) ScrLk button on your computer.  
20 The game of "go" Sean Pearson http://web.pdx.edu/~seanp/school/cs410/capturego.cgi    

I have categorized the student projects into the following categories.

Hands-busy instruction. This category of project provides incremental instruction while the user manipulates real-world artifacts with their hands. This category includes applications in which the user diagnoses a problem (determine what is wrong with a car's motor), repaire a device (fix a leaky faucet); and construct artifacts (project 1: a talking holiday project book for children; project 2: a talking recipe book; and project 3: creating a complex origami artifact). This category has the potential of being used to diagnose or repair products without calling a live-help agent. The application may be available by connecting to an application server, or the application may be supplied on a CD-rom packaged with the product. Project 17 (not necessarily a hands-busy instruction application) illustrates the multimodal equivalent of "flash cards." Some children use "flash cards" to learn their addition tables. Repetitive drills, such as flash cards, can be used not only for math skills but also for language training, spelling, and other rote memory training.

Entertainment. This category includes audio poems, stories (project 3: a child's fairy tale book), music (a audio-controlled juke box), and games. Gaming enthusiasts can use voice as a "third hand" while manipulating controls with both real hands.

Data collection. These applications are primarily verbal forms. For each electronic form slot or menu, the user speaks an answer to a question presented to the user either verbally, visually, or both. As a fall-back, voice users may also enter information directly into a GUI-based form. Projects falling into this category, include:

These applications can be very useful if your hands are busy, or if these applications are implemented on a handheld device. Usability testing is needed to determine if voice really benefits these applications when used on a desktop PC in an office environment.

Photo album tour. Another popular class of applications is a photo album tour. Some projects consist of an ordered sequence pictures that the user navigates by saying "next," "previous," or "home" (project 13: personal travel pictures). More complicated projects support a hierarchical structure in which users move up or down a hierarchy of pictures and navigate within a sequential sets of pictures at the leaves of the hierarchy (project 7: national parks tour). Project 19 (music world), replaces the photos by audio clips, which enables users to create their own playlist of downloaded tunes. These are simple applications to construct. One student constructed a PHP program to generate a photo album tour (project 16: picture album). Almost anyone can use such a generation tool to create their own personal photo album tour. Just as enabling users to author text, spreadsheets, and presentations, and e-mail make office GUI-based applications popular. Enabling users to be authors may be the key to popularizing multimodal applications.

Multimodal interface to a traditional Web page. Project 10 (library application) is an example of a GUI application that also supports verbal input. The user may switch between the traditional GUI and the multimodal user interface at any time. Because of user's typing speed, background noise, and privacy issues, extensive usability testing will determine multimodal user interfaces, such as the library application, will become popular.

New novel applications. These applications are not possible with a traditional GUI. For example, project 18 illustrates an interesting "hands busy" application: the user asks for musical notes to be played while using both hands to tune a violin or other musical instrument. Project 19 illustrates how to speak the name of a tune to be played. Imagine, speaking to your I-pod to select tunes!

Students experience with Opera's implementation of X+V

While most students were able to complete their projects within two weeks, they experienced several types of problems:

I spoke with an Opera representative who indicated that the Opera documentation will be improved, and W3C SSML and SRGS will be supported in the future. He also indicated that Opera is considering development tools.

IBM recently announced the Multimodal Tools Project for Eclipse [http://www.alphaworks.ibm.com/tech/mmtp] that avoids or overcomes the bulleted problems above. There are lots of papers and articles referenced by http://www-306.ibm.com/software/pervasive/multimodal/ and http://www-128.ibm.com/developerworks/. There is also an X+V Programmer's Reference Guide in the IBM Multimodal Tools Package. For debugging, the IBM Browser has a voice log window where programmers can trace their application and an X+V debugger. The IBM version also supports the W3C speech Synthesis Markup Language (SSML) and both versions of the W3C Speech Recognition Grammar Specifications (SRGS).

If you are inspired by these examples and have a novel multmodal application in mind, I invite you to build a demo using X+V. See getting started.

Conclusion

Clever students are able to create interesting and novel applications as long as some training in VoiceXML, HTML, XPath, Java Script, and X+V is provided. Usability testing is needed to: (1) refine and improve the user interfaces to the initial prototypes and (2) determine if the application has both wide appeal and use.

I challenge students to use X+V to create other new and novel multimodal applications. I'll post them along with the applications shown above if they (1) are error free, (2) conform to general moral standards (no pornography, please), and (3) represent a new use of multimodal technology.

Update

Fall quarter students at Portland State University and Oregon Institue of Technology developed the following X+V applications

Project number Project Name Author URI Comments  
21 Repot an orchrid Dianna Carroll Caroll/1_main.xml A useful hands-busy appliclation that uses prerecorded voice in place of speech synthesis  
22 Multimodal favorites John Chee Chee/favorites.xml

Speak the name of the web site you wish to visit.

Questrion: C an you think of a way to keep the voice active after going to a site, > so that the user can always choose one of the site at any time by speaking?

John's answer: I thought about this and I think the simplest way would be to have the website that the user chose inside a frame. Although, I don't know how Opera would like that. My thought was that a user using this application would likely be able to get used to the opera built-in voice commands fairly quickly and use phrases such as "opera back" although that still leaves something to be desired. Users could also use the application as a landing page, so that every time they open a window (or tab) the application would automatically start, users could then create a new tab and close the one that navigated away from the application.

 
23 World Guide Chris Roberts
http://dev.modspox.com/cgi-bin/guide.rb

Provides speech access to wikipedia content.

Chris's comments: The vxml engine that opera uses is very particular about what it will let you do and the fact that it doesn't support the 2.0 version of vxml made it very hard to get things working. I finally broke down and split it into two pages and used a ruby script to load the actual pages up. This way I can pre-insert the dynamic options that can be matched. I tried many things to allow a dynamic grammar but everything I tried from using Javascript to physically change the field during runtime to messing around the src to load the correct dynamic page just crashed the browser. So, attached is the finished script.

 
24 Online dictionary Yasuko (Katharine) K Horiguchi Kathy/onlineDictionary.xml For the traveler, translate English phrases to Chinese  
25 Internet phone system Hwa Young Lee Lee/phone.xml Multimodal user inteface for an VoIP phone system  
26 Buy movie tickets Michael A .Laskowski mal/mazeGame/mazeGame.xhtml

Trains children to pronounce difficult-to-say sounds by repeating words containing the sound as part of a maze game.

This maze uses your voice to navigate the maze. Navigation is accomplished by saying a word that is in the path of the user icon (a kid on a tricycle).

At each location in the maze, the words that can be said are outlined in yellow. The user can also say "help" to get the list of available words.

Say "stop" to exit the game.
Say "end" to navigate to the end of the maze. (can't say end until it is available for saying.)
Say "help" to get the list of available words.

 

 
27 Nagoya Scenic Spots Michiko I Luther michiko/proj3.xml    
28 Online unit conversion Sidy Sidy/index.xml Author's comments:
The JavaScript code: <vxml:script <![CDATA[ function convertV(m) {return   Math.floor((m -32) / 1.8);} ]]></vxml:script> works fine.  But Opera just won't let me call the convertV function for some reason.  From all the research that I have done, it seems like coverage of JavaScript language with Opera is kind of spotty.  I guess this is how the Opera browser is kept so small and fast.
 
One more thing: even though I used the <xv:sync xv:input="..." xv:field="..."> tags to synchronize between my voiceXML form and my graphical (XHTML) form, the functionality that I got was not exactly what I expected.  What happened is that the <xv:sync> tag did the right thing in triggering an event to stop the prompt ( i.e. barge in) whenever I clicked on the input corresponding to a field or simple barged in.  But the entered value was NOT reflected in the HTML form until I threw another event.  Fortunately, I read on the w3c specification that since a voiceXML form is contained within an XHTML form, elements of the XHTML document was accessible from within voiceXML.  This allowed me to update my XHTML field after collecting my voice input.
 
29 Computer museum Michael A. Turner Turner/index.xhtml - Say the # of the link you want to visit
- To refresh the page say "refresh"
- To go to the previous page say "back"
- To go to the next page in the # sequence say "next"
- To go to the home page say "home"
- To have the computer read the specs say "read"