March/April 2002
Technology TrendsBy Jim A. Larson
Users can choose from among several devices to access the World Wide Web. These devices include (a) PCs with a visual Web browser, such as Netscape’s Navigator or Microsoft’s Internet Explorer, to interpret HTML files downloaded from a Web server and executed on the PC; (b) telephones and cell phones with a verbal browser to interpret VoiceXML files downloaded from a Web server and executed on a voice server; and (c) WAP phones with a visual browser to interpret WML files downloaded from a server and executed on the WAP phone.
Additional devices, expected to appear shortly, include PDAs with wireless connections to a server that supports both visual and audio interfaces, wearable devices such as a display and microphone on a wristband and a speaker in the user’s ear that will support both visual and audio interfaces, and other devices or combination of devices that support visual and audio user interfaces.
How can developers create a single application to support these various devices? By separating the application implementation from each user interface implementation. Figure 1 illustrates an architecture to support multiple devices with differing modes of input/output. This architecture consists of:
Each user interface supports different requirements. For example, a telephone user interface may require a system-directed dialog in which the user responds to a series of questions, while a PC user interface may be user-directed in which the user selects and initiates actions.
In order to isolate the application from each of the user interfaces, the application should support a single data structure that can be used by each user interface. One such format uses XML. Data expressed in the XML format is translated to the format required by each user interface. For example, consider a flight query application in which the user requests the arrival time and gate for a specific flight. This data could be described using XML tags as illustrated in Figure 2.
The arrival information can be extracted, translated, and presented to the user as one or more controls. A control (sometimes called an interactor or widget) is a technique for presenting information to the user and/or soliciting information from the user. Different devices use different controls. For example,
The telephony application presents the four information items as a VoiceXML verbal prompt—a verbal message created by a speech synthesizer or replayed from a voice file. (Figure 4)
Controls are the basic components of a dialog. A dialog enables the user to perform a task by interacting with the application using a sequence of controls. Some dialogs present controls in parallel—such as the simultaneous display of multiple tables on a PC screen. Other dialogs present controls serially to the user—for example, a sequence of verbal questions and answers in a telephony user interface.
To design each of the user interfaces, the designer must perform the following four activities:
1. Extract the data to be presented via the user interface.
The designer determines that the flight number, departure airport, arrival time and arrival gate should be extracted from the application for presentation to the user.
2. Select controls appropriate for the user interface.
Each device supports controls that are specific to its user interface. Continuing with the example above, the designer selects four different presentation controls for each of the four devices. The designer determines that a table is the appropriate control for presenting information on a PC screen. A verbal prompt is used to present the information verbally to a telephone user. A card is appropriate for presenting the information to the WAP telephone user. The designer creates an animation showing red dotted lines indicating the direction the user should follow, and a verbal message presenting a the flight number, departure airport, arrival time and arrival gate.
3. Construct a dialog by combining controls.
For this example, the sequence of control presentation is elementary for the PC and WAP users. A single table is presented to the PC user and a single card is presented to the WAP user. Verbal interfaces require that information be sequential. Designers determine to present the flight number, departure airport, arrival time, and the arrival gate number in a VoiceXML prompt. The multimodal user interface displays a moving dotted line in addition to seeing a visual message in a box while hearing a verbal message.
4. Add media specific “decorations” to the data to be presented via each user interface.
Each interface is unique in the decorations it presents to the user. Decorations—control-specific specifications such as font size and color of the HTML table, gender and pitch of the synthesized voice of the VoiceXML prompt, color of the animation of the multimodal control—make the presentation pleasant for the user, while emphasizing the extracted information.
Figure 7 summarizes the four activities for designing each of the four user interfaces.
Devices | ||||
Activities | PC | Telephone | WAP | MM |
1. Extract | Flight Number Arrives From Arrival Time Arrival Gate |
Flight Number Arrives From Arrival Time Arrival Gate |
Flight Number Arrives From Arrival Time Arrival Gate |
Flight Number Arrives From Arrival Time Arrival Gate |
2. Select Controls | HTML table | VoiceXML prompts | WAP card | Multimodal animation and voice prompts |
3.Construct the dialog | Display a single table | Present a verbal prompt | Display a card | Display a multimodal animation and voice prompt |
4. Add decorations | Color, font size, font type and position of the table on the screen | Voice, volume and speaking rate | Card format | Graphics for animation, voice, column and speaking rate for the prompt |
Figure 7: Four steps for designing user interfaces for different devices
The extracted values are transferred from the application to each of the user interfaces, or a complete user interface is generated from the extracted values. Using current Web technology, it is possible to generate the user interface dynamically from XML information supplied by the application. For example, Extensible Stylesheet Language for Transformations (XSLT) is a W3C markup language used to specify translations. Active Server Page (ASP) or another dynamic page generation facility can be used to generate the user interfaces illustrated in Figures 3-6.
With the careful separation of the user interface dialog and input/output controls from an application, developers can create multiple user interfaces for different devices that enable users to access the same application. Each user interface has its own dialog and user interface controls. As new types of devices become available, designers only need to create new controls and the corresponding transformation so the new device can access the existing application without modifying the existing application. In effect, the XML format isolates the application from the various user interfaces and also isolates the user interfaces from each other.
Dr. James A. Larson is chairman of W3C Voice Browser Working Group. He is the author of "Developing Speech Applications Using VoiceXML" and teaches courses in user interfaces and speech applications at Portland State University and Oregon Health and Sciences University. He may be contacted at http://www.larson-tech.com.