September/October 2003

InkML and Speech

By Dr. James A. Larson

Have you ever been in a place where speaking to a VoiceXML application on a cell phone is impractical? As you know, not all locations or situations are suitable for using speech-enabled handheld devices, including:

• Noisy environments — it’s too noisy to hear, and your cell phone picks up too many background noises;
• Areas with a lack of privacy — it may be quiet, but you may not want others to hear confidential information; and
• Meetings — it’s a breach of etiquette to talk on your cell phone during meetings.

In all of these situations you can still send information using digital ink. Future Internet appliances will integrate the functions of a personal digital assistant (PDA) and a cell phone using ink and voice.

Digital Ink and InkML

Because we learned how to write in school, pen input is familiar and easy to use. Digital ink is an input mode that captures stylus movement, angle and pressure in an electronic form. By using digital ink, messages can be sent, stored and retrieved as the user originally wrote them. Messages can be written in any language or they can be drawn.

There has been no widely used method for storing, retrieving and interpreting ink representations until now. In July 2003, the World Wide Web Consortium’s (W3C) Multimodal Interaction Working Group published a working draft of the Digital Ink Markup Language (InkML). InkML supports a comprehensive and accurate representation of digital ink. InkML will provide a way to expand current Web-based applications to support pen input and enable new applications currently not possible without pen input. To see the complete InkML document, visit www.w3.org/2002/mmi.

InkML Uses

Pen input is flexible, easy-to-use and convenient. In the future, people will use pen input in two-way communications (similar to instant messaging), where two or more people conduct “real time” conversations or “brainstorming” sessions using their portable Internet appliances. With ink, users will see words and drawings as they are written and drawn. In another example, digital photographs can be coordinated and stored with speech and ink annotations. With multimodal applications, ink, speech and keyboards will be used as interchangeable input methods using portable Internet appliances. Pen input will be used as a complement to speech, as in “put that (point to the item) there (point to the location).” In addition, people with sight or hearing disabilities will use ink as a viable alternative to speech.

Scenario 1 (Click here for the scenario 1 graphic.)

John Smith is on his way to his next meeting. Using speech, he’s checking his portable Internet appliance for his next meeting location (Figures 1a and 1b). Because John had meetings all morning, he wants to check his e-mail and respond to high-priority messages (Figures 1c and 1d). As he enters the meeting room, John changes his portable Internet appliance to the silent mode (Figure 1e) to conform with meeting etiquette. However, he’s received an e-mail from his boss, Dan Harris, that must be answered; so using ink, John pens his reply (Figure 1f).

Scenario 2 (Click here for the scenario 1 & 2 graphic.)

Janis Cooper is driving to work when she remembers that she needs to order a medication from the pharmacy. Using speech, Janis tells her portable Internet appliance to dial the Evergreen Pharmacy (Figure 2a). The pharmacy has a form that Janis must fill-in and sign before the prescription can be ordered. Janis responds vocally to the prompts from the pharmacy’s application (Figures 2b and 2c). Janis must verify who she is and that the information is correct by signing the form, so she parks her car and signs the form (Figure 2d) using digital ink. In this example the pharmacy’s application prompted Janis for voice responses, but she also could have filled in the form using digital ink for the entire process.

Call to Action

By creating applications that use several modes of input such as ink, speech and keyboard, developers provide maximum flexibility and ease of use for consumers. InkML is an exciting addition to multimodal applications. So, what should you do? Review the working draft of the InkML document. Send your comments and suggestions to the W3C Multimodal Working Group (www-multimodal@w3.org). And, for maximum input to this proposed standard, join the Working Group to help establish the standard for integrating pen input for multimodal applications.


Dr. James A. Larson is Manager of Advanced Human Input/Output at Intel Corporation, and author of the book, VoiceXML -- Introduction to Developing Speech Applications. He can be reached at jim@larson-tech.com and his Web site is http://www.larson-tech.com.