Consistency and Symmetry—and Other Guiding Principles for Designing Speech Applications

Designing dialogs for verbal user-computer interactions is still an art as much as it is a science. Dramatically improve the usability of your speech interfaces by following some simple guiding principles.

Designing dialogs for verbal user-computer interactions is still an art as much as it is a science. Speech application developers iteratively test applications to refine dialogs to minimize communication breakdowns, decrease the time for callers to complete tasks, and improve the naturalness of the conversation.

From the limited experience to date, some guiding principles for designing dialogs have become apparent. These principles are used to generate human factor guidelines for verbal user interfaces. While guidelines often serve as a checklist of what to do and what not to do in a verbal user interface, guiding principles motivate and encourage good dialog designs.

Two early guiding principles have emerged: consistency and symmetry. Presentation consistency refers to the similar structure and format of prompts presented to the caller. Response consistency refers the similar phrasing of words spoken by the caller in response to the prompts. Symmetry is a type of consistency between prompts and responses, in which callers mimic or parrot the structure, format, and words they hear in the prompt.

Presentation Consistency

Unlike creative writing, in which consistency can sometimes be boring, presentation consistency enables callers to predict what they will hear, so they can be ready to respond appropriately. More experienced callers can bypass much of the prompt to accelerate the dialog. Here are two examples of presentation consistency.

For example:

Color.

(This is the name of the menu. Expert callers can barge in here because they know the question and the allowable options.)

Say the color you want.

(An average caller can listen to the question and barge in if they know the answer.)

Green, red, or blue?

(A novice caller listens to all of the menu options before responding.)

Other types of presentation consistency include consistent formats for help and error messages, formats for items in a form, validation of the caller's input, and the use of audio icons to indicate that "it is the caller's turn to speak," "the computer is working," and "this is a hyperlink that can be invoked by speaking its name."

If the style and format of dialogs are consistent, then an application's persona (the personality of the application) will emerge as consistent and helpful rather than random and confused.

Response Consistency

Callers can complete their tasks more quickly if they can respond consistently. The response becomes more automatic if users can use the same words and phrases in similar situations. Here are several ways to increase response consistency:

If applications reuse standard dialog modules, then callers will learn how to interact with these modules when they are first encountered, and use that knowledge whenever the modules are re-encountered. For example, speaking a credit card number in four chunks of four digits or speaking a North American telephone number in chunks of three, three, and four digits accelerates the dialog and makes the experience more natural for the caller. Most VoiceXML vendors have collections of dialog modules that, when reused, accelerate not only the callers' dialogs, but also the time to implement the application.

Symmetry

The principle of symmetry suggests that callers respond to prompts in the same style and wording used by the prompt. If the prompt is lengthy, the caller's responses tend to be lengthy. If the prompt is vague, then the caller tends to respond with vague answers. In order to encourage callers to say the words and phrases covered by a grammar, the prompt should be short, precise, and instructive. Paraphrasing the age old Golden Rule: "Do unto the caller as you would have the caller do unto you."

Designers specify two key items for each turn of an application-directed dialog:

  • Specify the grammar to contain the words and phrases that minimize the errors made by the automatic speech recognizer.

  • Formulate the prompts to encourage the caller to speak the words and phrases in the grammar.

After specifying the grammar, apply the principle of symmetry by encouraging the caller to parrot words, phrases, and patterns used within the prompt.

  • Prompt the caller with a menu of words from the grammar. The caller can respond by selecting one of the words from the menu rather than guessing what words are in the grammar. For example, suppose the grammar contains two words: "Green" and "gray." For example, avoid using the following prompt:

    Prompt: What color?

    Response: Chartreuse.

    Instead, enumerate the key words from the grammar:

    Prompt: What color? Green or gray?

    Response: Green.

    Selecting from a verbal menu is cognitively easier for the caller than selecting a word from one's mind. Verbal menus also decrease speech-recognition engine mismatches, which happen when callers say words not in the current grammar.

  • Prompt the caller with a pattern that the caller can mimic. Consider the following prompt to confirm values previously spoken by the caller:

    Prompt: Do you want to fly from Boston to Chicago on Thursday?

    Response: No, I want to fly from Austin to Chicago on Thursday.

    Words in the pattern are useful to assign spoken values to field names. In the above example, the spoken word "Austin" is assigned to the departure_city because of the pattern word "from," whereas "Chicago" is assigned to the arrival_city field because of the pattern word "to."

  • Prompt the caller with a short question This encourages the caller to respond with a short response. For example, avoid using the following:

    Prompt: Please say your first name.

    Response: My first name is Jim.

    Instead, use the following:

    Prompt: Your first name?

    Response: Jim.

    Short responses are easier for the speech-recognition engine to recognize rather than selecting keywords from the caller's response. A short prompt may also avoid false expectations about what the speech-recognition engine can recognize.

4. Other Guiding Principles

Consistency and symmetry are two guiding principles for designing voice caller interfaces. Other guiding principles include the following:

  1. Callers learn-by-doing. Psychologists tell us that one of the best methods for learning how to do something is to do it. This "sink-or-swim" approach is especially effective when callers are assured that if they start to sink, they will receive help to get back on track. Do not present long tutorials at the beginning of a telephony application. Callers remember little of the specific instructions, and resent the time required to listen to the tutorial. Instead, encourage the callers to explore. Dialog designers encourage callers to explore the speech application in at least two ways:

    • Prompts suggest words that the caller may say to explore new portions of the application. This encourages the caller to jump off into "unknown waters." For example, the following informs the user of physical objects that can be controlled from the telephone:

      "What action? Thermostat, lights, security, kitchen appliances, or alarm clock?"

    • Specify event handlers to help the caller when the speech-recognition system fails to understand what the caller says. This pulls the "sinking" caller back to the surface.

  2. Establish specific performance criteria. Developers need to conduct usability tests and measure performance criteria to determine whether changes to an application have an overall beneficial or harmful effect. It is easy to be misled by a change that solves a specific problem yet has an overall harmful effect to a speech application.

  3. Leverage the caller's knowledge. A caller familiar with the application domain should find the application easy to learn and natural to use. This occurs when the caller's conceptual model of the domain (the caller's understanding of the domain objects, relationships commands, and constraints from real life) match the application's conceptual model. Talk with several prospective callers before constructing the application; and use the terms, phrases, and names frequently used by the prospective callers.

  4. Optimize the frequent case. Make it easy to do simple things and possible to do complex things. Just like airport security, fliers with no suspicious objects go to an express line, whereas fliers with suspicious objects are asked to go to a "trouble" line. Frequent fliers quickly learn how to avoid the trouble lines so they can get through the security check faster. Organize the call flow so callers can perform the simple and frequent tasks quickly and be routed to specific paths in the call flow for infrequent and difficult tasks.

    NOTE

    For additional suggestions on how to improve the quality and experience of using telephony applications, see the author's book, VoiceXML: An Introduction to Building Voice Applications (Prentice Hall, 2002, ISBN: 0130092622).