PSU CS 410/510

CS 410/510 SLA-Spoken Language Applications
Fall Quarter 2006

Saturdays 9:00 -12:30
Room 103 Engineering Building
September 30-December 9, 2006
(updated 25 May, 2006)

Instructor
Jim Larson
jim@larson-tech.com
(503) 645-3598

Student Contest
Submit Project 2 or Project 3 to the AVIOS Student Voice Application Programming Contest and possible win up to $2000.

Course motivation
The Internet is changing the very fabric of society and business. It enables people to communicate with each other, to access nearly unlimited information, and to do business with one another. Voice is the bridge connecting the Internet and telephone network. With the widespread availability of telephones and the explosion of the number of cell phones, anyone can access the Internet from home, at work, away from their office, or on the road by speaking and listening. For these users, the phone IS the Internet.

VoiceXML is affecting the speech industry dramatically. VoiceXML is changing how developers create speech-enabled Internet applications. By hiding many low-level details, developers use VoiceXML to create speech-enabled applications by specifying high-level prompt messages for menus and forms rather than the detailed, procedural level of programming languages. Decreasing the programming time and effort enables developers to perform additional iterations of usability testing and design refinement. VoiceXML is lowering the entry barrier to creating speech applications.

While VoiceXML makes it easy to create a speech-enabled application, it is difficult to create a good one. An HTML programmer can easily learn the how to write VoiceXML scripts, but designing a usable VoiceXML form or menu is still an art more than a science. VoiceXML language manuals do not offer much in the way of advice for how to phrase a prompt, what to include in the grammar describing what a user may say in response to a prompt, and what to do if the user does not respond appropriately. This course answers fundamental speech user interface questions, including:

How to involve users in every stage of the design and implementation of speech-enabled applications
How to enable the computer to listen to users by writing speech grammars
How to enable the computer to speak to users by preparing textual prompts which are converted to speech by a speech synthesizer or are prerecorded by a professional voice actor
How to enable the computer to listen to the user's speech by creating grammars that guide a speech recognizer
How to write error handlers that deal with events such as no response by the user, unrecognizable words, and help
How to choose the appropriate speech dialog style and implement the style using VoiceXML
How to create new speech applications by reusing pieces of existing speech applications

VoiceXML makes iterative design and testing of speech-enabled applications possible. Developers can quickly mock up designs for evaluation by prospective users. Developers quickly identify and fix trouble spots. VoiceXML hides the complex programming details, and enables the developer to concentrate on developing the overall design refining the detailed wording of prompts and messages spoken to the user. VoiceXML does NOT displace the need for user testing; it makes it possible to perform more user testing.

Designing voice user interfaces is still an art. This course presents numerous guidelines suggestions, and conventional wisdom, but each day voice dialog designers are learning more so guidelines are evolving quickly. The VoiceXML language itself is evolving. The W3C Voice Browser Working Group meets at least three hours each week to discuss and modify VoiceXML and its related languages. While this course uses an early version of VoiceXML 2.0, some of the examples may be out of date by the time you read them. For the latest language specifications of VoiceXML and its related languages, see http://www.w3.org/voice

With the introduction of mobil devices that integrate the functionality of cell phones and PDAs, multimodal applications that provide both a visual and verbal user interfaces will be popular. "X+V" for speech enabling (X)HTML pages provide one approach for implementing this important and exciting new class of applications

Course Goal
Prepare students to design, construct, and evaluate spoken language applications.

Course Content
This course will consist of a combination of four activities—lectures, projects, evaluations, and texts/quizes.

Lectures—Summarize the state-of-the art practices in constructing spoken language applications.
Projects—Design and implement three spoken-language applications, including a voice-driven fast food ordering application, a voice portal to a Web site, and an interactive-animated adventure story.
Evaluations—Demonstrations and usability tests to review, evaluate, and improve spoken language applications
Tests/Quizzes—Midterm and final exams; surprise quizzes.

Prerequisite
Understanding of HTML or XHTML

Grading

The midterm and final exams are each worth 100 points.
Each surprise quiz is worth 10 points. (The lowest surprise quiz score is dropped.) There will usually be a surprise quiz each week.
Each homework assignment is worth 10 points.
Each of the three projects is worth 50 points.
Students earning more than 90% of the possible points will earn a course grade of A.
Students earning between 80% and 89% of the possible points will earn a course grade of B.
Students earning between 70% and 79% of the possible points will earn a course grade of C.
Students earning less than 70% will not earn credit for the course.
The exams will be closed book. Cheating during the quizzes or exams will result in no credit for the course.

Text

We will use the VoiceXML Guide, a CD rom home study guide, available from http://www.vxmlguide.com or the PSU bookstore.

Course Schedule

Date	Material covered	Due on this date
September 1	This syllabus	Due on this date
Oct. 7	Lesson 1: XML Background Lesson 2: VoiceXML Background Lesson 3: VoiceXML Application Structure Lesson 4: Menus	Exercises from Lessons 1-4
Oct. 14	Lesson 5: Forms and the Form Interpretation Algorithm (FIA) Lesson 6: Input Form Items—<field> and <record> Elements Lesson 7: Executable Content and Navigation Lesson 8: Procedural Elements	Exercises from Lessons 5-8. Project 1.
Oct. 21	Lesson 9: Input-form Items—<object>, <subdialog>, and <transfer> Elements Lesson 10: Variables Lesson 11: Events Lesson 12: Resource Management	Exercises from Lessons 9-12 Revised project 1 Project 2 proposal
Oct. 28	Lesson 13: Properties Lesson 14: Grammars Lesson 15: Use of Grammars in VoiceXML Lesson 16: Writing Complex Grammars	Exercises from Lessons 13-16
Nov. 4	Lesson 17: Speech Synthesis Markup Language (SSML) Lesson 18: Introduction to Semantic Interpretation Lesson 19: Semantic Interpretation—Towards Natural Language Understanding Lesson 20: Dialog Design	Exercises from Lessons 17-20 Project 2
Nov. 11	Multimodal user interfaces using "X+V"	Project 3 proposal
Nov. 18	Multimodal user interfaces using "X+V"
Nov. 25	No class, Thanksgiving weekend
Dec. 2	Project 3 demonstrations	Project 3
Dec. 9	Final Exam

Project 1:

Download the Prophecy voice platform http://www.voxeo.com/prophecy/ to your PC and implement the voice equivalent of the following paper form:

Ajax University

Student Name____________________
Student ID _______________________
Today's Date _____________________
Course Number___________________
Reason for droping course___________
_______________________________
Student signature _______________

Project 2:

Step 1: Turn in a written proposal describing (a) the purpose of your proposed speech application, (b) An example scenario of the use of your application, and (c) the target use of your application.

Step 2. After instructor approval, implement and demonstrate your project.

Project 3:

Step 1: Turn in a written proposal describing (a) the purpose of your proposed multimodal application, (b) An example scenario of the use of your application, and (c) the target use of your application.

Step 2. After instructor approval, implement and demonstrate your multimodal project.