ITU-T Study Group 12 (Study Period 2005-2008)

عربي | 中文 | Español | Français | Русский

Advanced Search

Home : ITU-T Home

Roadmap for Question 12/12: Performance Evaluation of Services Based on Speech Technology

Scope

It is the goal of Q.12/12 (Performance evaluation of services based on speech technology) to develop assessment and evaluation methods for telephone services which rely on speech technology, such as speech and speaker recognition, natural language understanding, speech synthesis, and spoken dialogue systems.

Evaluation should be carried out from two different points of view:

The user’s perspective: How is the service perceived by the user? Which are the aspects contributing to overall quality, user satisfaction and acceptability?
The system developer’s perspective: What is the performance of the individual components of the system? How is their performance influenced by the transmission channel, and by the behaviour of the user?

Q.12/12 aims at determining parameters which describe the performance of all major components of spoken dialogue systems:

Speech recognition
Speech understanding
Dialogue management
Response generation
Speech synthesis

The performance depends on the transmission channel (network, user interfaces, acoustic situation) between the system and the user, as it is depicted below.

The work will be divided into three steps:

Determine speech input performance
Determine speech output quality
Determine overall system quality, user satisfaction and acceptability

Achievements

ITU-T produced two Recommendations addressing subjective evaluation methods related to spoken dialogue systems:

P.85 (1994): A method for subjective performance assessment of the quality of speech voice output devices
P.851 (2003): Subjective quality evaluation of services based on speech technology

Current Work

Our current work aims at answering the following questions:

Which parameters can be used to reliably quantify the performance of speech technology devices in the context of voice-enabled telephone services? How can these parameters be measured?
Is it possible to determine the quality of synthesised speech in an instrumental way, e.g. using objective methods (ITU-T Rec. P.563: Single-ended method for objective speech quality assessment in narrow-band telephony applications)?
What is the influence of transmission impairments encountered in modern networks (non-linear codec distortions, time-variant channel characteristics, circuit and comfort noise, handset/ headset/ HFT characteristics) and in acoustically adverse conditions (e.g. in a moving car) on the performance of speech and speaker recognition devices, and on the quality of synthesised speech?
How can this influence be described and predicted? Are objective methods and network planning models recommended by the ITU-T able to predict the influence of transmission impairments on recognition performance and synthesised speech quality as well? Are the requirements defined for ensuring a sufficiently high speech communication quality also sufficient to guarantee high recognition accuracy?
Which quality aspects are important for the users of such services? How can these aspects be quantified with subjective evaluation methods? How far is the user of the service distracted from other tasks (e.g. from driving)?
How are the subjective quality aspects of the overall service related to the performance of the individual speech technology devices? Is it possible to predict service quality on the basis of measurable parameters?

Expected Outcome

A number of new and revised Recommendations are expected as an outcome of our work:

New Supplement to P.85X Series: Collection of parameters quantifying the interaction with spoken dialogue systems, and the performance of system components. Expected approval: 10/2005.
New Recommendation P.PST: Parameters describing the performance of speech technology components. Expected approval: 2006.
New Recommendation P.TCI: Transmission channel impact on speech technology performance. Expected approval: 2006.
New Recommendation P.QVS: Quality prediction models for services based on speech technology. Expected approval: 2007.
Update of Recommendation P.851: Subjective quality evaluation of services based on speech technology.

Please contribute!

In case that you are interested in the work of Q.12/12, please feel free to contribute with questions, suggestions, or descriptions of your own work! Please send an email to one of the Rapporteurs:

Sebastian Möller: Sebastian.moeller@ruhr-uni-bochum.de
Alexander Raake: Alexander.Raake@limsi.fr