Work item:
|
P.AI-MOS
|
Subject/title:
|
Objective quality assessment method for QoE in multimodal interaction with AI applications
|
Status:
|
Under study
|
Approval process:
|
AAP
|
Type of work item:
|
Recommendation
|
Version:
|
New
|
Equivalent number:
|
-
|
Timing:
|
2027 (Medium priority)
|
Liaison:
|
-
|
Supporting members:
|
China Mobile Communications Co. Ltd., Huawei Technologies Co. Ltd.; TU Berlin; LuleƄ University of Technology, Blekinge Institute of Technology
|
Summary:
|
The main process of AI multimodal interactive applications is to transmit user side data such as text, audio, images, and videos to the cloud for processing with large-scale AI models, and then transmit multimodal data back to the terminal side for feedback presentation to the user. The factors that affect the user experience of AI multimodal interactive applications mainly include the quality of media sources input to the AI, the perceived interaction quality by users, and the presentation quality that provides feedback or can be viewed by users.
Due to factors such as transmission network status, encoding and decoding, and terminal devices, users may experience quality degradation when using AI multimodal interactive applications, such as loss of image/video information, loss of image/video quality, distortion of audio, audio-visual asynchrony, excessive first word/first image delay in output, and unstable end-to-end latency.
Compared to subjective evaluation, developing objective quality evaluation methods for AI multimodal interactive applications can greatly improve evaluation efficiency, have a wide range of applicable scenario, and support user experience quality assurance and improvement.
At present, some objective evaluation methods for multimedia quality have been developed and standardized, such as video on demand, video calls, live streaming, etc. However, most of the existing methods evaluate user experience for symmetrical modality or audio-visual interaction, and the corresponding services do not undergo processes such as AI data perception and large-scale model inference. While for AI multimodal interactive applications, the user experience not only involves multi-modalities, but also are the comprehensive result of the interaction experience, media quality and presentation quality under the combination of various modalities
Therefore, this new work item is proposed to fill the gap, by focusing on two types of AI applications: real-time visual conversational AI applications (e.g. ChatGPT-4o), and task-based user agent AI applications (e.g. Manus, Genspark, Claude). The new work item can be advanced through the following steps:
Study and analyze the applicability of existing subjective test methods. Prepare a preliminary subjective test plan.
Participating parties jointly discuss and determine the subjective test protocol.
Conduct subjective tests and analyze test results. Build the subjective test datasets.
Develop and validate objective models based on the subjective test datasets
|
Comment:
|
-
|
Reference(s):
|
|
|
Historic references:
|
Contact(s):
|
|
ITU-T A.5 justification(s): |
|
|
|
First registration in the WP:
2025-09-18 14:53:27
|
Last update:
2025-09-19 09:42:05
|
|