Page 695 - AI for Good Innovate for Impact
P. 695
AI for Good Innovate for Impact
At the stage when the video content is actually played at the called terminal, the core function
should remain in the administrative state of Ready, while the execution state provides direct
feedback on the playback result: A smooth display of a 20-second video is Passed; if playback
fails due to terminal incompatibility, lag or interruption, it is Failed. Failed. cities 4.8: Smart home/
The next part of the called user's response (answer, reject or no action) does not involve the
management state, but purely user behavior triggers different execution states The next part
of the called user's operation response (answer, reject or no operation) does not involve the
management state, it is purely the user's behavior that triggers the different execution states:
the user takes the initiative to answer, the system successfully establishes the connection, which
marks the achievement of the main business objectives, and the execution state is Passed; the
user's refusal to answer results in the caller receiving If the called user does not operate within
the ringing timeout period, the system automatically ends the call, and the execution state can
be regarded as Skip because this interaction is not handled effectively.
Finally, in the final part of call establishment or termination, the successful establishment of the
call belongs to Passed state, while the failure of the call due to rejection or timeout is a Failed
state. The administrative state is not usually applied directly here.
Partners: N/A
2�2 Benefits of the use case
AI video ringback tones make information transmission in mobile communication more
efficient and valuable. At the social level, promoting the rapid dissemination of information,
breaking geographical limitations, allowing remote areas to access cultural and entertainment
information, and also transmitting their own information to the outside world. From an
environmental perspective, it reduces unnecessary travel caused by information exchange,
lowers transportation energy consumption and exhaust emissions.
2�3 Future Work
On the one hand, we focus on user needs and provide services such as commercial promotion,
personal display, and life tips to users who have the ability to transmit information before a call;
On the other hand, we will continue to explore the value of AI technology, promote its
implementation, create products, and benefit more users.
3 Use Case Requirements
REQ-01: Multi-Modal Input Support
This requirement calls for The system has the ability to handle heterogeneous input data.
Users can use text descriptions (≤100 characters of natural language) or image files (Joint
Photographic Experts Group/Portable Network Graphics(JPG/PNG) format, resolution ≥1080P,
size ≤5MB) as the video generation source. The key technology implementation needs to
deploy multimodal alignment model (e.g., Contrastive Language–Image Pretraining(CLIP)
architecture) to map textual semantics and visual features to a unified implicit space, to ensure
that the input content parsing delay is ≤3 seconds (P95 response time), and the parsing results
659

