Page 695 - AI for Good Innovate for Impact
P. 695

AI for Good Innovate for Impact



               At the stage when the video content is actually played at the called terminal, the core function
               should remain in the administrative state of Ready, while the execution state provides direct
               feedback on the playback result: A smooth display of a 20-second video is Passed; if playback
               fails due to terminal incompatibility, lag or interruption, it is Failed. Failed.                   cities  4.8: Smart home/

               The next part of the called user's response (answer, reject or no action) does not involve the
               management state, but purely user behavior triggers different execution states The next part
               of the called user's operation response (answer, reject or no operation) does not involve the
               management state, it is purely the user's behavior that triggers the different execution states:
               the user takes the initiative to answer, the system successfully establishes the connection, which
               marks the achievement of the main business objectives, and the execution state is Passed; the
               user's refusal to answer results in the caller receiving If the called user does not operate within
               the ringing timeout period, the system automatically ends the call, and the execution state can
               be regarded as Skip because this interaction is not handled effectively.

               Finally, in the final part of call establishment or termination, the successful establishment of the
               call belongs to Passed state, while the failure of the call due to rejection or timeout is a Failed
               state. The administrative state is not usually applied directly here.

               Partners: N/A


               2�2     Benefits of the use case

               AI video ringback tones make information transmission in mobile communication more
               efficient and valuable. At the social level, promoting the rapid dissemination of information,
               breaking geographical limitations, allowing remote areas to access cultural and entertainment
               information, and also transmitting their own information to the outside world. From an
               environmental perspective, it reduces unnecessary travel caused by information exchange,
               lowers transportation energy consumption and exhaust emissions.


               2�3     Future Work

               On the one hand, we focus on user needs and provide services such as commercial promotion,
               personal display, and life tips to users who have the ability to transmit information before a call;

               On the other hand, we will continue to explore the value of AI technology, promote its
               implementation, create products, and benefit more users.

               3      Use Case Requirements


               REQ-01: Multi-Modal Input Support

               This requirement calls for The system has the ability to handle heterogeneous input data.
               Users can use text descriptions (≤100 characters of natural language) or image files (Joint
               Photographic Experts Group/Portable Network Graphics(JPG/PNG) format, resolution ≥1080P,
               size ≤5MB) as the video generation source. The key technology implementation needs to
               deploy multimodal alignment model (e.g., Contrastive Language–Image Pretraining(CLIP)
               architecture) to map textual semantics and visual features to a unified implicit space, to ensure
               that the input content parsing delay is ≤3 seconds (P95 response time), and the parsing results






                                                                                                    659
   690   691   692   693   694   695   696   697   698   699   700