Page 225 - AI for Good-Innovate for Impact
P. 225

AI for Good-Innovate for Impact



               Use case – 53: vivo’s Technology for All: Bridging the Accessibility

               Gap                                                                                                  53 - VIVO









               Country: China

               Organization: vivo Mobile Communication Co., Ltd.

               Contact person: Li Mengzhu, limengzhu.ai@ vivo .com


               53�1� Use Case Summary Table


                Domain                   Accessibility
                The Problem to be        Smartphone OS-based Information Accessibility Solutions and
                addressed                Public Welfare for People with Disabilities
                Key aspects of the solution vivo AI Lab, founded in 2018, has been committed to build-
                                         ing industry-leading AI technologies and providing users with
                                         ultimate product experience. Areas of work include Computer
                                         Vision, Speech Technology, Natural Language Processing and
                                         Machine Learning.
                                         We have AI R&D bases in Shenzhen, Beijing, Hangzhou and
                                         Nanjing, with a team of over 1,000 AI engineers and dozens of
                                         papers published in top AI academic conferences (AAAI, ICLR,
                                         ECCV, CVPR, InterSpeech, etc.), as well as hundreds of AI patents
                                         granted.
                                         1.  vivo Sight: The offline technologies such as Automatic
                                            Speech Recognition (ASR), facial recognition, optical char-
                                            acter recognition, and multi-target tracking/recognition are
                                            integrated with the AI big model's visual multimodal capabili-
                                            ties for image processing, to assist users to "see" the world in
                                            personalized scenarios through multiple rounds of Q&A. The
                                            environmental description technology can convert recog-
                                            nized images into voice descriptions and broadcast them
                                            aloud, thereby augmenting visual comprehension of both
                                            on-screen and off-screen environmental information.
                                         2.  vivo Score Reading: By utilizing capabilities such as note
                                            recognition algorithms, users can customize the reading of
                                            music scores according to notes, beats, and measures. This
                                            feature aids in the reading and learning of piano scores.
                                         3.   vivo Voice/Accessibility Calls: ASR and Speech-to-Text/Text-
                                            to-Speech technologies have been applied to aid the fluent
                                            face-to-face communication and telephone conversations
                                            of hearing-impaired individuals. Additionally, multi-lingual
                                            recognition and translation technology have significantly
                                            reduced language barriers, allowing effortless communica-
                                            tion between different users.









                                                                                                    209
   220   221   222   223   224   225   226   227   228   229   230