Page 226 - AI for Good-Innovate for Impact Final Report 2024
P. 226

AI for Good-Innovate for Impact



                      Use case – 53: Vivo’s Technology for All: Bridging the Accessibility

                      Gap









                      Country: China

                      Organization: vivo

                      Contact person: Li Mengzhu, limengzhu.ai@ vivo .com


                      53�1� Use Case Summary Table


                       Domain                   Accessibility
                       The Problem to be        Smartphone OS-based Information Accessibility Solutions and
                       addressed                Public Welfare for People with Disabilities
                       Key aspects of the solution vivo AI Lab, founded in 2018, has been committed to build-
                                                ing industry-leading AI technologies and providing users with
                                                ultimate product experience. Areas of work include Computer
                                                Vision, Speech Technology, Natural Language Processing and
                                                Machine Learning.
                                                We have AI R&D bases in Shenzhen, Beijing, Hangzhou and
                                                Nanjing, with a team of over 1,000 AI engineers and dozens of
                                                papers published in top AI academic conferences (AAAI, ICLR,
                                                ECCV, CVPR, InterSpeech, etc.), as well as hundreds of AI patents
                                                granted.
                                                1.  vivo Sight: The offline technologies such as Automatic
                                                   Speech Recognition (ASR), facial recognition, optical char-
                                                   acter recognition, and multi-target tracking/recognition are
                                                   integrated with the AI big model's visual multimodal capabili-
                                                   ties for image processing, to assist users to "see" the world in
                                                   personalized scenarios through multiple rounds of Q&A. The
                                                   environmental description technology can convert recog-
                                                   nized images into voice descriptions and broadcast them
                                                   aloud, thereby augmenting visual comprehension of both
                                                   on-screen and off-screen environmental information.
                                                2.  vivo Score Reading: By utilizing capabilities such as note
                                                   recognition algorithms, users can customize the reading of
                                                   music scores according to notes, beats, and measures. This
                                                   feature aids in the reading and learning of piano scores.
                                                3.   vivo Voice/Accessibility Calls: ASR and Speech-to-Text/Text-
                                                   to-Speech technologies have been applied to aid the fluent
                                                   face-to-face communication and telephone conversations
                                                   of hearing-impaired individuals. Additionally, multi-lingual
                                                   recognition and translation technology have significantly
                                                   reduced language barriers, allowing effortless communica-
                                                   tion between different users.









                  210
   221   222   223   224   225   226   227   228   229   230   231