|Enhanced shared experiences in heterogeneous network with generative AI|
|Authors: Neeraj Kumar, Ankur Narang, Brejesh Lall, Nitish Kumar Singh|
Date of publication: 30 July 2021
Published in: ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4 - AI and machine learning solutions in 5G and future networks, Pages 27-46
Article DOI : https://www.itu.int/en/journal/j-fet
COVID-19 has made the immersive experiences such as video conferencing, virtual reality/augmented reality, the most important modes of exchanging information. Despite much advancement in the network bandwidth and codec techniques, the current system still suffers from glitches, lags and poor video quality, especially under unreliable network conditions. In this paper, we propose the method of a video streaming pipeline to provide better video quality under erratic network conditions. We propose an environment where the participants can interact with each other through video conferencing by only sending the audio in the network. We propose a Multimodal Adaptive Normalization (MAN)-based architecture to synthesize a talking person video of arbitrary length using as input: an audio signal and a single image of a person. The architecture uses multimodal adaptive normalization, keypoint heatmap predictor, optical flow predictor and class activation map-based layers to learn movements of expressive facial components and hence generates a highly expressive talking-head video of the given person. We demonstrate the effectiveness of proposed streaming that dynamically controls the Quality of Experience (QoE) as per the requirements.
|Keywords: Audio to video generation, deep learning architecture, dynamic QoE control, GAN, multimodal adaptive normalization, video streaming pipeline|
Rights: © International Telecommunication Union, available under the CC BY-NC-ND 3.0 IGO license.
Full article (PDF)||0
||Free of charge||DOWNLOAD|