Page 60 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 60

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4





          [22] Valentin Bazarevsky, Yury Kartynnik, Andrey     [33] Andreea Stef, Kaveen Perera, Hubert Shum, and
               Vakunov, Karthik Raveendran, and Matthias             Edmond Ho. “Synthesizing Expressive Facial and
               Grundmann. “Blazeface: Sub‑millisecond neural         Speech Animation by Text‑to‑IPA Translation with
               face detection on mobile gpus”. In: arXiv preprint    Emotion Control”. In: Dec. 2018, pp. 1–8. DOI: 10.
               arXiv:1907.05047 (2019).                              1109/SKIMA.2018.8631536.
          [23] Yury Kartynnik, Artsiom Ablavatski, Ivan Gr‑    [34] Tero Karras, Timo Aila, Samuli Laine, Antti Herva,
               ishchenko, and Matthias Grundmann. “Real‑time         and Jaakko Lehtinen. “Audio‑driven facial anima‑
               Facial Surface Geometry from Monocular Video on       tion by joint end‑to‑end learning of pose and emo‑
               Mobile GPUs”. In: arXiv preprint arXiv:1907.06724     tion”. In: ACM Transactions on Graphics 36 (July
               (2019).                                               2017), pp. 1–12. DOI: 10.1145/3072959.3073658.
          [24] AdrianBulatandGeorgiosTzimiropoulos.“Howfar     [35] Sarah Taylor, Moshe Mahler, Barry‑John Theobald,
               are we from solving the 2d & 3d face alignment        and Iain Matthews. “Dynamic units of visual
               problem?(and a dataset of 230,000 3d facial land‑     speech”. In: July 2012, pp. 275–284.
               marks)”. In: Proceedings of the IEEE International  [36] Pif Edwards, Chris Landreth, Eugene Fiume, and
               Conference on Computer Vision. 2017, pp. 1021–        Karan Singh. “JALI: an animator‑centric viseme
               1030.                                                 model for expressive lip synchronization”. In: ACM
          [25] Ivan Grishchenko, Artsiom Ablavatski, Yury            Transactions on Graphics 35 (July 2016), pp. 1–11.
               Kartynnik, Karthik Raveendran, and Matthias           DOI: 10.1145/2897824.2925984.
               Grundmann. “Attention Mesh: High‑ idelity Face  [37] Wesley Mattheyses and Werner Verhelst. “Audiovi‑
               Mesh Prediction in Real‑time”. In: arXiv preprint     sual speech synthesis: An overview of the state‑of‑
               arXiv:2006.10962 (2020).                              the‑art”. In: Speech Communication 66 (Nov. 2014).
          [26] Valentin Bazarevsky, Ivan Grishchenko, Karthik        DOI: 10.1016/j.specom.2014.11.001.
               Raveendran, Tyler Zhu, Fan Zhang, and Matthias  [38] Supasorn Suwajanakorn, Steven Seitz, and Ira
               Grundmann.    “BlazePose:  On‑device  Real‑           Kemelmacher. “Synthesizing Obama: learning lip
               time Body Pose tracking”. In: arXiv preprint          sync from audio”. In: ACM Transactions on Graphics
               arXiv:2006.10204 (2020).                              36 (July 2017), pp. 1–13. DOI: 10.1145/3072959.
          [27] Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang.        3073640.
               “Deep high‑resolution representation learning for
                                                               [39] Joon Son Chung, Amir Jamaludin, and Andrew Zis‑
               human pose estimation”. In: Proceedings of the
                                                                     serman. “You said that?” In: British Machine Vision
               IEEE conference on computer vision and pattern
                                                                     Conference. 2017.
               recognition. 2019, pp. 5693–5703.
                                                               [40] Masaki Saito, Eiichi Matsumoto, and Shunta Saito.
          [28] George Papandreou, Tyler Zhu, Liang‑Chieh Chen,
                                                                     “Temporal Generative Adversarial Nets with Singu‑
               Spyros Gidaris, Jonathan Tompson, and Kevin           lar Value Clipping”. In: Oct. 2017. DOI: 10 . 1109 /
               Murphy. “Personlab: Person pose estimation and        ICCV.2017.308.
               instance segmentation with a bottom‑up, part‑
               based, geometric embedding model”. In: Proceed‑  [41] Carl Vondrick, Hamed Pirsiavash, and Antonio Tor‑
               ings of the European Conference on Computer Vision    ralba. “Generating Videos with Scene Dynamics”.
               (ECCV). 2018, pp. 269–286.                            In: (Sept. 2016).
          [29] Peter Eisert and Bernd Girod. “Analyzing facial  [42] Konstantinos Vougioukas, Stavros Petridi, and
               expressions for virtual conferencing”. In: IEEE       Maja Pantic. “End‑to‑End Speech‑Driven Facial An‑
               Computer Graphics and Applications 18.5 (1998),       imation with Temporal GANs”. In: Journal of Foo
               pp. 70–78.                                            14.1 (2004), pp. 234–778.
          [30] Peter Eisert. “MPEG‑4 facial animation in video  [43] O. Wiles, A.S. Koepke, and A. Zisserman. “X2Face:
               analysis and synthesis”. In: International Journal    A network for controlling face generation by using
               of Imaging Systems and Technology 13.5 (2003),        images, audio, and pose codes”. In: European Con‑
               pp. 245–256.                                          ference on Computer Vision. 2018.
          [31] A. Simons and Stephen Cox. “Generation of       [44] Sergey Tulyakov, Ming‑Yu Liu, Xiaodong Yang, and
               mouthshapes for a synthetic talking head”. In:        Jan Kautz. “MoCoGAN: Decomposing motion and
               Proceedings of the Institute of Acoustics, Autumn     content for video generation”. In: IEEE Confer‑
                                                                     ence on Computer Vision and Pattern Recognition
               Meeting (Jan. 1990).
                                                                     (CVPR). 2018, pp. 1526–1535.
          [32] FirstName  Alpher,  FirstName  Fotheringham‑
                                                               [45] R. Yi, Zipeng Ye, J. Zhang, H. Bao, and Yongjin Liu.
               Smythe, and FirstName Gamow. “Can a machine
                                                                     “Audio‑driven Talking Face Video Generation with
               frobnicate?” In: Journal of Foo 14.1 (2004),
                                                                     Learning‑based Personalized Head Pose”. In: arXiv:
               pp. 234–778.
                                                                     Computer Vision and Pattern Recognition (2020).


          44                                 © International Telecommunication Union, 2021
   55   56   57   58   59   60   61   62   63   64   65