Page 61 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 61

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4





          [46] Dipanjan Das, Sandika Biswas, Sanjana Sinha, and  [60] Ting‑Chun Wang, Ming‑Yu Liu, Jun‑Yan Zhu, An‑
               Brojeshwar Bhowmick. “Speech‑Driven Facial An‑        drew Tao, Jan Kautz, and Bryan Catanzaro. “High‑
               imation Using Cascaded GANs for Learning of Mo‑       Resolution Image Synthesis and Semantic Manip‑
               tion and Texture”. In: (Oct. 2019).                   ulation with Conditional GANs”. In: Proceedings of
          [47] Chelsea Finn, P. Abbeel, and Sergey Levine. “Model‑   the IEEE Conference on Computer Vision and Pat‑
               Agnostic Meta‑Learning for Fast Adaptation of         tern Recognition. 2018.
               Deep Networks”. In: ICML. 2017.                 [61] Yanchun Li, Nanfeng Xiao, and Wanli Ouyang. “Im‑
                                                                     proved Generative Adversarial Networks with Re‑
          [48] Lele Chen, Ross Maddox, Zhiyao Duan, and Chen‑
                                                                     construction Loss”. In: Neurocomputing 323 (Oct.
               liang Xu. Hierarchical Cross‑Modal Talking Face
                                                                     2018). DOI: 10.1016/j.neucom.2018.10.014.
               Generationwith Dynamic Pixel‑Wise Loss. May
               2019.                                           [62] Karen Simonyan and Andrew Zisserman. “Very
          [49] Lele Chen, Zhiheng Li, Ross Maddox, Zhiyao Duan,      Deep Convolutional Networks for Large‑Scale Im‑
               and Chenliang Xu. “Lip Movements Generation at a      age Recognition”. In: arXiv 1409.1556 (Sept. 2014).
               Glance”. In: July 2018.                         [63] Wang Mei and Weihong Deng. “Deep Face Recogni‑
          [50] Hang Zhou, Y. Liu, Z. Liu, Ping Luo, and X. Wang.     tion: A Survey”. In: (Apr. 2018).
               “Talking Face Generation by Adversarially Dis‑  [64] Alexandre Alahi Justin Johnson and Li Fei‑Fei. “Per‑
               entangled Audio‑Visual Representation”. In: AAAI.     ceptual Losses for Real‑Time Style Transfer and
               2019.                                                 Super‑Resolution”. In: (2016).
          [51] K Prajwal, Rudrabha Mukhopadhyay, Vinay Nam‑    [65] facial           keypoint          detection.
               boodiri, and C Jawahar. A Lip Sync Expert Is All You  “https://github.com/raymon‑tian/hourglass‑
               Need for Speech to Lip Generation In The Wild. Aug.   facekeypoints‑detection”. In: (2017).
               2020.                                           [66] Gunnar Farnebäck. “Two‑Frame Motion Estima‑
          [52] Hao Zhu, Huaibo Huang, Y. Li, A. Zheng, and R. He.    tion Based on Polynomial Expansion”. In: vol. 2749.
               “Arbitrary Talking Face Generation via Attentional    June 2003, pp. 363–370. DOI: 10.1007/3- 540-
               Audio‑Visual Coherence Learning.” In: arXiv: Com‑     45103-X_50.
               puter Vision and Pattern Recognition (2020).    [67] Saman Zadtootaghaj, Steven Schmidt, and Sebas‑
          [53] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun.     tian Möller. “Modeling gaming QoE: Towards the
               “Deep Residual Learning for Image Recognition”.       impact of frame rate and bit rate on cloud gaming”.
               In: 2016 IEEE Conference on Computer Vision and       In: 2018 Tenth International Conference on Qual‑
               Pattern Recognition (CVPR) (2016), pp. 770–778.       ity of Multimedia Experience (QoMEX). IEEE. 2018,
                                                                     pp. 1–6.
          [54] Alireza M. Javid, Sandipan Das, M. Skoglund, and
               S. Chatterjee. “A ReLU Dense Layer to Improve   [68] FirstName Alpher and FirstName Fotheringham‑
               the Performance of Neural Networks”. In: ICASSP.      Smythe. “Frobnication revisited”. In: Journal of Foo
               2021.                                                 13.1 (2003), pp. 234–778.
          [55] B. Zhou, A. Khosla, Lapedriza. A., A. Oliva, and A.  [69] Najwa Alghamdi, Steve Maddock, Ricard Marxer,
               Torralba. “Learning Deep Features for Discrimina‑     Jon Barker, and Guy J. Brown. A corpus of audio‑
               tive Localization.” In: CVPR (2016).                  visual Lombard speech with frontal and pro ile view,
          [56] Dmitry Nikitko. stylegan‑encoder. https://github.     The Journal of the Acoustical Society of America 143,
               com/Puzer/stylegan‑encoder. 2019.                     EL523 (2018); https://doi.org/10.1121/1.5042758.
                                                                     2018.
          [57] Gary Storey, Ahmed Bouridane, Richard Jiang, and
               Chang‑Tsun Li. “Atypical Facial Landmark Localisa‑  [70] Houwei Cao, David Cooper, Michael Keutmann,
               tion with Stacked Hourglass Networks: A Study on      Ruben Gur, Ani Nenkova, and Ragini Verma.
               3D Facial Modelling for Medical Diagnosis”. In: Jan.  “CREMA‑D: Crowd‑sourced emotional multimodal
               2020, pp. 37–49. ISBN: 978‑3‑030‑32582‑4. DOI:        actors dataset”. In: IEEE transactions on affective
               10.1007/978-3-030-32583-1_3.                          computing 5 (Oct. 2014), pp. 377–390. DOI: 10 .
                                                                     1109/TAFFC.2014.2336244.
          [58] PyWORLD.“https://github.com/JeremyCCHsu/Python‑
               Wrapper‑for‑World‑Vocoder”. In: (2019).         [71] J. S. Chung, A. Nagrani, and A. Zisserman. “Vox‑
                                                                     Celeb2: Deep Speaker Recognition”. In: INTER‑
          [59] Ting‑Chun Wang, Ming‑Yu Liu, Jun‑Yan Zhu, An‑
                                                                     SPEECH. 2018.
               drew Tao, Jan Kautz, and Bryan Catanzaro. “High‑
                                                               [72] PyWorldVocoder.“https://github.com/JeremyCCHsu/
               Resolution Image Synthesis and Semantic Manip‑
                                                                     Python‑Wrapper‑for‑World‑Vocoder”. In: (2017).
               ulation with Conditional GANs”. In: Proceedings of
               the IEEE Conference on Computer Vision and Pat‑
               tern Recognition. 2018.





                                             © International Telecommunication Union, 2021                    45
   56   57   58   59   60   61   62   63   64   65   66