Page 61 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 61
ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4
[46] Dipanjan Das, Sandika Biswas, Sanjana Sinha, and [60] Ting‑Chun Wang, Ming‑Yu Liu, Jun‑Yan Zhu, An‑
Brojeshwar Bhowmick. “Speech‑Driven Facial An‑ drew Tao, Jan Kautz, and Bryan Catanzaro. “High‑
imation Using Cascaded GANs for Learning of Mo‑ Resolution Image Synthesis and Semantic Manip‑
tion and Texture”. In: (Oct. 2019). ulation with Conditional GANs”. In: Proceedings of
[47] Chelsea Finn, P. Abbeel, and Sergey Levine. “Model‑ the IEEE Conference on Computer Vision and Pat‑
Agnostic Meta‑Learning for Fast Adaptation of tern Recognition. 2018.
Deep Networks”. In: ICML. 2017. [61] Yanchun Li, Nanfeng Xiao, and Wanli Ouyang. “Im‑
proved Generative Adversarial Networks with Re‑
[48] Lele Chen, Ross Maddox, Zhiyao Duan, and Chen‑
construction Loss”. In: Neurocomputing 323 (Oct.
liang Xu. Hierarchical Cross‑Modal Talking Face
2018). DOI: 10.1016/j.neucom.2018.10.014.
Generationwith Dynamic Pixel‑Wise Loss. May
2019. [62] Karen Simonyan and Andrew Zisserman. “Very
[49] Lele Chen, Zhiheng Li, Ross Maddox, Zhiyao Duan, Deep Convolutional Networks for Large‑Scale Im‑
and Chenliang Xu. “Lip Movements Generation at a age Recognition”. In: arXiv 1409.1556 (Sept. 2014).
Glance”. In: July 2018. [63] Wang Mei and Weihong Deng. “Deep Face Recogni‑
[50] Hang Zhou, Y. Liu, Z. Liu, Ping Luo, and X. Wang. tion: A Survey”. In: (Apr. 2018).
“Talking Face Generation by Adversarially Dis‑ [64] Alexandre Alahi Justin Johnson and Li Fei‑Fei. “Per‑
entangled Audio‑Visual Representation”. In: AAAI. ceptual Losses for Real‑Time Style Transfer and
2019. Super‑Resolution”. In: (2016).
[51] K Prajwal, Rudrabha Mukhopadhyay, Vinay Nam‑ [65] facial keypoint detection.
boodiri, and C Jawahar. A Lip Sync Expert Is All You “https://github.com/raymon‑tian/hourglass‑
Need for Speech to Lip Generation In The Wild. Aug. facekeypoints‑detection”. In: (2017).
2020. [66] Gunnar Farnebäck. “Two‑Frame Motion Estima‑
[52] Hao Zhu, Huaibo Huang, Y. Li, A. Zheng, and R. He. tion Based on Polynomial Expansion”. In: vol. 2749.
“Arbitrary Talking Face Generation via Attentional June 2003, pp. 363–370. DOI: 10.1007/3- 540-
Audio‑Visual Coherence Learning.” In: arXiv: Com‑ 45103-X_50.
puter Vision and Pattern Recognition (2020). [67] Saman Zadtootaghaj, Steven Schmidt, and Sebas‑
[53] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. tian Möller. “Modeling gaming QoE: Towards the
“Deep Residual Learning for Image Recognition”. impact of frame rate and bit rate on cloud gaming”.
In: 2016 IEEE Conference on Computer Vision and In: 2018 Tenth International Conference on Qual‑
Pattern Recognition (CVPR) (2016), pp. 770–778. ity of Multimedia Experience (QoMEX). IEEE. 2018,
pp. 1–6.
[54] Alireza M. Javid, Sandipan Das, M. Skoglund, and
S. Chatterjee. “A ReLU Dense Layer to Improve [68] FirstName Alpher and FirstName Fotheringham‑
the Performance of Neural Networks”. In: ICASSP. Smythe. “Frobnication revisited”. In: Journal of Foo
2021. 13.1 (2003), pp. 234–778.
[55] B. Zhou, A. Khosla, Lapedriza. A., A. Oliva, and A. [69] Najwa Alghamdi, Steve Maddock, Ricard Marxer,
Torralba. “Learning Deep Features for Discrimina‑ Jon Barker, and Guy J. Brown. A corpus of audio‑
tive Localization.” In: CVPR (2016). visual Lombard speech with frontal and pro ile view,
[56] Dmitry Nikitko. stylegan‑encoder. https://github. The Journal of the Acoustical Society of America 143,
com/Puzer/stylegan‑encoder. 2019. EL523 (2018); https://doi.org/10.1121/1.5042758.
2018.
[57] Gary Storey, Ahmed Bouridane, Richard Jiang, and
Chang‑Tsun Li. “Atypical Facial Landmark Localisa‑ [70] Houwei Cao, David Cooper, Michael Keutmann,
tion with Stacked Hourglass Networks: A Study on Ruben Gur, Ani Nenkova, and Ragini Verma.
3D Facial Modelling for Medical Diagnosis”. In: Jan. “CREMA‑D: Crowd‑sourced emotional multimodal
2020, pp. 37–49. ISBN: 978‑3‑030‑32582‑4. DOI: actors dataset”. In: IEEE transactions on affective
10.1007/978-3-030-32583-1_3. computing 5 (Oct. 2014), pp. 377–390. DOI: 10 .
1109/TAFFC.2014.2336244.
[58] PyWORLD.“https://github.com/JeremyCCHsu/Python‑
Wrapper‑for‑World‑Vocoder”. In: (2019). [71] J. S. Chung, A. Nagrani, and A. Zisserman. “Vox‑
Celeb2: Deep Speaker Recognition”. In: INTER‑
[59] Ting‑Chun Wang, Ming‑Yu Liu, Jun‑Yan Zhu, An‑
SPEECH. 2018.
drew Tao, Jan Kautz, and Bryan Catanzaro. “High‑
[72] PyWorldVocoder.“https://github.com/JeremyCCHsu/
Resolution Image Synthesis and Semantic Manip‑
Python‑Wrapper‑for‑World‑Vocoder”. In: (2017).
ulation with Conditional GANs”. In: Proceedings of
the IEEE Conference on Computer Vision and Pat‑
tern Recognition. 2018.
© International Telecommunication Union, 2021 45