Page 60 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks

P. 60

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4

[22] Valentin Bazarevsky, Yury Kartynnik, Andrey [33] Andreea Stef, Kaveen Perera, Hubert Shum, and
Vakunov, Karthik Raveendran, and Matthias Edmond Ho. “Synthesizing Expressive Facial and
Grundmann. “Blazeface: Sub‑millisecond neural Speech Animation by Text‑to‑IPA Translation with
face detection on mobile gpus”. In: arXiv preprint Emotion Control”. In: Dec. 2018, pp. 1–8. DOI: 10.
arXiv:1907.05047 (2019). 1109/SKIMA.2018.8631536.
[23] Yury Kartynnik, Artsiom Ablavatski, Ivan Gr‑ [34] Tero Karras, Timo Aila, Samuli Laine, Antti Herva,
ishchenko, and Matthias Grundmann. “Real‑time and Jaakko Lehtinen. “Audio‑driven facial anima‑
Facial Surface Geometry from Monocular Video on tion by joint end‑to‑end learning of pose and emo‑
Mobile GPUs”. In: arXiv preprint arXiv:1907.06724 tion”. In: ACM Transactions on Graphics 36 (July
(2019). 2017), pp. 1–12. DOI: 10.1145/3072959.3073658.
[24] AdrianBulatandGeorgiosTzimiropoulos.“Howfar [35] Sarah Taylor, Moshe Mahler, Barry‑John Theobald,
are we from solving the 2d & 3d face alignment and Iain Matthews. “Dynamic units of visual
problem?(and a dataset of 230,000 3d facial land‑ speech”. In: July 2012, pp. 275–284.
marks)”. In: Proceedings of the IEEE International [36] Pif Edwards, Chris Landreth, Eugene Fiume, and
Conference on Computer Vision. 2017, pp. 1021– Karan Singh. “JALI: an animator‑centric viseme
1030. model for expressive lip synchronization”. In: ACM
[25] Ivan Grishchenko, Artsiom Ablavatski, Yury Transactions on Graphics 35 (July 2016), pp. 1–11.
Kartynnik, Karthik Raveendran, and Matthias DOI: 10.1145/2897824.2925984.
Grundmann. “Attention Mesh: High‑ idelity Face [37] Wesley Mattheyses and Werner Verhelst. “Audiovi‑
Mesh Prediction in Real‑time”. In: arXiv preprint sual speech synthesis: An overview of the state‑of‑
arXiv:2006.10962 (2020). the‑art”. In: Speech Communication 66 (Nov. 2014).
[26] Valentin Bazarevsky, Ivan Grishchenko, Karthik DOI: 10.1016/j.specom.2014.11.001.
Raveendran, Tyler Zhu, Fan Zhang, and Matthias [38] Supasorn Suwajanakorn, Steven Seitz, and Ira
Grundmann. “BlazePose: On‑device Real‑ Kemelmacher. “Synthesizing Obama: learning lip
time Body Pose tracking”. In: arXiv preprint sync from audio”. In: ACM Transactions on Graphics
arXiv:2006.10204 (2020). 36 (July 2017), pp. 1–13. DOI: 10.1145/3072959.
[27] Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 3073640.
“Deep high‑resolution representation learning for
[39] Joon Son Chung, Amir Jamaludin, and Andrew Zis‑
human pose estimation”. In: Proceedings of the
serman. “You said that?” In: British Machine Vision
IEEE conference on computer vision and pattern
Conference. 2017.
recognition. 2019, pp. 5693–5703.
[40] Masaki Saito, Eiichi Matsumoto, and Shunta Saito.
[28] George Papandreou, Tyler Zhu, Liang‑Chieh Chen,
“Temporal Generative Adversarial Nets with Singu‑
Spyros Gidaris, Jonathan Tompson, and Kevin lar Value Clipping”. In: Oct. 2017. DOI: 10 . 1109 /
Murphy. “Personlab: Person pose estimation and ICCV.2017.308.
instance segmentation with a bottom‑up, part‑
based, geometric embedding model”. In: Proceed‑ [41] Carl Vondrick, Hamed Pirsiavash, and Antonio Tor‑
ings of the European Conference on Computer Vision ralba. “Generating Videos with Scene Dynamics”.
(ECCV). 2018, pp. 269–286. In: (Sept. 2016).
[29] Peter Eisert and Bernd Girod. “Analyzing facial [42] Konstantinos Vougioukas, Stavros Petridi, and
expressions for virtual conferencing”. In: IEEE Maja Pantic. “End‑to‑End Speech‑Driven Facial An‑
Computer Graphics and Applications 18.5 (1998), imation with Temporal GANs”. In: Journal of Foo
pp. 70–78. 14.1 (2004), pp. 234–778.
[30] Peter Eisert. “MPEG‑4 facial animation in video [43] O. Wiles, A.S. Koepke, and A. Zisserman. “X2Face:
analysis and synthesis”. In: International Journal A network for controlling face generation by using
of Imaging Systems and Technology 13.5 (2003), images, audio, and pose codes”. In: European Con‑
pp. 245–256. ference on Computer Vision. 2018.
[31] A. Simons and Stephen Cox. “Generation of [44] Sergey Tulyakov, Ming‑Yu Liu, Xiaodong Yang, and
mouthshapes for a synthetic talking head”. In: Jan Kautz. “MoCoGAN: Decomposing motion and
Proceedings of the Institute of Acoustics, Autumn content for video generation”. In: IEEE Confer‑
ence on Computer Vision and Pattern Recognition
Meeting (Jan. 1990).
(CVPR). 2018, pp. 1526–1535.
[32] FirstName Alpher, FirstName Fotheringham‑
[45] R. Yi, Zipeng Ye, J. Zhang, H. Bao, and Yongjin Liu.
Smythe, and FirstName Gamow. “Can a machine
“Audio‑driven Talking Face Video Generation with
frobnicate?” In: Journal of Foo 14.1 (2004),
Learning‑based Personalized Head Pose”. In: arXiv:
pp. 234–778.
Computer Vision and Pattern Recognition (2020).

44 © International Telecommunication Union, 2021

55 56 57 58 59 60 61 62 63 64 65