Page 83 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 83
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
[38] M. Kamp, L. Adilova, J. Sicking, F. Hüger, [50] L. Liu, J. Zhang, S. Song, and K. B. Letaief. Edge-
P. Schlicht, T. Wirtz, and S. Wrobel. Efficient de- assisted hierarchical federated learning with non-iid
centralized deep learning by dynamic model aver- data. arXiv preprint arXiv:1905.06641, 2019.
aging. In Joint European Conference on Machine
Learning and Knowledge Discovery in Databases, [51] D. Marpe and T. Wiegand. A highly efficient
pages 393–409. Springer, 2018. multiplication-free binary arithmetic coder and its
application in video coding. In Proceedings 2003 In-
[39] S. P. Karimireddy, Q. Rebjock, S. U. Stich, and ternational Conference on Image Processing (Cat.
M. Jaggi. Error feedback fixes signsgd and other No. 03CH37429), volume 2, pages II–263. IEEE,
gradient compression schemes. arXiv preprint 2003.
arXiv:1901.09847, 2019.
[52] H. B. McMahan, E. Moore, D. Ramage, S. Hamp-
[40] E. D. Karnin. A simple procedure for pruning back- son, et al. Communication-efficient learning of deep
propagation trained neural networks. IEEE trans- networks from decentralized data. arXiv preprint
actions on neural networks, 1(2):239–242, 1990. arXiv:1602.05629, 2016.
[41] A. Karpathy and L. Fei-Fei. Deep visual-semantic [53] D. Molchanov, A. Ashukha, and D. Vetrov. Varia-
alignments for generating image descriptions. In tional dropout sparsifies deep neural networks. In
Proceedings of the IEEE conference on computer Proceedings of the 34th International Conference
vision and pattern recognition, pages 3128–3137, on Machine Learning-Volume 70, pages 2498–2507.
2015. JMLR. org, 2017.
[42] Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. [54] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild:
Character-aware neural language models. In AAAI, A lock-free approach to parallelizing stochastic gra-
pages 2741–2749, 2016. dient descent. In Advances in neural information
processing systems, pages 693–701, 2011.
[43] A. Koloskova, T. Lin, S. U. Stich, and
M. Jaggi. Decentralized deep learning with arbi- [55] A. Reisizadeh, H. Taheri, A. Mokhtari, H. Hassani,
trary communication compression. arXiv preprint and R. Pedarsani. Robust and communication-
arXiv:1907.09356, 2019. efficient collaborative learning. In Advances in Neu-
ral Information Processing Systems, pages 8386–
[44] A. Koloskova, S. U. Stich, and M. Jaggi. Decentral- 8397, 2019.
ized stochastic optimization and gossip algorithms
with compressed communication. arXiv preprint [56] A. K. Sahu, T. Li, M. Sanjabi, M. Zaheer, A. Tal-
arXiv:1902.00340, 2019. walkar, and V. Smith. On the convergence of feder-
ated optimization in heterogeneous networks. arXiv
[45] J. Konečnỳ, H. B. McMahan, F. X. Yu, preprint arXiv:1812.06127, 2018.
P. Richtárik, A. T. Suresh, and D. Bacon. Feder-
ated learning: Strategies for improving communi- [57] F. Sattler, K.-R. Müller, and W. Samek. Clustered
cation efficiency. arXiv preprint arXiv:1610.05492, federated learning: Model-agnostic distributed
2016. multi-task optimization under privacy constraints.
arXiv preprint arXiv:1910.01991, 2019.
[46] A. Lalitha, O. C. Kilinc, T. Javidi, and F. Koushan-
far. Peer-to-peer federated learning on graphs. [58] F. Sattler, S. Wiedemann, K.-R. Müller, and
arXiv preprint arXiv:1901.11173, 2019. W. Samek. Sparse binary compression: Towards
distributed deep learning with minimal communi-
[47] T. Li, Z. Liu, V. Sekar, and V. Smith. Privacy cation. In 2019 International Joint Conference on
for free: Communication-efficient learning with dif- Neural Networks (IJCNN), pages 1–8. IEEE, 2019.
ferential privacy using sketches. arXiv preprint
arXiv:1911.00972, 2019. [59] F. Sattler, S. Wiedemann, K. Müller, and
W. Samek. Robust and communication-efficient
[48] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith. federated learning from non-i.i.d. data. IEEE
Federated learning: Challenges, methods, and fu- Transactions on Neural Networks and Learning
ture directions. arXiv preprint arXiv:1908.07873, Systems, pages 1–14, 2019.
2019.
[60] M. Shoeybi, M. Patwary, R. Puri, P. LeGres-
[49] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally. ley, J. Casper, and B. Catanzaro. Megatron-lm:
Deep gradient compression: Reducing the commu- Training multi-billion parameter language mod-
nication bandwidth for distributed training. arXiv els using gpu model parallelism. arXiv preprint
preprint arXiv:1712.01887, 2017. arXiv:1909.08053, 2019.
© International Telecommunication Union, 2020 61