Summary

The deep neural network (DNN) model inference process usually requires a large amount of computing resources and memory. Therefore, it is difficult for end devices to perform DNN models independently. An effective way to implement end-edge collaborative DNN execution is through DNN model partition, which can reduce latency and improve resource utilization at the same time. Recommendation ITU-T F.748.20 aims to specify the technical framework of DNN model partition and collaborative execution. First, it is necessary to predict the overall inference latency under the current system state according to different DNN partition strategies in advance. Then, it is necessary to choose the appropriate partition locations and collaborative execution strategy based on the equipment computation capabilities, network status and DNN model properties. Finally, the model collaborative execution is implemented and the resource allocation optimized in the meantime.