Page 88 - Kaleidoscope Academic Conference Proceedings 2020

P. 88

2020 ITU Kaleidoscope Academic Conference

(e.g., CPU cycles), resource utilization (i.e., amount of Resource adjustment
allocated resource utilized or kept busy), and performance
latency (i.e., amount of time taken by the VNF to process a
service request and provide the response). The data is
collected at the highest frequency supported by the system, RC Query (t 1 ) Front- Records
i.e. without hampering the system performance. Data is end DB
cleaned and processed to extract the most relevant features
and their values (e.g., mean, median, maximum or minimum
values) according to the correlation coefficient of the target Resource monitoring tool
variables. The data set is split into the training and test data EU
sets. Response (t 2 )

The offline training procedure is shown in Figure 3. To train VM Docker container Server daemon
the model, we fit regression model(s) with the training data
by tuning hyper-parameters. During the training execution, Figure 5 - Experimental setup for performance evaluation
each model is ranked based on their prediction errors and
training times. This training procedure is repeated until all
test data sets have been used. After that the best model that observation data are added to the training data set, the models
gives the least prediction error is selected. are retrained online by using a copy of the most recent
training data set. The retrained models replace the old ones
3.3. Online retraining at the beginning of the next time slot. If workload variation
is significant, a small value of Z is preferred, and vice versa.
With the adoption of online retraining of the model, we can This approach of model retraining at regular intervals is
dynamically update the model to make it capable of dealing useful for tackling with the unknown nature of future
with changing workload patterns and system status. The workload variations. Note that multiple trained models can
flowchart of dynamic resource adjustment with online be used to predict the proper values of resource allocation.
retraining of regression models is shown in Figure 4. At the We can select only one model or create an ensemble of
interval of every second, the current workload is observed various models for improving the prediction accuracy.
from the system and fed to the best-trained model that
determines the amount of required resources to maintain the 4. PERFORMANCE EVALUATION RESULTS
target values of latency and utilization metrics. Accordingly,
the decision of either to increase or decrease the amount of In this section, we describe the parameter settings of the IoT-
allocated resource is executed by using control commands of DS experimental system, its setup, input workload patterns,
the container platform. In parallel, we append the currently evaluation criteria, and evaluation results.
observed workloads, performance metrics, resource
allocation and utilization data to the training data set (as the 4.1. Experimental setup and considerations
latest entry). Since a fixed training data set size ensures the
deterministic value of model training time with hyper- As shown in Figure 5 the IoT-DS test bed system consisted
parameter tuning, when we add a new entry to the training of three virtual machines (VMs), of which two VMs served
data set, we drop the oldest entry to keep the training data set as a resource controller (RC) and an EU sending lookup
size unchanged. When some fixed number (say Z) of queries for records, respectively, and the third VM contained
the Docker containerized IoT-DS module with 100K records
Start in its registry database (DB). The VMs were created by using
the VirtualBox software on a Windows 8.1 PC with Intel
Current workload Core i7-5930K 12 cores CPU (3.5 GHz) and a 64 GB
Append to dataset memory. The front end of IoT-DS received and processed
Best trained model(s) lookup queries from the EU and sent back the requested IoT
device records in response messages after the queries were
No Z
Select single or observations processed from the back-end database. We exclusively
combined prediction added? allocated a CPU core to the Docker container that
Yes implemented the front end. This CPU was at target to
Execution Copy w% of most recent
observations monitor the computational resource (i.e., number of CPU
cycles or time) allocated to the container and the
Model evaluation
Re-train model(s) performance in terms of record lookup response time
(measured as the difference between the time instance (i.e.,
Yes Workload Replace old model(s) t1) a lookup query is issued from the EU and the time instance
available?
(i.e., t2) a corresponding response is received. We applied
Dynamic resource No Online re-training of regression models to predict the number of CPU cycles to be
adjustment decision Stop regression models allocated to the container based on different patterns of input
workloads and current resource utilizations. We aimed at
Figure 4 - Flowchart of model deployment and online retraining

– 30 –

83 84 85 86 87 88 89 90 91 92 93