Page 207 - Kaleidoscope Academic Conference Proceedings 2020

P. 207

Industry-driven digital transformation

true when the representation learning is applied to traffic
packets.

Figure 5 – Illustration of the packet number selection for
PERT classification

We performed PERT classifications multiple times on the
two data sets with different settings of the “packet_num” and
the results are shown in Figure 5. As we can see, at the
beginning, the classification result on each data set is greatly
improved with more packets used. However its increase is
slight after the continuous adding of packets. For example,
the F1-score is shown to reach 91.35% while classifying the
Android data set with 20 packets. But this result is merely
boosted by 1.28% in comparison with using 5 packets. It is
not recommended considering the costs of PERT encoding
for so many packets and such minor further improvements.

We point out that using 5-10 packets for our PERT
classification will be sufficient. Similar conclusions can be
also found in other flow-level classification research like
[9],[10].

4.4 Discussion: Merging of the Encoded Packets
Figure 6 – F1-score converging speed comparison
A major difference between our PERT classification and
most flow-level DL-based methods like the HAST-II is how We perform validation every training epoch for each
the encoded packets are merged. HAST-II constructs a 2- classification experiment and record corresponding F1-
layer LSTM after encoding the packet data whereas we scores for evaluation. As illustrated in Figure 6, we cannot
simply apply a concatenation. To make a comparison actually tell which merging approach is better for
between these two approaches, we modify our PERT model classification accuracy. Whether using the concatenation or
and the HAST-II model. the LSTM approach for merging, it does not have a major
influence in the final classification results.
Firstly, we refer to HAST-II and construct the PERT_lstm
model by using a 2-layer LSTM to follow our PERT encoded However, using different merging approaches will have an
packets. Then, we remove the LSTM layer from HAST-II obvious impact on the converging speed of the classification
and further generate the HAST_con by concatenating the training. In Figure 6, it always takes less training rounds
HAST encoded packets to fit an ordinary softmax classifier, before the model converges while introducing the
just as our original PERT model does. For all the compared concatenation merging. We believe the LSTM is not a
methods, we consistently select 5 packets for classification satisfactory option for merging the encoded packets as
based on our former discussion. applying a simple concatenation can reach a very close
classification result, yet it is much faster.

– 149 –

202 203 204 205 206 207 208 209 210 211 212