Page 206 - Kaleidoscope Academic Conference Proceedings 2020

P. 206

2020 ITU Kaleidoscope Academic Conference

ML-based: We refer to [15] to implement our ML-based As for other models, using the same manner of data
methods using the decision tree classifier (we named it ML- preprocessing, the CNN classification results are pretty close
1). However, it only contains basic flow-statistical features to what is provided by [7]. The CNN methods obviously
that we consider as not the most optimized ML-based obtain higher precision and recall than the ML-1 which are
method. Thus, based on ML-1, we further add some time implemented based on [15]. However, the ML-1 can still be
series features as the source ports, destination ports, improved. When the time series features are added, the
directions, packet lengths and arrival time intervals of the precision of the ML-2 classification can exceed 89% which
first 10 packets in a flow to generate the ML-2 model. is much better than what the basic CNN methods get. In other
words, the basic CNN methods actually have no absolute
CNN: The two types of CNN models that are provided by advantage while classifying the ISCX data set.
[7], the 1D-CNN and the 2D-CNN. They both use the first
784 bytes of a traffic flow to perform the classification. HAST-I achieves better results than typical CNN models yet
HAST-II with an LSTM works relatively worse. In fact, we
HAST: The two HAST models proposed by [10] are the think using the first few bytes of a flow to perform a direct
state-of-art end-to-end methods for intrusion detection. deep learning (like HAST-I and CNN-1D) is considered
HAST-I uses the first 784 bytes of a flow for direct better than merging the packet-level encoded vectors, since
representation learning. HAST-II, however, only performs the representation learning can directly capture flow-level
packet-level encoding. It further introduces an LSTM to information. However, the encoding costs on a long string
merge the encoded packets. are not affordable for complex dynamic word embedding. At
the current stage, the “packet-level encoding + flow-level
During the evaluation, we randomly chose 90% of samples merging” is the best option for our PERT classification.
from the data set as the the training set, and the remaining
10% for validation. Then, three widely used classification Table 4 – Classification results (Android data set)
metrics are applied:
Model Precision Recall F1

( ) = ML-1 [15] / / /
+ ML-2 0.7351 0.7335 0.7321
( ) = (9)
+ CNN-1D [7] 0.7709 0.7683 0.7668
2× ×
1 − ( 1) = CNN-2D [7] 0.7684 0.7659 0.7643
+
HAST-I [10] 0.8201 0.8185 0.8167
Take a class yi as an example, the TPi is the amount of HAST-II [10] 0.7924 0.7813 0.7826
samples correctly classified as yi, FP i is the amount of PERT(Ours) 0.9042 0.9003 0.9007
samples mistakenly classified as yi, FNi is the amount of
samples mistakenly classified as nor-yi. As for the overall Results on the Android data set: Experiments based on full
evaluation for all classes, we use the average values of those HTTPS traffic to evaluate the actual encrypted traffic
metrics. classification ability of each method. As all the data here are
HTTPS flows, in comparison with the ISCX data set whose
4.2 Overall Analysis data covers several traffic protocols, it is harder to locate
distinctly different flow behaviors among the chosen
Table 3 – Classification results (ISCX data set) applications. Consequently, the ML-based methods that
strongly rely on flow statistics features work extremely
Model Precision Recall F1 weakly. Even when enhanced by time series features, the
ML-1 [15] 0.8194 0.8136 0.8164 ML-2 still obtains a worse result than basic DL methods. As
ML-2 0.8901 0.8896 0.8898 for the original ML-1, we find it is entirely not capable of
CNN-1D [7] 0.8616 0.8605 0.8610 addressing this 100-class HTTPS classification that we
CNN-2D [7] 0.8425 0.8420 0.8422 ignore its result in Table 4.
HAST-I [10] 0.8757 0.8729 0.8742 Results on the Android data set demonstrate that the DL-
HAST-II [10] 0.8502 0.8427 0.8409 based methods are more suitable for processing full
PERT(Ours) 0.9327 0.9322 0.9323 encrypted traffic data. More importantly, our PERT
classification again shows its superiority as it introduces a
Results on the ISCX data set: This group of experiments more powerful representation learning strategy. Its F1-score
are used to discuss the classification based on the consistent on the 100-class encrypted traffic classification exceeds 90%
data settings of [7]. As we can see in Table 3, our flow-level whereas the HAST can only achieves a result of 81.67%.
PERT classification achieves the best classification results
where the precision reaches 93.27% and the recall reaches 4.3 Discussion: Selection of the Packet Number
93.22%. It proves the PERT is a power representation
learning method for encrypted traffic classification. In a flow-level classification model, the increase in the use
of packets will cause significant costs. This is particularly

– 148 –

201 202 203 204 205 206 207 208 209 210 211