Page 206 - Kaleidoscope Academic Conference Proceedings 2020
P. 206

2020 ITU Kaleidoscope Academic Conference




           ML-based:  We  refer  to  [15]  to  implement  our  ML-based   As  for  other  models,  using  the  same  manner  of  data
           methods using the decision tree classifier (we named it ML-  preprocessing, the CNN classification results are pretty close
           1). However, it only contains basic flow-statistical features   to  what  is  provided  by  [7].  The  CNN  methods  obviously
           that  we  consider  as  not  the  most  optimized  ML-based   obtain higher precision and recall than the ML-1 which are
           method. Thus, based  on  ML-1,  we  further  add  some time   implemented based on [15]. However, the ML-1 can still be
           series  features  as  the  source  ports,  destination  ports,   improved.  When  the  time  series  features  are  added,  the
           directions,  packet  lengths and arrival time intervals  of the   precision of the ML-2 classification can exceed 89% which
           first 10 packets in a flow to generate the ML-2 model.   is much better than what the basic CNN methods get. In other
                                                              words,  the  basic  CNN  methods  actually  have  no absolute
           CNN: The two types of CNN models that are provided by   advantage while classifying the ISCX data set.
           [7], the 1D-CNN and the 2D-CNN. They both use the first
           784 bytes of a traffic flow to perform the classification.   HAST-I achieves better results than typical CNN models yet
                                                              HAST-II with an LSTM works relatively worse. In fact, we
           HAST:  The  two  HAST  models  proposed  by  [10]  are  the   think using the first few bytes of a flow to perform a direct
           state-of-art  end-to-end  methods  for  intrusion  detection.   deep  learning  (like  HAST-I  and  CNN-1D)  is  considered
           HAST-I  uses  the  first  784  bytes  of  a  flow  for  direct   better than merging the packet-level encoded vectors, since
           representation  learning.  HAST-II,  however,  only  performs   the  representation  learning  can  directly  capture  flow-level
           packet-level  encoding.  It  further  introduces  an  LSTM  to   information. However, the encoding costs on a long string
           merge the encoded packets.                         are not affordable for complex dynamic word embedding. At
                                                              the  current  stage,  the  “packet-level  encoding  +  flow-level
           During the evaluation, we randomly chose 90% of samples   merging” is the best option for our PERT classification.
           from the data set as the the training set, and the remaining
           10% for  validation.  Then, three  widely  used classification   Table 4 – Classification results (Android data set)
           metrics are applied:
                                                                  Model      Precision    Recall       F1
                                                  
                                                            (    ) =  ML-1 [15]   /     /          /
                                                +              ML-2         0.7351      0.7335     0.7321
                                                  (    ) =               (9)
                                               +               CNN-1D [7]   0.7709      0.7683     0.7668
                                         2×    ×    
                           1 −                     (    1) =   CNN-2D [7]   0.7684      0.7659     0.7643
                                              +    
                                                               HAST-I [10]   0.8201     0.8185     0.8167
           Take  a  class  yi  as  an  example,  the  TPi  is  the  amount  of   HAST-II [10]   0.7924   0.7813   0.7826
           samples  correctly  classified  as  yi,  FP i  is  the  amount  of   PERT(Ours)   0.9042   0.9003   0.9007
           samples  mistakenly  classified  as  yi,  FNi  is  the  amount  of
           samples  mistakenly classified  as  nor-yi.  As  for  the  overall   Results on the Android data set: Experiments based on full
           evaluation for all classes, we use the average values of those   HTTPS  traffic  to  evaluate  the  actual  encrypted  traffic
           metrics.                                           classification ability of each method. As all the data here are
                                                              HTTPS flows, in comparison with the ISCX data set whose
           4.2    Overall Analysis                            data  covers  several traffic  protocols,  it  is  harder  to  locate
                                                              distinctly  different  flow  behaviors  among  the  chosen
                Table 3 – Classification results (ISCX data set)   applications.  Consequently,  the  ML-based  methods  that
                                                              strongly  rely  on  flow  statistics  features  work  extremely
               Model      Precision    Recall       F1        weakly.  Even  when  enhanced  by  time  series  features,  the
            ML-1 [15]    0.8194      0.8136      0.8164       ML-2 still obtains a worse result than basic DL methods. As
            ML-2         0.8901      0.8896      0.8898       for the original ML-1, we find it is entirely not capable of
            CNN-1D [7]   0.8616      0.8605      0.8610       addressing  this  100-class  HTTPS  classification  that  we
            CNN-2D [7]   0.8425      0.8420      0.8422       ignore its result in Table 4.
            HAST-I [10]   0.8757     0.8729      0.8742       Results  on  the  Android  data  set  demonstrate  that  the  DL-
            HAST-II [10]   0.8502    0.8427      0.8409       based  methods  are  more  suitable  for  processing  full
            PERT(Ours)   0.9327      0.9322      0.9323       encrypted  traffic  data.  More  importantly,  our  PERT
                                                              classification again shows its superiority as it introduces a
           Results on the ISCX data set: This group of experiments   more powerful representation learning strategy. Its F1-score
           are used to discuss the classification based on the consistent   on the 100-class encrypted traffic classification exceeds 90%
           data settings of [7]. As we can see in Table 3, our flow-level   whereas the HAST can only achieves a result of 81.67%.
           PERT classification achieves the best classification results
           where the precision reaches 93.27% and the recall reaches   4.3   Discussion: Selection of the Packet Number
           93.22%.  It  proves  the  PERT  is  a  power  representation
           learning method for encrypted traffic classification.   In a flow-level classification model, the increase in the use
                                                              of packets will cause significant costs. This is particularly





                                                          – 148 –
   201   202   203   204   205   206   207   208   209   210   211