Page 108 - Proceedings of the 2017 ITU Kaleidoscope
P. 108

2017 ITU Kaleidoscope Academic Conference




          {0, 1}, depending on the outcome of the similarity measure.  in Table 2 so that distance apart can be derived. In computing
          That is S f : C×C → {0, 1} ⊂ N and is based on correlation  the distance measure to capture the information for the preva-
          with other objects or features:                    lence threshold, our approach adopts key principles of basic
                                                             geometry and extends them to the current research in achiev-
                              (
                               1 if P f (C i ) = P f (C j )  ing the 2-D components for the day and time attributes. The
                  S f (C i , C j ) =                         location (loc) attribute typically has the longitude (long) and
                               0 otherwise
                                                             latitude (lat) as its (2-D) components (X, Y ), while that of
          Definition 2. (Coherence) The coherence of a set of crime  day and time is computed using the standard geometry con-
          C i , C j is defined as the sum of their pairwise similarities  cept.
                                      F
                                      X                       Table 2. A depiction of the 2-D components for determining
                   Coherence(C i , C j ) =  S f (C i , C j ).  (1)
                                                                            prevalence characteristics
                                      f=1
                                                                         Geo  Loc      Day        Time
          Definition 3. (Significance Threshold (S)) The significance  C 1  (long,  lat)  (x, y)     (x, y)
          threshold S for a set of crimes, C, in the feature space is  (long,  lat)   (x, y)      (x, y)
                                                                   C 2
                                                                    .         .         .           .
          defined as the coherence threshold for two crime objects C i  .      .         .           .
          and C j to be considered similar. That is if the two crimes  .      .         .           .
          exhibit sufficient related attributes in common, then we define
          crime similarity (S) as follows:
                                                             More formally, P is set to the 3rd quartile among the set of
                           (
                            1 if Coherence(C i , C j ) ≥ S   values computed in the following manner: Consider a crime
                S(C i , C j ) =                              object C i ∈ C, we form the 6−component vector A us-
                                                                                                          i
                            0 otherwise
                                                             ing the 2-D co-ordinates of P day (C i ), P loc (C i ), P time (C i ).
                                                             Thus,
          The crime similarity for any non-null crime object refer-
                                                                       i
          ence(s) has the following properties:                       A = (P day (C i ), P loc (C i ), P time (C i )).
             1. S(C i , C i ) = true (i.e. 1); [reflexive].   If Coherence(C i , C j ) exceeds S (the significance thresh-
                                                             old), we compute the 6D Euclidean distance d ij between A i
             2. S(C i , C j ) = true ⇐⇒ S(C j , C i ) = true ; [symme-  j
                                                             and A . If the distance is within range, that is not greater than
               try].
                                                             the threshold P, then (C i , C j ) are connected in the similarity
                                                                                                0
                                                             graph. P is set to the 3rd quartile of the d ij s. on the advice
             3. S(C i , C j ) ≥ 0; [non-negativity].
                                                             of crime experts.
             4. S(C i , C j ) = 0, ⇐⇒ C i and C j are independent
                                                             Definition 4. (Similarity Graph) A similarity graph is an
               [well-defined].
                                                             undirected graph G = (V, E), where V depicts the set of ver-
             5. (S(C i , C j ) = 0) || (S(C i , C j ) = 1); [consistency].  tices, E depicts the set of edges, E = {{v i v j } : Λ(v i , v j ) ≥
                                                             S, (v i , v j ) ` P, v i , v j ∈ V, v i 6= v j }.
          Our threshold is computed based on a sound mathematical
          principle and crime expert recommendations. The signifi-                    F
                                                                                     X
          cance and prevalence thresholds measure the interest similar-    Λ(v i , v j ) =  S f (v i , v j ).  (2)
          ity support, and helps to conceptualise the underlying graph-
                                                                                     f=1
          ical structure, and ensures that a link ensues between two
          crimes if and only if the support of the similarity attributes
          is greater than or equal to parameters S(:= 5) and P. While
          the parameter S come from crime intelligence experts as was  Crime 9  Crime 7
          also done in previous research [5], the coefficient P is a pa-                 Crime 3   Crime 1
          rameter we learn from the data. The prevalence threshold
          considers attributes relating to “day”, “time” and “location”
                                                                                    Crime 5
          information of a crime incident. These features are consid-  Crime 8                     Crime 2
          ered because of their potential characteristics in assisting the   Crime 6        Crime 4
          analysis as a series will happen within a close space-time
          proximity. While the significance threshold helps to elimi-
          nate the first level of uncertainty between two crime objects,
          that is knowing whether the crime objects, say C i , C j , are  Fig. 2. Identifying sufficiently connected nodes in a crime
          sufficiently similar to be considered for further analysis, the  similarity graph (red edges are min-cut)
          prevalence characteristics (threshold P) further affirms the
          proximity condition. In learning a suitable value for parame-  The underlying crime incidents dependency structure can be
          ter P, we consider the data set derived for analysis as shown  modelled using a graphical approach as shown in Figure 2,



                                                          – 92 –
   103   104   105   106   107   108   109   110   111   112   113