Page 109 - Proceedings of the 2017 ITU Kaleidoscope
P. 109

Challenges for a data-driven society




          based on a metric similarity function, say Λ(v i , v j ) as shown  then return (G)
          in Equation (2). The similarity function, S f (v i , v j ), mea-  else
          sures the similarity between crimes v i , v j in the f th  feature.  HCS(H)
                                                                                ¯
          The representation of the crime similarity data in a similar-   HCS(H)
          ity graph helps to simplify computation, since we then only  end if
          need to find sufficiently inter-connected sets of nodes [13],  end
          which is referred to as highly connected sub-graphs (HCS)
          as shown in Figure 2. The HCS approach significantly helps  The correctness and completeness of the approach consid-
          to simplify the computation of relevant clusters.  ered in this research can be seen as a direct consequence of
                                                             the correctness and completeness of the methods and algo-
                                                             rithms utilised for achieving the target task [13],[14].
          3.3. CriClust Problem Specification

             • Specifying the Similarity Graph
                                                                  4. CRICLUST WEB-BASED KNOWLEDGE
               For a crime object C, let L d (C) denote the 2-D (x,          SUPPORT SYSTEM
               y) component of day attribute, L t (C) - the 2-D (x,
               y) component of time attribute, and L l (C) - the 2-D  It is important to stress that while this research explores a
               (longitude, latitude) component of location attribute.  rape database for identifying crime series, the idea consid-
               Moreover, let L i (C) be the i th  attribute of the crime  ered in this study can be extended to other forms of crime.
               object C. The specification for CriClust model is as  We carefully and deliberately try to focus our analysis on
               follows:                                      relevant attributes that will assist the analysis according to
               m : N           number of attributes (≥ 9)    crime expert advice.
               S : N          significance threshold (≥ 7)
               P : N           prevalence threshold (≤11)
                                                             4.1. System Overview
               N : N           number of crime objects
                    m
               C : R           crime objects with m features each  The experimental set-up was implemented using Java Net-
                                 m    m
               G : array [N, N) of R ×R       similarity graph  Beans platform with multi-threading, Apache Derby Net-
               (originally empty)                            work Server 10.10.2.0., with security manager installed
                    q
                        r
               For C , C ∈ C                                 using the basic server security policy. The experiment was
                              r
                                       q
                   if |{i : L i (C ) = L i (C )}| ≥ S    suffi-  conducted on an Inspiron-7347 DELL machine, Intel(R)
               ciently similar crime objects                 Core(TM) i5-4210U CPU @ 1.70GH. The data considered
                                  q
                                         q
                                                q
                         q
                       A := (L d (C ), L t (C ), L l (C ))   for the experiment consist of 5500 rape crime records across
                                  r
                         r
                                                r
                                         r
                       A := (L d (C ), L t (C ), L l (C ))   40 locations (suburbs) in Western Cape, South Africa, com-
                                           r
                                        q
                           if Euclid-dist(A , A ) ≤ P        prising of 13 attributes of relevant features as prescribed by
                                        q
                                           r
                               G+ = (C , C ) }               the crime intelligence unit. In what follows, we present some
               Return G                  Similarity graph    results to show the potential usefulness and reliability of the
                                                             CriClust model.
             • Specifying the Minimum Cut (min-cut) using Karger’s
               algorithm [14] on a graph G = (V, E)
               minCut(G = (V, E))
                   G 0 ← G
                   j = 0
                   While G j has |V | > 2 do
                       pick an edge e j from G j at random
                       G j+1 ← G j \ e j
                       j ← j + 1
                   {T, V \ T} is the cut in the original graph that
               corresponds to the min-cut of G j
             • Specifying the Highly Connected Sub-graph (HCS)
               For a graph G, let E(G) be the edge set, V (G) be the  Fig  3.  CriClust  selection  interface  for  data  processing
               vertex set and C be a minimum cut  5
                                                                                   features
               HCS(G(V, E))
               begin                                         Figure 3 shows the process selection interface for data pro-
                        ¯
                   {H, H, C} ←− minCut(G)
                                                             cessing features. Upon successful login, the functionalities
                   if G is highly connected
                                                             of the system use this interface for flexible feature selection.
             5                                               Other features from the view include “crime forms” tab that
             A minimum cut, C, is the minimum number of edges which separates
                           ¯
          G into sub-graphs H and H.                         allows users to capture crime data. There is also an added
                                                          – 93 –
   104   105   106   107   108   109   110   111   112   113   114