Page 107 - Proceedings of the 2017 ITU Kaleidoscope
P. 107

Challenges for a data-driven society




          within the framework is central to deriving a useful spatial  Table 1.  Description of the different categories of important
          analysis. This means that when crime data from a spatial re-
                                                                              features considered
          gion is analysed alongside other spatial data (from another
                                                               Features                Categories
          region/province), there is tendency for over-fitting or under-                Indian-male (I-male)
          fitting in the emerging model or pattern, leading to poor pre-                Indian-female (I-female)
          dictions on new data sets. Thus, the spatial characteristics  Victim         Black-male (B-male)
          of data within a specified proximity is crucial during analy-  and            Black-female (B-female)
          sis. The proximity centred analysis can be achieved if local  Suspect information  White-male (W-male)
          stations are empowered to effectively analyse data from their                White-female (W-female)
          region or suburb.                                                            Coloured-male (C-male)
                                                                                       Coloured-female (C-female)
                                                                                       Lured
          2.3. Paucity of Research in Crime Series Identification in  Method of victim capture  Kidnapped
          Developing Nations                                                           Weapon
                                                                                       Deceit
          Over the past decade, there have been a significant research  Incident location/day/time  Time and location information
                                                               Substance abuse suspected  traces of substance (drug)
          effort on crime mining, for example, in the area of hotspots
                                                                                       abuse:
          and spatio-temporal related research [6],[7],[8], but there is
                                                                                       (yes, no, unsure)
          a paucity of research in crime series identification particu-
                                                               Suspect disguised (Masked)  (yes, no, unsure)
          larly in developing nations [9]. Moreover, while research
          on crime series identification seems to be gaining attention
          by researchers in the advanced part of the world such as the  3.2. Problem Definition and Analysis
          USA [10],[11], its exploration in developing nations is in-
          significant, despite its critical importance for public safety  The proposition in this study is that most crime patterns ex-
          improvement in a smart city development.           hibit at least a k minimum principal set that characterise the
                                                             MO of the offender(s) behaviour. This minimum principal
          Crime series analysis focuses on crimes thought to have been
                                                             set induces a similarity graph of crime objects and has the
          committed by the same individual or offenders, and may not
                                                             capability to reveal specific and general crime trends. To
          necessarily happen at hotspot locations [4]. Experience has
                                                             identify crime series in a (rape) crime database, a hybrid
          shown that many crimes are due to repeat offenders [10],[11],
                                                             model called CriClust, which combines similarity concepts,
          [12]. However, our findings reveal that the crime intelligence
                                                             geometric projection, and graph connectivity (highly con-
          unit in most of the developing nations (e.g., South Africa) do
                                                             nected subgraphs), was adopted. CriClust is augmented with
          not currently have an automated means of identifying these
                                                             a dual threshold scheme. Firstly, a crime similarity function
          similar attributes or incidents. Hence this research focuses on
                                                             was derived which is used to connect crime instances that
          the development of a crime series mining model, CriClust,
                                                             share related attribute information, based on the dual thresh-
          augmented with a dual-threshold scheme, which applies es-
                                                             old scheme. The similar objects are then modelled into a
          tablished theoretical concepts from clustering (highly con-
                                                             graphical structure, to learn a similarity graph that is based on
          nected sub-graph and similarity ranking) [13] to derive use-
                                                             established graph-theoretic model which is then partitioned
          ful evidence to security agencies as a way to improve public
                                                             into highly connected sub-graphs of related crimes [13].
          safety outcomes in developing nations.
                                                             Let C be a set of crime items or objects, where each crime
                                                             object, say C i ∈ C, is defined by a set of attributes A(C i ),
                3. CRICLUST MODEL FORMULATION                with cardinality F. Our interest lies in crime objects that ex-
                                                             hibit a coherent pattern on a subset of attributes of A. This
                                                             requires understanding the different characteristics of a data
          3.1. Data Used: Rape Database
                                                             set and prioritising features that will promote the goal of the
          This work serves to assist in identifying CSP in a rape data,  analysis. The measure used in this work identifies similarity
          however it can be extended to other forms of crime. The  attribute between crimes C i and C j based on two important
          motivation for considering rape crime is the fact that despite  thresholds S and P, for sufficiently high (strict) coherence;
          the heightened sensitivity and understanding about sexual as-  where S is the interest similarity support measure (signifi-
          sault and violence, South African communities happen to be  cance threshold), and P is the prevalence support threshold.
          a place where rape, assault and murder of people (and partic-  Therefore, the following definitions follow:
                                                4
          ularly women and children) is of great concern .
                                                             Definition 1. (Instance Feature (IF)) Consider a crime
          Table 1 presents a description of some features and subjects  C i ∈ C, and a feature f. Let P f (C i ) be the value of the
          considered in this research. The prefix on gender informa-  f (th)  feature in C i . For example, if the crime C 2 occurs on a
          tion (e.g., I-male, B-female) represents the different racial  Monday, then P day (C 2 ) = Monday.
          population categories in SA.
                                                             We define a binary feature similarity function S f using the
             4 http://rapecrisis.org.za/                     Kronecker delta function, where S f (c i , c j ) takes on values in



                                                          – 91 –
   102   103   104   105   106   107   108   109   110   111   112