Page 109 - Proceedings of the 2017 ITU Kaleidoscope
P. 109
Challenges for a data-driven society
based on a metric similarity function, say Λ(v i , v j ) as shown then return (G)
in Equation (2). The similarity function, S f (v i , v j ), mea- else
sures the similarity between crimes v i , v j in the f th feature. HCS(H)
¯
The representation of the crime similarity data in a similar- HCS(H)
ity graph helps to simplify computation, since we then only end if
need to find sufficiently inter-connected sets of nodes [13], end
which is referred to as highly connected sub-graphs (HCS)
as shown in Figure 2. The HCS approach significantly helps The correctness and completeness of the approach consid-
to simplify the computation of relevant clusters. ered in this research can be seen as a direct consequence of
the correctness and completeness of the methods and algo-
rithms utilised for achieving the target task [13],[14].
3.3. CriClust Problem Specification
• Specifying the Similarity Graph
4. CRICLUST WEB-BASED KNOWLEDGE
For a crime object C, let L d (C) denote the 2-D (x, SUPPORT SYSTEM
y) component of day attribute, L t (C) - the 2-D (x,
y) component of time attribute, and L l (C) - the 2-D It is important to stress that while this research explores a
(longitude, latitude) component of location attribute. rape database for identifying crime series, the idea consid-
Moreover, let L i (C) be the i th attribute of the crime ered in this study can be extended to other forms of crime.
object C. The specification for CriClust model is as We carefully and deliberately try to focus our analysis on
follows: relevant attributes that will assist the analysis according to
m : N number of attributes (≥ 9) crime expert advice.
S : N significance threshold (≥ 7)
P : N prevalence threshold (≤11)
4.1. System Overview
N : N number of crime objects
m
C : R crime objects with m features each The experimental set-up was implemented using Java Net-
m m
G : array [N, N) of R ×R similarity graph Beans platform with multi-threading, Apache Derby Net-
(originally empty) work Server 10.10.2.0., with security manager installed
q
r
For C , C ∈ C using the basic server security policy. The experiment was
r
q
if |{i : L i (C ) = L i (C )}| ≥ S suffi- conducted on an Inspiron-7347 DELL machine, Intel(R)
ciently similar crime objects Core(TM) i5-4210U CPU @ 1.70GH. The data considered
q
q
q
q
A := (L d (C ), L t (C ), L l (C )) for the experiment consist of 5500 rape crime records across
r
r
r
r
A := (L d (C ), L t (C ), L l (C )) 40 locations (suburbs) in Western Cape, South Africa, com-
r
q
if Euclid-dist(A , A ) ≤ P prising of 13 attributes of relevant features as prescribed by
q
r
G+ = (C , C ) } the crime intelligence unit. In what follows, we present some
Return G Similarity graph results to show the potential usefulness and reliability of the
CriClust model.
• Specifying the Minimum Cut (min-cut) using Karger’s
algorithm [14] on a graph G = (V, E)
minCut(G = (V, E))
G 0 ← G
j = 0
While G j has |V | > 2 do
pick an edge e j from G j at random
G j+1 ← G j \ e j
j ← j + 1
{T, V \ T} is the cut in the original graph that
corresponds to the min-cut of G j
• Specifying the Highly Connected Sub-graph (HCS)
For a graph G, let E(G) be the edge set, V (G) be the Fig 3. CriClust selection interface for data processing
vertex set and C be a minimum cut 5
features
HCS(G(V, E))
begin Figure 3 shows the process selection interface for data pro-
¯
{H, H, C} ←− minCut(G)
cessing features. Upon successful login, the functionalities
if G is highly connected
of the system use this interface for flexible feature selection.
5 Other features from the view include “crime forms” tab that
A minimum cut, C, is the minimum number of edges which separates
¯
G into sub-graphs H and H. allows users to capture crime data. There is also an added
– 93 –