Accepted Manuscript Title: An approach for understanding offender modus operandi to detect serial robbery crimes Authors: Yu-Sheng Li, Ming-Liang QI PII: DOI: Article Number:
S1877-7503(19)30341-2 https://doi.org/10.1016/j.jocs.2019.101024 101024
Reference:
JOCS 101024
To appear in: Received date: Revised date: Accepted date:
25 March 2019 12 July 2019 25 July 2019
Please cite this article as: Li Y-Sheng, QI M-Liang, An approach for understanding offender modus operandi to detect serial robbery crimes, Journal of Computational Science (2019), https://doi.org/10.1016/j.jocs.2019.101024 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
An approach for understanding offender modus operandi to detect serial robbery crimes
Yu-Sheng Li a,b, Ming-Liang QI a,b,* a. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
ro of
b. School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China
re
-p
Highlights
The crime process is incorporated into the understanding of the criminals’ behaviours.
The crime features similarity calculation takes into account the frequency of each
lP
feature and a modified dynamic time warping (DTW) is used to measure the
na
similarity of the crime process.
A real-world robbery dataset is employed to measure the performance of finding
ur
serial crimes.
This approach can improve the efficiency of crime analysis and maintain the social
Jo
security.
*
Corresponding author at: Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China. E-mail addresses:
[email protected] (M. Qi). 1
Abstract: Detecting serial crimes is one of the most challenging tasks in crime analysis. Linking crimes committed by the same criminal can improve the work efficiency of police offices and maintain public safety. Previous crime linkage studies have focused on the crime features of modus operandi (M.O.) but did not address the crime process. In this paper, we proposed an approach for detecting serial robbery crimes based on understanding offender M.O. by integrating crime process information.
ro of
According to the crime narrative text, a natural language processing method is used to extract the action and object characteristics of the crime process, a dynamic time
-p
warping method was introduced in the similarity measurement of these characteristics, and an information entropy method was used to weight the similarity of the action and
re
object characteristics to obtain the comprehensive similarity of criminals’ crime process.
lP
A real-world robbery dataset is employed to measure the performance of finding serial crimes after adding the crime process information. According to the results, information
na
about the crime process obtained from the case narrative text has significant separability and can better characterize better the offender’s M.O. Five machine learning algorithms
ur
are used to classify the case pairs and identify serial cases and nonserial cases. Based
Jo
on the crime features, the results show that the addition of crime process information can substantially improve the effect of detecting serial crimes. Keywords: Serial crimes, Crime linkage, Modus operandi, Natural Language
Processing, Dynamic time warping
2
1 Introduction Of China’s crimes of encroachment on property, serial crimes account for a considerable proportion[1]. As the most violent crime of encroaching on property[2], robbery seriously jeopardizes public security and has a relatively high frequency in practice[3]. Linking crimes can improve the efficiency of crime analysis and maintain social security. Therefore, China’s police officials are increasingly focusing their
ro of
attention on research and the development of algorithms to detect serial crimes[4].
The detection of serial crimes is one of the most important problems in crime
-p
analysis[5]. Detection is based on the similarity among offenders’ behaviors to identify crimes committed by the same suspects[6]. Studies have shown that a large proportion
re
of crimes are committed by a minority of offenders[7], e.g., some researchers have
lP
discovered that approximately 50% of crimes are committed by 6%-10% of criminals in the United States (US) and the United Kingdom (UK) [8].
na
Some forensic evidence can be used to link crimes, such as DNA, fingerprint, etc, but the availability of the evidence is limited[7]. Therefore, geospatial and temporal
ur
features, the offense target and criminal behavior are used as alternative information to
Jo
detect serial crimes. In the rational choice theory, the choice of criminals is to maximize rewards and reflects decision criteria that criminals use[9]. Previous studies found that both the geographical distance and temporal proximity achieved statistically significant levels of discrimination accuracy when differentiating between a wide variety of linked
3
and unlinked crimes[10]. Criminal behavior is more widely used in serial crime detection. Modus operandi (M.O.) refers to behaviors committed during an offense that serve to ensure its completion while also protecting the perpetrator’s identity and facilitating escape following the offense[11]. It characterizes criminals’ crime series, which is a key part of a researcher’s case linkage[12, 13]. Previous studies primarily identified the M.O. by extracting the key information from the crime description; however, they did not
ro of
consider the process of the crime. For instance, in case A, the criminal suspect snatched
the property and the victim rebelled, and then the criminal suspect pushed the victim to
-p
the ground and took the property. In case B, the criminal suspect pushed the victim to the ground and snatched the property, and then the victim rebelled and the criminal
re
suspect took the property. The criminal suspects in these two cases have the same crime
lP
feature. From the perspective of the crime process, however, the criminal suspect in case B is more aggressive than the criminal suspect in case A. Only considering the
na
crime feature will disregard some information about the M.O. This paper will study the addition of crime process information to the crime features to determine whether it can
ur
improve the performance of detecting serial robbery crimes.
Jo
The crime narrative contains the entire process of criminals committing crimes. Thus, the crime process can be regarded as the changing process of the key information in the crime narrative text during the time of the crime. Before comparing the similarities in the crime process, we need to identify the key information in the crime
4
narrative text. In practice, the criminal suspect’s action is an important characteristic of the crime process; thus, the verbs in the crime narrative text can be treated as action characteristics. In Chinese and English, nouns are the most important parts of speech that characterize the content of a text[14]. Thus, the characteristics of the crime process can be divided into two parts: the action characteristic, which is represented by the verbs in the crime narrative, and the object characteristic, which is represented by the
ro of
nouns in the crime narrative. We treat the key information in the crime narrative as time
series data and employ the dynamic time warping (DTW) method to measure its
-p
similarity. The contribution of this study is described as follows: First, we incorporate
information about the crime process into the understanding of the M.O. Second, the
re
crime features similarity calculation takes into account the frequency of each feature
lP
and a modified DTW is used to measure the similarity of the crime process. Third, several machine learning algorithms were used to demonstrate the performance of
na
adding process information to the detection of serial crimes. The remainder of this paper is organized as follows: Section 2 presents a brief
ur
summary and a review of relevant research. In Section 3, we describe the crime series
Jo
detection problems. Section 4 summarizes the methodology applied to detect serial robbery crimes, proposes several attribute similarity measures, including the use of DTW to calculate the similarity of the crime process and introduces several detection models. In Section 5, a real-world robbery crime dataset in a city of China is introduced,
5
and some evaluation criteria and the analysis of the results of the experiments are presented. Conclusions and future research are discussed in Section 6. 2 Literature review Crime series detection relies on three primary assumptions. The first assumption is that criminals consistently act, the second assumption is that criminals exhibit some
ro of
distinctiveness in their behaviors, and the third assumption is that important aspects of criminals’ behaviors can be observed, measured and accurately recorded[15]. Based on these three assumptions, researchers can extract the crime attributes according to the
-p
criminals’ M.O. and detect whether two cases are committed by the same offender
re
according to the similarity among the case attributes. Therefore, the quality of attributes extraction determines the quality of the final detection results.
lP
The M.O. is the main basis for criminal behavior. Researchers primarily extract attributes from the M.O. of robbery cases according to the following domains: the
na
action of criminals[16], weapons of criminals[17], ways of threat[18], ways of harm[19], and
ur
ways of disguise[20].
In addition to the crime features, some researchers have attempted to use crime
Jo
narrative information for crime linkage. They handled the crime narrative in two ways. The first way was to treat the crime narrative as one of the factors in identifying serial crimes. A few studies employed term frequency-inverse document frequency (TF-IDF) to calculate the similarity between two cases’ narrative texts and integrated other 6
information to crime linkage[21,
22]
. Zhu and Xie applied regularized restricted
Boltzmann machines (RBMs) to jointly capture time, location, and the complex freetext component of each crime, which can select the key features of crime to linking crimes without supervision[23]. The second way used the case narrative as the sole basis for identifying crime series, without considering other information. Helbich et al. applied the self-organizing map algorithm and its visualization capabilities to explore
ro of
the hidden relationships and information in a geographical context and investigative a
crime series[24]. Some researchers introduced natural language processing to calculate
-p
the similarity among crime reports to support crime series detection[25, 26]. These studies show that crime narrative information is effective for crime series detection.
re
Although researchers have focused on extracting the crime attribute from the
lP
perspective of crime features, the crime process is not considered. Studies have shown that traces and changes on the scene are inextricably linked to the crime process[27]. In
na
criminology, there are some studies detecting crime patterns based on crime processes. Cornish proposed the concept of crime script, which can organize our knowledge about
ur
how to understand and enact commonplace behavioral processes of crimes, it can be
Jo
divided up into scenes involving smaller units of action, or plans [28]. Crime scripts have been used to analyze the offense process of serial offenders. Beauregard et al used a rational choice theory approach to analyze the offense behavior of serial sex offenders and identified hunting process scripts in a sample of serial sex crimes by using
7
multivariate statistical methods and hierarchical cluster analysis[29, 30]. Analysis of the crime process can fully grasp the criminal activities of the perpetrators[31]. Thus, detecting serial crimes can integrate crime process information. The crime narrative text records the entire process of a case from the beginning to the end; thus, the crime process can be obtained from the narrative text. The overarching theme in previous crime linkage research is to find crimes with
ro of
similar attribute values. Studies have suggested that the similarity among serial cases’ attributes is higher than those in the nonserial case[32]. Many different similarity
-p
measures exist for different crime attribute types. Brown and Hagen measured the similarity among two numeric values based on the absolute distance[33], which is a
re
normalized operation of the absolute value of the difference. For a categorical
lP
attribution, the similarity is usually binarized and then binary similarity measures are applied[34]. Bennell et al. employed Jaccard’s coefficient to calculate the crime
na
similarity for a pair of crimes[35].
Previous studies paid more attention to extracting the crime attribute from the
ur
perspective of the criminal M.O. The obtained crime attributes are crime features but
Jo
the crime process is not considered. To further understand the offender’s M.O., this paper divides the M.O. into the crime features and the crime process. Information about the crime process is derived from the narrative text of the case, which simplifies the acquisition of the process information.
8
3 Problem description Table 1 lists some notations that are used in this section. The problem of serial case detection can be described as follows: We know the set of cases C ={ci | i 1, 2,..., n} , where ci denotes the i -th case in the crime set C, every two cases can form a pairwise case pij , ci , c j C, i j , and N n (n 1) / 2 pairwise cases exist. We extract the crime attributes from two parts: the crime features and the crime process. Each case ci
ro of
has m attributes on the crime features, and the attribute vector of the case ci is denoted by ai =(ai1 ,ai2 ,...,aik ,...,aim ) . According to the different attribute types, different
-p
similarity measures are selected to calculate the similarity of each case pair on each attribute. Let Sijk denote the similarity of the k-th crime features attribute of the pij
re
and Sijprocess denote the similarity of the process of pij . The similarity vector of the
lP
pairwise case pij is expressed by Sij' =(Sij1 ,Sij2 ,..., Sijm ,Sijprocess ) . We want to determine whether the two cases are committed by the same criminal. If pij was committed by
na
the same offender (or same gang), we refer to it as the serial case.
Jo
ur
Table 1 Notation description Notation Description n Number of cases Number of case pairs, N n (n 1) / 2 N Pairwise case consisting of the case i and j , where i, j {1, 2,..., n} , pij i j The feature attribute of crime, k [1, m] k
aik
The value of the attribute k of the case i
ai
The attribute vector of the case i , ai =(ai1 ,ai2 ,...,aim )
9
ak
The attribute k
a process
The crime process attribute
Sijk
Similarity value of the pairwise case pij in attribute k
Sijprocess
Similarity value of the pairwise case pij in the crime process attribute
Sij
Similarity Vector of the pairwise case pij , where Sij =(Sij1 ,Sij2 ,..., Sijm ) Similarity Vector of the pairwise case pij after adding the crime process attribute, where Sij' =(Sij1 ,Sij2 ,..., Sijm ,Sijprocess )
yij
ro of
S
' ij
Actual value of whether the case i and j belong to a series, yij {0,1} Predicted value of whether the case i and j belong to a series, yˆij {0,1}
-p
yˆ ij
re
4 Methodology
lP
In this paper, a crime process information integrated approach that understands an offender’s M.O. is proposed for detecting serial robbery crimes. The key steps are
na
provided in Fig. 1.
The first step is constructing robbery crime attributes from the two aspects of the
ur
crime features and the crime process. The attributes are primarily derived from the
Jo
property of the robbery crime and previous studies, and some attributes are mentioned in the literature review. The second step is measuring the similarity of each crime attribute of the case pairs.
Multiple attribute types will exist, and the similarity methods should be appropriate for
10
the corresponding type. The third step is applying machine learning classification algorithms to model the problem of crime series detection. Cases are gathered and processed in this step. The cases include serial crimes and nonserial crimes. Case processing involves extracting attribute values and representing them in the format of the problem description. An
Crime attributes
Attributes construct
Categorical attributes
Keyword attributes
Absolute distance
Jaccard s coefficient
Word2vec
Crime process attribute
-p
Numeric attributes
re
Attribute similarity measures
Crime features
ro of
extensive range of algorithms is applied.
Similarity measure based on DTW
information entropy
Crime feature similarity
na
Detection models
lP
applying algorithms Crime process similarity
ur
Crime series detection results
Jo
Fig. 1. The methodology applied to detect serial robbery crimes
4.1 Attribute similarity measurement The similarity calculation of the case attributes includes two aspects: the similarity
of the crime features and the similarity of the crime process. The crime feature attributes
11
may include three types of contents: numeric attributes, categorical attributes, and keyword attributes; the methods absolute distance, Jaccard’s coefficient, and word2vec, respectively, are used to compute the similarity for these three types of crime features attributes. The similarity calculation of the crime process can be regarded as matching the change in the key information in the crime narrative. Due to the variation in the
ro of
complexity of the case, the duration of the case differs and the length of the key information sequence in the case narrative text is inconsistent. DTW is based on the
-p
idea of dynamic programming and focuses on the optimal matching when the
conditions of the two sequence lengths are inconsistent [36]. Thus, DTW can solve the
re
matching problem of two cases’ crime processes. The process information is contained
lP
in the narrative text. Some researchers have applied DTW to text data and treat textual data as a sequence of words[37, 38]. Based on their thoughts, we design a similarity
na
measure to extract key characteristics of the crime process to calculate the process similarity. The attribute information and similarity measure are shown in Table 2.
ur
Table 2
Jo
Attribute information and similarity measures M.O.
Crime features
Name of attribute C_Num C_Tools W_Disguise W_Harm W_Property
Description Number of criminals Tools used by the criminal The way criminals disguise The way criminals harm victims The way criminals rob property 12
Measure Absolute Jaccard Jaccard Jaccard Jaccard
W_Threat W_Breaking C_Action C_process
Crime process
The way criminals threat victims The way criminals break through obstacles Actions taken by the criminals The crime process of criminals
Jaccard word2vec word2vec DTW
4.1.1 Similarity of the numeric attribute We use the absolute distance to measure the similarity between two numeric values. The absolute distance is shown as Eq. (1).
ro of
k k Simabk _ dist (aik , a kj ) 1- | aik a kj | /(amax amin )
(1)
k k where amax denotes the maximum value of the attribute k and amin denotes the
re
4.1.2 Similarity of the categorical attribute
-p
minimum value.
For categorical attributes, Jaccard’s coefficient is used to calculate their similarity.
lP
Jaccard’s coefficient is calculated as | q | /(| q | | r | | s |) , where q represents the set
na
of features that are present in both cases, and r and s represent the set of features that are present in one case but absent in the other case. In the calculation of the
ur
traditional Jaccard’s coefficient, the importance of all values on the attribute is regarded
Jo
as identical. However, when a group of records has some common features and these features are outliers, these records are more likely to be generated by the same cause[39]. Therefore, different scores should be given according to the features’ frequencies; the lower is the occurrence probability of a feature, the higher is its importance. We calculate the score of each feature kf by Eq. (2). Let n kf denote the number of cases 13
with the feature f on the attribute k and n denote the total number of cases (refer to Table 1). The larger is the number of times that a feature appears, the more common is the feature, and the lower is the score. In addition, when the attribute value is missing, k is NULL. We denote the score of NULL as NULL , and we
then the feature f
calculate it according to Eq. (2).
kf 1 nkf / n
(2)
ro of d ( xq , zv )
zv
z1
-p
Word Sequence2
zV
xq
x1
xQ
re
Word Sequence1
lP
Fig. 2 A warping path using DTW to calculate the distance of two word sequences We calculate the similarity of the pairwise case on the attribute k , as shown in Eq. (3).
ur
na
( kf ) / (| q | | r | | s |) f q k k k k SimJaccard (ai , a j ) NULL , aik a kj NULL (3) k k 0, ai NULL or a j NULL
Jo
In Eq. (3), the numerator in the traditional Jaccard’s coefficient calculation formula becomes the sum of the scores of the identical features, kf belongs to 0 to 1 and
f q
k f
is actually equal to | q | (nkf / n) . If the value is missing on the attribute k , f q
k the similarity of the case pair on this attribute is NULL (calculated according to Eq.
14
(2)). When one of the two cases is missing on attribute k but the other case is not missing, the similarity is set to 0.
4.1.3 Similarity of the keyword attribute Some of the crime features are derived from the keywords in the crime narrative text. Only by understanding the actual meaning of the words, the comparison results of
ro of
the words are credible. In this paper, we use word2vec to learn word embeddings [40]. Based on the word embeddings, we use word vectors to calculate the similarity of words w1 and w2 , which is denoted as W 2V ( w1 , w2 ) . The similarity calculation
-p
method on attribute k in the pairwise case pij is shown in Eq. (4).
(4)
lP
re
max(W 2V (aik, w , a kj ,w )), aik, w aik , a kj , w a kj w k k k k Simword NULL , aik a kj NULL 2 vec ( ai , a j ) 0, aik NULL or a kj NULL
In Eq. (4), when a case i has multiple keywords on the attribute k , each keyword
na
is recorded as aik, w , where aik, w aik . The similarity of the pairwise case pij on attribute k is the value with the highest similarity of words for all keywords, which
ur
means that the maximum value is taken for the similarity of the keywords. When the
Jo
two cases are missing on the attribute k , the similarity of the case pair on this attribute k is NULL , which is calculated in the same way as Eq. (2). When one of the two cases is
missing on attribute k but the other case is not missing, the similarity is set to 0.
15
4.1.4 Similarity of the crime process attribute The action characteristics and the object characteristics constitute the crime process; they can be regarded as sequence data composed of a series of words that are obtained from the crime narrative text. In this paper, we design a method based on the symmetric form of the standard DTW to calculate the similarity in the crime process of two crimes.
ro of
This similarity measure is divided into two stages. The first stage is to calculate the similarity between the action and the object characteristics of the case pair, and the second stage is to weight the two sequences as the crime process similarity.
DTW[41].
Assume
the
two
-p
First, we calculate the similarity between two word sequences based on the word
sequences
X ={x1 ,..., xq ,..., xQ }
and
re
Z={z1 ,...,zv ,...,zV } , as shown in Fig. 2. A Q -by- V grid is constructed, and the
lP
( xq , zv ) elements in each grid represent the distance d ( xq , zv ) between the word xq
and the word zv . The warping path is not unique but must start at the comparison of
na
the first word of two word sequences and end at the comparison of the last word. If the warping path passes through T grids, it can be defined as W ; thus, we have
ur
W ( x1 , z1 )1 ,...,( xq , zv )t ,...,( xQ , zV )T
Jo
The goal of optimization is to obtain the smallest cumulative distance from ( x1 , z1 )
to ( xQ , zV ) . The distance can be defined as Dist (refer to Eq. (5)). T
min Dist ( X , Z ) d ( xq , zv )t , 1 q Q,1 v V W
(5)
t 1
The cumulative distance from ( x1 , z1 )1 to ( xq , zv )t 16
is calculated by the
cumulative distance function, as shown in Eq. (6). The cumulative distance of step k is represented by the sum of the cumulative distance of step t 1 and the minimum of ( xq +1 , zv ) , ( xq +1 , zv +1 ) , and ( xq , zv +1 ) ; that is, each step of a warping path can only be
extended along one of the three grids. xq +1 and zv +1 indicates the next action and object characteristic in the crime process. D(t )=D(t 1)+min{d ( xq +1, zv ), d ( xq +1, zv +1 ), d ( xq , zv +1 )}
(6)
ro of
In this paper, we use the same word vector as section 4.1.3 to measure words’ similarities. The distance between words xq and zv in the two word sequences X
-p
and Z is calculated by Eq. (7). d ( xq , zv ) 1 W 2V ( xq , zv )
(7)
re
The calculation result Dist represents the distance between two word sequences.
lP
The similarity between the two word sequences can be measured by Eq. (8).
(X , Z )=1-Dist (X , Z ) T
(8)
na
where T is the length of the warping path. In order to make the range of the distance of two crimes’ crime process from 0 to 1, we divide Dist by the warping path length.
ur
Because we use the similarity to detect serial crimes, we subtract the obtained distance
Jo
from 1. The range of the similarity (X , Z ) is 0 to 1. Algorithm 1. The similarity measure for word sequence Input: X ={x1 ,..., xq ,..., xQ } , Z={z1 ,...,zv ,...,zV } Output: (X , Z )
17
1: D(1)=d ( x1 , z1 ) 2: For q in range (2, Q) do 3: For v in range (2, V) do 4:
D(t )=D(t 1)+min{d ( xq +1, zv ), d ( xq +1, zv +1 ), d ( xq , zv +1 )}
5: Calculate Dist ( X , Z ) , T 6: End for 7: End for 8: Return: (X , Z )
ro of
Algorithm 1 describes how to measure the similarity between two word sequences, then we calculate the weight of each sequence. Assume that narrative texts for case i
and j exist; we can compute their action characteristics’ similarities Sijac and object
-p
characteristics’ similarities Sijob , which are denoted as Sijseq , seq {ac, ob} . The
re
similarity of the crime process is obtained by the weighted summation of the two characteristics’ similarities. The similarity value of each characteristic in the crime
lP
process can be regarded as the predicted value of the linking crimes. The deviation between the similarity value and the actual value yij of the pairwise case is considered
na
to be the error of detecting serial crimes, such as the deviation between the action
ur
characteristic similarity Sijac and the actual value yij is | Sijac yij | . When determining the weight of the action characteristic and the object characteristic, we consider two
Jo
factors, one is the level of the error, and the other is the stability of the error. We want to increase the weight of the characteristic with a low error level and small error variation. A lower level of error indicates that the characteristic is more capable of detecting serial crimes. The smaller is the error variation, the more robust and stable is
18
seq the characteristic in linking crimes. The error vector Sdeviation of the characteristic
similarity is composed of the deviation d ij of all characteristic similarity values from the actual value yij . The calculation is expressed as follows: seq Sdeviation {dijseq | dijseq | Sijseq yij |, i j, i, j {1, 2,..., n}}
(9)
where dijseq denote the deviation of the action characteristic and object characteristic of the pairwise case pij . When calculating the error level, we get the error vector of
ro of
ac each characteristic according to Eq. 9. In Eq. 10 and Eq. 11, Dlevel denotes the sum of
obj the action characteristic’s error and Dlevel denotes the sum of the object characteristic’s
-p
error, then we calculate the ratio of each error vector’s sum in the sum of the two
characteristics’ error. The larger is the error level of the error vector, the smaller is the
re
ac corresponding sequence weight. The weights of the two characteristics are level and
lP
ob . level
ac Dlevel ac ac , where Dlevel = Sdeviatio n ac obj Dlevel Dlevel
na
ac level 1
ob level 1
ob Dlevel ob ob , where Dlevel = Sdeviatio n ac ob Dlevel Dlevel
(10)
(11)
ur
We utilize the information entropy[42] method to calculate the error variation.
Jo
Entropy is the measure of a system’s uncertainty in information theory; the larger is the entropy, the greater is the uncertainty, and vice versa[43,
44]
. Some studies use
information entropy to quantitatively describe the equilibrium of the decision attributes[45]. If each value in the sequence is equal, then the entropy of the sequence is
19
largest. When each value in the sequence differs greatly, the entropy of the sequence is small. For the error vector, if the entropy of the error vector is small, then the variation of error is large. It means that the error of the characteristic is very inconsistent and varies widely, some of them have large error values and some have small error values. It means that the error vector has bad stability, the weight given to the corresponding
ijseq
dijseq
d
seq ij
, ci , c j C , i j
i, j
ro of
characteristic should be smaller. First, the error vector needs to be normalized (Eq. 12).
(12)
1 ln(| S
seq deviation
|) ci ,c j C ,i j
ijseq lnijseq
(13)
re
seq Euncertainty
-p
Second, the entropy of each error vector needs to be obtained, as shown in Eq. (13):
lP
seq seq where 1/ ln(| Sdeviation |) makes Euncertainty [0,1] . According to the meaning of entropy,
the error vector with a larger entropy value indicates that the variation in the error vector
na
is smaller and the stability is better, and the weight of the corresponding sequence
ur
should be larger, as shown in Eq. (14).
ac Euncertainty
ac ob Euncertainty Euncertainty
,
ob uncertainty
ob Euncertainty ac ob Euncertainty Euncertainty
(14)
Jo
ac uncertainty
After calculating the weights of the action characteristic and the object
characteristic similarity in terms of the error level and error variation, the corresponding weights of the two characteristics are calculated according to Eq. (15) and (16). We assign a higher weight to the characteristic with the lower error level and the smaller 20
error variation. ac ac level uncertainty ac ac ob ob level uncertainty level uncertainty
(15)
ob ob level uncertainty ac ac ob ob level uncertainty level uncertainty
(16)
ac
ob
Last, the similarity sequence of the crime process is as expressed as follows: Sijprocess ac Sijac ob Sijob
ro of
(17)
4.2 Detection models
To create detection models, we selected five machine learning algorithms that are
-p
extensively employed for classification problems and are suitable for our problem. In
re
our experiments, the average results of the 10-fold cross-validation are regarded as the classification performance. In the selection of algorithm parameters, we make some
lP
adjustments based on the default parameters of the scikit-learn machine learning library.
na
The algorithms are briefly described as follows: Logistic Regression (LR): This paper applies a binomial logistic regression
Support Vector Machine (SVM): The support vector machine takes each sample
Jo
ur
model. In our experiments, the l2 penalty is used.
as a point in space, and the goal is to obtain a hyperplane to separate the samples. In this paper, we use a linear kernel.
K-Nearest Neighbor (KNN): KNN obtains the k instances closest to the
21
instance in the training data set. The majority of the k instances belong to a class, and the input instance is divided into this class. In our experiment, k =1 , which is the distance between the two points, uses the Minkowski distance.
Neural Network (NN): Neural networks can simulate the interaction of biological neural networks with real-world objects and classifies samples by training the
ro of
parameters of network connections and neuron thresholds. Our model is a network with 100 hidden layer nodes, and the learning rate is 0.001.
Random forest (RF): The random forest is a classifier that contains multiple
-p
decision trees and introduces random attribute selection in the decision tree
re
training process. In our experiment, the number of trees = 100.
lP
5 Case studies
In this section, we employ the proposed serial crimes detection method to
na
experiment on real-world case data and introduce some evaluation criteria. The experimental results are analyzed and discussed.
ur
5.1 Data sets
Jo
5.1.1 Wikipedia dumps
We apply the CBOW model to train word embeddings on the Wikipedia Chinese
corpus dumps. The corpus contains 6.98 GB (XML type) articles. We reduce the noise of textual data by removing the empty lines and stopwords. The vector dimensionality
22
of word embedding is 400.
5.1.2 Case description The case dataset was derived from the judicial document published on the OpenLaw website, and the crime features and process of each case were manually obtained. These crimes were real solved robbery cases in Zhengzhou City, Henan
ro of
Province, China. These incidents occurred from January 2013 to February 2018. The dataset includes a total of 334 cases, which were committed by 279 criminals; 248 cases were committed by a single criminal, not a serial crime. The remaining 86
-p
cases were serial crimes, which were committed by 31 criminals, each of which
re
committed a maximum of 8 crimes and a minimum of 2 crimes (refer to Table 3). In serial crimes, 18 criminals committed 2 cases; 10 criminals committed 3 cases; 1
lP
criminal committed 4 cases; 2 criminals committed 8 cases. Therefore, the entire dataset
na
can construct 55,611 case pairs, of which 110 pairwise cases were committed by the same offender, and 55,501 pairwise cases were committed by different offenders.
ur
In this paper, we obtain the crime attribute values from judicial documents. From the perspective of the victim or eyewitness, we aim to reorganize the crime narrative
Jo
text based on the judicial document, and the reorganized text is employed as the cases’ crime processes. The similarity among the pairwise cases of each attribute is calculated using the similarity functions proposed in Section 4, and the similarity values fall between 0 and 23
1. The similarity vector Sij =(Sij1 ,Sij2 ,..., Sijm ) of a single pairwise case is used to indicate the attributes’ similarity of this pairwise case. The number of pairwise cases is denoted as N ; for each attribute, the similarity of the pairwise cases can form a N N similarity matrix M
N N
. Each case has m attributes; thus, m similarity matrices
will be formed.
ro of
5.2 Attribute sets We divide the crime process of the case narrative into action characteristics and
object characteristics, which correspond to the verb sequence and the noun sequence in
-p
the narrative text. The two sequences are weighted by the information entropy. We
re
denote the attribute, which only considers the similarity among the action (or object) characteristics as the similarity of the crime process as a ac (or a ob ), and denote the
lP
weighted summation of a ac and a ob as a process . The entire case narrative text
na
(contains all parts of speech) is denoted as a all , and the word sequence that contains only verbs and nouns (verbs and nouns are not split) is represented as a ac&ob .
ur
To illustrate the effect of adding crime narrative information on improving the effectiveness of case linkage and demonstrate the superiority of the weighted sequence,
Jo
we compare the results of several attribute sets on each machine learning classification algorithm mentioned in Section 5.2. Each attribute set is described as follows:
A1 , attribute sets that only include 8 attributes of the crime feature (refer to Table 2). 24
A2 , attribute sets that include the verb sequence attribute a ac , A2 A1 a ac .
A3 , attribute sets that include the noun sequence attribute a ob , A3 A1 aob .
A4 , attribute sets that include the word sequence attribute a all , which contains all parts of speech, A4 A1 a all .
A5 , attribute sets that include the word sequence attribute a ac&ob and contain
ro of
only verbs and nouns (verbs and nouns are not split), A5 A1 a ac&ob . A6 , attribute sets that include the weighted sequence attribute of verbs and nouns
-p
a process , A6 A1 a process .
re
5.3 Evaluation criteria of results 5.3.1 Measurement of attributes’ separability
lP
Separability is a measure of the complexity of a dataset and can be used to compare the effects of attributes for classification[46]. To measure the separability of attributes
na
and the impact of case narrative information on the separability of an entire dataset, we
ur
employ the following methods as indicators of separability[47]. 5.3.1.1 Two-sample Kolmogorov-Smirnov (KS) test value
Jo
The KS value can be used to evaluate whether two samples are produced by the
same potential distribution; it can be defined as follows:
KS max | F0,n0 ( x) F1,n1 ( x) | x
(18)
where F0,n0 and F1,n1 are the empirical distribution functions of the two classes, and 25
n0 , n1 indicate their sample size. The KS value represents the largest vertical distance on the two sample distribution curves[48]. The range of the KS value is from 0 to 1. The larger is the KS value, the larger is the difference between the two classes, and the better is the separability of the attributes. 5.3.1.2 Non-overlapped domain (NOD) and nonoverlapped ratio (NOR) The non-overlapped domain (NOD) represents the geometrical scope of the
ro of
overlaps of two classes. NOD can be influenced by the outliers, so we use
nonoverlapped ratio (NOR) to show the proportion of this class samples in the overlap.
min(max( f c0 ), max( f c1 )) max(min( f c0 , min( f c1 )) max(max( f c0 ), max( f c1 )) min(min( f c0 , min( f c1 ))
re
NOD=1-
-p
They can be defined as follows:
(19)
R [max(min( f c0 , min( f c1 )), min(max( f c0 ), max( f c1 ))]
(20)
1, f j R, j {0,1} c sgn( fi j ) i 0, other
(21)
lP
c
c
na
NOR j 0 1
nj
i
c
sgn( fi j )
(22)
nj
c
ur
where max( f j ) and min( f j ) represent the maximum values and the minimum values of the feature f on class c j and for binary problem j =0 or 1. The function
Jo
sgn() is used to mark whether each sample point f i
cj
is in the field R . NOD
represents the percentage of the two classes’ nonoverlapping domain, and NOR represents the proportion of the sample points in the nonoverlapping region. The range of NOD and NOR is from 0 to 1. The larger are the values, the lower is the degree 26
of overlap between the two samples and the higher is the separability of the attributes. 5.3.1.3 Fisher’s discriminant ratio (FDR)
FDR is used to compare the difference between the mean and the variance of the two samples, as defined by Eq. (23).
FDR
( 0 1 ) 2 02 12
(23)
ro of
where 0 , 1 and 02 , 12 represent the mean and the variance, respect of the two classes, respectively. A larger FDR indicates that the sample is closer within a class
-p
and sparser between the classes, and the attribute has excellent separability.
5.3.2 Measurement of serial crime detection performance
re
After calculating the similarity of the pairwise case attributes, we consider the
lP
confusion matrix and some other evaluation indexes—accuracy, precision, recall, fmeasure, and logistic loss—to measure the performance of linking case. The confusion
na
matrix is shown in Table 3. True Positive (TP) refers to the pairwise cases that were correctly labeled as serial crime. True Negative (TN) indicates the pairwise cases that
ur
were correctly labeled as nonserial crime. False Negative (FN) represents the pairwise
Jo
cases that were not correctly labeled as serial crime. False Positive (FP) indicates the pairwise cases that were not correctly labeled as nonserial crime.
27
Table 3 Confusion matrix Predict
Confusion Matrix
Actual
Serial crimes Nonserial crimes Total
Serial crimes
Nonserial crimes
Total
TP FP P’
FN TN F’
P N P+N
calculated: Accuracy =(TP TN ) / ( P N )
ro of
According to the confusion matrix provided in Table 3, the following index can be
(24)
-p
Precision TP / (TP FP)
2 Precision Rrecall Precision Rrecall
lP
F1
re
Recall TP / (TP FN )
(25) (26)
(27)
In addition, the logistic loss can measure the quality of the classifier’s probability
na
estimate[49]. For the binary classification problem, the formula is shown in Eq. (28).
1 N ( yi log pi (1 yi ) log(1 pi )) i 1 N
(28)
ur
LogL
Jo
5.4 Experiment results and discussion In this subsection, two aspects are investigated. First, the action characteristics and
object characteristics are weighted to obtain a weighted sequence. We verify the superiority of the weighted sequence from the perspective of separability and crime linkage performance. Second, the performance of crime series detection is assessed 28
after adding the crime process.
5.4.1 Experiment 1: comparisons of the word sequence To evaluate the effect of selecting the action and object characteristics to characterize the crime process and the superiority of the weighting method, we validate the performance of five attribute sets ( A2 , A3 , A4 , A5 , A6 ) for crime series
ro of
detection and compared the separability of the five sequences ( a ac , a ob , a all , a ac&ob , and a process ).
As shown in Fig. 3, the attribute set A6 is generally superior to the attribute sets
-p
A2 and A3 in classification accuracy, precision, recall, F1. The weighted sequence
re
attribute is more accurate in characterizing the crime process and more capable in detecting serial crimes. In particular, A6 has a significant increase in recall based on
lP
ensuring precision. As shown in Fig. 5d, for the attribute a ac , the upper quartile, the
na
lower quartile and the median of the serial crimes’ similarity are the highest compared with other attributes, and its similarity values are concentrated in the upper quartile and
ur
form a shape with a wide top and narrow tail (Fig. 5c). However, its lower bound is too low (Fig. 5d), which is easily confused with nonserial cases. In Fig. 5c, the total level
Jo
of the serial crimes’ similarity on attribute a ob is low, and the total level of similarity is higher in nonserial crimes. However, the lower bound of the attribute a ob in the serial crimes’ similarity is higher than a ac (see Fig. 5d), which is superior to a verb in this respect. The attribute a process combines the advantages of both a ac and a ob , 29
which present high similarity and high lower bound features in serial crimes. As shown in Fig. 5c, the similarity of the attribute a process in the serial crimes is concentrated in the upper quartile, and its lower bound is the highest among the five attributes. In the nonserial crimes, its similarity is concentrated in the median (refer to Fig. 5a). Among
lP
re
-p
ro of
the three attributes a process , a ac and a ob , a process has the highest separability.
Fig. 3. The average results in 10-fold cross-validation of attribute sets A2, A3, and A6
na
As listed in Fig. 4, the experimental result of attribute set A4 is not superior to that of the attribute set A5, and the measure result of the attribute a ac&ob is superior to that
ur
of the attribute a all in Table 4. Attribute a ac&ob and a all have similar distributions in
Jo
Fig. 5. This analysis reveals that only focusing on verbs and nouns do not cause a loss of the key information of the case narrative. Conversely, this approach can eliminate noise information in the case narrative and grasp important characteristics of the crime process. From the results of the crime series detection in Fig. 4, the effect of attribute set A6 30
is better than that of A4 and A5. In Fig. 5c and Fig. 5d, a ac&ob and a all have low similarity levels in serial crimes, their lower bounds of similar values are not high, and the similar values are not concentrated, contributes to mistakes in detecting serial crimes. Attribute a process and a ac&ob both consider verbs and nouns only. The difference is that attribute a ac&ob does not separate verbs and nouns, and the two partof-speech words are combined. The better performance of attribute set A6 indicates that
ro of
verbs and nouns should be separated and shows the superiority of our weighting method. In Table 4, the separability of a process is the best among the four indicators, which
-p
explains why a process can perform well in crime linkage. In addition, classifiers can achieve a higher quality of probability estimate on the attribute set A6 (Table 5).
re
In general, the experimental results are in consistent with our ideas. The action
lP
characteristics and object characteristics in the crime process are more important and best reflect the main information of the crime process. In particular, the action
na
characteristics and object characteristics should be separated. To handle case narratives, we should focus more attention on the nouns and verbs that appear in the case narrative,
ur
calculate the similarity among the two part-of-speech sequences of the case, and weight
Jo
them to make the depiction of the crime process more accurate. After weighting the two sequences, the crime process can be more accurately characterized.
31
ro of
ur
na
lP
re
-p
Fig. 4. The average results in 10-fold cross-validation of attribute sets A4, A5, and A6
Jo
Fig. 5. Boxplot and violinplot for the attributes
Table 4
Comparisons of attributes’ separability (bold values indicate the best effect) KS NOD
aac 0.863 0.116
aob 0.785 0.160
aall 0.828 0.141 32
aac&ob 0.847 0.200
aprocess 0.871 0.262
NOR FDR
0.088 3.584
0.142 2.797
0.090 3.574
0.106 3.844
0.274 4.020
Table 5 The average results of logistic loss (LogL) in 10-fold cross-validation of attribute sets (bold values indicate the best effect) LR 0.00538 0.00399 0.00437 0.00401 0.00394 0.00382
SVM 0.00549 0.00407 0.00462 0.00430 0.00413 0.00405
KNN 0.05714 0.03602 0.03478 0.03354 0.03292 0.02795
NN 0.00538 0.00407 0.00444 0.00413 0.00408 0.00393
RF 0.01661 0.01091 0.01679 0.01249 0.01416 0.01180
ro of
Attribute set A1 A2 A3 A4 A5 A6
5.4.2 Experiment 2: the effect of adding crime process information
-p
According to Experiment 1, we already know that attribute a process can more
re
accurately describe the criminal’s crime process and then compare the difference between attribute sets A6 and A1. In this paper, we propose that integrating the crime
lP
process information can better characterize the offender’s M.O., which can improve the
Jo
ur
viewpoints.
na
effect of detecting serial crimes. This objective of the experiment is to verify our
33
ro of
Fig. 6. The average results in 10-fold cross-validation of attribute sets A1 and A6
As shown in Fig. 6, the performance of attribute set A1 is weaker than that of A6
-p
in the crime series detection, which proves the positive effect of the crime process in
re
crime linkage. After the attribute sets A1 and A6 are reduced to two dimensions by the
lP
PCA, as shown in Fig. 7, the samples of the nonserial crime are very dispersed and many serial crime samples are mixed in the nonserial crime samples. The sample of
na
nonserial crimes is very concentrated. The distance between the serial crime samples and the nonserial crime samples is generally large. Some serial crime samples are mixed
ur
in nonserial crime samples; however, the degree of confusion has been greatly reduced,
Jo
which shows that the separability of the sample set is improved after adding the crime process. Table 6
The crime features of two nonserial crimes (“--” represents the missing data) Case
C_Num
W_Disguise
W_Harm
C_Action
34
W_Property
W_Threat
Case1
1
--
Violence
--
Snatched
--
Case2
1
--
Violence
--
Snatched
--
Violence
hit
Asked for
Violence threat
Accused Case3
2
cheating Accused
Case4
6
Violence
cheating
Violence
hit
Asked for
threat/Speech threat
Table 7 The crime process of two nonserial crimes
ro of
Case Crime process Case1 Violence → snatched Case2 Snatched → Revolted → violence
Case3 Disguised → hit → snatched → threat and hit → asked for the property Case4 Disguised → threat → snatched → threat and hit → asked for the property
-p
To illustrate the pragmatic effect of adding crime process information, we selected
four typical cases in the case dataset for further analysis. Case 1 and Case 2 are two
re
cases of bag robbery, and Case 3 and Case 4 are two cases of the casino robbery. Case
lP
1 and Case 2 involved nonserial crimes and were correctly classified in attribute set A6 but were incorrectly classified into serial crimes in the attribute set A1. Case 3 and Case
na
4 involved similar situations.
We analyze the contribution of crime process information to the classification of
ur
serial crimes as follows: The crime features of Case 1 and Case 2 are identical (refer to
Jo
Table 6), which explains why they were incorrectly classified in the attribute set A1. However, their crime processes differed. As listed in Table 7, the criminal in Case 1 violently harmed the victim at the beginning of the crime, which indicated that the criminal was more aggressive. According to the calculation results, the crime process similarity between case 1 and case 2 is 0.26, which is lower than the average similarity 35
value of 0.67 of serial crimes. Adding the crime process can help identify the difference between the two criminals’ M.O.; thus, Case 1 and Case 2 can be correctly classified in attribute set A6. The crime features of case 3 and case 4 are not identical but are similar (refer to Table 6). The calculation result of their crime processes’ similarity is 0.29, which shows that some differences remain (refer to Table 7). The difference in the crime features does not guarantee that the two cases can be correctly classified, and the
ro of
difference in their crime processes have an additional role in the classification.
After adding the crime narrative information, the actual classification effect and
-p
the separability of the sample has been greatly improved, considering that the crime
process can improve the separability between serial crime samples and nonserial crime
ur
na
lP
re
samples and contribute the effect of crime series detection.
Jo
Fig. 7. The PCA’s results of attribute sets A1, A6
6 Conclusion
In this paper, we divide the modus operandi into the crime features and the crime process. Regarding the similarity among the crime features, we design similarity 36
measures according to the different characteristics of features, taking into account the frequencies of features, in particular. The crime process is obtained from the crime narrative. We regard the verbs and nouns in the crime narrative as the action characteristics and object characteristics of the crime process. The verb sequence and the noun sequence are treated as time series data, and their similarity is calculated by dynamic time warping. Two word sequences’ similarity are weighted by information
ro of
entropy as the similarity of the crime process.
The analysis shows that the crime process attributes have excellent separability. We
-p
apply several popular machine learning classification algorithms to verify the effect of
linking crimes after adding the crime process information. The results show that the
re
performance of crime series detection has been substantially improved, and the
lP
weighting method can more accurately characterize the crime process. In this paper, the crime features are extracted by hand and cannot be directly
na
obtained. In the future, we intend to apply natural language processing technology, i.e., entity extraction, to automatically obtain the features of modus operandi from crime
Jo
ur
narrative text, which will enable a greater degree of automation in crime series detection.
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be
37
Jo
ur
na
lP
re
-p
ro of
considered as potential competing interests:
38
Reference
[6]
[7]
[8]
[9]
Jo
ur
[10]
ro of
[5]
-p
[4]
re
[3]
lP
[2]
Z. Hua-Wei and S. Xiao-Ming, "The Application of Behavior Patterns Analyzing Techniques to Series of Cases," Journal of Poltical Science and Law, vol. 32, no. 5, pp. 64-71, 2015. X. Zhang and C. Chen, "A Brief Discussion on the Difficulties and Ways of Investigation in Robbery Cases," Journal of Zhejiang Public Security College, no. 06, pp. 108-110, 1999. Y. Zhang and C. Zhang, "Several hot issues in the identification of robbery crime," Law Science, no. 05, pp. 155-160, 2014. W. J. J. o. S. P. A. Chen, "Research on the Criminal Investigation System Based on Data Mining," Journal of ShanXi Police Academy, vol. 18, no. 4, pp. 71-75, 2010. T. Wang, C. Rudin, D. Wagner, and R. Sevieri, "Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores," Big Data, vol. 3, no. 1, pp. 321, Mar 2015.doi:https://doi.org/10.1089/big.2014.0021 J. Woodhams, R. Bull, and C. R. Hollin, "Case Linkage," in Criminal Profiling: International Theory, Research, and Practice, R. N. Kocsis, Ed. Totowa, NJ: Humana Press, 2007, pp. 117-133. doi:https://doi.org/10.1007/978-1-60327146-2_6 A. Borg, M. Boldt, N. Lavesson, U. Melander, and V. Boeva, "Detecting serial residential burglaries using clustering," Expert Systems with Applications, vol. 41, no. 11, pp. 5252-5266, 2014.doi:https://doi.org/10.1016/j.eswa.2014.02.035 M. Tonkin et al., "Using offender crime scene behavior to link stranger sexual assaults: A comparison of three statistical approaches," Journal of Criminal Justice, vol. 50, pp. 19-28, 2017.doi:https://doi.org/10.1016/j.jcrimjus.2017.04.002 W. Bernasco and P. Nieuwbeerta, "How Do Residential Burglars Select Target Areas?," The British Journal of Criminology, vol. 45, no. 3, pp. 296-315, 2005.doi:https://doi.org/10.1093/bjc/azh070 M. Tonkin, J. Woodhams, R. Bull, J. W. Bond, and E. J. Palmer, "Linking Different Types of Crime Using Geographical and Temporal Proximity," Criminal Justice and Behavior, vol. 38, no. 11, pp. 1069-1088, 2011/11/01 2011.doi:https://doi.org/10.1177/0093854811418599 D. Gee and A. Belofastov, "Profiling Sexual Fantasy," in Criminal Profiling: International Theory, Research, and Practice, R. N. Kocsis, Ed. Totowa, NJ: Humana Press, 2007, pp. 49-71. doi:https://doi.org/10.1007/978-1-60327-1462_3 A. Borg and M. Boldt, "Clustering Residential Burglaries Using Modus Operandi and Spatiotemporal Information," International Journal of Information Technology & Decision Making, vol. 15, no. 01, pp. 23-42,
na
[1]
[11]
[12]
39
[19]
[20]
[21] [22]
Jo
[23]
ro of
[18]
-p
[17]
re
[16]
lP
[15]
na
[14]
ur
[13]
2016.doi:https://doi.org/10.1142/s0219622015500339 R. R. Hazelwood and J. I. Warren, "Linkage analysis: modus operandi, ritual, and signature in serial sexual crime," Aggression and Violent Behavior, vol. 9, no. 3, pp. 307-318, 2004.doi:https://doi.org/10.1016/j.avb.2004.02.002 P. Han, D. Wang, Y. Liu, and X. Su, "Influence of part-of-speech on Chinese and English document clustering," Journal of Chinese Information Processing, vol. 27, no. 02, pp. 65-73, 2013. M. D. Porter, "A Statistical Approach to Crime Linkage," The American Statistician, vol. 70, no. 2, pp. 152-165, 2016.doi:https://doi.org/10.1080/00031305.2015.1123185 H. Chi, Z. Lin, H. Jin, B. Xu, and M. Qi, "A decision support system for detecting serial crimes," Knowledge-Based Systems, vol. 123, pp. 88-101, 2017.doi:https://doi.org/10.1016/j.knosys.2017.02.017 L. F. Alarid, V. S. Burton, and A. L. Hochstetler, "Group and solo robberies: Do accomplices shape criminal form?," Journal of Criminal Justice, vol. 37, no. 1, pp. 1-9, 2009.doi:https://doi.org/10.1016/j.jcrimjus.2008.12.001 A. Burrell, R. Bull, and J. Bond, "Linking Personal Robbery Offences Using Offender Behaviour," Journal of Investigative Psychology and Offender Profiling, vol. 9, no. 3, pp. 201-222, 2012.doi:https://doi.org/10.1002/jip.1365 J. Woodhams and K. Toye, "An empirical test of the assumptions of case linkage and offender profiling with serial commercial robberies," Psychology, Public Policy, and Law, vol. 13, no. 1, pp. 59-85, 2007.doi:https://doi.org/10.1037/1076-8971.13.1.59 L. E. Porter and L. J. Alison, "Behavioural coherence in group robbery: a circumplex model of offender and victim interactions," Aggressive Behavior, vol. 32, no. 4, pp. 330-342, 2006.doi:https://doi.org/10.1002/ab.20132 F. G. M. Prats, "Textual Analysis and Linking of Narratives (TALON)," in Systems & Information Engineering Design Symposium, 2005. X. Wang, D. E. Brown, and J. H. Conklin, "Crime Incident Association with Consideration of Narrative Information," in Systems & Information Engineering Design Symposium, 2007. S. Zhu and Y. Xie, "Crime Event Embedding with Unsupervised Feature Selection," arXiv e-prints, Accessed on: June 01, 2018Available: https://ui.adsabs.harvard.edu/\#abs/2018arXiv180606095Z. M. Helbich, J. Hagenauer, M. Leitner, and R. Edwards, "Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach," Cartography and Geographic Information Science, vol. 40, no. 4, pp. 326-336, 2013.doi:https://doi.org/10.1080/15230406.2013.779780 C.-H. Ku and G. Leroy, "A decision support system: Automated crime report analysis and classification for e-government," Government Information
[24]
[25]
40
[32]
[33] [34]
[35]
Jo
[36]
ro of
[31]
-p
[30]
re
[29]
lP
[28]
na
[27]
ur
[26]
Quarterly, vol. 31, no. 4, pp. 534-544, 2014.doi:https://doi.org/10.1016/j.giq.2014.08.003 D. Lei and T. Xu, "A Short Text Similarity Algorithm for Finding Similar Police 110 Incidents," in International Conference on Cloud Computing & Big Data, 2017. C. Ding, "The crime process: Leading on site investigation," Journal of Henan Public Security Academy, no. 06, pp. 140-141, 2008. D. Cornish, "The procedural analysis of offending and its relevance for situational prevention," Crime prevention studies, vol. 3, pp. 151-196, 1994. E. Beauregard, D. K. Rossmo, and J. Proulx, "A Descriptive Model of the Hunting Process of Serial Sex Offenders: A Rational Choice Perspective," Journal of Family Violence, vol. 22, no. 6, pp. 449-463, 2007.doi:https://doi.org/10.1007/s10896-007-9101-3 E. Beauregard, J. Proulx, K. Rossmo, B. Leclerc, and J.-F. Allaire, "Script Analysis of the Hunting Process of Serial Sex Offenders," Criminal Justice and Behavior, vol. 34, no. 8, pp. 1069-1084, 2007.doi:https://doi.org/10.1177/0093854807300851 X. Hu, H. Yao, and J. Tan, "Research on analysis method of crime proces," Journal of Hubei University of Police, vol. 24, no. 03, pp. 73-76, 2011. L. Ma, Y. Chen, and H. Hao, "AK-Modes: A weighted clustering algorithm for finding similar case subsets," in International Conference on Intelligent Systems & Knowledge Engineering, 2010. D. E. Brown and S. Hagen, "Data association methods with applications to law enforcement," Decision Support Systems, vol. 34, no. 3, pp. 369-378, 2003. S. Boriah, V. Chandola, and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation," in Proceedings of the 2008 SIAM International Conference on Data Mining, 2008, pp. 243-254. doi:https://doi.org/10.1137/1.9781611972788.22 C. Bennell, N. J. Jones, and T. Melnyk, "Addressing problems with traditional crime linking methods using receiver operating characteristic analysis," Legal and Criminological Psychology, vol. 14, no. 2, pp. 293-310, 2009.doi:https://doi.org/10.1348/135532508x349336 N. Pan et al., "Nonlinear tool traces fast tracing algorithm based on single point laser detection," Journal of Intelligent & Fuzzy Systems, pp. 1-12, 2018.doi:https://doi.org/10.3233/jifs-169885 X. Liu, Y. Zhou, and R. Zheng, "Sentence Similarity based on Dynamic Time Warping," presented at the International Conference on Semantic Computing (ICSC 2007), 2007. doi:https://doi.org/10.1109/icsc.2007.48 X. Zhu, D. Klabjan, and P. Bless, "Semantic Document Distance Measures and Unsupervised Document Revision Detection," arXiv e-prints, Accessed on: September 01, 2017Available:
[37]
[38]
41
[45]
[46]
[47]
[48]
Jo
[49]
ro of
[44]
-p
[43]
re
[42]
lP
[41]
na
[40]
ur
[39]
https://ui.adsabs.harvard.edu/\#abs/2017arXiv170901256Z. S. Lin and D. E. Brown, "An outlier-based data association method for linking criminal incidents," Decision Support Systems, vol. 41, no. 3, pp. 604-615, 2006.doi:https://doi.org/10.1016/j.dss.2004.06.005 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv e-prints, Accessed on: January 01, 2013Available: https://ui.adsabs.harvard.edu/\#abs/2013arXiv1301.3781M. H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43-49, 1978. C. E. Shannon, "A mathematical theory of communication," Bell Systems Technical Journal, vol. 27, no. 4, pp. 623-656, 1948. I. A. Rezek and S. J. Roberts, "Stochastic complexity measures for physiological signal analysis," IEEE transactions on bio-medical engineering, vol. 45, no. 9, pp. 1186-1191, 1998. K. H. Guo and L. I. Wen-Li, "Evidential Reasoning-Based Approach for Multiple Attribute Decision Making Problems under Uncertainty," Journal of Industrial Engineering and Engineering Management, vol. 26, no. 2, pp. 94100, 2012. D. H. Sun, W. N. Liu, and W. Song, "A Model with the Evaluation of the Equilibrium of Attribute Indexes Based on the Relative Entropy Measuring," Systems Engineering-theory Practice, 2001. J.-R. Cano, "Analysis of data complexity measures for classification," Expert Systems with Applications, vol. 40, no. 12, pp. 4820-4831, 2013.doi:https://doi.org/10.1016/j.eswa.2013.02.025 Z. Lin, "The Application of Separability Analysis in Feature Selection of the Serial Crime Linkage Problem," presented at the The 45th International Conference on Computers & Industrial Engineering, Metz / France, 2015. R. C. Spear and G. M. Hornberger, "Eutrophication in peel inlet—II. Identification of critical uncertainties via generalized sensitivity analysis," Water Research, vol. 14, no. 1, pp. 43-49, 1980. C. Ferri, J. Hernández-Orallo, and R. Modroiu, "An experimental comparison of performance measures for classification," Pattern Recognition Letters, vol. 30, no. 1, pp. 27-38, 2009.doi:https://doi.org/10.1016/j.patrec.2008.08.010
42
Mingliang QI obtained his Ph. D in Management Science from Chinese Academy of
Sciences in 2007. He is currently an associate professor at the Institutes of Science and
Development, Chinese Academy of Sciences. His research interests are emergency
management, public safety management, science and technology policy. He has
ro of
published several papers in peer reviewed journals and conferences.
Yusheng LI is a five-year successive master-doctor program student from 2017 at
Institutes of Science and Development, Chinese Academy of Science. His research
Mingliang QI
Jo
ur
na
lP
re
-p
interest is data mining.
43
ro of
-p
re
lP
na
ur
Jo Yusheng LI
44