An approach for understanding offender modus operandi to detect serial robbery crimes

An approach for understanding offender modus operandi to detect serial robbery crimes

Accepted Manuscript Title: An approach for understanding offender modus operandi to detect serial robbery crimes Authors: Yu-Sheng Li, Ming-Liang QI P...

4MB Sizes 2 Downloads 21 Views

Accepted Manuscript Title: An approach for understanding offender modus operandi to detect serial robbery crimes Authors: Yu-Sheng Li, Ming-Liang QI PII: DOI: Article Number:

S1877-7503(19)30341-2 https://doi.org/10.1016/j.jocs.2019.101024 101024

Reference:

JOCS 101024

To appear in: Received date: Revised date: Accepted date:

25 March 2019 12 July 2019 25 July 2019

Please cite this article as: Li Y-Sheng, QI M-Liang, An approach for understanding offender modus operandi to detect serial robbery crimes, Journal of Computational Science (2019), https://doi.org/10.1016/j.jocs.2019.101024 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

An approach for understanding offender modus operandi to detect serial robbery crimes

Yu-Sheng Li a,b, Ming-Liang QI a,b,* a. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China

ro of

b. School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China

re

-p

Highlights

The crime process is incorporated into the understanding of the criminals’ behaviours.



The crime features similarity calculation takes into account the frequency of each

lP



feature and a modified dynamic time warping (DTW) is used to measure the

na

similarity of the crime process.

A real-world robbery dataset is employed to measure the performance of finding



ur

serial crimes.

This approach can improve the efficiency of crime analysis and maintain the social



Jo

security.

*

Corresponding author at: Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China. E-mail addresses: [email protected] (M. Qi). 1

Abstract: Detecting serial crimes is one of the most challenging tasks in crime analysis. Linking crimes committed by the same criminal can improve the work efficiency of police offices and maintain public safety. Previous crime linkage studies have focused on the crime features of modus operandi (M.O.) but did not address the crime process. In this paper, we proposed an approach for detecting serial robbery crimes based on understanding offender M.O. by integrating crime process information.

ro of

According to the crime narrative text, a natural language processing method is used to extract the action and object characteristics of the crime process, a dynamic time

-p

warping method was introduced in the similarity measurement of these characteristics, and an information entropy method was used to weight the similarity of the action and

re

object characteristics to obtain the comprehensive similarity of criminals’ crime process.

lP

A real-world robbery dataset is employed to measure the performance of finding serial crimes after adding the crime process information. According to the results, information

na

about the crime process obtained from the case narrative text has significant separability and can better characterize better the offender’s M.O. Five machine learning algorithms

ur

are used to classify the case pairs and identify serial cases and nonserial cases. Based

Jo

on the crime features, the results show that the addition of crime process information can substantially improve the effect of detecting serial crimes. Keywords: Serial crimes, Crime linkage, Modus operandi, Natural Language

Processing, Dynamic time warping

2

1 Introduction Of China’s crimes of encroachment on property, serial crimes account for a considerable proportion[1]. As the most violent crime of encroaching on property[2], robbery seriously jeopardizes public security and has a relatively high frequency in practice[3]. Linking crimes can improve the efficiency of crime analysis and maintain social security. Therefore, China’s police officials are increasingly focusing their

ro of

attention on research and the development of algorithms to detect serial crimes[4].

The detection of serial crimes is one of the most important problems in crime

-p

analysis[5]. Detection is based on the similarity among offenders’ behaviors to identify crimes committed by the same suspects[6]. Studies have shown that a large proportion

re

of crimes are committed by a minority of offenders[7], e.g., some researchers have

lP

discovered that approximately 50% of crimes are committed by 6%-10% of criminals in the United States (US) and the United Kingdom (UK) [8].

na

Some forensic evidence can be used to link crimes, such as DNA, fingerprint, etc, but the availability of the evidence is limited[7]. Therefore, geospatial and temporal

ur

features, the offense target and criminal behavior are used as alternative information to

Jo

detect serial crimes. In the rational choice theory, the choice of criminals is to maximize rewards and reflects decision criteria that criminals use[9]. Previous studies found that both the geographical distance and temporal proximity achieved statistically significant levels of discrimination accuracy when differentiating between a wide variety of linked

3

and unlinked crimes[10]. Criminal behavior is more widely used in serial crime detection. Modus operandi (M.O.) refers to behaviors committed during an offense that serve to ensure its completion while also protecting the perpetrator’s identity and facilitating escape following the offense[11]. It characterizes criminals’ crime series, which is a key part of a researcher’s case linkage[12, 13]. Previous studies primarily identified the M.O. by extracting the key information from the crime description; however, they did not

ro of

consider the process of the crime. For instance, in case A, the criminal suspect snatched

the property and the victim rebelled, and then the criminal suspect pushed the victim to

-p

the ground and took the property. In case B, the criminal suspect pushed the victim to the ground and snatched the property, and then the victim rebelled and the criminal

re

suspect took the property. The criminal suspects in these two cases have the same crime

lP

feature. From the perspective of the crime process, however, the criminal suspect in case B is more aggressive than the criminal suspect in case A. Only considering the

na

crime feature will disregard some information about the M.O. This paper will study the addition of crime process information to the crime features to determine whether it can

ur

improve the performance of detecting serial robbery crimes.

Jo

The crime narrative contains the entire process of criminals committing crimes. Thus, the crime process can be regarded as the changing process of the key information in the crime narrative text during the time of the crime. Before comparing the similarities in the crime process, we need to identify the key information in the crime

4

narrative text. In practice, the criminal suspect’s action is an important characteristic of the crime process; thus, the verbs in the crime narrative text can be treated as action characteristics. In Chinese and English, nouns are the most important parts of speech that characterize the content of a text[14]. Thus, the characteristics of the crime process can be divided into two parts: the action characteristic, which is represented by the verbs in the crime narrative, and the object characteristic, which is represented by the

ro of

nouns in the crime narrative. We treat the key information in the crime narrative as time

series data and employ the dynamic time warping (DTW) method to measure its

-p

similarity. The contribution of this study is described as follows: First, we incorporate

information about the crime process into the understanding of the M.O. Second, the

re

crime features similarity calculation takes into account the frequency of each feature

lP

and a modified DTW is used to measure the similarity of the crime process. Third, several machine learning algorithms were used to demonstrate the performance of

na

adding process information to the detection of serial crimes. The remainder of this paper is organized as follows: Section 2 presents a brief

ur

summary and a review of relevant research. In Section 3, we describe the crime series

Jo

detection problems. Section 4 summarizes the methodology applied to detect serial robbery crimes, proposes several attribute similarity measures, including the use of DTW to calculate the similarity of the crime process and introduces several detection models. In Section 5, a real-world robbery crime dataset in a city of China is introduced,

5

and some evaluation criteria and the analysis of the results of the experiments are presented. Conclusions and future research are discussed in Section 6. 2 Literature review Crime series detection relies on three primary assumptions. The first assumption is that criminals consistently act, the second assumption is that criminals exhibit some

ro of

distinctiveness in their behaviors, and the third assumption is that important aspects of criminals’ behaviors can be observed, measured and accurately recorded[15]. Based on these three assumptions, researchers can extract the crime attributes according to the

-p

criminals’ M.O. and detect whether two cases are committed by the same offender

re

according to the similarity among the case attributes. Therefore, the quality of attributes extraction determines the quality of the final detection results.

lP

The M.O. is the main basis for criminal behavior. Researchers primarily extract attributes from the M.O. of robbery cases according to the following domains: the

na

action of criminals[16], weapons of criminals[17], ways of threat[18], ways of harm[19], and

ur

ways of disguise[20].

In addition to the crime features, some researchers have attempted to use crime

Jo

narrative information for crime linkage. They handled the crime narrative in two ways. The first way was to treat the crime narrative as one of the factors in identifying serial crimes. A few studies employed term frequency-inverse document frequency (TF-IDF) to calculate the similarity between two cases’ narrative texts and integrated other 6

information to crime linkage[21,

22]

. Zhu and Xie applied regularized restricted

Boltzmann machines (RBMs) to jointly capture time, location, and the complex freetext component of each crime, which can select the key features of crime to linking crimes without supervision[23]. The second way used the case narrative as the sole basis for identifying crime series, without considering other information. Helbich et al. applied the self-organizing map algorithm and its visualization capabilities to explore

ro of

the hidden relationships and information in a geographical context and investigative a

crime series[24]. Some researchers introduced natural language processing to calculate

-p

the similarity among crime reports to support crime series detection[25, 26]. These studies show that crime narrative information is effective for crime series detection.

re

Although researchers have focused on extracting the crime attribute from the

lP

perspective of crime features, the crime process is not considered. Studies have shown that traces and changes on the scene are inextricably linked to the crime process[27]. In

na

criminology, there are some studies detecting crime patterns based on crime processes. Cornish proposed the concept of crime script, which can organize our knowledge about

ur

how to understand and enact commonplace behavioral processes of crimes, it can be

Jo

divided up into scenes involving smaller units of action, or plans [28]. Crime scripts have been used to analyze the offense process of serial offenders. Beauregard et al used a rational choice theory approach to analyze the offense behavior of serial sex offenders and identified hunting process scripts in a sample of serial sex crimes by using

7

multivariate statistical methods and hierarchical cluster analysis[29, 30]. Analysis of the crime process can fully grasp the criminal activities of the perpetrators[31]. Thus, detecting serial crimes can integrate crime process information. The crime narrative text records the entire process of a case from the beginning to the end; thus, the crime process can be obtained from the narrative text. The overarching theme in previous crime linkage research is to find crimes with

ro of

similar attribute values. Studies have suggested that the similarity among serial cases’ attributes is higher than those in the nonserial case[32]. Many different similarity

-p

measures exist for different crime attribute types. Brown and Hagen measured the similarity among two numeric values based on the absolute distance[33], which is a

re

normalized operation of the absolute value of the difference. For a categorical

lP

attribution, the similarity is usually binarized and then binary similarity measures are applied[34]. Bennell et al. employed Jaccard’s coefficient to calculate the crime

na

similarity for a pair of crimes[35].

Previous studies paid more attention to extracting the crime attribute from the

ur

perspective of the criminal M.O. The obtained crime attributes are crime features but

Jo

the crime process is not considered. To further understand the offender’s M.O., this paper divides the M.O. into the crime features and the crime process. Information about the crime process is derived from the narrative text of the case, which simplifies the acquisition of the process information.

8

3 Problem description Table 1 lists some notations that are used in this section. The problem of serial case detection can be described as follows: We know the set of cases C ={ci | i  1, 2,..., n} , where ci denotes the i -th case in the crime set C, every two cases can form a pairwise case pij , ci , c j  C, i  j , and N  n  (n  1) / 2 pairwise cases exist. We extract the crime attributes from two parts: the crime features and the crime process. Each case ci

ro of

has m attributes on the crime features, and the attribute vector of the case ci is denoted by ai =(ai1 ,ai2 ,...,aik ,...,aim ) . According to the different attribute types, different

-p

similarity measures are selected to calculate the similarity of each case pair on each attribute. Let Sijk denote the similarity of the k-th crime features attribute of the pij

re

and Sijprocess denote the similarity of the process of pij . The similarity vector of the

lP

pairwise case pij is expressed by Sij' =(Sij1 ,Sij2 ,..., Sijm ,Sijprocess ) . We want to determine whether the two cases are committed by the same criminal. If pij was committed by

na

the same offender (or same gang), we refer to it as the serial case.

Jo

ur

Table 1 Notation description Notation Description n Number of cases Number of case pairs, N  n  (n  1) / 2 N Pairwise case consisting of the case i and j , where i, j {1, 2,..., n} , pij i j The feature attribute of crime, k [1, m] k

aik

The value of the attribute k of the case i

ai

The attribute vector of the case i , ai =(ai1 ,ai2 ,...,aim )

9

ak

The attribute k

a process

The crime process attribute

Sijk

Similarity value of the pairwise case pij in attribute k

Sijprocess

Similarity value of the pairwise case pij in the crime process attribute

Sij

Similarity Vector of the pairwise case pij , where Sij =(Sij1 ,Sij2 ,..., Sijm ) Similarity Vector of the pairwise case pij after adding the crime process attribute, where Sij' =(Sij1 ,Sij2 ,..., Sijm ,Sijprocess )

yij

ro of

S

' ij

Actual value of whether the case i and j belong to a series, yij {0,1} Predicted value of whether the case i and j belong to a series, yˆij {0,1}

-p

yˆ ij

re

4 Methodology

lP

In this paper, a crime process information integrated approach that understands an offender’s M.O. is proposed for detecting serial robbery crimes. The key steps are

na

provided in Fig. 1.

The first step is constructing robbery crime attributes from the two aspects of the

ur

crime features and the crime process. The attributes are primarily derived from the

Jo

property of the robbery crime and previous studies, and some attributes are mentioned in the literature review. The second step is measuring the similarity of each crime attribute of the case pairs.

Multiple attribute types will exist, and the similarity methods should be appropriate for

10

the corresponding type. The third step is applying machine learning classification algorithms to model the problem of crime series detection. Cases are gathered and processed in this step. The cases include serial crimes and nonserial crimes. Case processing involves extracting attribute values and representing them in the format of the problem description. An

Crime attributes

Attributes construct

Categorical attributes

Keyword attributes

Absolute distance

Jaccard s coefficient

Word2vec

Crime process attribute

-p

Numeric attributes

re

Attribute similarity measures

Crime features

ro of

extensive range of algorithms is applied.

Similarity measure based on DTW

information entropy

Crime feature similarity

na

Detection models

lP

applying algorithms Crime process similarity

ur

Crime series detection results

Jo

Fig. 1. The methodology applied to detect serial robbery crimes

4.1 Attribute similarity measurement The similarity calculation of the case attributes includes two aspects: the similarity

of the crime features and the similarity of the crime process. The crime feature attributes

11

may include three types of contents: numeric attributes, categorical attributes, and keyword attributes; the methods absolute distance, Jaccard’s coefficient, and word2vec, respectively, are used to compute the similarity for these three types of crime features attributes. The similarity calculation of the crime process can be regarded as matching the change in the key information in the crime narrative. Due to the variation in the

ro of

complexity of the case, the duration of the case differs and the length of the key information sequence in the case narrative text is inconsistent. DTW is based on the

-p

idea of dynamic programming and focuses on the optimal matching when the

conditions of the two sequence lengths are inconsistent [36]. Thus, DTW can solve the

re

matching problem of two cases’ crime processes. The process information is contained

lP

in the narrative text. Some researchers have applied DTW to text data and treat textual data as a sequence of words[37, 38]. Based on their thoughts, we design a similarity

na

measure to extract key characteristics of the crime process to calculate the process similarity. The attribute information and similarity measure are shown in Table 2.

ur

Table 2

Jo

Attribute information and similarity measures M.O.

Crime features

Name of attribute C_Num C_Tools W_Disguise W_Harm W_Property

Description Number of criminals Tools used by the criminal The way criminals disguise The way criminals harm victims The way criminals rob property 12

Measure Absolute Jaccard Jaccard Jaccard Jaccard

W_Threat W_Breaking C_Action C_process

Crime process

The way criminals threat victims The way criminals break through obstacles Actions taken by the criminals The crime process of criminals

Jaccard word2vec word2vec DTW

4.1.1 Similarity of the numeric attribute We use the absolute distance to measure the similarity between two numeric values. The absolute distance is shown as Eq. (1).

ro of

k k Simabk _ dist (aik , a kj )  1- | aik  a kj | /(amax  amin )

(1)

k k where amax denotes the maximum value of the attribute k and amin denotes the

re

4.1.2 Similarity of the categorical attribute

-p

minimum value.

For categorical attributes, Jaccard’s coefficient is used to calculate their similarity.

lP

Jaccard’s coefficient is calculated as | q | /(| q |  | r |  | s |) , where q represents the set

na

of features that are present in both cases, and r and s represent the set of features that are present in one case but absent in the other case. In the calculation of the

ur

traditional Jaccard’s coefficient, the importance of all values on the attribute is regarded

Jo

as identical. However, when a group of records has some common features and these features are outliers, these records are more likely to be generated by the same cause[39]. Therefore, different scores should be given according to the features’ frequencies; the lower is the occurrence probability of a feature, the higher is its importance. We calculate the score of each feature  kf by Eq. (2). Let n kf denote the number of cases 13

with the feature f on the attribute k and n denote the total number of cases (refer to Table 1). The larger is the number of times that a feature appears, the more common is the feature, and the lower is the score. In addition, when the attribute value is missing, k is NULL. We denote the score of NULL as NULL , and we

then the feature f

calculate it according to Eq. (2).

 kf  1  nkf / n

(2)

ro of d ( xq , zv )

zv

z1

-p

Word Sequence2

zV

xq

x1

xQ

re

Word Sequence1

lP

Fig. 2 A warping path using DTW to calculate the distance of two word sequences We calculate the similarity of the pairwise case on the attribute k , as shown in Eq. (3).

ur

na

(  kf ) / (| q |  | r |  | s |)  f q  k k k k SimJaccard (ai , a j )   NULL , aik  a kj  NULL (3) k k  0, ai  NULL or a j  NULL  

Jo

In Eq. (3), the numerator in the traditional Jaccard’s coefficient calculation formula becomes the sum of the scores of the identical features,  kf belongs to 0 to 1 and

 f q

k f

is actually equal to | q |  (nkf / n) . If the value is missing on the attribute k , f q

k the similarity of the case pair on this attribute is NULL (calculated according to Eq.

14

(2)). When one of the two cases is missing on attribute k but the other case is not missing, the similarity is set to 0.

4.1.3 Similarity of the keyword attribute Some of the crime features are derived from the keywords in the crime narrative text. Only by understanding the actual meaning of the words, the comparison results of

ro of

the words are credible. In this paper, we use word2vec to learn word embeddings [40]. Based on the word embeddings, we use word vectors to calculate the similarity of words w1 and w2 , which is denoted as W 2V ( w1 , w2 ) . The similarity calculation

-p

method on attribute k in the pairwise case pij is shown in Eq. (4).

(4)

lP

re

max(W 2V (aik, w , a kj ,w )), aik, w  aik , a kj , w  a kj  w k k k k Simword NULL , aik  a kj  NULL 2 vec ( ai , a j )    0, aik  NULL or a kj  NULL 

In Eq. (4), when a case i has multiple keywords on the attribute k , each keyword

na

is recorded as aik, w , where aik, w  aik . The similarity of the pairwise case pij on attribute k is the value with the highest similarity of words for all keywords, which

ur

means that the maximum value is taken for the similarity of the keywords. When the

Jo

two cases are missing on the attribute k , the similarity of the case pair on this attribute k is NULL , which is calculated in the same way as Eq. (2). When one of the two cases is

missing on attribute k but the other case is not missing, the similarity is set to 0.

15

4.1.4 Similarity of the crime process attribute The action characteristics and the object characteristics constitute the crime process; they can be regarded as sequence data composed of a series of words that are obtained from the crime narrative text. In this paper, we design a method based on the symmetric form of the standard DTW to calculate the similarity in the crime process of two crimes.

ro of

This similarity measure is divided into two stages. The first stage is to calculate the similarity between the action and the object characteristics of the case pair, and the second stage is to weight the two sequences as the crime process similarity.

DTW[41].

Assume

the

two

-p

First, we calculate the similarity between two word sequences based on the word

sequences

X ={x1 ,..., xq ,..., xQ }

and

re

Z={z1 ,...,zv ,...,zV } , as shown in Fig. 2. A Q -by- V grid is constructed, and the

lP

( xq , zv ) elements in each grid represent the distance d ( xq , zv ) between the word xq

and the word zv . The warping path is not unique but must start at the comparison of

na

the first word of two word sequences and end at the comparison of the last word. If the warping path passes through T grids, it can be defined as W ; thus, we have

ur

W  ( x1 , z1 )1 ,...,( xq , zv )t ,...,( xQ , zV )T

Jo

The goal of optimization is to obtain the smallest cumulative distance from ( x1 , z1 )

to ( xQ , zV ) . The distance can be defined as Dist (refer to Eq. (5)). T

min Dist ( X , Z )   d ( xq , zv )t , 1  q  Q,1  v  V W

(5)

t 1

The cumulative distance from ( x1 , z1 )1 to ( xq , zv )t 16

is calculated by the

cumulative distance function, as shown in Eq. (6). The cumulative distance of step k is represented by the sum of the cumulative distance of step t  1 and the minimum of ( xq +1 , zv ) , ( xq +1 , zv +1 ) , and ( xq , zv +1 ) ; that is, each step of a warping path can only be

extended along one of the three grids. xq +1 and zv +1 indicates the next action and object characteristic in the crime process. D(t )=D(t 1)+min{d ( xq +1, zv ), d ( xq +1, zv +1 ), d ( xq , zv +1 )}

(6)

ro of

In this paper, we use the same word vector as section 4.1.3 to measure words’ similarities. The distance between words xq and zv in the two word sequences X

-p

and Z is calculated by Eq. (7). d ( xq , zv )  1  W 2V ( xq , zv )

(7)

re

The calculation result Dist represents the distance between two word sequences.

lP

The similarity between the two word sequences can be measured by Eq. (8).

 (X , Z )=1-Dist (X , Z ) T

(8)

na

where T is the length of the warping path. In order to make the range of the distance of two crimes’ crime process from 0 to 1, we divide Dist by the warping path length.

ur

Because we use the similarity to detect serial crimes, we subtract the obtained distance

Jo

from 1. The range of the similarity  (X , Z ) is 0 to 1. Algorithm 1. The similarity measure for word sequence Input: X ={x1 ,..., xq ,..., xQ } , Z={z1 ,...,zv ,...,zV } Output:  (X , Z )

17

1: D(1)=d ( x1 , z1 ) 2: For q in range (2, Q) do 3: For v in range (2, V) do 4:

D(t )=D(t 1)+min{d ( xq +1, zv ), d ( xq +1, zv +1 ), d ( xq , zv +1 )}

5: Calculate Dist ( X , Z ) , T 6: End for 7: End for 8: Return:  (X , Z )

ro of

Algorithm 1 describes how to measure the similarity between two word sequences, then we calculate the weight of each sequence. Assume that narrative texts for case i

and j exist; we can compute their action characteristics’ similarities Sijac and object

-p

characteristics’ similarities Sijob , which are denoted as Sijseq , seq {ac, ob} . The

re

similarity of the crime process is obtained by the weighted summation of the two characteristics’ similarities. The similarity value of each characteristic in the crime

lP

process can be regarded as the predicted value of the linking crimes. The deviation between the similarity value and the actual value yij of the pairwise case is considered

na

to be the error of detecting serial crimes, such as the deviation between the action

ur

characteristic similarity Sijac and the actual value yij is | Sijac  yij | . When determining the weight of the action characteristic and the object characteristic, we consider two

Jo

factors, one is the level of the error, and the other is the stability of the error. We want to increase the weight of the characteristic with a low error level and small error variation. A lower level of error indicates that the characteristic is more capable of detecting serial crimes. The smaller is the error variation, the more robust and stable is

18

seq the characteristic in linking crimes. The error vector Sdeviation of the characteristic

similarity is composed of the deviation d ij of all characteristic similarity values from the actual value yij . The calculation is expressed as follows: seq Sdeviation  {dijseq | dijseq | Sijseq  yij |, i  j, i, j {1, 2,..., n}}

(9)

where dijseq denote the deviation of the action characteristic and object characteristic of the pairwise case pij . When calculating the error level, we get the error vector of

ro of

ac each characteristic according to Eq. 9. In Eq. 10 and Eq. 11, Dlevel denotes the sum of

obj the action characteristic’s error and Dlevel denotes the sum of the object characteristic’s

-p

error, then we calculate the ratio of each error vector’s sum in the sum of the two

characteristics’ error. The larger is the error level of the error vector, the smaller is the

re

ac corresponding sequence weight. The weights of the two characteristics are level and

lP

ob . level

ac Dlevel ac ac , where Dlevel = Sdeviatio n ac obj Dlevel  Dlevel

na

ac level  1

ob level  1

ob Dlevel ob ob , where Dlevel = Sdeviatio n ac ob Dlevel  Dlevel

(10)

(11)

ur

We utilize the information entropy[42] method to calculate the error variation.

Jo

Entropy is the measure of a system’s uncertainty in information theory; the larger is the entropy, the greater is the uncertainty, and vice versa[43,

44]

. Some studies use

information entropy to quantitatively describe the equilibrium of the decision attributes[45]. If each value in the sequence is equal, then the entropy of the sequence is

19

largest. When each value in the sequence differs greatly, the entropy of the sequence is small. For the error vector, if the entropy of the error vector is small, then the variation of error is large. It means that the error of the characteristic is very inconsistent and varies widely, some of them have large error values and some have small error values. It means that the error vector has bad stability, the weight given to the corresponding

ijseq 

dijseq

d

seq ij

, ci , c j  C , i  j

i, j

ro of

characteristic should be smaller. First, the error vector needs to be normalized (Eq. 12).

(12)

1 ln(| S



seq deviation

|) ci ,c j C ,i  j

ijseq lnijseq

(13)

re

seq Euncertainty 

-p

Second, the entropy of each error vector needs to be obtained, as shown in Eq. (13):

lP

seq seq where 1/ ln(| Sdeviation |) makes Euncertainty [0,1] . According to the meaning of entropy,

the error vector with a larger entropy value indicates that the variation in the error vector

na

is smaller and the stability is better, and the weight of the corresponding sequence

ur

should be larger, as shown in Eq. (14).





ac Euncertainty

ac ob Euncertainty  Euncertainty

,

ob uncertainty



ob Euncertainty ac ob Euncertainty  Euncertainty

(14)

Jo

ac uncertainty

After calculating the weights of the action characteristic and the object

characteristic similarity in terms of the error level and error variation, the corresponding weights of the two characteristics are calculated according to Eq. (15) and (16). We assign a higher weight to the characteristic with the lower error level and the smaller 20

error variation. ac ac level  uncertainty   ac ac ob ob level  uncertainty  level  uncertainty

(15)

ob ob level  uncertainty   ac ac ob ob level  uncertainty  level  uncertainty

(16)

ac

ob

Last, the similarity sequence of the crime process is as expressed as follows: Sijprocess   ac  Sijac   ob  Sijob

ro of

(17)

4.2 Detection models

To create detection models, we selected five machine learning algorithms that are

-p

extensively employed for classification problems and are suitable for our problem. In

re

our experiments, the average results of the 10-fold cross-validation are regarded as the classification performance. In the selection of algorithm parameters, we make some

lP

adjustments based on the default parameters of the scikit-learn machine learning library.



na

The algorithms are briefly described as follows: Logistic Regression (LR): This paper applies a binomial logistic regression

Support Vector Machine (SVM): The support vector machine takes each sample

Jo



ur

model. In our experiments, the l2 penalty is used.

as a point in space, and the goal is to obtain a hyperplane to separate the samples. In this paper, we use a linear kernel.



K-Nearest Neighbor (KNN): KNN obtains the k instances closest to the

21

instance in the training data set. The majority of the k instances belong to a class, and the input instance is divided into this class. In our experiment, k =1 , which is the distance between the two points, uses the Minkowski distance. 

Neural Network (NN): Neural networks can simulate the interaction of biological neural networks with real-world objects and classifies samples by training the

ro of

parameters of network connections and neuron thresholds. Our model is a network with 100 hidden layer nodes, and the learning rate is 0.001. 

Random forest (RF): The random forest is a classifier that contains multiple

-p

decision trees and introduces random attribute selection in the decision tree

re

training process. In our experiment, the number of trees = 100.

lP

5 Case studies

In this section, we employ the proposed serial crimes detection method to

na

experiment on real-world case data and introduce some evaluation criteria. The experimental results are analyzed and discussed.

ur

5.1 Data sets

Jo

5.1.1 Wikipedia dumps

We apply the CBOW model to train word embeddings on the Wikipedia Chinese

corpus dumps. The corpus contains 6.98 GB (XML type) articles. We reduce the noise of textual data by removing the empty lines and stopwords. The vector dimensionality

22

of word embedding is 400.

5.1.2 Case description The case dataset was derived from the judicial document published on the OpenLaw website, and the crime features and process of each case were manually obtained. These crimes were real solved robbery cases in Zhengzhou City, Henan

ro of

Province, China. These incidents occurred from January 2013 to February 2018. The dataset includes a total of 334 cases, which were committed by 279 criminals; 248 cases were committed by a single criminal, not a serial crime. The remaining 86

-p

cases were serial crimes, which were committed by 31 criminals, each of which

re

committed a maximum of 8 crimes and a minimum of 2 crimes (refer to Table 3). In serial crimes, 18 criminals committed 2 cases; 10 criminals committed 3 cases; 1

lP

criminal committed 4 cases; 2 criminals committed 8 cases. Therefore, the entire dataset

na

can construct 55,611 case pairs, of which 110 pairwise cases were committed by the same offender, and 55,501 pairwise cases were committed by different offenders.

ur

In this paper, we obtain the crime attribute values from judicial documents. From the perspective of the victim or eyewitness, we aim to reorganize the crime narrative

Jo

text based on the judicial document, and the reorganized text is employed as the cases’ crime processes. The similarity among the pairwise cases of each attribute is calculated using the similarity functions proposed in Section 4, and the similarity values fall between 0 and 23

1. The similarity vector Sij =(Sij1 ,Sij2 ,..., Sijm ) of a single pairwise case is used to indicate the attributes’ similarity of this pairwise case. The number of pairwise cases is denoted as N ; for each attribute, the similarity of the pairwise cases can form a N  N similarity matrix M 

N N

. Each case has m attributes; thus, m similarity matrices

will be formed.

ro of

5.2 Attribute sets We divide the crime process of the case narrative into action characteristics and

object characteristics, which correspond to the verb sequence and the noun sequence in

-p

the narrative text. The two sequences are weighted by the information entropy. We

re

denote the attribute, which only considers the similarity among the action (or object) characteristics as the similarity of the crime process as a ac (or a ob ), and denote the

lP

weighted summation of a ac and a ob as a process . The entire case narrative text

na

(contains all parts of speech) is denoted as a all , and the word sequence that contains only verbs and nouns (verbs and nouns are not split) is represented as a ac&ob .

ur

To illustrate the effect of adding crime narrative information on improving the effectiveness of case linkage and demonstrate the superiority of the weighted sequence,

Jo

we compare the results of several attribute sets on each machine learning classification algorithm mentioned in Section 5.2. Each attribute set is described as follows: 

A1 , attribute sets that only include 8 attributes of the crime feature (refer to Table 2). 24



A2 , attribute sets that include the verb sequence attribute a ac , A2  A1 a ac .



A3 , attribute sets that include the noun sequence attribute a ob , A3  A1 aob .



A4 , attribute sets that include the word sequence attribute a all , which contains all parts of speech, A4  A1 a all .



A5 , attribute sets that include the word sequence attribute a ac&ob and contain



ro of

only verbs and nouns (verbs and nouns are not split), A5  A1 a ac&ob . A6 , attribute sets that include the weighted sequence attribute of verbs and nouns

-p

a process , A6  A1 a process .

re

5.3 Evaluation criteria of results 5.3.1 Measurement of attributes’ separability

lP

Separability is a measure of the complexity of a dataset and can be used to compare the effects of attributes for classification[46]. To measure the separability of attributes

na

and the impact of case narrative information on the separability of an entire dataset, we

ur

employ the following methods as indicators of separability[47]. 5.3.1.1 Two-sample Kolmogorov-Smirnov (KS) test value

Jo

The KS value can be used to evaluate whether two samples are produced by the

same potential distribution; it can be defined as follows:

KS  max | F0,n0 ( x)  F1,n1 ( x) | x

(18)

where F0,n0 and F1,n1 are the empirical distribution functions of the two classes, and 25

n0 , n1 indicate their sample size. The KS value represents the largest vertical distance on the two sample distribution curves[48]. The range of the KS value is from 0 to 1. The larger is the KS value, the larger is the difference between the two classes, and the better is the separability of the attributes. 5.3.1.2 Non-overlapped domain (NOD) and nonoverlapped ratio (NOR) The non-overlapped domain (NOD) represents the geometrical scope of the

ro of

overlaps of two classes. NOD can be influenced by the outliers, so we use

nonoverlapped ratio (NOR) to show the proportion of this class samples in the overlap.

min(max( f c0 ), max( f c1 ))  max(min( f c0 , min( f c1 )) max(max( f c0 ), max( f c1 ))  min(min( f c0 , min( f c1 ))

re

NOD=1-

-p

They can be defined as follows:

(19)

R  [max(min( f c0 , min( f c1 )), min(max( f c0 ), max( f c1 ))]

(20)

1, f j  R, j {0,1}  c sgn( fi j )   i 0, other  

(21)

lP

c

c

na

NOR   j 0 1



nj

i

c

sgn( fi j )

(22)

nj

c

ur

where max( f j ) and min( f j ) represent the maximum values and the minimum values of the feature f on class c j and for binary problem j =0 or 1. The function

Jo

sgn() is used to mark whether each sample point f i

cj

is in the field R . NOD

represents the percentage of the two classes’ nonoverlapping domain, and NOR represents the proportion of the sample points in the nonoverlapping region. The range of NOD and NOR is from 0 to 1. The larger are the values, the lower is the degree 26

of overlap between the two samples and the higher is the separability of the attributes. 5.3.1.3 Fisher’s discriminant ratio (FDR)

FDR is used to compare the difference between the mean and the variance of the two samples, as defined by Eq. (23).

FDR 

( 0  1 ) 2  02   12

(23)

ro of

where 0 , 1 and  02 ,  12 represent the mean and the variance, respect of the two classes, respectively. A larger FDR indicates that the sample is closer within a class

-p

and sparser between the classes, and the attribute has excellent separability.

5.3.2 Measurement of serial crime detection performance

re

After calculating the similarity of the pairwise case attributes, we consider the

lP

confusion matrix and some other evaluation indexes—accuracy, precision, recall, fmeasure, and logistic loss—to measure the performance of linking case. The confusion

na

matrix is shown in Table 3. True Positive (TP) refers to the pairwise cases that were correctly labeled as serial crime. True Negative (TN) indicates the pairwise cases that

ur

were correctly labeled as nonserial crime. False Negative (FN) represents the pairwise

Jo

cases that were not correctly labeled as serial crime. False Positive (FP) indicates the pairwise cases that were not correctly labeled as nonserial crime.

27

Table 3 Confusion matrix Predict

Confusion Matrix

Actual

Serial crimes Nonserial crimes Total

Serial crimes

Nonserial crimes

Total

TP FP P’

FN TN F’

P N P+N

calculated: Accuracy =(TP  TN ) / ( P  N )

ro of

According to the confusion matrix provided in Table 3, the following index can be

(24)

-p

Precision  TP / (TP  FP)

2  Precision  Rrecall Precision  Rrecall

lP

F1 

re

Recall  TP / (TP  FN )

(25) (26)

(27)

In addition, the logistic loss can measure the quality of the classifier’s probability

na

estimate[49]. For the binary classification problem, the formula is shown in Eq. (28).

1 N ( yi log pi  (1  yi ) log(1  pi ))  i 1 N

(28)

ur

LogL  

Jo

5.4 Experiment results and discussion In this subsection, two aspects are investigated. First, the action characteristics and

object characteristics are weighted to obtain a weighted sequence. We verify the superiority of the weighted sequence from the perspective of separability and crime linkage performance. Second, the performance of crime series detection is assessed 28

after adding the crime process.

5.4.1 Experiment 1: comparisons of the word sequence To evaluate the effect of selecting the action and object characteristics to characterize the crime process and the superiority of the weighting method, we validate the performance of five attribute sets ( A2 , A3 , A4 , A5 , A6 ) for crime series

ro of

detection and compared the separability of the five sequences ( a ac , a ob , a all , a ac&ob , and a process ).

As shown in Fig. 3, the attribute set A6 is generally superior to the attribute sets

-p

A2 and A3 in classification accuracy, precision, recall, F1. The weighted sequence

re

attribute is more accurate in characterizing the crime process and more capable in detecting serial crimes. In particular, A6 has a significant increase in recall based on

lP

ensuring precision. As shown in Fig. 5d, for the attribute a ac , the upper quartile, the

na

lower quartile and the median of the serial crimes’ similarity are the highest compared with other attributes, and its similarity values are concentrated in the upper quartile and

ur

form a shape with a wide top and narrow tail (Fig. 5c). However, its lower bound is too low (Fig. 5d), which is easily confused with nonserial cases. In Fig. 5c, the total level

Jo

of the serial crimes’ similarity on attribute a ob is low, and the total level of similarity is higher in nonserial crimes. However, the lower bound of the attribute a ob in the serial crimes’ similarity is higher than a ac (see Fig. 5d), which is superior to a verb in this respect. The attribute a process combines the advantages of both a ac and a ob , 29

which present high similarity and high lower bound features in serial crimes. As shown in Fig. 5c, the similarity of the attribute a process in the serial crimes is concentrated in the upper quartile, and its lower bound is the highest among the five attributes. In the nonserial crimes, its similarity is concentrated in the median (refer to Fig. 5a). Among

lP

re

-p

ro of

the three attributes a process , a ac and a ob , a process has the highest separability.

Fig. 3. The average results in 10-fold cross-validation of attribute sets A2, A3, and A6

na

As listed in Fig. 4, the experimental result of attribute set A4 is not superior to that of the attribute set A5, and the measure result of the attribute a ac&ob is superior to that

ur

of the attribute a all in Table 4. Attribute a ac&ob and a all have similar distributions in

Jo

Fig. 5. This analysis reveals that only focusing on verbs and nouns do not cause a loss of the key information of the case narrative. Conversely, this approach can eliminate noise information in the case narrative and grasp important characteristics of the crime process. From the results of the crime series detection in Fig. 4, the effect of attribute set A6 30

is better than that of A4 and A5. In Fig. 5c and Fig. 5d, a ac&ob and a all have low similarity levels in serial crimes, their lower bounds of similar values are not high, and the similar values are not concentrated, contributes to mistakes in detecting serial crimes. Attribute a process and a ac&ob both consider verbs and nouns only. The difference is that attribute a ac&ob does not separate verbs and nouns, and the two partof-speech words are combined. The better performance of attribute set A6 indicates that

ro of

verbs and nouns should be separated and shows the superiority of our weighting method. In Table 4, the separability of a process is the best among the four indicators, which

-p

explains why a process can perform well in crime linkage. In addition, classifiers can achieve a higher quality of probability estimate on the attribute set A6 (Table 5).

re

In general, the experimental results are in consistent with our ideas. The action

lP

characteristics and object characteristics in the crime process are more important and best reflect the main information of the crime process. In particular, the action

na

characteristics and object characteristics should be separated. To handle case narratives, we should focus more attention on the nouns and verbs that appear in the case narrative,

ur

calculate the similarity among the two part-of-speech sequences of the case, and weight

Jo

them to make the depiction of the crime process more accurate. After weighting the two sequences, the crime process can be more accurately characterized.

31

ro of

ur

na

lP

re

-p

Fig. 4. The average results in 10-fold cross-validation of attribute sets A4, A5, and A6

Jo

Fig. 5. Boxplot and violinplot for the attributes

Table 4

Comparisons of attributes’ separability (bold values indicate the best effect) KS NOD

aac 0.863 0.116

aob 0.785 0.160

aall 0.828 0.141 32

aac&ob 0.847 0.200

aprocess 0.871 0.262

NOR FDR

0.088 3.584

0.142 2.797

0.090 3.574

0.106 3.844

0.274 4.020

Table 5 The average results of logistic loss (LogL) in 10-fold cross-validation of attribute sets (bold values indicate the best effect) LR 0.00538 0.00399 0.00437 0.00401 0.00394 0.00382

SVM 0.00549 0.00407 0.00462 0.00430 0.00413 0.00405

KNN 0.05714 0.03602 0.03478 0.03354 0.03292 0.02795

NN 0.00538 0.00407 0.00444 0.00413 0.00408 0.00393

RF 0.01661 0.01091 0.01679 0.01249 0.01416 0.01180

ro of

Attribute set A1 A2 A3 A4 A5 A6

5.4.2 Experiment 2: the effect of adding crime process information

-p

According to Experiment 1, we already know that attribute a process can more

re

accurately describe the criminal’s crime process and then compare the difference between attribute sets A6 and A1. In this paper, we propose that integrating the crime

lP

process information can better characterize the offender’s M.O., which can improve the

Jo

ur

viewpoints.

na

effect of detecting serial crimes. This objective of the experiment is to verify our

33

ro of

Fig. 6. The average results in 10-fold cross-validation of attribute sets A1 and A6

As shown in Fig. 6, the performance of attribute set A1 is weaker than that of A6

-p

in the crime series detection, which proves the positive effect of the crime process in

re

crime linkage. After the attribute sets A1 and A6 are reduced to two dimensions by the

lP

PCA, as shown in Fig. 7, the samples of the nonserial crime are very dispersed and many serial crime samples are mixed in the nonserial crime samples. The sample of

na

nonserial crimes is very concentrated. The distance between the serial crime samples and the nonserial crime samples is generally large. Some serial crime samples are mixed

ur

in nonserial crime samples; however, the degree of confusion has been greatly reduced,

Jo

which shows that the separability of the sample set is improved after adding the crime process. Table 6

The crime features of two nonserial crimes (“--” represents the missing data) Case

C_Num

W_Disguise

W_Harm

C_Action

34

W_Property

W_Threat

Case1

1

--

Violence

--

Snatched

--

Case2

1

--

Violence

--

Snatched

--

Violence

hit

Asked for

Violence threat

Accused Case3

2

cheating Accused

Case4

6

Violence

cheating

Violence

hit

Asked for

threat/Speech threat

Table 7 The crime process of two nonserial crimes

ro of

Case Crime process Case1 Violence → snatched Case2 Snatched → Revolted → violence

Case3 Disguised → hit → snatched → threat and hit → asked for the property Case4 Disguised → threat → snatched → threat and hit → asked for the property

-p

To illustrate the pragmatic effect of adding crime process information, we selected

four typical cases in the case dataset for further analysis. Case 1 and Case 2 are two

re

cases of bag robbery, and Case 3 and Case 4 are two cases of the casino robbery. Case

lP

1 and Case 2 involved nonserial crimes and were correctly classified in attribute set A6 but were incorrectly classified into serial crimes in the attribute set A1. Case 3 and Case

na

4 involved similar situations.

We analyze the contribution of crime process information to the classification of

ur

serial crimes as follows: The crime features of Case 1 and Case 2 are identical (refer to

Jo

Table 6), which explains why they were incorrectly classified in the attribute set A1. However, their crime processes differed. As listed in Table 7, the criminal in Case 1 violently harmed the victim at the beginning of the crime, which indicated that the criminal was more aggressive. According to the calculation results, the crime process similarity between case 1 and case 2 is 0.26, which is lower than the average similarity 35

value of 0.67 of serial crimes. Adding the crime process can help identify the difference between the two criminals’ M.O.; thus, Case 1 and Case 2 can be correctly classified in attribute set A6. The crime features of case 3 and case 4 are not identical but are similar (refer to Table 6). The calculation result of their crime processes’ similarity is 0.29, which shows that some differences remain (refer to Table 7). The difference in the crime features does not guarantee that the two cases can be correctly classified, and the

ro of

difference in their crime processes have an additional role in the classification.

After adding the crime narrative information, the actual classification effect and

-p

the separability of the sample has been greatly improved, considering that the crime

process can improve the separability between serial crime samples and nonserial crime

ur

na

lP

re

samples and contribute the effect of crime series detection.

Jo

Fig. 7. The PCA’s results of attribute sets A1, A6

6 Conclusion

In this paper, we divide the modus operandi into the crime features and the crime process. Regarding the similarity among the crime features, we design similarity 36

measures according to the different characteristics of features, taking into account the frequencies of features, in particular. The crime process is obtained from the crime narrative. We regard the verbs and nouns in the crime narrative as the action characteristics and object characteristics of the crime process. The verb sequence and the noun sequence are treated as time series data, and their similarity is calculated by dynamic time warping. Two word sequences’ similarity are weighted by information

ro of

entropy as the similarity of the crime process.

The analysis shows that the crime process attributes have excellent separability. We

-p

apply several popular machine learning classification algorithms to verify the effect of

linking crimes after adding the crime process information. The results show that the

re

performance of crime series detection has been substantially improved, and the

lP

weighting method can more accurately characterize the crime process. In this paper, the crime features are extracted by hand and cannot be directly

na

obtained. In the future, we intend to apply natural language processing technology, i.e., entity extraction, to automatically obtain the features of modus operandi from crime

Jo

ur

narrative text, which will enable a greater degree of automation in crime series detection.

Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be

37

Jo

ur

na

lP

re

-p

ro of

considered as potential competing interests:

38

Reference

[6]

[7]

[8]

[9]

Jo

ur

[10]

ro of

[5]

-p

[4]

re

[3]

lP

[2]

Z. Hua-Wei and S. Xiao-Ming, "The Application of Behavior Patterns Analyzing Techniques to Series of Cases," Journal of Poltical Science and Law, vol. 32, no. 5, pp. 64-71, 2015. X. Zhang and C. Chen, "A Brief Discussion on the Difficulties and Ways of Investigation in Robbery Cases," Journal of Zhejiang Public Security College, no. 06, pp. 108-110, 1999. Y. Zhang and C. Zhang, "Several hot issues in the identification of robbery crime," Law Science, no. 05, pp. 155-160, 2014. W. J. J. o. S. P. A. Chen, "Research on the Criminal Investigation System Based on Data Mining," Journal of ShanXi Police Academy, vol. 18, no. 4, pp. 71-75, 2010. T. Wang, C. Rudin, D. Wagner, and R. Sevieri, "Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores," Big Data, vol. 3, no. 1, pp. 321, Mar 2015.doi:https://doi.org/10.1089/big.2014.0021 J. Woodhams, R. Bull, and C. R. Hollin, "Case Linkage," in Criminal Profiling: International Theory, Research, and Practice, R. N. Kocsis, Ed. Totowa, NJ: Humana Press, 2007, pp. 117-133. doi:https://doi.org/10.1007/978-1-60327146-2_6 A. Borg, M. Boldt, N. Lavesson, U. Melander, and V. Boeva, "Detecting serial residential burglaries using clustering," Expert Systems with Applications, vol. 41, no. 11, pp. 5252-5266, 2014.doi:https://doi.org/10.1016/j.eswa.2014.02.035 M. Tonkin et al., "Using offender crime scene behavior to link stranger sexual assaults: A comparison of three statistical approaches," Journal of Criminal Justice, vol. 50, pp. 19-28, 2017.doi:https://doi.org/10.1016/j.jcrimjus.2017.04.002 W. Bernasco and P. Nieuwbeerta, "How Do Residential Burglars Select Target Areas?," The British Journal of Criminology, vol. 45, no. 3, pp. 296-315, 2005.doi:https://doi.org/10.1093/bjc/azh070 M. Tonkin, J. Woodhams, R. Bull, J. W. Bond, and E. J. Palmer, "Linking Different Types of Crime Using Geographical and Temporal Proximity," Criminal Justice and Behavior, vol. 38, no. 11, pp. 1069-1088, 2011/11/01 2011.doi:https://doi.org/10.1177/0093854811418599 D. Gee and A. Belofastov, "Profiling Sexual Fantasy," in Criminal Profiling: International Theory, Research, and Practice, R. N. Kocsis, Ed. Totowa, NJ: Humana Press, 2007, pp. 49-71. doi:https://doi.org/10.1007/978-1-60327-1462_3 A. Borg and M. Boldt, "Clustering Residential Burglaries Using Modus Operandi and Spatiotemporal Information," International Journal of Information Technology & Decision Making, vol. 15, no. 01, pp. 23-42,

na

[1]

[11]

[12]

39

[19]

[20]

[21] [22]

Jo

[23]

ro of

[18]

-p

[17]

re

[16]

lP

[15]

na

[14]

ur

[13]

2016.doi:https://doi.org/10.1142/s0219622015500339 R. R. Hazelwood and J. I. Warren, "Linkage analysis: modus operandi, ritual, and signature in serial sexual crime," Aggression and Violent Behavior, vol. 9, no. 3, pp. 307-318, 2004.doi:https://doi.org/10.1016/j.avb.2004.02.002 P. Han, D. Wang, Y. Liu, and X. Su, "Influence of part-of-speech on Chinese and English document clustering," Journal of Chinese Information Processing, vol. 27, no. 02, pp. 65-73, 2013. M. D. Porter, "A Statistical Approach to Crime Linkage," The American Statistician, vol. 70, no. 2, pp. 152-165, 2016.doi:https://doi.org/10.1080/00031305.2015.1123185 H. Chi, Z. Lin, H. Jin, B. Xu, and M. Qi, "A decision support system for detecting serial crimes," Knowledge-Based Systems, vol. 123, pp. 88-101, 2017.doi:https://doi.org/10.1016/j.knosys.2017.02.017 L. F. Alarid, V. S. Burton, and A. L. Hochstetler, "Group and solo robberies: Do accomplices shape criminal form?," Journal of Criminal Justice, vol. 37, no. 1, pp. 1-9, 2009.doi:https://doi.org/10.1016/j.jcrimjus.2008.12.001 A. Burrell, R. Bull, and J. Bond, "Linking Personal Robbery Offences Using Offender Behaviour," Journal of Investigative Psychology and Offender Profiling, vol. 9, no. 3, pp. 201-222, 2012.doi:https://doi.org/10.1002/jip.1365 J. Woodhams and K. Toye, "An empirical test of the assumptions of case linkage and offender profiling with serial commercial robberies," Psychology, Public Policy, and Law, vol. 13, no. 1, pp. 59-85, 2007.doi:https://doi.org/10.1037/1076-8971.13.1.59 L. E. Porter and L. J. Alison, "Behavioural coherence in group robbery: a circumplex model of offender and victim interactions," Aggressive Behavior, vol. 32, no. 4, pp. 330-342, 2006.doi:https://doi.org/10.1002/ab.20132 F. G. M. Prats, "Textual Analysis and Linking of Narratives (TALON)," in Systems & Information Engineering Design Symposium, 2005. X. Wang, D. E. Brown, and J. H. Conklin, "Crime Incident Association with Consideration of Narrative Information," in Systems & Information Engineering Design Symposium, 2007. S. Zhu and Y. Xie, "Crime Event Embedding with Unsupervised Feature Selection," arXiv e-prints, Accessed on: June 01, 2018Available: https://ui.adsabs.harvard.edu/\#abs/2018arXiv180606095Z. M. Helbich, J. Hagenauer, M. Leitner, and R. Edwards, "Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach," Cartography and Geographic Information Science, vol. 40, no. 4, pp. 326-336, 2013.doi:https://doi.org/10.1080/15230406.2013.779780 C.-H. Ku and G. Leroy, "A decision support system: Automated crime report analysis and classification for e-government," Government Information

[24]

[25]

40

[32]

[33] [34]

[35]

Jo

[36]

ro of

[31]

-p

[30]

re

[29]

lP

[28]

na

[27]

ur

[26]

Quarterly, vol. 31, no. 4, pp. 534-544, 2014.doi:https://doi.org/10.1016/j.giq.2014.08.003 D. Lei and T. Xu, "A Short Text Similarity Algorithm for Finding Similar Police 110 Incidents," in International Conference on Cloud Computing & Big Data, 2017. C. Ding, "The crime process: Leading on site investigation," Journal of Henan Public Security Academy, no. 06, pp. 140-141, 2008. D. Cornish, "The procedural analysis of offending and its relevance for situational prevention," Crime prevention studies, vol. 3, pp. 151-196, 1994. E. Beauregard, D. K. Rossmo, and J. Proulx, "A Descriptive Model of the Hunting Process of Serial Sex Offenders: A Rational Choice Perspective," Journal of Family Violence, vol. 22, no. 6, pp. 449-463, 2007.doi:https://doi.org/10.1007/s10896-007-9101-3 E. Beauregard, J. Proulx, K. Rossmo, B. Leclerc, and J.-F. Allaire, "Script Analysis of the Hunting Process of Serial Sex Offenders," Criminal Justice and Behavior, vol. 34, no. 8, pp. 1069-1084, 2007.doi:https://doi.org/10.1177/0093854807300851 X. Hu, H. Yao, and J. Tan, "Research on analysis method of crime proces," Journal of Hubei University of Police, vol. 24, no. 03, pp. 73-76, 2011. L. Ma, Y. Chen, and H. Hao, "AK-Modes: A weighted clustering algorithm for finding similar case subsets," in International Conference on Intelligent Systems & Knowledge Engineering, 2010. D. E. Brown and S. Hagen, "Data association methods with applications to law enforcement," Decision Support Systems, vol. 34, no. 3, pp. 369-378, 2003. S. Boriah, V. Chandola, and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation," in Proceedings of the 2008 SIAM International Conference on Data Mining, 2008, pp. 243-254. doi:https://doi.org/10.1137/1.9781611972788.22 C. Bennell, N. J. Jones, and T. Melnyk, "Addressing problems with traditional crime linking methods using receiver operating characteristic analysis," Legal and Criminological Psychology, vol. 14, no. 2, pp. 293-310, 2009.doi:https://doi.org/10.1348/135532508x349336 N. Pan et al., "Nonlinear tool traces fast tracing algorithm based on single point laser detection," Journal of Intelligent & Fuzzy Systems, pp. 1-12, 2018.doi:https://doi.org/10.3233/jifs-169885 X. Liu, Y. Zhou, and R. Zheng, "Sentence Similarity based on Dynamic Time Warping," presented at the International Conference on Semantic Computing (ICSC 2007), 2007. doi:https://doi.org/10.1109/icsc.2007.48 X. Zhu, D. Klabjan, and P. Bless, "Semantic Document Distance Measures and Unsupervised Document Revision Detection," arXiv e-prints, Accessed on: September 01, 2017Available:

[37]

[38]

41

[45]

[46]

[47]

[48]

Jo

[49]

ro of

[44]

-p

[43]

re

[42]

lP

[41]

na

[40]

ur

[39]

https://ui.adsabs.harvard.edu/\#abs/2017arXiv170901256Z. S. Lin and D. E. Brown, "An outlier-based data association method for linking criminal incidents," Decision Support Systems, vol. 41, no. 3, pp. 604-615, 2006.doi:https://doi.org/10.1016/j.dss.2004.06.005 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv e-prints, Accessed on: January 01, 2013Available: https://ui.adsabs.harvard.edu/\#abs/2013arXiv1301.3781M. H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43-49, 1978. C. E. Shannon, "A mathematical theory of communication," Bell Systems Technical Journal, vol. 27, no. 4, pp. 623-656, 1948. I. A. Rezek and S. J. Roberts, "Stochastic complexity measures for physiological signal analysis," IEEE transactions on bio-medical engineering, vol. 45, no. 9, pp. 1186-1191, 1998. K. H. Guo and L. I. Wen-Li, "Evidential Reasoning-Based Approach for Multiple Attribute Decision Making Problems under Uncertainty," Journal of Industrial Engineering and Engineering Management, vol. 26, no. 2, pp. 94100, 2012. D. H. Sun, W. N. Liu, and W. Song, "A Model with the Evaluation of the Equilibrium of Attribute Indexes Based on the Relative Entropy Measuring," Systems Engineering-theory Practice, 2001. J.-R. Cano, "Analysis of data complexity measures for classification," Expert Systems with Applications, vol. 40, no. 12, pp. 4820-4831, 2013.doi:https://doi.org/10.1016/j.eswa.2013.02.025 Z. Lin, "The Application of Separability Analysis in Feature Selection of the Serial Crime Linkage Problem," presented at the The 45th International Conference on Computers & Industrial Engineering, Metz / France, 2015. R. C. Spear and G. M. Hornberger, "Eutrophication in peel inlet—II. Identification of critical uncertainties via generalized sensitivity analysis," Water Research, vol. 14, no. 1, pp. 43-49, 1980. C. Ferri, J. Hernández-Orallo, and R. Modroiu, "An experimental comparison of performance measures for classification," Pattern Recognition Letters, vol. 30, no. 1, pp. 27-38, 2009.doi:https://doi.org/10.1016/j.patrec.2008.08.010

42

Mingliang QI obtained his Ph. D in Management Science from Chinese Academy of

Sciences in 2007. He is currently an associate professor at the Institutes of Science and

Development, Chinese Academy of Sciences. His research interests are emergency

management, public safety management, science and technology policy. He has

ro of

published several papers in peer reviewed journals and conferences.

Yusheng LI is a five-year successive master-doctor program student from 2017 at

Institutes of Science and Development, Chinese Academy of Science. His research

Mingliang QI

Jo

ur

na

lP

re

-p

interest is data mining.

43

ro of

-p

re

lP

na

ur

Jo Yusheng LI

44