Keyword-based approach for recognizing fraudulent messages by keystroke dynamics

Keyword-based approach for recognizing fraudulent messages by keystroke dynamics

Pattern Recognition 98 (2020) 107067 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/patcog ...

2MB Sizes 0 Downloads 28 Views

Pattern Recognition 98 (2020) 107067

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/patcog

Keyword-based approach for recognizing fraudulent messages by keystroke dynamics Cheng-Jung Tsai a,∗, Po-Hao Huang b a

Graduate Institute of Statistics and Information Science, National Changhua University of Education, Jin-De Campus, No. 1, Jin-De Road, Chang-Hua 500, Taiwan, R.O.C. b Department of Mathematics, National Changhua University of Education, Taiwan, R.O.C.

a r t i c l e

i n f o

Article history: Received 23 July 2019 Revised 9 August 2019 Accepted 23 September 2019 Available online 24 September 2019 Keywords: Biometrics Internet fraud Instant messaging Keystroke dynamics Free text

a b s t r a c t In recent years, many approaches that use keystroke dynamics in free text authentication have been proposed. The major drawback of the proposed approaches is that training generally requires several months, thereby resulting in low practicality. In this study, a method to detect U.S. English fraudulent messages by analyzing keyboard users’ keystroke dynamics is proposed. To the best of our knowledge, this is the first study to apply keystroke dynamics to detect fraudulent instant messages. In the proposed system, each user requires only approximately 20 min of training in U.S. English keystroke dynamics. Furthermore, a voting-based statistical classifier is presented to improve the recognition accuracy of instant messages and prevent phishing messages. Experimental results indicate that the proposed approach outperforms other relevant published methods in terms of shorter training time, fewer false alarms, and comparable recognition accuracy. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction With the prevalence of Skype, Line, WeChat, and other instant messaging (IM) software, frauds involving IM applications have increased significantly. For example, users clicking on an unknown website displayed in certain IM software, causing usernames and passwords to be stolen and used by scam gangs to defraud the user’s friends or families. In another typical scam, a scam gang steals a user’s username and password from the IM software and, subsequently, sends messages to the user’s friends by requesting the purchase of game points. Fig. 1 depicts a flowchart of such a scam. In 2018, the Internet Crime Complaint Center issued an announcement stating that Internet fraud had continued to increase. According to the FBI’s 2018 Internet Crime Report, the global exposure to Internet fraud increased by 136%, and the total amount of losses over the world is approximately 12.5 billion USD in the duration from December 2016 to May 2018. To strengthen the security of usernames and passwords, researchers have recently applied different biometric techniques relating to iris [13], retina [37], fingerprint [39], face [27], palm print [32], voice [18], online signature [33], or keystroke dynamics [24,28] as login-identification mechanisms. However, biometricsidentification technologies for retina, fingerprint, and face require



Corresponding author. E-mail address: [email protected] (C.-J. Tsai).

https://doi.org/10.1016/j.patcog.2019.107067 0031-3203/© 2019 Elsevier Ltd. All rights reserved.

additional hardware costs. Because a keyboard, a typical computer peripheral, which does not require any changes in computer-use habits, most recent studies have focused on keystroke dynamics as a method of increasing user account security [1,16,20]. In recent years, the method of keystroke dynamics has been applied in the research of smartphones [9,25], touch devices [36], and emotion recognition [23]. However, traditional keystroke-dynamicsverification methods are not applicable to detect fraudulent messages in IM software. Recently, free text studies have been performed. A major problem in the proposed free text therein was that users were required to undergo several months of training in keystroke dynamics. For example, in the study of Gunetti and Picardi [15], approximately one month of keystroke-dynamics training was required, while Teh et al. [35] required four months of training. Messerman et al.’s study [26] required up to 12 months of training. According to our investigation in our previously performed keystrokedynamics-related experiments [8,9,36], we observed that the users felt bored and were unwilling to use an authentication system if the training time was greater than 30 min. Furthermore, our investigation demonstrated that approximately 90% of the users were unwilling to use an authentication system if the training time was longer than 30 min, even if such a system could protect them from fraudulent messages. Furthermore, owing to the current keystrokedynamics authentication (KDA) systems, which do not reach 100% recognition accuracy, these systems will produce excessively fre-

2

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

Fig. 2. Four most typical keystroke dynamics.

Fig. 1. Flowchart example of a typical Internet fraud.

quent false alarms for the free-text authentication of IM users, thereby reducing the users’ willingness to use the authentication system. For instance, suppose that the accuracy rate of a KDA system is 90% and that a user inputs 100 characters per min on average. If the KDA system monitors every character, a recognition error will be generated for every ten characters on average; i.e., every 6 s, the KDA system will generate a false alarm. Hence, a prediction mechanism for keystroke dynamics is proposed herein. Our method can significantly reduce the training time required for users during the enrollment phase. In the conducted experiments, our approach can yield high accuracy even when most users spend only approximately 20 min on keystrokedynamics training. Furthermore, to solve the problem of excessively frequent error warnings, we use a keyword-based identification mechanism. Our KDA system implements authentication only when a user enters keywords that are associated with fraud. The contributions of this study can be summarized as follows. (1) This is the first study that focuses on applying keystroke dynamics to detect fraudulent instant messages. (2) An effective keystroke-dynamics-prediction method, a keyword-based identification mechanism, and a voting-based statistical (VBS) classifier are proposed. (3) Our KDA system requires only a short training time in keyboard dynamics. (4) The number of participants in this study is 300, adhering to the suggested experimental threshold of 100 in Giot et al. [14]. (5) Finally, our approach can be applied widely. It is not limited for use in only IM authentication, but it can be applied for other applications such as email identification, etc. This paper is divided into five sections. In Section 2, the literature related to long texts and keystroke dynamics is reviewed and discussed in detail. Section 3 describes the proposed KDA system, including the enrollment phase, prediction mechanism, VBS classifier, and keyword-based authentication mechanism. Subsequently, the experimental analysis and comparison are presented in Section 4. Section 5 presents the conclusions and suggestions for future research directions. 2. Related works 2.1. Keystroke dynamics The most typical method of user-identity authentication is to use usernames and passwords. Nonetheless, owing to the lack of knowledge about antivirus and anti-hacking software, a user’s password can be easily stolen or cracked by hackers. To enhance

the security of authentication systems, researchers have proposed KDA approaches. When a user enters a password to a KDA system, the system measures the period of time from when a specific key is pressed to when the key is released as a method to differentiate between legitimate and imposters. As each person exhibits his/her own unique typing pattern, even if a user’s password has been stolen or cracked, an imposter cannot easily intrude the KDA system. The concept of KDA emerged during the World War II, i.e., when U.S. ships distinguished each sending telegraph operator on the basis of specific keystroke-dynamics rhythms. In recent years, methods related to keystroke-dynamics verification have primarily used four keystroke-time values, namely, press–release (PR) time, press–press (PP) time, release–press (RP)time, and release–release (RR) time, as the characteristic properties of fixed-length usernames and passwords; subsequently, nearest neighbor [17], statistics [7], artificial neural network [2], or the sorting method [15] were used to distinguish between legitimate users and imposters. Suppose that keyi denotes a key i in a standard keyboard. The definitions of PR, PP, RP, and RR are as follows:  PRi : time duration between the pressing and releasing of the same keyi .  RPij : time interval between releasing keyi and then pressing keyj .  PPij : time interval between pressing keyi and then pressing keyj .  RRij : time interval between releasing keyi and then releasing keyj . For example, supposed that a user entered “ABCD”; the four standard keystroke dynamics are depicted in Fig. 2. Furthermore, Teh et al. described the usages of PR, PP, RP, and RR, and analyzed any permutations and combinations of the four keystroke dynamics. Their experimental results indicated that the optimal combination was PR and RP [35]. Some researchers have proposed applying other features also such as typing rate and frequency of word errors [38] to increase the prediction accuracy of KDA systems. Another important topic for KDA systems is their performance evaluation. Three most typically considered performance-evaluation criteria are false rejection rate (FRR), false acceptance rate (FAR), and equal error rate (EER) [16,22,28]. The most recently proposed KDA systems adopt the EER to evaluate and verify their performances. 2.2. Long-text identification using keystroke dynamics The problem of long-text identification can be divided into two categories, namely, fixed long-text identification, and free-text identification. Initially, keystroke dynamics was applied only to enhance the security of fixed long texts, such as usernames and passwords. Researchers of fixed long texts generally ask users to input the same content both in the enrollment and authentication phases. On the contrary, the methods related to free text claim that users can input any content in the authentication phase. Since

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

1980, many KDA systems that focus on the identification of fixed long texts have been proposed; however, these methods were inapplicable to free texts. Monrose and Rubin first studied the issue of free text [30]. They involved 42 participants and designed relevant experiments that lasted for seven weeks. In their reported experiments, their training samples included fixed long texts and free texts. In addition, each participant used his/her own computer to participate in all the relevant experiments. Within the authentication phase, Monrose and Rubin applied the RP and PR criteria and a Euclidean-distance-based classifier. Furthermore, they analyzed various weightings between RP and PR. The experimental results indicated that their method could yield a 10% EER in identifying fixed long texts, and an extremely high EER (i.e., 77%) in identifying free texts. Dowland et al. [11] used the mean and standard deviation of keystroke dynamics to identify free texts. However, the number of participants in their experiments was only five, and the experimental result contained approximately 40% EER. Meanwhile, Gunetti and Picardi [15] proposed using the speed and the corresponding order of keystroke dynamics, such as PR or RP, to identify free texts. Each entered key of the training samples and authentication data was sorted on the basis of keystroke duration. According to the different ranks between the training samples and authentication data, the sum of the ranking difference was calculated. If the sum was less than the default threshold, the user could be identified as legitimate. In contrast, if the sum was greater than a specific critical value, the user would be identified as an impostor. They observed 40 participants as legitimate users and 165 participants as impostors; each legitimate user provided 15 training samples and each impostor provided one authentication sample. In their experiment, the contents of keyworddynamics training and the authentication samples included approximately 800 words, and the average training time was approximately 1–2 months, with a 15% EER. Using the mean and standard deviation of keystroke dynamics, Villani et al. [38] proposed a statistical classifier to identify long texts. The training samples were divided into four data categories, namely, fixed long text of desktops, free text of desktops, fixed long text of laptops, and free text of laptops. They involved 118 participants, and each participant provided at least two different data categories, of which 36 participants provided four data categories. According to the 52 participants for the data category of fixed long texts of desktops, their approach yielded 0.8% EER, while 3.6% EER for the data category of free texts of desktops was obtained from 40 participants. Their experimental results indicated that keystroke dynamics could accurately identify free text if sufficient enrollment samples were available. Compared with the approaches that use the four typical keystroke-dynamics-performance criteria, i.e., PR, PP, RP, and RR, Park et al. [31] proposed four new keystroke dynamics modes, namely, left hand, right hand, space, and delete, to model the keystroke dynamics of users and identify whether they were legitimate. With 35 participants, the reported result yielded 8.9% EER. Samura and Nishimura [34] described how to efficiently identify free text in the Japanese language by utilizing the unique properties of consonants and vowels in the Japanese language. They used a classifier based on the Euclidean distance to analyze various weightings between PR, RP, RR, and PP. Their experiments involved 112 participants classified into three groups according to the participants’ typing speed, with a reported result of 5.3% EER. Davoudi and Kabir [10] selected 21 participants from 40 participants as legitimate users and subsequently divided them into several classes, according to the participants’ typing speed. Using the sorting approach proposed by Gunetti and Picardi [15], Davodi and Kabir [10] proposed a new classifier to identify free text and performed a series of experiments. On combining both the sorting ap-

3

proach and their classifier, they obtained only five false rejections and nine false receptions. To enhance the accuracy of free-text identification, Messerman et al. [26] improved the Euclidean-distance formula used for designing the classifier. They involved 55 participants, and each participant used his/her own computer to participate in all the related experiments, while the participants’ collection time of training data was approximately one year. Subsequently, Messerman et al. [26] removed the outliers in the training data and performed a series of experiments, and their approach yielded a 2% EER. Ahmed and Traore [2] used a feedforward neural networks based approach to resolve the inconsistency in two freely typed keystroke-dynamics datasets and achieved a 2.13% EER. However, as stated by Kim et al. [22], the aforementioned approach was impractical as too many parameters required optimization. Alsultan et al. [5] proposed a key-layout-based method to authenticate free texts in the Arabic language. Furthermore, Alsultan et al. [4] proposed two level fusions, namely, feature-level fusion and decisionlevel fusion, to improve the performance of free-text authentication. However, Alsultan et al. had recently reported the FAR and FRR instead of the typically used EER in their two studies. Kim et al. [22] proposed a user-adaptive feature-extraction method and used five types of classifiers, namely, Gaussian density estimation, Parzen window density estimation, k-nearest neighbor (k-NN), super vector machine (SVM), and k-means clustering, to improve the performance of keystroke-dynamics-based authentication for free texts. Compared with the traditional fixed feature-extraction method, their method improved the authentication performance by 45.3% for the Korean language and 39.0% for the English language. Kang and Cho [21] explored the expendability of KDA for long and free texts by using diverse input devices, namely, a standard 104-key keyboard, a soft keyboard, and a touch keyboard. Kang [19] reported that typing in different languages and familiarity levels in languages could affect keystroke-dynamics-based authentication for free texts. Mondal and Bours [29] applied a user’s freely typed keystroke dynamics and also the user’s mouse dynamics into a continuous authentication problem. Alpar [3] used the Fourier transform to generate the frequency spectrograms of keystroke dynamics to authenticate users. 3. Methodology To develop a KDA system that can effectively and efficiently identify fraudulent IM instances, and solves the problem of long training during the enrollment phase, we first propose a method to predict all the possible keystroke dynamics of a legitimate user from the limited keystroke-dynamics data gathered during the enrolment phase. We refer to the predicted data as “predicted keystroke dynamics (PKD).” Furthermore, because fraudulent messages usually contain typical keywords such as “money” and “remit,” to solve the problem of overly frequent false alarms, we propose a KDA system that employs keyword-based recognition. Finally, we propose a VBS classifier to further improve the efficiency of our KDA system. 3.1. System architecture Fig. 3 depicts the architecture of the IM KDA system proposed herein. When two users are using IM software such as Line, Skype, or Facebook, both the users’ computers must install our KDA system. Subsequently, both the users must input the training samples while being during the enrollment phase; the process of collecting keystroke dynamics is detailed in Section 3.2. According to a user’s training samples, which contain only a part of his/her all possible keystroke dynamics when he/she entered within the IM software, our proposed system will predict this user’s complete keystroke

4

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

Fig. 3. Architecture of our proposed KDA system.

dynamics and further establish the PKD for each legitimate user. This prediction mechanism is described in detail in Section 3.3. Whenever two users use the IM software, their KDA systems must have each other’s PKD profiles, which can be accomplished by downloading the required PKD from a remote database. When a message containing one of the fraudulent keywords is passed on, the KDA system determines whether the message was sent by a legitimate user, on the basis of the PKD of the sender. If the results state that the sender is illegitimate, the KDA system will send a warning message to the recipient. Finally, Section 3.4 describes our proposed VBS classifier used in the authentication phase. 3.2. Enrollment phase Following the findings of Teh et al. [35], we applied the PR and RP criteria gathered during the enrollment phase, coupled with our own proposed keystroke-dynamics-prediction method to generate a unique PKD for each legitimate user. In regard to the number of training samples used during the enrollment phase, based on the results of Araújo et al. [6], Giot et al. [14], and Gunetti and Picardi [15], a legitimate user will be required to input an article of approximately 200 words thrice as training samples in our proposed KDA system. Assuming that a legitimate user entered three samples (i.e., I1, I2, I3), subsequently, four training profiles {I1, I2}, {I2, I3}, {I1, I3}, and {I1, I2, I3} will be granted for the specific user to calculate the FAR, FRR, and EER during the authentication phase. Fig. 4 displays the interface of our proposed KDA system wherein the users must enter basic information, such as gender, age, computer-use habits, and typically used IM software by them. 3.3. Construction of PKD profile The training samples in this paper consist of approximately 200 U.S. English words, i.e., approximately 10 0 0 characters. Using the training data obtained during the enrollment phase, our

proposed KDA system can obtain the PR time of every Englishlanguage alphabetical letter and some RP times of letter combinations. Because our KDA system uses statistical information such as the mean and standard deviation of users’ keystroke dynamics as defined in Eqs. (1)–(4) in the authentication phase, the proposed mechanism will record the means and standard deviations of the available RPs into two 26 × 26 matrices, namely μRP[] and σ RP[], to predict the means and standard deviations of unobtainable RPs during the enrollment phase. On the basis of the user’s training samples obtained during the enrollment phase, Eqs. (1) and (2) are used to calculate the mean and standard deviation of keystroke dynamics PRi , where i represents a letter key keyi on a standard keyboard, while Eqs. (3) and (4) are used to calculate the mean and standard deviation of keystroke dynamics RPij . In Eqs. (1)–(4), i, j ∈ {a ∼ z ∪ A ∼ Z}, and n is the total number of keystroke dynamics PRi or RPij . No data are recorded if the user presses a non-letter key, for example, numeric keys or symbol keys.

μPRi =

1 P Ri , n

σPRi =

1  |PRi − μPRi |, n−1

i ∈ {a ∼ z ∪ A ∼ Z }.

μRPi j =

1 RPi j , n

σRPi j =

1  |RPi j − μRPi j |, n−1

i ∈ {a ∼ z ∪ A ∼ Z }.

i, j ∈ {a ∼ z ∪ A ∼ Z}. i, j ∈ {a ∼ z ∪ A ∼ Z}.

(1)

(2)

(3)

(4)

Our proposed prediction mechanism is divided into two methods, namely, symmetric prediction and cross prediction. In the symmetric-prediction method, for keyi and keyj , if the system collects RPji but no RPij , our system sets RPij = RPji . The principle of the symmetric-prediction method was observed by researchers in our previously conducted keystroke-dynamics-related experiments,

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

5

Fig. 4. Proposed system interface of enrollment phase.

where for the same user, the keystroke-dynamics latency between the release of keyi and press of keyj was similar to the keystrokedynamics latency between the release of keyj and press of keyi . In the cross-prediction stage, the remaining uncollected keystroke dynamics RPij is predicted using Eq. (5). In Eq. (5), RPi∗ denotes the time latency from the release of keyi to the press of any key, RP∗ j the time latency from pressing keyj to releasing any key, and n the total number of available RPi∗ or RPj∗ after implementing the symmetric-predication method. The cross-prediction method is based on a statistical principle according to which the unknown RPij can be predicted by calculating the mean of all the related latencies RPi∗ and RP∗ j . This technique has been widely used in many studies. For example, the data-preprocessing step used in data mining uses this technique to handle null data.

1  n

RPi∗ +

 1 RP∗ j / 2. n

(5)

The pseudocode of the algorithm for our proposed KDA system to generate a user’s PKD is as follows. For a legitimate user, line 1 loads his/her keystroke dynamics obtained during the enrollment phase, while lines 2–6 use our symmetric-prediction method to generate the average of some unknown RPij . Lines 7–11 generate the standard deviation of some unknown RPij . Next, lines 12–16 use our cross-prediction method to predict the average of the remaining unknown RPij , and lines 17–21 predict the standard deviation of the remaining unknown RPij . Finally, the legitimate user’s PKD is generated as shown in line 22. The time complexity of generating a PKD is analyzed as follows. After inputting a user’s keystroke dynamics in line 1, lines 2–11 implement the symmetric-prediction method, which requires O(n2 ) computational cost, where n denotes the number of letters in the English language alphabet and is a constant, i.e., 26. In lines 12–21, the crossprediction method is implemented, requiring the computational cost of O(n2 ). Consequently, for a user, the time complexity of generating his/her PKD is O(676).

Algorithm for generating PKD

μPRij : the mean of all latency between keyi release and keyj press; σ PRij : the standard deviation of all latency between keyi release and keyj press; μRP[]: a 26 × 26 matrix which records the μRPij obtained in the training step; σ RP[]: a 26 × 26 matrix which records the σ RPij obtained in the training step; RPi∗ : the latency between keyi release and any key press; RP∗ j : the latency between any key release and keyj press; 1. Input μRP[] and σ RP[]; 2. for every μPRij in μRP[] 3. if μPRij is null and μPRji is not null μPRij = μPRji ; 4. 5. end if 6. end for 7. for every σ PRij in σ RP[] 8. if σ PRij is null and σ PRji is not null σ PRij = σ PRji ; 9. 10. end if 11. end for 12. for every μRPij in μRP[] 13. if μRPij is null   μRPi j = (1/n RPi∗ + 1/n RP∗ j ) / 2; 14. 15. end if 16. end for 17. for every σ RPij in σ RP[] 18. if σ RPij is null   σRPi j = (1/n RPi∗ + 1/n RP∗ j ) / 2; 19. 20. end if 21. end for 22. Output μRP[] and σ RP[];

3.4. VBS classifier and keyword-based authentication mechanism The study proposed by Boechat et al. [7] indicated that statistical classifiers are simple, fast, with low computational burden, and are able to offer authentication accuracy comparable to that of other types of classifiers. On the basis of the statistical classifier proposed by Boechat et al. [7], we propose a new VBS classifier

6

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

for our keyword-based authentication mechanism. The statistical classifier proposed by Boechat et al. uses the mean and standard deviation of a legitimate user’s keystroke dynamics, as defined in Eqs. (1)–(4). Given a predefined threshold thr, the statistical classifier calculates the variance S between the user’s enrollment data and authentication data, as displayed in Eq. (6), where Y represents the keystroke dynamics (PR or RP) obtained during the authentication phase, μ and σ the mean and standard deviation of the keystroke dynamics (PR or RP) obtained during the enrollment phase, respectively, and m the total number of the occurrences of PR and RP obtained during the authentication phase. If S < thr, the user is identified as legitimate. However, this classifier is not suited to our keyword-based authentication mechanism. Compared to a free text that contains hundreds or even thousands of characters, keyword samples, however, typically consist of only a few characters; therefore, the keyword-based authentication mechanism is vulnerable to outliers.

S=

1 Y −μ . m σ

(6)

To solve the aforementioned problems, the proposed VBS classifier is defined in Eqs. (7)–(9). Eq. (7) is used for performing PR identification. Eq. (8) is used for RP identification, where XPRi and XPRij , respectively, denote the keystroke dynamics obtained in the authentication phase. Eq. (9), which is the linear combination of Eqs. (7) and (8), is used to differentiate between legitimate users and imposters. For a given threshold σ T and a specific keystroke dynamics (XPRi or XPRij ), by substituting the mean and standard deviation calculated from Eqs. (1)–(4) into Eqs. (7) and (8), if the variation between the enrollment data and authentication data is less than σ T , our proposed system outputs 1 and 0 if the variation is greater than or equal to σ T , respectively. The integer value  D +  F is subsequently obtained from the summation of Eqs. (7) and (8), where  D and  F indicate the total number of PR and RP, respectively. Finally, a value G (0 ≤ G ≤ 1) is obtained by dividing ( D +  F) by the total number of characters, AL, in a keyword. For a given threshold percent, G > percent indicates that the keyword is input by a legitimate user; otherwise, it is identified as an input by an imposter.



D=

1 (pass ),

if

0 (no pass ),

 F=

1 (pass ),

0 (no pass ), 

G=

if

D+ AL



F

.

XPRi −μPRi

σPRi

< σT ,

σRPi j

< σT ,

kw: a fraudulent keyword in a received instant message; PRi : the latency between keyi press to keyi release; RPij : the latency between keyi release and keyj press; μPRij : the mean of all latency between keyi release and keyj press; σ PRij : the standard deviation of all latency between keyi release and keyj press; D: the number of times which [(P Ri − μPRi )/σPRi ] < σT ; F: the number of times which [ (RPi j − μRPi j )/σRPi j ] < σT ; AL: the total number of PRs and RPs in the keyword; percent: a threshold 1. Input kw; 2. for every PRi in the keyword 3. if [(PRi - μPRi ) / σ PRi ] < σ T then 4. D++; 5. end if 6. end for 7. for every RPij in the keyword 8. if [(RPij - μRPij ) / σ RPij ] < σ T then 9. F++; 10. end if 11. end for 12. G = (D + F) / AL; 13. if G < percent then 14. Warn the user that the received instant message may be sent by a imposters;

Suppose that the user enters the keyword “bank”; this keyword contains four PRs {b, a, n, k}, and three RPs {ba, an, nk}. Assuming that σ T is 2 and that percent is 70%, the four PRs and three RPs were substituted into Eqs. (7) and (8), respectively, to obtain  D = 4 and  F = 2. From Eq. (9), as G = 6 / 7 = 85% > percent, the predicted result indicates that the keyword is input by a legitimate user. Suppose that the required KDA accuracy is approximately 90% and that a general user can input 100 characters per minute. Subsequently, on average, a non-keyword-based KDA system will generate an identification error for every 10 characters. That is, in every 6 s, such a non-keyword-based KDA system will generate a false alarm. Furthermore, suppose that a fraudulent keyword appears only once in every 10 min; subsequently, our KDA system will produce a false alert once only within every 100 min. Consequently, the proposed VBS classifier is feasible as it can significantly improve users’ willingness to use our mechanism for identifying IM.

(7)

otherwise. XRPi j −μRPi j

Algorithm of VBS classifier

4. Experimental results and discussions

(8)

otherwise. (9)

In the following, the pseudocode of the VBS classifier’s working algorithm is presented. When a message that contains one of the fraudulent keywords kw is passed, line 1 loads the kw. Lines 2 to 6 authenticate all fraudulent keywords defined by our system, while lines 7–11 authenticate the RP of all fraudulent keywords. Subsequently, line 12 calculates the value G defined in Eq. (9). Finally, the proposed VBS classifier uses a threshold percent to determine whether the received IM is sent by a legitimate user. If G < percent, our system displays the corresponding alerts to the user. The time complexity of building our VBS classifier is O(np), where n is the number of training users and p the number of different keystroke dynamics. For PR and RP, k is a constant of 26 and 676, respectively. Therefore, we can reduce the time complexity from O(np) to O(n). With regard to the time complexity of authentication, our method requires only O(1) time complexity because the number of characters in a keyword is a small integer.

The proposed KDA system is implemented using Microsoft Visual C#, while the related participants used the Windows XP operating system on personal computers equipped with Intel Core2 Duo processors and 2-GB primary memory. All the experiments were performed in a computer classroom, and all the computers used in the experiments were equipped with a standard 104key keyboard. Before starting experiments, we asked all the participants to close all other applications and background programs to ensure the same PC condition in all the experimental computers. All our 300 participants were either college graduates or undergraduates in Taiwan, and most of them were familiar with English language typing. Keystroke dynamics such as PR and RP were calculated using the KeyDown() and KeyUP() functions of the Microsoft Visual C# library. The units of keystroke-dynamics time were recorded in milliseconds. Because Giot et al. [14] recommended that experiments related to KDA-related studies should contain more than 100 participants, we followed their recommendations and involved 300 participants between the age of 19 and 26 years. The profile statistics of the 300 participants are presented in Table 1. It is interesting that more than half of our participants had some experience with identity theft. Therefore, the results of this research are of high practicality.

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067 Table 1 Statistical information of the 300 participants. Attribute

Entry

Number of subjects

Gender

Male Female

150 150

Experience of Fraud Message

Yes No

162 138

Age

18–20 21–26

223 77

Typing Method (The Number of Fingers)

1–3 4–7 8–10

32 58 210

PC Usage Frequency (Year)

1–4 4–7 7–10 10+

28 66 130 76

PC Usage Frequency (Hour)

0–1 2–4 5–7 8+

22 118 98 62

Handedness

Right Left

276 24

Keyboard

Desktop Laptop Both

156 82 62

Commonly Used IM-Software

Facebook Line Skype

117 151 32

During the enrollment phase, both the studied results of Araújo et al. [6] and Giot et al. [14] for KDA and the experimental design of Gunetti and Picardi [15] were combined. We requested each legitimate user to input a long text of approximately 200 words thrice as training samples, following which we generated the PKD of each legitimate user. The training article was randomly extracted from news on the Internet. Assume that each legitimate user input the requested three training samples, namely, I1, I2, and I3. From these three training samples, four user profiles {I1, I2}, {I2, I3}, {I1, I3}, and {I1, I2, I3} could be created. One week later, the same 300 participants keyed-in an approximately 100-word article four times in the same computer classroom. This article was different from that used during the enrollment phase, and the collected data were used as attack data during the authentication phase. Ten fraudulent keywords were embedded in the training and authentication data, and our proposed KDA system currently contains seven built-in keywords, namely, “passbook,” “operate,” “remit,” “accident,” “cash,” “lottery,” and “money.” According to the experimental design mentioned above, the attack samples of each legitimate user were used to attack one of our PKD profiles approximately 40 times (10 × 4), while the number of each illegitimate attack was approximately 11,960 times (10 × 4 × 299). In summary, our experiment included 48,0 0 0 (10 × 4 × 4 × 300) instances of legitimate validation and 14,352,0 0 0 (10 × 4 × 4 × 299 × 300) instances of illegitimate validation. The EER, FRR, and FAR of the conducted experimental results are presented in Table 2. The optimal experiment result yields a 10.4% EER at a 95% confidence level, when σ T was set as 2 and percent as 87%. To understand the setting of threshold σ T and percent on the authentication rate of our proposed system, the range of threshold σ T was set between 1 and 3 by increments of 0.1. The experimental results were plotted as receiver operating characteristic (ROC) curves [12] to further evaluate the system’s accuracy. To prevent a confusing comparison chart, Fig. 5 depicts only the ROC curves for threshold σ T values of 1, 1.5, 2, 2.5, and 3. It is noteworthy that we could plot multiple ROC curves in Fig. 5 because our system authenticates users by considering two parameters: σ T and percent.

7

Table 2 Resulted EER with the most balanced optimum threshold of our proposed system. Threshold percent

87%

EER(%) Legitimate user samples # of times to reject a legitimate user FRR(%) Confidence intervala Impostor samples # of times to accept an impostor FAR(%) Confidence intervala

10.4% 48,000 4985 10.4% [0.0989–0.1089] 14,352,000 1490,556 10.4% [0.1019–0.1059]

a level = 95%, sampling error =  Confident Pˆ(1 − Pˆ)/n, where Pˆ is FRR or FAR, and n is # of le-

gitimate user samples or impostor samples.

Fig. 5. ROC curves of different threshold settings.

In other words, each ROC curve in Fig. 5 is obtained by examining different parameter percent for a fixed parameter σ T . By the definition of ROC curves, the closer the curve is to the origin, the higher is the identification accuracy of the system. As shown in Fig. 5, when σ T is set as 2, the ROC curve is closest to the origin with an EER of 10.4%. Although we are the first to apply keystroke dynamics to detect fraudulent instant messages, we compare our proposed approach with other free-text-related studies in Table 3. In Table 3, the entry “−” indicates unspecified in the corresponding published paper. For example, Alsultan et al. had recently reported the FAR and FRR instead of the typically used EER and did not clearly reveal the issue of training time in their two studies [4,5]. Dowland et al. [11], Villani et al. [38], and Samura and Nishimura [34] also did not disclose the training times of their experiments. Considering the number of participants within the studies related to free text, our study and those of Samura and Nishimura [34] and Kim et al. [22] are the only ones with more than 100 participants. Regarding training time, our proposed approach requires only a relatively shorter training time, i.e., approximately 20 min, which is significantly better than several weeks or months as required in other studies. Regarding authentication accuracy, our system yields an EER of 10.4%. Although the EER result is not the best among all the related studies, yet our system, indeed, decreases the training burden on users and is more practical. It is noteworthy that the primary focuses of this paper are the feasible training time, relatively smaller number of false alarms, and comparable EER. In fact, a trade-off exists between the EER

8

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067 Table 3 Comparison of free-text related works, where the symbol “−” indicates unspecified in the corresponding published paper. Study

Year

# of subjects

Training time (# of training characters)

Classifier(s)

EER(%)

Monrose and Rubin [30] Dowland et al. [11] Gunetti and Picardi [15] Villani et al. [38] Samura and Nishimura [34] Davoudi and Kabir [10] Park et al. [31] Messerman et al. [26] Ahmed and Traore [2] Alsultan et al. [5] Alsultan et al. [4] Kim et al. [22] This work (non-prediction) This work (non-prediction) This work (cross) This work (symmetric) This work-female (prediction + VBS) This work -male (prediction + VBS) This work (prediction + VBS)

1997 2002 2005 2006 2009 2009 2010 2011 2014 2016 2018 2018 2019 2019 2019 2019 2019 2019 2019

42 5 40 40 112 21 35 55 53 21 25 150 300 300 300 300 150 150 300

7 weeks (−) − (−) 1–2 months (−) − (−) − (−) 1–2 months (−) − (−) 12 months (−) 5 months (18,008) − (7200) − (7200) − (180,000) 20 min (1000) 20 min (1000) 20 min (1000) 20 min (1000) 20 min (1000) 20 min (1000) 20 min (1000)

Statistical Statistical Statistical Statistical Statistical Statistical Statistical Statistical Neural networks SVM and decision tree SVM and decision tree Five classifiers Statistical [7] VBS VBS VBS VBS VBS VBS

77 40 15 3.6 5.3 − 8.9 2 2.46 − − 1.06 ∼ 10.1 24.2 21.0 15.0 16.9 10.2 10.0 10.4

and training time. For example, experimental results showed that our method did not outperform the method proposed by Kim et al. [22] in terms of EER because the training time of our method is much less than that required by [22]. In [22], each user was asked to key-in 180,0 0 0 characters during the training phase, while our method only required a user to key-in approximately 10 0 0 characters. While our method requiring users to undergo approximately 20 min of training, we can infer that [22] should need approximately 60 h for the same. If the training time is longer, our system can yield a better EER. The purpose of presenting Table 3 is to provide readers an overview of the recent studies in free text. The comparison of the EERs presented in Table 3 is meaningless because our training time is much shorter than those of the other studies. Regarding the training times of the previous studies listed in Table 3, the actual values of the previous studies should be smaller because their training times are not continuous. However, their actual training time could not be reported because these proposed studies did not disclose detailed information regarding their experimental results. We also analyzed the performance of our two prediction methods and VBS classifier, as shown in Table 3. We found that the VBS classifier improved the EER from 24.2% to 21.0% when our two prediction methods were not used. Regarding the effectiveness of our two prediction methods, the symmetry-prediction method should theoretically perform better than the cross-prediction method. The main reason is that the symmetry-prediction method was based on a natural phenomenon—PRij is naturally highly similar to PRij , while the cross-prediction method was based on the statistical pre-

diction. However, we obtained a different result. Using the crossprediction method, an EER of 15.0% was achieved, and an EER of 16.9% was achieved using the symmetry-prediction method. By analyzing our experimental data, we found the main reason is that using the cross-prediction method filled-in more missing keystroke dynamics embedded in our seven built-in keywords as compared to the symmetry-prediction method. However, this situation may change upon increase in the number of built-in keywords. In summary, the experimental results from this study indicated that our proposed two prediction methods and VBS classifier can effectively predict the legitimate user’s keystroke dynamics and recognize fraudulent instant messages. Considering that the length and thickness of females’ fingers are typically longer and thinner than those of males, we analyzed whether a difference existed between the applicability of results differently on males and females. Our experimental results indicated that males and females reach 10.0% and 10.2% EER, respectively. That is, no significant difference between the results of males and females was observed. We analyzed our experimental data and found that the length and thickness of fingers are not the key factors in determining keystroke dynamics. Rather, the key factors of keystroke dynamics are primarily decided by the users’ typing speed and typing habit. Finally, we compare the computational cost of our proposed approach with that of other free-text related studies in Table 4. The comparison consists of two parts: building of classifiers and authentication. Please note that the time required in implementing these two parts is relatively very short as compared to the train-

Table 4 Comparison of computational cost of the free-text related works. Study

Classifier

Building of classifier(s)

Authentication

Monrose and Rubin [30] Dowland et al. [11] Gunetti and Picardi [15] Villani et al. [38] Samura and Nishimura [34] Davoudi and Kabir [10] Park et al. [31] Messerman et al. [26] Ahmed and Traore [2] Alsultan et al. [5] Alsultan et al. [4] Kim et al. [22] This paper

Statistical Statistical Statistical Statistical Statistical Statistical Statistical Statistical Forward neural networks SVM and decision tree SVM and decision tree Five classifiers VBS

O(n) O(n) O(n) O(n) O(n) O(n) O(n) O(n) O(nu2 ) O(nmd) ∼ O(n3 ) O(nmd) ∼ O(n3 ) O(nm) ∼ O(n3 ) O(n)

O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(w) O(m) ∼ O(d) O(m) ∼ O(d) O(1) ∼ O(d) O(1)

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

ing time spent on keystroke-dynamics training. The time complexity of building a classifier is decided by the adopted techniques. For example, if we use SVM, the time complexity is O(n3 ), where n is the number of training data. For the feedforward neural networks based approach adopted in [2], since the authors used the Levenberg-Marquardt algorithm, the time complexity was O(nu2 ), where u is the number of neurons. The time complexity using the decision tree, k-means, and k-NN approaches is O(nmd), O(nmk), and O(nm), respectively, where n is the number of training examples, m the number of attributes, d the depth of the decision tree, and k the number of clusters. In regard to building a statistical classifier, it was observed to be the fastest among the approaches discussed because it only needed to calculate the mean and standard derivation of each keystroke dynamics. The time computational complexity of a statistical classifier is O(np), where n is the number of training users and p the number of total keystroke dynamics. For PR and RP, p is a constant, i.e., 26 and 676, respectively. Therefore, we can reduce the time complexity from O(np) to O(n). Please note that our method requires an extra step to predict the missing keystroke dynamics. As derived in Section 3.3, the time complexity of our prediction methods needs only O(656), which is a constant time O(1). Therefore, our method requires O(n) to build our VBS classifier. After the authentication system is built, the authentication time of each method was observed to be very fast. A statistical-based classifier was still observed to be the fastest since it needed to compute only the difference between a users’ current keystroke dynamics and his/her profile obtained during the enrollment phase by using the mean and standard deviation. With regard to the SVM, neural networks, and decision tree, they, respectively, required O(m), O(w), and O(d) computational cost, where w is the number of weights in the neural networks. Regarding the real execution time, our experiment showed that our method needed only 17 ms to generate a legitimate user’s PKD and 18 ms to build his/her VBS, while the authentication time of a keyword was 0.06 ms only. 5. Conclusions and future works Owing to Skype, Line, and WeChat being prevalent online IM applications, incidences of fraud against the users of these software have increased significantly. To effectively prevent online scams, we herein proposed a novel approach to authenticate English-language-related fraudulent instant messages on the basis of keystroke dynamics, by using a prediction mechanism combining two prediction methods, namely, the symmetrical-prediction method and the cross-prediction method. Using this prediction mechanism, users can be authenticated immediately after undergoing approximately 20 min of keystroke-dynamics training. The training time required by our approach was significantly better and more feasible than that required by all other published studies. Furthermore, a VBS classifier was designed to further improve the authentication accuracy. Finally, in this paper, a keyword-based identification method was proposed to reduce the problem of excessively frequent false alarms. The experimental results demonstrated that our approach outperformed other relevant studies, in terms of a shorter training time, a relatively smaller number of false alarms, and a positively comparable EER. Our experiment also demonstrated that there was no significant difference between the keystroke dynamics of males and females, despite of the fact that the length and thickness of females’ fingers are typically longer and thinner than those of males. For our current system design, we seek improvement in the EER of our proposed system in our future works. Although increasing the training time is the simplest method, yet it is infeasible since it also decreases users’ willingness to use our proposed sys-

9

tem. There are two possible directions to improve the EER of our proposed system. The first direction is to use more keystroke dynamics such as frequency of word errors, corresponding order of keystroke dynamics, and keystroke-dynamics modes. The second direction is using incremental learning to gradually increase the prediction accuracy of our prediction mechanism. Finally, we hope that this study will motivate researchers to propose more efficient and effective methods to recognize fraudulent instant messages. Declaration of competing interest The authors declare that they have no conflict of interest. Acknowledgments This research was partially supported by the Ministry of Science and Technology of Republic of China under grant nos. MOST 1072622-E-018-002-CC3 and MOST 107-2221-E-018-015. Partial results of this paper have been presented in The 3rd International Scientific Conference on Engineering and Applied Sciences. References [1] A. Abo-alian, N.L. Badr, M.F. Tolba, Keystroke dynamics-based user authentication service for cloud computing, Concurr. Comput. 28 (9) (2016) 2567–2585. [2] A.A. Ahmed, I. Traore, Biometric recognition based on free-text keystroke dynamics, IEEE Trans. Cybern. 44 (4) (2014) 458–472. [3] O. Alpar, Frequency spectrograms for biometric keystroke authentication using neural network based classifier, Knowl. Based Syst. 116 (2017) 163–171. [4] A. Alsultan, K. Warwick, H. Wei, Improving the performance of free-text keystroke dynamics authentication by fusion, Appl. Soft Comput. 70 (2018) 1024–1033. [5] A. Alsultan, K. Warwick, H. Wei, Free-text keystroke dynamics authentication for Arabic language, IET Biom. 5 (3) (2016) 164–169. [6] L.C.F. Araújo, L.H.R. Sucupira, M.G. Lizarraga, L.L. Ling, J.B.T. Yabu-Uti, User authentication through typing biometrics features, IEEE Trans. Signal Process. 53 (2) (2015) 851–855. [7] G.C. Boechat, J.C. Ferreira, E.C.B.C. Filho, Authentication personal, in: IEEE International Conference on Intelligent and Advanced Systems, 2007, pp. 254–256. [8] T.Y. Chang, C.J. Tsai, J.H. Lin, A graphical-based password keystroke dynamics authentication system for touch screen handheld mobile devices, J. Syst. Softw. 85 (5) (2012) 1157–1165. [9] T.Y. Chang, C.J. Tsai, W.J. Tsai, C.C. Peng, H.S. Wu, A changeable number-based keystroke dynamic authentication system on smart phones, Secur. Commun. Netw. 9 (15) (2016) 2674–2685. [10] H. Davoudi, E. Kabir, A new distance measure for free text keystroke authentication, in: Proceedings of International CSI Computer Conference, 2009, pp. 570–575. [11] P. Dowland, S. Furnell, M. Papadaki, Keystroke analysis as a method of advanced user authentication and response, in: Proceedings of the IFIP TC11 Seventeenth International Conference on Information Security: Visions and Perspectives, 2002, pp. 215–226. [12] T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett. 27 (8) (2006) 861–874. [13] R. Gad, M. Talha, A.A.A. El-Latif, M. Zorkany, E.S. Ayman, E.F. Nawal, G. Muhammad, Iris recognition using multi-algorithmic approaches for cognitive internet of things (CIoT) framework, Future Gener. Comput. Syst. 89 (2018) 178–191. [14] R. Giot, M. El-Abed, B. Hemery, C. Rosenberger, Unconstrained keystroke dynamics authentication with shared secret, Comput. Secur. 30 (6–7) (2011) 427–445. [15] D. Gunetti, C. Picardi, Keystroke analysis of free text, ACM Trans. Inf. Syst. Secur. 8 (3) (2005) 312–347. [16] J. Ho, D.K. Kang, Mini-batch bagging and attribute ranking for accurate user authentication in keystroke dynamics, Pattern Recognit. 70 (2017) 139–151. [17] J. Hu, D. Gingrich, A. Sentosa, A k-nearest neighbor approach for user authentication through biometric keystroke dynamics, in: Proceedings of IEEE International Conference on Communications, 2008, pp. 1556–1560. [18] A.K. Jain, A. Ross, S. Prabhakar, An introduction to biometric recognition, IEEE Trans. Circuits Syst. Video Technol. 14 (1) (2004) 4–20. [19] P. Kang, The effects of different alphabets on free text keystroke authentication: a case study on the Korean–English users, J. Syst. Softw. 102 (2015) 1–11. [20] P. Kang, S. Cho, A hybrid novelty score and its use in keystroke dynamics-based user authentication, Pattern Recognit. 42 (11) (2009) 3115–3127. [21] P. Kang, S. Cho, Keystroke dynamics-based user authentication using long and free text strings from various input devices, Inf. Sci. 308 (2015) 72–93. [22] J. Kim, H. Kim, P. Kang, Keystroke dynamics-based user authentication using freely typed text based on user-adaptive feature extraction and novelty detection, Appl. Soft Comput. 62 (2018) 1077–1087. [23] A. Kołakowska, Usefulness of keystroke dynamics features in user authentication and emotion recognition, Hum.-Comput. Syst. Interact. 551 (2018) 42–52.

10

C.-J. Tsai and P.-H. Huang / Pattern Recognition 98 (2020) 107067

[24] H.J. Lee, S. Cho, Retraining a keystroke dynamics-based authenticator with impostor patterns, Comput. Secur. 26 (4) (2007) 300–310. [25] C.L. Liu, C.J. Tsai, T.Y. Chang, W.J. Tsai, P.K. Zhong, Implementing multiple biometric features for a recall-based graphical keystroke dynamics authentication system on a smart phone, J. Netw. Comput. Appl. 53 (2015) 128–139. [26] A. Messerman, T. Mustafic, S.A. Camtepe, S. Albayrak, Continuous and non-intrusive identity verification in real-time environments based on free-text keystroke dynamics, in: Proceedings of IEEE International Joint Conference on Biometrics, 2011, pp. 1–8. [27] A. Mian, Online learning from local features for video-based face recognition, Pattern Recognit. 44 (5) (2011) 1068–1075. [28] J.V. Monaco, C.C. Tappert, The partially observable hidden Markov model and its application to keystroke dynamics, Pattern Recognit. 76 (2018) 449–462. [29] S. Mondal, P. Bours, A study on continuous authentication using a combination of keystroke and mouse biometrics, Neurocomputing 230 (2017) 1–22. [30] F. Monrose, A. Rubin, Authentication via keystroke dynamics, in: Proceedings of the Fourth ACM Conference on Computer and Communications Security, 1997, pp. 48–56. [31] S. Park, J. Park, S. Cho, User authentication based on keystroke analysis of long free texts with a reduced number of features, in: Proceedings of IEEE International Conference on Communication Systems, Networks and Applications, 1, 2010, pp. 433–435. [32] R. Raghavendra, B. Dorizzi, A. Rao, G.H. Kumar, Designing efficient fusion schemes for multimodal biometric systems using face and palmprint, Pattern Recognit. 44 (5) (2011) 1076–1088. [33] N. Sae-Bae, N. Memon, P. Sooraksa, Distinctiveness, complexity, and repeatability of online signature templates, Pattern Recognit. 84 (2018) 332–344. [34] T. Samura, H. Nishimura, Keystroke timing analysis for individual identification in Japanese free text typing, in: Proceedings of ICROS-SICE International Joint Conference, 2009, pp. 3166–3170. [35] P.S. Teh, S. Yue, A.B.J. Teoh, Improving keystroke dynamics authentication system via multiple feature fusion scheme, in: Proceedings of IEEE International Conference on Cyber Security, Cyber Warfare and Digital Forensic, 2012, pp. 277–282. [36] C.J. Tsai, T.Y. Chang, P.C. Cheng, J.H. Lin, Two novel biometric features in keystroke dynamics authentication systems for touch screen devices, Secur. Commun. Netw. 7 (4) (2014) 750–758.

[37] J.A. Unar, W.C. Seng, A. Abbasi, A review of biometric technology along with trends and prospects, Pattern Recognit. 47 (8) (2014) 2673–2688. [38] M. Villani, C. Tappert, G. Ngo, J. Simone, H.S. Fort, S.H. Cha, Keystroke biometric recognition studies on long-text input under ideal and application-oriented conditions, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2006 39-39. [39] W. Yang, S. Wang, J. Hu, G. Zheng, C. Valli, A fingerprint and finger-vein based cancelable multi-biometric system, Pattern Recognit. 78 (2018) 242–251. Cheng-Jung Tsai is currently an associate professor in the Graduate Institute of Statistics and Information Science at National Changhua University of Education, Chang-Hua, Taiwan, R.O.C. His research interests include data mining, big data analysis, information security, e-learning, and digital image processing.

Po-Hao Huang is currently an engineer in CHIMEI Materials Technology Corporation, Tainan, Taiwan, R.O.C. His current research interests include information security and data mining.