Applied Soft Computing 24 (2014) 40–49
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
A novel model for credit card fraud detection using Artificial Immune Systems Neda Soltani Halvaiee ∗ , Mohammad Kazem Akbari Amirkabir University of Technology, 424 Hafez Street, Tehran, Iran
a r t i c l e
i n f o
Article history: Received 3 February 2013 Received in revised form 4 December 2013 Accepted 24 June 2014 Available online 15 July 2014 Keywords: Credit card fraud detection Artificial Immune Systems Artificial Immune Recognition System Memory cell Cloud computing MapReduce
a b s t r a c t The amount of online transactions is growing these days to a large number. A big portion of these transactions contains credit card transactions. The growth of online fraud, on the other hand, is notable, which is generally a result of ease of access to edge technology for everyone. There has been research done on many models and methods for credit card fraud prevention and detection. Artificial Immune Systems is one of them. However, organizations need accuracy along with speed in the fraud detection systems, which is not completely gained yet. In this paper we address credit card fraud detection using Artificial Immune Systems (AIS), and introduce a new model called AIS-based Fraud Detection Model (AFDM). We will use an immune system inspired algorithm (AIRS) and improve it for fraud detection. We increase the accuracy up to 25%, reduce the cost up to 85%, and decrease system response time up to 40% compared to the base algorithm. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Credit card fraud is an important issue and has considerable cost for banks and card issuer companies. Financial organizations try to prevent account misuse using different security solutions. The more complex the security solutions are, the more sophisticated fraudsters get i.e. fraudsters change their methods over time. Therefore it is crucial to improve fraud detection methods along with security modules which try to prevent fraud. Fraud detection has become a crucial activity in order to decrease the impact of fraudulent transactions on service delivery, costs, and reputation of the company. There are plenty of methods used for fraud detection each of which tries to retain maximum quality of service while keeping false alarm rate at minimum. Fraud is cost and detecting it before the transaction is registered will reduce this cost significantly, which needs a very accurate system with quite few false alarms. Edge and Falcone Sampaio [1] state that while implementation of proactive methods increases the potential for early fraud alerting, real-time processing significantly reduces the available time window within which computational analysis should be performed and an accurate decision should be made in response to newly arriving transactions. The quicker a fraud detection system
∗ Corresponding author at: No. 21, Mohajeran Avenue, Imam Street, Azarshahr, East Azarbaijan, Iran. Tel.: +98 9143053404. E-mail addresses:
[email protected],
[email protected] (N. Soltani Halvaiee),
[email protected] (M.K. Akbari). http://dx.doi.org/10.1016/j.asoc.2014.06.042 1568-4946/© 2014 Elsevier B.V. All rights reserved.
responds, the better. Fraud detection systems are trained using older transactions in order to decide about new ones. This training phase is time-consuming which can be parallelized in most cases. In order to reduce computation time one can reduce the number of previous transactions processed by minimizing the time window, use less complicated methods, and etc. each of which might result in reduction in accuracy, which means more missed fraud cases and more false alarms. Accordingly, a powerful tool is needed on which the fraud detection system could run and process transactions in minimum time This paper suggests using cloud computing i.e. implementing fraud detection system on a cloud-base file system, namely Hadoop, which makes data parallelization possible in large datasets. Different methods have been used for fraud detection including Bayesian algorithm [2], Neural network [3], Markov model [4], account signature [1], Artificial Immune Systems [5–8]. AIS is based on human immune system and is similar to fraud detection system in many aspects: 1 – Both of them pursue the same goal of separating normal records from unauthorized ones. 2 – In both cases the number of normal records is much more than unauthorized ones. 3 – In both cases unauthorized records are similar to those of normal. In human body viruses and non-self cells carry protein and masquerade self cells. Fraudsters also try to have similar behavior to card owner’s behavior. 4 – Both systems have to detect and learn new methods of misuse. Human body could face new types of nonself cells any time and it has to not only detect the new types, but also remember them so that they can be detected later. Similarly, a typical fraud detection system should be able to detect any type
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
of fraud even if it had not happened before. Also the system should learn it for future cases. AIS addresses detecting non-self cells, imitating the functions of human body which occurs during generating detector cells, detecting non-self cells, and cleaning the body from non-self while learning its pattern. Detector cells, namely lymphocytes, are selftolerant which means they are not stimulated by self cells but by non-self cells. Immune system can learn new patterns of non-self cells that it has not come across before. Once a detector is stimulated by a non-self, the system keeps the detector as a memory cell. Therefore, if that particular non-self cell enters body later, it can be detected again. This makes AIS adaptable to its environment. Immune system starts training with no information about non-self cells. This means it is trained using only self cells. In this paper we will use an AIS-based method for credit card fraud detection and introduce AFDM. We will improve a previously introduced algorithm [9] in various aspects to get higher precision. We will also propose a new implementation model for the method in order to reduce training time. The results are compared to a similar work [6] which has improved AIS system parameters. We use the same parameters as well as the dataset used in [6]. The remainder of this paper is structured as follows: Section 2 presents the background information. First AIS is described followed by Artificial Immune Recognition System – the algorithm which is used in this paper. Then, after a brief introduction about Cloud Computing, Hadoop file system and MapReduce API are described. Section 3 is about related work in credit card fraud detection field focusing on using AIS for fraud detection. Section 4 describes AFDM, the methodology, the improvements on AIRS, and the implementation model. Section 5 includes the results of the tests. Finally, Section 6 discusses future research directions. 2. Background 2.1. AIS AIS simulates human body immune system functionality. Human body detects non-self cells, which might be viruses, pathogens, germs, etc., by creating detector cells named lymphocytes. As this functionality is similar to what a typical fraud detection system does, AIS is used for fraud detection in some researches. AIS detects non-self cells using two basic functions in human body which generate and mature lymphocytes: Negative Selection and Clonal Selection. Detector cells (lymphocytes) are generated through random composition of protein patterns. Then they will be able to prevent any potential threat by covering many protein patterns randomly. In order to have self-tolerant detectors which do not react to self cells, the system declines those which do. It means that any randomly generated detector which detects a self cell dies immediately. Right after generation, detectors are presented to self cells and only those which do not react to self cells survive. This process is called Negative Selection. After this process the detectors enter the system and in their short life-time they are expected to face any potential non-self cell and detect it. If any detector detects a non-self cell, it can live longer in order to make body vaccinated against that non-self; the process is called Clonal Selection. When a detector comes across a non-self cell and detects it, the detector is cloned through mutation. One of the clones having the highest affinity with the non-self cell is selected as memory cell and lives longer in human body. If that specific type of non-self cell enters human body again, the system will detect it using memory cells. In this paper we will make improvements on an AIS-based algorithm named Artificial Immune Recognition System. Watkins et al. [9] first introduced AIRS. It is a classification algorithm which uses Clonal Selection. In [10] the authors state following as some features of AIRS:
41
Fig. 1. Lifecycle overview of the AIRS algorithm [9].
- Self-regulation: AIRS does not require the user to select architecture, instead the adaptive process discovers or learns an appropriate architecture during training. - Performance: Evaluation of the technique in some researches [6,10] show that AIRS is a competitive classification system. - Generalization: Unlike techniques such as K-Nearest Neighbor that use the entire training dataset for classification, AIRS performs generalization via data reduction. This means that the resulting classifier produced by the algorithm represents the training data with a reduced or minimum number of exemplars. It is typical for AIRS to produce classifiers with half the number of training instances. - Parameter adjustment: The algorithm has a number of parameters that allows tuning of the technique to a specific problem, with the intent of achieving improved results. In AIRS both self/non-self cells and detector cells are represented as feature vectors. In order to reduce redundancy, ARB (Artificial Recognition Ball) is used which is representative of similar memory cells. ARBs are generated using a random mutation process with a certain probability and then ARBs that comply with the stop condition are selected as memory cells. Fig. 1 shows the flowchart of the algorithm. In the first step the classifier is prepared by normalizing training set records. Then the parameters of the algorithm are initialized and affinity threshold is calculated using the distance between any two records in training set. After this part some records are chosen randomly as initial ARBs. This part of the algorithm is time-consuming. The rest of the algorithm processes each single record. In this part, memory cells which have been stimulated the most by the training record (antigen) and their class attributes are the same as training record are chosen to be added to memory cell pool. Stimulation is calculated using the distance function. Choosing the memory cells is done by competition between existing memory cells and their clones. In the end a memory cell is chosen and added to memory cell pool and the training on a specific training record is done. The algorithm continues training on the rest of training records in the same way. This part is time-consuming, too. The last part of algorithm is Classification which starts when the training is done on all records. In this part K-Nearest Neighbor is used as the classifier algorithm. Each record in the test set is presented to memory cells and the neighbors are chosen based on stimulation. Then the class of the antigen is decided based on the class of the majority of K memory cells in the neighborhood. 2.2. Cloud computing Cloud computing offers features which can help fraud detection in some aspects. First, cloud computing offers computation power: cloud computing uses datacenters with vast resources which has almost no limitation in computing, memory, and storage. Cloud
42
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
computing is distributed: it offers the benefits of distributed systems, e.g. the data are accessible everywhere at any time using any device. Cloud computing decreases cost: considering the payment model here, namely pay-as-you-go, one will pay only for resources which are actually used. Also resource allocation can be based on need and whenever there is no need one will not pay for resources. AFDM uses cloud computing and parallelizes time-consuming parts of AIRS. Training phase in AIRS has two time-consuming parts: affinity threshold calculation – the distance between each two records in training set should be calculated in order to get the average-, and memory cells generation in which each record is processed to generate memory cells. We used Hadoop Distributed File System for storing transaction records and MapReduce API for processing those records. Hadoop is a framework for running programs on cluster. It uses MapReduce, a computing paradigm, which divides program into smaller parts and runs them on cluster nodes. Both Hadoop and MapReduce are designed to manage nodes including non-working nodes. Using MapReduce the programmer can divide their program into parallel parts. Mapper class is the parallel part. Reducer class integrates the results of parallel parts and returns the final result. Both Mapper and Reducer classes can be overridden by user programmer and manipulated in the way the programmer needs. 3. Related work: credit card fraud detection Many researchers address credit card fraud detection and many methods are developed. Yet real-time fraud detection remains an issue. Edge and Falcone Sampaio [1] survey Account Signature which steps toward real-time fraud detection. This paper mentions that methods based on data mining need flagged records, are timeconsuming, and need to be updated. The only problem is Account Signatures is an inflexible behavior model as it considers the total trend for user. It is obvious that user might change their trend within time and one has to consider updating more frequently. Also updating should be intelligent enough in order to decide on new behaviors to be considered as a new trend or only a single distinct transaction. Krenker et al. [3] address mobile fraud detection using a bi-directional Artificial Neural Network. The system predicts user behavior and compares it to user’s current behavior. Prediction is based on history. Kundu et al. [11] suggest a method based on bioinformatics, namely Sequence Alignment. This method is used to find the similarity between two transactions. Further analysis is done on transaction to find the deviation from normal behavior. All these methods have the drawbacks of a typical behavior-based method, including not tolerating major changes in user behavior, having high FP, the need for a rather massive amount of transactions for a single user, and the issue of covering all possible, normal scenarios. Krivko [12] suggests a hybrid model to cover the draws of behavioral models. The suggested system combines behavioral model and rule based system. The results are promising. Yet the system is complicated and time-consuming. Therefore, one needs a system which offers a fair tradeoff between precision, time and cost. 3.1. Fraud detection using AIS Artificial Immune System is a rather old method mostly used for intrusion detection. The writers believe AIS has proper potential for fraud detection. Researches on AIS for fraud detection prove the claim. As mentioned before, AIS starts with slight information about potential non-self cells. So it can solve first issue about Rule Based systems which need massive number of fraud type in order to extract relative rule. On the other hand it can learn new non-self patterns while it works. Also AIS is distributed which means one detector does not decide on its own; this improves precision. AIS
detectors die after a while if they do not detect a non-self cell, so they can tolerate self behavior change while the system is being updated continuously. Huang et al. [7] propose a hybrid model for online fraud detection of Video-on-Demand Systems, which is aimed to improve the current Risk Management Pipeline (RMP) by using Artificial Immune System (AIS) for fraud detection by logging data. The AIS based model combines two artificial immune system algorithms with behavior based intrusion detection using Classification and Regression trees (CART). Brabazon et al. [5] have investigated the effectiveness of Artificial Immune Systems (AIS) for credit card fraud detection. Three AIS algorithms were implemented and their performance was benchmarked against a logistic regression model. The results suggest that AIS algorithms have potential for inclusion in fraud detection systems but that further work is required to realize their full potential in this domain. Wong et al. [8] suggest fraud detection system architecture based on Negative Selection and Vaccination. Detectors are generated using fraudulent records, they are mutated and the distance between mutated cells and fraudulent records is calculated. If the distance is higher than the threshold, Negative Selection comes to action. Gadi et al. [6] perform a comparison between credit card fraud detection methods including Neural Network, Bayesian Network, Naïve Bayes, Decision Tree, and Artificial Immune Recognition System. It is shown that after improving the parameters of AIRS, it has the least cost and the most precision. 4. Materials, methods, and theory of AFDM The most important issue in credit card fraud detection is to achieve high detection rate while keeping false alarm rate low. It is also important to have real-time responses and the system decide about a transaction before it is registered. Therefore time-consuming algorithms should respond as soon as possible; especially those algorithms which have high training time. AFDM presents some improvements on AIRS algorithm which follows. In order to get more precise results while having fewer false alarms, we have included Negative Selection and model update, changed distance function, and considered dataset properties while generating memory cells. On the other hand we have implemented our model using Hadoop in order to reduce training time. We have also used ranking for those records which are flagged as fraud by the system so that analysts may know which flagged record is most likely fraud and not a false alarm. Following is the details of each improvement followed by implementation model. 4.1. Memory cell generation AIRS uses Clonal Selection in order to generate the memory cells. Yet generated cells are not the best memory cells. In the first stage of memory cell generation the distance between each training record and each ARB, having the same class as record, is calculated in order to select the closest cell for mutation. If the closest cell has not the same class as the training record, other cells, which might be quite far, will be selected. Therefore the closest cell to the training record might be rather far. On the other hand a memory cell which is so close to a record from the other class is actually matched with the wrong class, i.e. has a wrong detection. One can consider Negative Selection in this stage and delete the memory cell which is detecting wrongly, yet this is not the best solution. Because deleting each memory cell which has detected a wrong record once happens to cost many memory cells. In credit card datasets the number of fraudulent records is much fewer than normal records. Therefore the chance of detecting a normal record by a fraud memory cell is rather high, which results in losing lots of fraud cells if Negative
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
Selection is used. Therefore we have used another method in order to weaken those memory cells that have detected wrongly many times. In this method memory cells are rated – instead of being deleted – based on the distance they have from any record that they have detected wrongly. In this rating mechanism each detector has a rate of 0 at the beginning and each time it matches with a record from other class, the distance is reduced from its rate. Therefore, at the end of memory cell generation phase we’ll have rated memory cells those of which have higher negative rate are weaker detectors in their own class; which means they could be powerful detectors for the other class, so the class of the detectors with highly negative rate is changed. This process can be seen as a modified type of Negative Selection which is improved for fraud detection. 4.2. Distance function Affinity is calculated using Euclidean distance in AIRS. Euclidean distance is a good metric in multidimensional spaces. However this might not be good enough considering the fields in credit card transaction records. Each field has a different type from others e.g. date, amount, location, quantity, etc. Therefore the distance between two transactions means the difference between two behaviors not the spatial distance, and one might expect that two similar (close by distance) transactions have similar field values with low difference in some values. We have used the average distance (d) between fields in order to show the difference or affinity between two transactions:
n
d=
i=0
(|v1i − v2i |/maxi − mini ) n
(1)
v1i is the value of i’th field of first record, and v2i is the value of the i’th field of second one as well. maxi and mini sequentially show the maximum and minimum values for the i’th field in the whole dataset. Finally, n is the number of the fields. In this formula the distance is not based on spatial distance but the raw distance between two values. This way one can use different distance functions for different field types, and add weight to important fields while the effective weight is not squared and has a direct impact. So it is easier to find proper weight. In this paper all our dataset records were classified, and numerical. So we did not test weighting of the fields and the different functions. But we have shown that this distance function is more precise than Euclidean distance. 4.3. Updating system A typical fraud detection system should be up-to-date which means any new fraud type should be added to the system instantly after being detected. However this is not completely possible in a real system because not all the flagged records are really fraud and might be false alarm. Yet the system might add these new fraud types by time when they are proved to be fraudulent. This is done by training system on the test data while any new fraud is detected; and this is done in fixed time windows in order to have more realistic results. In AFDM we generate memory cells for any newly detected fraud. New memory cells are used while testing the rest of the dataset. 4.4. Dataset properties In the field of credit card fraud detection there are different datasets with different fraud properties, e.g. the number of fraudulent records, the type of fraud, the distribution of fraudulent records between normal records, and the variety of fraud types. For example if there are a variety of fraud types in the dataset, there should be more fraud memory cells generated. Also if the space in a dataset
43
can be divided into two regions – of fraudulent and normal records – so that most fraudulent records are gathered in a subspace, memory cells should cover it with more precision. As another example some fields are more correlated to fraud in some datasets so we can focus on them while generating memory cells and detecting fraud. The number of fraudulent records in every dataset is different. This might be because of the different security protocols used by different organizations and banks. This fact causes different fraud characteristics on each dataset, which affects the performance of the fraud detection system. Dataset preprocessing helps us understand the overall trend of transactions in the dataset, discover important fields, and remove or weaken unimportant fields – from fraud point of view – in the calculations. In order to address this issue we include a preprocessing phase in AFDM. In this phase the dataset is preprocessed and the number of memory cells is controlled based on the distribution of fraudulent records. The ratio of fraudulent records for each value of each field is calculated presenting the percentage of fraud occurred in each value. All fields are examined, and an index (˛) is calculated for each field using the following formula: ˛ij =
Nij Nf
(2)
˛ij is the fraud ratio for the jth value of the ith field, and Nij shows the number of fraudulent records having jth value for the ith field, and Nf is the whole number of fraudulent record in the dataset. While generating memory cells, the ratio affects the number of memory cells. This is done by multiplication of it by hyper-mutation parameter which determines the number of clones a memory cell can have. Using this method, when a training record contains a specific value for a specific field which has high ˛, more mutations as well as more memory cells are generated from that record. Therefore, the overall number of memory cells containing those fields that are more critical from fraud point of view, and the possibility of detecting fraud of that type increases. This is proved by the test results [13]. 4.5. Scoring In most cases the alarms made by the fraud detection system are sent to fraud department in order to be investigated by fraud analysts. Further analysis is needed in order to make sure the records flagged as fraud are really fraudulent, so the number of flagged records should be kept low so that all of them can be analyzed. The department may call the card owner in order to make sure a record is fraudulent. The less the number of calls, the better for the bank or card issuer; because it impresses the organization’s reputation and trust. Therefore, it is essential to have a proper metric which shows how serious an alarm is. This will help improve analysis phase and reduce the effects of false alarm. In this paper we introduce fraud scoring for this purpose. Following is the fraud score calculation approach. After generating memory cells, we test the training set using KNN in order to evaluate memory cells and see how precise they are. Each memory cell is ranked based on its distance from each record it is matched with (Formula (3)). If the memory cell has the same class as the matching record, the memory cell gets a positive rank based on the distance; if not, the rank is negative. detrank = prerank ± (1 − d)
(3)
In Formula (3) we can see how the rate of a detector is calculated. Detrank is the rank of a detector we want to calculate. Prerank (previous rank) is the rate of the detector so far. d (same as Formula (1)) shows the distance between the detector and the matching record, so 1 − d is the affinity. If the memory cell has the same class as the matching record, the memory cell gets a positive rank based on the distance (which means detrank = prerank + (1 − d) is used); and if
44
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
not, the rank is negative (which means detrank = prerank − (1 − d) is used). Therefore, performing a correct detection with low distance, a memory cell gets a high rank. Similarly, performing a wrong detection with low distance, a memory cell gets a low (or highly negative) rank. At the end each memory cell has a rank which shows how precise it is. Following is how these ranks are used in scoring records. While testing the test set using KNN, the algorithm considers the class of K closest memory cells and takes vote. The rank of these K cells is used in order to score the record which is flagged as fraudulent. The score is calculated using Formula (4). If more precise memory cells (which have highly positive rank) detect the record as fraudulent, being very close to the record, the record will get a high score which shows it is a serious alarm. FScore = PScore + (detrank ∗ (1 − d))
(4)
In this formula, FScore (fraud score) is the score given to a flagged record. PScore (previous score) is the score of the record so far. Detrank is the rate calculated in Formula (3), and in the same way d is the distance between the record and the detector. 4.6. Implementation model In this paper we suggest a distributed implementation model for fraud detection system. The training phase of AIRS is timeconsuming. The solution suggested here is based on cloud computing which helps do the training phase parallel and get quicker results. There are two time-consuming steps in AIRS training: calculation of affinity threshold which is the average affinity between each two records in training set, and memory cell generation which processes each record in training set to find or generate the best matching memory cell. It is essential to decrease the training time considering the large amount of transactions entered each day. Apache Hadoop is a framework for running programs on cluster. It uses a computational paradigm named MapReduce which splits up the program into smaller parts and runs them on cluster nodes. Hadoop also uses a distributed file system (HDFS) which stores the data on cluster nodes. The management process of nodes and declining non-working nodes is automatic due to HDFS and MapReduce design. Training phase can be divided into two time-consuming parts: first the system should calculate the distance between each two records in training set and find the average distance in order to define Affinity Threshold. Second, each record in training set is processed in order to generate best memory cells. Doing these parts parallel will reduce training time considerably. In order to parallelize AIRS, following issues are involved: - The result of parallel parts of program should be combined. In case of memory cells this combination might result in duplicate cells. On the other hand parallelizing the calculation of Affinity Threshold is not easy as it needs all the data in order to get the precise average. - A program running parallel is as quick as the slowest machine. There are also management issues about machines and ignoring nonworking nodes. Hadoop addresses these issues: - Combining the results of parallel parts is done by Reduce function which is managed automatically; yet the programmer can add code to it. So one can check duplicate cells manipulating Reduce function.
- While we use virtualization, the limitation of computation power and memory of the nodes is not an issue anymore. We have used MapReduce API which separates job into tasks and runs parallel. The most important advantage of using MapReduce over other parallelizing methods is that in order to add more processing nodes, one may add new Hadoop nodes to the cluster and there is no need to change the code [14]. The management of the tasks running parallel is automatic so adding a new node is easy. Fig. 2 shows the implementation model. Each Map function in Initialization (Affinity Threshold calculation) part, sums the distance between each two records in one split of dataset and returns the total sum and the number of records to the Reduce function. Reducer gets the total sum and the number of records from all Mappers and returns the Affinity Threshold which is the average of all. In second part of training phase the memory cells are generated for each split of dataset. Each Mapper generates relative detectors and returns them to Reducer. As mentioned in Sections 4.1, 4.4 and 4.5 each Mapper considers improvements relatively. That is, each Mapper uses Negative Selection, rates the detectors, and uses ˛ (which is calculated over the whole training dataset). The Reducer here simply gathers the generated detectors. Using key-value pairs, the duplicate memory cells are united in Reducer and returned to the main program. The test (Classification) phase contains KNN algorithm, along with Fraud Scoring and update. 5. Results and discussion 5.1. Evaluation metrics Four parameters are used for evaluating fraud detection methods – classification methods in general: True negative (TN) – the number of normal transactions flagged as normal, false negative (FN) – the number of fraudulent transactions flagged wrongly as normal, i.e. missed fraud cases, true positive (TP) – the number of fraudulent transactions flagged as fraud i.e. detected fraud cases, false positive (FP) – the number of normal transactions flagged as fraud. Obviously, a method which offers minimum FP and FN, and maximum TP and TN is the best method. Following are some metrics which are defined based on these parameters [15]: False Positive Rate = Detection Rate = Hit Rate =
FP FP + TP + FN + TN
TP TP + FN
TP TP + FP
(5) (6) (7)
There are also other metrics including the cost function based on these parameters, the training time of the algorithm addressed in some papers, and the precision which shows the percentage of the records flagged correctly. All these metrics are based on the number of the mistaken and correct classifications the algorithm makes. Jyotindra and Ashok [16] offer a Risk Score for evaluation. It is claimed that the risk score is minimum for those transactions similar to user behavior and maximum for those of different from user behavior. Krivko [12] compares rule based method to behavioral models. Evaluation is based on TP and FP. Also Yu and Dasgupta [15] use TP to prove their suggested method is better. Brabazon et al. [5] consider False Alarm Rate, Missed Fraud Rate, Precision, and the algorithm running time. In some other papers a cost function is calculated based on FP and FN. Gadi et al. [6] use following cost function in order to compare some methods for credit card fraud detection: Cost = 100 × FN + 10 × FP + TP
(8)
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
45
Fig. 2. Implementation model of AFDM using MapReduce.
This means each missed fraudulent transaction has a cost of 100 currency units, while each false alarm costs 10 currency units. It also considers 1 currency unit for processing a fraudulent record. The cost imposed by the mistakes of the fraud detection system does not only include processing costs, but also other parameters. First, the amount of missed frauds directly affects the bank or card issuing company. Second, each mistake needs further processing. Fraud analyzers do processing or contact card owner in order to be sure that the transaction is done by them. If the number of false alarms is not kept low, lots of analytics is needed as well as contacting owners for each suspicious case. This can affect reputation of the organization. Therefore, in addition to the processing cost, each false alarm imposes the cost of reputation loss. The problem with FP and FP parameters is that they do not represent the efficiency of the fraud detection system. Of course these parameters are rather than enough about classification methods. Yet fraud detection is a delicate application of classification in which cost is strictly involved. So the number of mistakes is not merely enough. One needs more precise metrics to show the real cost of these mistakes, which includes the processing cost, the cost of reputation loss, and the amount of money involved in each fraudulent transaction. Maybe a method works very well and detects lots of fraudulent transactions, but what if all detected transactions are retail ones with low value. This might be a result of the database containing lots of low-valued fraud cases. Then the training phase uses these cases and fails to detect high-valued fraud cases. So a good method is one which does not miss high valued frauds because the value of a missed fraudulent transaction is direct cost for the bank. The writers have introduced a new metric for evaluating fraud detection methods [17]. In this metric the following const function is used: Total Cost =
(Amount FN) −
(Amount TP) + c ∗ FP
(9)
Amount FN and Amount TP respectively show the total amount of undetected fraudulent transactions and detected ones. c is a parameter which shows the cost of each false alarm. It can be initialized
according to the processing cost and the anticipated reputation loss cost. Amount FN is a direct cost whereas Amount TP is a saved cost. Therefore, first one is positive and the latter is negative. FP imposes cost too. So its corresponding cost is added to total cost. 5.2. Dataset We used a real dataset to test our model. The dataset is gained from transactions of a Brazilian bank. Gadi et al. [6] use this dataset to compare methods to AIRS. It is 3.74% fraudulent which contains lost and stolen cards, skimming, mail and telephone orders, and account takeover. 5.3. Results Evaluation is based on each improvement on AIRS mentioned so far, and each one is compared to base AIRS. We have used Robust Parameters for AIRS based on the results of [6] for all tests. Table 1 shows the results; each row of which shows the results of one improvement. We have used metrics mentioned in Formulas (5) through (9) in order to evaluate our model. c is equal to 1 in Total Cost function. The values of the amount field of transactions are between 1 and 9. Knowing this we decided about the value of c. As mentioned before it can be decided based on dataset properties and the real cost of false alarm. The first row in Table 1 shows the results of base AIRS. Comparing the results of all rows one can see the best results are gained when we implement all improvements (last row). Also both cost functions get better results for improved AIRS. Yet each improvement solely has better results than the base algorithm. Only if we consider dataset analysis, Hit Rate decreases slightly and FP Rate becomes higher than base AIRS. This is because of the focus on Fraud Detectors or Fraud Memory Cells. As mentioned in Section 4.4 we create more memory cells for those records which contain critical fields, and then the overall number of Fraud Memory Cells is higher than base AIRS. So it is normal if this method flags more
Table 1 Test results of each improvement on AIRS separated by rows. Last row shows the results of implementing all improvements in AFDM. Method
TN
FN
TP
FP
Detection rate
FP rate
Hit rate
Cost
Total cost
AIRS [6] AIRS with Memory Cell Generation Improved AIRS with Distance function Improved AIRS with Update AIRS with Dataset properties considered AFDM
8226 8227 8266 8252 8111 8253
189 185 174 164 177 157
137 141 152 162 149 169
179 178 139 153 294 152
0.420 0.433 0.466 0.497 0.457 0.518
0.021 0.021 0.017 0.018 0.035 0.017
0.434 0.442 0.522 0.514 0.336 0.526
479 448 259 177 455 70
20,827 20,421 18,942 18,092 20,789 17,389
46
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
90
85.39
80 70
63.05
Decrease %
60 50
45.93
40 30 20 10
6.47
5.01
0 Memory Cell Generaon Improved
Distance funcon Improved
Updang System
Dataset properes considered
All improvements
Diagram 1. The decrease for cost function for each improvement.
records as fraudulent because of using KNN. Comparing Detection Rate shows that Improved AIRS is the best. Updating system also results in better Detection Rate than other improvements and it is quite close to Improved AIRS. Whereas Negative Selection has the least increase over base AIRS. This is because Negative Selection aims at decreasing the mistakes, not increasing TP or TN. Improving the distance function is as effective on FP Rate as implementing all improvements on AIRS. Diagrams 1 and 2 compare the decrease of cost in two cost functions individually. Diagram 1 shows much more increase than Diagram 2. So based on Cost function the improvements are doing well. Total Cost function emphasizes on the amount of transactions, so if the detected fraud cases are high-valued the cost function will show larger decrease.
Further analysis is done over results in order to evaluate the Scoring of flagged records, shown in Diagrams 3 and 4. The correlation between assigned score and fraud field is 0.25 which is rather high, and shows that assigned score can show how critical a flagged record is, that is, the possibility of an alarm to be right. Diagram 3 shows the histogram for TP scoring, which shows the frequency of records having each score value. As shown in this diagram most detected fraudulent records have got a score of 0.23. On the other hand considering Diagram 4, which shows the histogram for FP scoring, one can see that the score of the most false alarms is 0.22 and less. This can be considered as a domain boundary for scores. Flagged records having a score of 0.22 or less possibly are not fraudulent, and those having a score of 0.23 and more are more likely fraudulent and need further processing and investigation.
18
16.51 16 14
13.13
Decrease %
12 10
9.05
8 6 4
1.95 2
0.18 0 Memory Cell Generaon Improved
Distance funcon Improved
Updang System
Dataset properes considered
Diagram 2. The decrease for total cost function for each improvement.
All improvements
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
47
Frequency
TP Score 120.00%
100 90 80 70 60 50 40 30 20 10 0
100.00% 80.00% 60.00% 40.00%
Frequency Cumulative %
20.00% 0.00%
Bin Diagram 3. TP score histogram.
FP Score 120.00%
40 35
100.00%
Frequency
30 80.00%
25
60.00%
20
Frequency
15
40.00%
Cumulave %
10 20.00%
5
0.00% More
0.256
0.217
0.178
0.14
0.063
0.101
0.024
-0.015
-0.053
-0.092
-0.13
-0.169
0
Bin Diagram 4. FP score histogram.
Table 2 The results of running algorithm parallel using MapReduce. Node Prop.
CPU (GHz)
RAM (GB)
No. of nodes
No. of map functions
Threshold computing time (s)
Memory cell generation time (s)
Serial Test case 2 Test case 3 Test case 4 Test case 5
2.50 GHz 2.50 GHz 2.50 GHz 2.50 GHz 2.50 GHz
1 GB 512 MB 512 MB 128 MB 128 MB
1 2 2 4 4
– 4 10 4 10
131 75 81 87 79
168 91 94 133 132
As we discussed in Section 4.6 the implementation model is based on Hadoop. In order to prove that this implementation method improves the system’s training time, we did the following tests. Table 2 shows the training time of fraud detection system in 5 test cases. In the first test case the system runs serially. Training time in these tests is divided to two parts: Threshold computing time and Memory Cell Generation time. These are the two parts which are parallelized independently. If the number of training records were higher, the improvement in these results would be more apparent. However, the parallel tests show better results. Table 2 lists the properties of each parallel node, number of nodes in each test, number of Mappers, and the time results. The number of Mappers is the same for both parallelized parts. Reducer function in both parts gathers the results and has no specific process so we have just one reducer in all tests. Two last columns show time by seconds.
As it is seen, having 2 nodes and 4 Mappers decreases time in both parts. It is the effect of parallelizing. However, test case 3 shows different behavior. While it has better results than serial test, there is a slight increase in time than test case 2. It is because of the extra processes of communication between Mappers and management of parallel parts. If the number of records in training set were more, the extra processing time would be ignorable and even imperceptible. This increase is seen also while comparing test case 3 to test case 4, yet the amount of increase is higher, which is a result of adding two more nodes which means more communication and more process. Test case 5 shows slightly better results than test case 4; which we believe is because of handling 10 Mappers being easier for 4 nodes than 2 nodes, and subsequently wasting less time. It seems that test case 2 is a standard configuration considering the number of records in training set and the node properties, which has the best results too.
48
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49
6. Conclusion This paper addressed credit card fraud detection using AIS (Artificial Immune System), and a new model called AIS-based Fraud Detection Model (AFDM) was introduced for this purpose. The model added some improvements to AIRS (Artificial Immune Recognition System) algorithm which helped to increase the precision, decrease the cost and system training time. Affinity between antigens was calculated using a novel method in AFDM. Negative Selection was used along with Clonal Selection in order to achieve higher precision We preprocessed dataset in order to generate more accurate memory cells. We added update to the system; then AFDM is updated whenever a fraudulent transaction is detected. Considering the training time of the system, which is rather high, AFDM uses a cloud computing solution in order to perform the training phase in parallel. Another issue about fraud analysis is the fact that fraud detection systems are based on tagging transactions as fraudulent or normal. AFDM scores flagged transactions, which shows how risky a fraud-flagged transaction might be. One major result is that any individual improvement has better results, yet not as good as all together. Each of these changes is addressing one of the inefficiencies of the base algorithm. Table 1 shows that improving memory cell generation improves the detection rate. This is because this improvement focuses on generating more precise fraud memory cells. Changing distance function, as we see in Table 1, performs better regarding FP. Because, as we explained in Section 4.2 this distance formula is closer to the meaning of distance of two transactional records. This results in a better performance of KNN. The next improvement, updating system during test phase, results in higher detection rate which is expected due to adding new fraud memory cells to the system, and covering a wider range of fraud types. Finally, preprocessing the data set, another improvement, is expected to have higher detection rate. In this improvement, the system considers fraud cases (only), and the way the values of different fields impact fraud. That means we do not focus on normal transactions, so we do not expect improvements in FP, but in TP. Having all these changes together makes the best results. AFDM improved detection rate up to 23%, decreased cost up to 85%, and training time up to 40%. We could get higher than 50% detection rate while having a very low false positive rate (less than 2%), which is considerable. Also, implementing the parallel model only in a test environment shows fair decrease in training time, which is expected to be better in a real cloud computing system. Improving classification methods for fraud detection, which is a delicate application, and dependant to the training dataset requires further work than just improving parameters or adding extra parts. The writers believe that doing more process on transaction records might need more time and power, which is worth the improvement of results. Furthermore, this paper uses parallelization and addresses an issue which has not been aimed before, but is important. 7. Discussion and future work As we showed in this paper, further improvements on AIS can help getting better results in fraud detection. The writers believe AIS has potential for getting much better results. Following are some improvements which could be done on this paper: - Weighting dataset fields in distance function. There are plenty of data fields in a transaction database some of which are more important than the others considering fraud detection. However, some others are least important i.e. there is no meaningful correlation between those fields and being fraudulent. Therefore,
the distance function could ignore those fields or reduce their effect – similar to what is done in Neural Networks [3]. This can be done by weighting fields in distance function. There are several ways for determining the value of weight for each field, e.g. cross-validation. - Artificial immune networks for fraud detection. AIN considers immune cells as interconnected network nodes which can communicate with each other. Considering the models introduced in [18], the idea is here: when a detector cell detects a non-self with low affinity, it can stimulate other detector cells then they help it detect more precisely. This idea can be quite useful in a typical fraud detection system, so that memory cells recall each other when they come across a suspicious record. An AIN-based fraud detection system could create detectors based on the importance of fields. So when a memory cell detects a suspicious cell, recalls the memory cells which consider more important fields. If they fail to determine the suspicious cell, they call memory cells concerning less important fields and so on. Then a decision can be made based on the overall decision of memory cells about a suspicious record. - Distance function based on dataset properties. As mentioned before a typical transaction contains many fields each of which has a different meaning. Therefore, the concept of distance for each field is different from its concept for others. As an example if the user has the habit of shopping on certain days of month, then the distance of 30 is actually equivalent to 0. Whereas the distance of 30 for transaction amount, shows a rather high distance. Therefore, it makes sense to calculate the distance for each field based on the application and the type of that field. - Cloud computing misuse detection using AIRS. There are many security issues about cloud computing. One is misusing cloud services for fraud, attacking, and hacking. This phenomenon is named Fraud as a Service, and addresses misusing Cloud computing services. We can use fraud detection techniques in order to detect misuse. A node can be considered as a cell which can be either self or non-self – a node which is being misused and has abnormal traffic or resource usage model. Detectors can define normal usage models for nodes.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.asoc.2014.06.042. References [1] M.E. Edge, P.R. Falcone Sampaio, A survey of signature based methods for financial fraud detection, Comput. Secur. 28 (2009) 381–394. [2] S. Panigrahi, A. Kundu, S. Sural, A. Majumdar, Credit card fraud detection: a fusion approach using Dempster–Shafer theory and Bayesian learning, Inf. Fusion 10 (2009) 354–363. [3] A. Krenker, M. Volk, U. Sedlar, J. Beˇster, A. Kos, Bidirectional artificial neural networks for mobile-phone fraud detection, ETRI J. 31 (2009) 92–94. [4] A. Srivastava, A. Kundu, S. Sural, A.K. Majumdar, Credit card fraud detection using hidden Markov model, IEEE Trans. Depend. Secur. Comput. 5 (2008) 37–48. [5] A. Brabazon, J. Cahill, P. Keenan, D. Walsh, Identifying online credit card fraud using artificial immune systems, in: 2010 IEEE Congress on Evolutionary Computation (CEC) [Proceedings], IEEE Press, 2010. [6] M. Gadi, X. Wang, A. do Lago, Credit card fraud detection with artificial immune system, Artif. Immune Syst. (2008) 119–131. [7] R. Huang, H. Tawfik, A. Nagar, A novel hybrid artificial immune inspired approach for online break-in fraud detection, Proc. Comput. Sci. 1 (2010) 2733–2742. [8] N. Wong, P. Ray, G. Stephens, L. Lewis, Artificial immune systems for the detection of credit card fraud: an architecture, prototype and preliminary results, Inf. Syst. J. 22 (2012) 53–76. [9] A. Watkins, J. Timmis, L. Boggess, Artificial immune recognition system (AIRS): an immune-inspired supervised learning algorithm, Genet. Program. Evol. Mach. 5 (2004) 291–317.
N. Soltani Halvaiee, M.K. Akbari / Applied Soft Computing 24 (2014) 40–49 [10] J. Brownlee, Artificial Immune Recognition System (AIRS) – A Review and Analysis, Centre for Intelligent Systems and Complex Processes (CISCP), Faculty of Information & Communication Technologies (ICT), Swinburne University of Echnology (SUT), 2005, pp. 16. [11] A. Kundu, S. Panigrahi, S. Sural, A.K. Majumdar, Blast-ssaha hybridization for credit card fraud detection, IEEE Trans. Depend. Secur. Comput. 6 (2009) 309–315. [12] M. Krivko, A hybrid model for plastic card fraud detection systems, Expert Syst. Appl. 37 (2010) 6070–6076. [13] N. Soltani, M.K. Akbari, M. Sargolzaei Javan, A new user-based model for credit card fraud detection based on artificial immune system, in: 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), 2012, IEEE, 2012, pp. 029–033. [14] V. Amiry, S.Z. Rad, M.K. Akbari, M.S. Javan, Implementing Hadoop Platform on Eucalyptus Cloud Infrastructure, in: 2012 Seventh International Conference on P2P, Parallel Grid, Cloud and Internet Computing (3PGCIC), 2012, pp. 74–78. [15] S. Yu, D. Dasgupta, Conserved self pattern recognition algorithm, Artif. Immune Syst. (2008) 279–290. [16] N.D. Jyotindra, R.P. Ashok, A data mining with hybrid approach based Transaction Risk Score Generation Model (TRSGM) for fraud detection of online financial transaction, Int. J. Comput. Appl. 16 (2011) 18–25. [17] N. Soltani, M.K. Akbari, M. Sargolzaei Javan, A new metric for evaluation and comparison of banking fraud detection algorithms, based on transaction cost, in: 2012 4th Conference on Information and Knowledge Technology (IKT), Iran, Babol, 2012. [18] J.C. Galeano, A. Veloza-Suan, F.A. Gonzalez, A comparative analysis of artificial immune network models, in: Conference on Genetic and evolutionary computation (GECCO), 2005 Proceedings of the US, Washington, DC, 2005.
49
Neda Soltani Halvaiee is graduated from the College of Computer Engineering and Information Technology at Amirkabir University of Technology. Her research focuses on credit card fraud detection. She also has a background of Cloud Computing, and usage of Artificial Immune Systems. She is currently doing PhD at Amirkabir University of Technology. She will extend her study to the field of Pervasive computing during her PhD.
Mohammad Kazem Akbari is an associate professor of high performance computing in the College of Computer Engineering and Information Technology at Amirkabir University of Technology. He leads cloud computing laboratory at Amirkabir University of Technology, where the research team examine cloud computing services, architecture, and development.