Applied Soft Computing 24 (2014) 142–157
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Defending against XML-related attacks in e-commerce applications with predictive fuzzy associative rules Gaik-Yee Chan a,1 , Chien-Sing Lee b,∗ , Swee-Huay Heng c,2 a b c
Faculty of Information Technology, Multimedia University, Cyberjaya, Malaysia Faculty of Creative Industries, Universiti Tunku Abdul Rahman, Malaysia Faculty of Information Science and Technology, Multimedia University, Melaka, Malaysia
a r t i c l e
i n f o
Article history: Received 22 May 2012 Received in revised form 12 June 2014 Accepted 28 June 2014 Available online 14 July 2014 Keywords: Intrusion detection Intrusion prevention Fuzzy logic Association rule mining e-Commerce
a b s t r a c t Security administrators need to prioritise which feature to focus on amidst the various possibilities and avenues of attack, especially via Web Service in e-commerce applications. This study addresses the feature selection problem by proposing a predictive fuzzy associative rule model (FARM). FARM validates inputs by segregating the anomalies based fuzzy associative patterns discovered from five attributes in the intrusion datasets. These associative patterns leads to the discovery of a set of 18 interesting rules at 99% confidence and subsequently, categorisation into not only certainly allow/deny but also probably deny access decision class. FARM’s classification provides 99% classification accuracy and less than 1% false alarm rate. Our findings indicate two benefits to using fuzzy datasets. First, fuzzy enables the discovery of fuzzy association patterns, fuzzy association rules and more sensitive classification. In addition, the root mean squared error (RMSE) and classification accuracy for fuzzy and crisp datasets do not differ much when using the Random Forest classifier. However, when other classifiers are used with increasing number of instances on the fuzzy and crisp datasets, the fuzzy datasets perform much better. Future research will involve experimentation on bigger data sets on different data types. © 2014 Elsevier B.V. All rights reserved.
Introduction Both the Internet and eXtensible Markup Language (XML)-based Web Services (WS) have revolutionised the Information Technology (IT) industry due to their many attractive features such as platform independence, interoperability, ease of use and ability to transport huge amount of information over the World Wide Web. Thus, more and more software applications, especially e-commerce applications are built on the Internet-enabled WS-platform. Consequently, the complex and largely unsecured Application Layer is open to various types of threats. This is further confirmed by a study by Veracode [1] on software-related Cybersecurity risks. It reported that 84% of Web applications are found to be vulnerable to the most
∗ Corresponding author. Tel.: +60 3 7956 1831. E-mail addresses:
[email protected] (G.-Y. Chan),
[email protected],
[email protected] (C.-S. Lee),
[email protected] (S.-H. Heng). 1 Present address: Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Malaysia. Tel.: +60 3 8312 5215. 2 Present address: Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450, Bukit Beruang, Melaka, Malaysia. Tel.: +60 6 252 3084. http://dx.doi.org/10.1016/j.asoc.2014.06.053 1568-4946/© 2014 Elsevier B.V. All rights reserved.
frequently exploited Web application vulnerabilities. As a result, the Application Layer is open to various WS related threats such as Structured Query Language (SQL) injection, XML injection, XML content and parameter tampering, Simple Object Access Protocol (SOAP) oversized payload, coercive parsing, and recursive payload leading to XML Denial-of-Service (DoS) attack. The costs of these security threats are tremendous. Enterprises not only lose the confidentiality, integrity and availability of the WS system, but also the loss of data, businesses and the confidence of customers. Intrusion detection (ID) and intrusion prevention (IP) systems are mainly network or host-based. These systems merely detect attacks such as User-to-Root (U2R), Remote-to-Local (R2L), Denial of Service (DoS) and Probe by observing various network and host activities. Moreover, packet-level firewalls and ID systems are not able to secure WS traffic because they do not detect SOAP and XML traffic. For example, SOAP typically uses HTTP or SMTP. Therefore, it easily passes through traditional firewalls – a phenomenon known as the port 80 problem as mentioned in Khari et al. [2]. Unfortunately, WS vulnerabilities and attacks related to SOAP and XML might eventually turn into XML DoS attacks. Additionally, these existing known attacks may evolve to new types of attacks as new hacking skills and tools evolve with time as well. Although the WS-Security standard exists, it does not define any direct
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
countermeasure for DoS attacks according to Jensen et al. [3]. In addition to this, encrypted content can conceal attacks such as oversized payload, coercive parsing leading to XML injection or XML DoS. In view of this security threat, a predictive fuzzy associative rule model (FARM) incorporated in an adaptive framework is developed to validate inputs to counter attacks in WS e-commerce applications. FARM provides an added layer of security protection to compliment the inadequacy of the network-based ID systems and WS-Security standard. This paper is organised as follows: the next section reviews related works, third section describes the ID/IP framework, fourth section defines the methodology and experimental design, fifth section presents the evaluation of the predictive fuzzy association rule model, summarises and compares results and the last section concludes and discusses future work.
Related work Hybrid AI techniques to countermeasure network-based attacks Research has been on-going in developing enhanced techniques in intrusion detection and prevention to fine-tune both signaturebased and anomaly-based approaches to more accurately identify, prevent and predict attacks, while minimizing false alarms. To provide the best for both signature- and anomaly-based intrusion detection, a hybrid approach combining Artificial Intelligence (AI) and data mining techniques is actively being applied in ID/IP researches. Some of these techniques are decision tree, rule-based technique, fuzzy logic with association rules and frequent episodes, genetic algorithm, neural network, Bayesian network, support vector machine and so on. The techniques used in this study are association rule mining and fuzzy logic. The former helps us to identify associative patterns. We can adjust the confidence and support levels to generate different numbers of interesting rules. Hence, this approach is different from the classification approach. We choose to use fuzzy logic based on the following reasons. Unlike the classical set denoted by Boolean logic, fuzzy logic is a form of many-valued logic which corresponds to “degrees of truth”, that range between 0 and 1. Fuzzy logic, in the form of many-valued logic, is used to deal with reasoning that is approximate and thus, suitable for use in reasoning with uncertainty. Fuzzy logic is used in FARM to determine the decision for actions based on the relationship between the SOAP size, XML content, input size and input values. Additionally, even though SOAP size and input size are continuous variables, their sub-ranges can be transformed into linguistic variables, where the degree of truth of each linguistic variable is governed by its membership function. If non-fuzzy sets are used, input values can either be valid or malicious, input size and SOAP size can either be in the normal range or out-of range, XML content is either malicious or nonmalicious, then decision outcome would only be certainly deny or certainly allow access. As a result, many probably deny access cases may turn into false alarm cases, contributing towards a high false alarm rate. With fuzzy logic, our adaptive framework coupled with FARM is able to detect, prevent and predict attacks such as SOAP oversized payload, coercive parsing, recursive payload leading to XML DoS, SQL injection, XML injection, and XML content and parameter tampering with competitive detection and prediction accuracy and minimal false alarms. In another research, Chan et al. [4], fuzzy if–then rules are obtained from an adaptive neural fuzzy inference system (ANFIS) to countermeasure WS attacks with good detection accuracy and minimal false alarms. Many fuzzy rule-based models such as Chen [5], Lin et al. [6] and Hsu et al. [7] dealing with uncertain nonlinear
143
systems are found to be effective as well. Hence, in the following elaborations, we focus on these two techniques only. Prior studies (Table 1a) have demonstrated that association rule mining and fuzzy logic each have been used with other AI techniques for high detection accuracy and low false alarm rate. For example, association rule mining is used together with multilayer perceptron neural networks in Sheikhan and Jadidi’s [8] hybrid misuse-based ID system resulting in a classification-based association rule approach. In this approach, the inputs are selected through feature relevancy analysis. Researchers Zainal et al. [9] have deployed linear genetic programming, ANFIS and random forests in their ID system. Jawhar and Mehrotra [10] have combined fuzzy C-means and neural network. Examples of research which hybridise fuzzy logic with association rule mining are Tajbakhsh et al. [11]; fuzzy association rule mining with genetic algorithm, Ozyer et al. [12]; fuzzy association rule mining with neural network, Sheikhan and Rad [13] and fuzzy association rule mining with genetic network programming, Mabu et al. [14] in their ID/IP systems. These systems aim to improve detection accuracy and reduce false alarm rate. On the other hand, researchers Markam and Dubey [15] have utilised fuzzy association rule mining and K-means clustering in their ID/IP system for improved efficiency in memory and time performance. Refer to Table 1b for more details. From the above studies, it is found that the ID/IP systems which are built using fuzzy logic with neural network (Table 1a: Item #2 and Item #3) and fuzzy logic with association rule mining techniques (Table 1b: Items #2, #3, and #4), are able to achieve high detection rates of 94–99% or greater, and false alarm rates as low as 1% or lesser. However, it needs to be highlighted again that these efficient and effective systems, whether signature-based and/or anomaly-based are network-based ID/IP systems for identifying abnormalities in network traffic for detection of Probe, DoS, U2R and R2L. They are not for the detection of WS application-based attacks related to SOAP and XML. Nevertheless, the above studies have provided evidence of the feasibility and possibility that application-based ID/IP systems, such as our FARM, can also employ these techniques in defending against WS attacks related to SOAP and XML effectively.
Techniques to countermeasure WS attacks Although research have been conducted to countermeasure Web application attacks, they are still not adequate in countering all the SOAP and XML-related attacks effectively and efficiently. For example, Ye [16] has designed a scheme to authenticate and validate a service request when the system is suspected of being under XML DoS attack. Experiments conducted show that the time taken to authenticate and validate SOAP messages increases as the SOAP size increases. This is due to the fact that more time is taken for the system to digest and decrypt larger SOAP messages. Studies tabulated in Table 2 also revealed that existing approach or technique has limited attack coverage. AI, data mining and various ID/IP have been conducted to countermeasure WS attacks with effective and efficient results. In Relan and Sonawane [17], the Markov models developed are incapable of detecting malicious user behaviour. They are merely capable of detecting SQL injection attacks, buffer overflow attacks, and crosssite scripting (XSS) attacks. In another study by Thakar et al. [18], requests for WS are simulated on honeypots and the support vector machine-based semi-supervised classifier used is able to intercept SOAP request to identify, for example, SQL injection and XML DoS attacks only. The coverage of other research techniques, for example, Pinzón et al. [19], Bazarganigilani et al. [20], Karthigeyan et al. [21], Patil and More [22], Corona et al. [23], Chan et al. [4] and
144
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Table 1a Hybrid approach: AI and data mining techniques to increase detection and reduce false alarm rates. Item #
ID techniques & approach
Type of attacks & research focus
Detection rate: (DTR %) False alarm rate: (FA %)/False positive rate: (FP %)
Source from:
1
Association rule + Neural Network (Signature-based)
DTR: 94.71–95.14 FA: 1.49–1.58
[8]
2
ANFIS + Linear Genetic Programming + Random Forests (Anomaly-based)
DTR: 99.27–99.96 FP: 0–0.0029
[9]
3
Fuzzy C-Means + Neural Network (Signature & Anomaly-based)
Detection of network-based attacks (KDD99 data set with 16 features) Handle feature selection for improved classification accuracy Detection of network-based attacks (KDD99 data set with 41 features) Ensemble of classifiers utilizing different learning mechanisms for improved detection accuracy Random Forest to address imbalanced dataset problem Detection of network-based attacks (KDD99 data set with 35 features) Classification model with high detection accuracy & low false alarm rate
DTR: 99.9 FA: 0.01
[10]
Table 1b Hybrid approach: fuzzy logic and association rule mining techniques. Item #
ID techniques & approach
Type of attacks & research focus
Detection rate: (DTR %) False alarm rate: (FA %)/False positive rate: (FP %)
Source from
1
Fuzzy association rule mining (Signature & Anomaly based)
DTR: 80.6–91 FP: 2.95–3.34
[11]
2
Fuzzy association rule + Genetic Algorithm (Anomaly-based)
DTR: 97.4 FA/FP: not available
[12]
3
Fuzzy association rule + Neural Network (Signature-based)
DTR: 94.37–96.81 FA: 0.18–0.36
[13]
4
Fuzzy association rule + Genetic Network Programming (Signature & Anomaly-based)
DTR: 94.4–98.7 FP: 0.53–7.2
[14]
5
Fuzzy association rule + K-means clustering (Signature & Anomaly-based)
Detection of network-based attacks (KDD99 data set with 41 features) Generation of human comprehensible rules that handle symbolic or categorical attributes on large datasets Detection of DoS attacks (KDD99 data set with 42 features) Classification and generation of human comprehensible rules Detection of network-based attacks (KDD99 data set with 31 features) Feature selection, reduction on dimension of input feature vector to classifier Detection of network-based attacks. (KDD99 data set with 41 features) Dealing with discrete and continuous data attributes Extraction of association rules for enhanced detection Detection of network-based attacks (KDD99 data set with 6 features) Evaluation of technique based on packet, memory & time performance
DTR: not available FA/FP: not available
[15]
Kalavadekar and Mogal [24] tabulated in Table 2 provide more details. Our adaptive framework and predictive fuzzy association rule model In order to capture both signature-based and anomaly-based intrusions with minimal false alarms, an adaptive framework coupled with a predictive fuzzy association rule model has been developed to counter SOAP and XML-related attacks. In our framework (Fig. 1), similar to any real-life practical system, valid user login ID and password are initially captured and stored as normal user profile for subsequent authentication purpose. Particularly in e-commerce applications where payment for items purchased are made through credit cards, user’s 16-digits credit card number and 3-digits pin number are maintained in the database of a “Financial Entity” for subsequent authentication and authorisation. Other user behaviour or dimensions such as user’s email address, initial SOAP message size for a request, XML syntax and SQL commands are also captured as normal profile and stored in a normal profile database. Unethical use of confidential information is not a concern and in fact, our framework is to prevent unethical or unauthorised use of confidential information!
As service request is transmitted, user information (user’s login ID, password, 16-digits credit card number, 3-digits pin number, and email address), and SOAP service request information (SOAP message size, computed inputs size, XML syntax and SQL commands) are captured and compared to the normal profile. Any violation to the normal profile is then dynamically identified and immediate action is taken to block, reject the request, terminate the subsequent activity or grant an alternative action. The anomaly captured is then stored in an anomaly database for further offline analysis for confirmation as a genuine existing attack or discovery of new attack. However, items that have normal profile are not captured as anomalies. They are allowed to continue along with normal activities. Hence, performance is affected at a minimum. Another Web service request transaction instance flowing through the framework is shown in Fig. 2. As Fig. 2 shows, a user submits credit card information through the e-commerce application onto the proxy server, which serves as honey pots. Input values, input lengths and SOAP message size are validated. The validated values are then matched with those rules generated by the fuzzy association rule model. After matching with the rules, a single decision is made whether to certainly allow, certainly deny or probably deny access. If the transaction is found to have valid inputs with non-malicious XML content and normal SOAP size, then it is
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
145
Table 2 Research to countermeasure WS attacks. Item #
ID techniques & approach
Type of attacks/Research focus/Performance evaluation/Commentsa
Source from:
1
Decision trees and multilayer perceptron
[19]
2
XML similarity classifiers
3
Security policies
4
Container architecture + lightweight virtualisation technique
5
Multiple classifier + Anomaly templates
6
Container architecture + query processing & mapping
7
ANFIS + business policies
A distributed hierarchical multi-agent architecture for blocking malicious SOAP messages. False alarm and prediction error rates range from 12% to about 3%. This approach can also be considered as a means to prevent and detect DoS attacks in the WS environments. However, there is still room for improvement on accuracy and false alarm rates. Much work is especially required in checking the validity of the architecture in heterogeneous real environments To distinguish malicious SOAP request and XML content based on the structure and content. Performance evaluation of this method is promising with high accuracy. However, time performance issue is not discussed while dealing with XML content and structure which is resource intensive By limiting the size of a payload, limiting the time to process a SOAP request, limiting the maximum number of requests, limiting the number of bytes contained in any given XML message and so on in curbing XML DoS attacks with satisfactory results. There could be more intelligent technique in countering XML DoS and other WS attacks instead of just simple setting of security policies The techniques identify session hijack and SQL injection attacks. The virtualisation technique is used to assign each user’s web session to a dedicated container in an isolated virtual computing environment. In this way, the container ID is able to accurately associate the web request with the subsequent database queries. Other main WS attacks are not addressed by these techniques User-defined models in the form of anomaly templates that associate a suitable action in response to anomalous web traffic is able to accurately detect attacks such as XSS, SQL injection and buffer overflow attacks with false alarm rates ranging from 5.4% to 0.47%. The anomaly templates have to be constantly updated to reflect changing anomalous web traffic which means detection or prevention could not be truly real-time Container architecture is used to keep each session separate so that each client’s communication occurs in different channel. Through query processing and mapping, this approach is able to identify which SQL query is for which HTTP request, thus detecting attacks in a multitier web environment. It has limited attack coverage especially for anomalous WS attacks. Additionally, the time performance issue when handling HTTP requests and sessions over the multitier web environment is not discussed Fuzzy if–then rules are obtained from an adaptive neural fuzzy inference system (ANFIS) to countermeasure SOAP and XML attacks with good detection accuracy (∼99%) and minimal false alarms (∼1%). Only two attributes (SOAP size and input size) are used as inputs for the fuzzy if–then rules to determine the decision output. There may be other input attributes that could influence the decision outcome, hence the accuracy of the results
a
[20]
[21]
[22]
[23]
[24]
[4]
Note: WS attacks include but not limited to XML DoS, XML content tampering, SOAP oversized payload- XSS, SQL/XML injection, buffer overflow and new types of attacks.
Fig. 1. An overview of the adaptive framework.
146
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Fig. 2. An instance of a web service request transaction flowing through the framework.
considered a normal transaction. Thus access is certainly allowed. However, if inputs or XML contents are malicious or SOAP size is out-of-range, then access is certainly denied. In the case that XML contents are non-malicious but inputs violate business policies, then decision is made to probably deny access. For a normal transaction, user inputs and transaction information are duplicated onto the normal profile database of the main server. For transaction whose access has been denied, action is taken to reject, block or kill the transaction immediately, and inputs and transaction information are updated to the malicious profile database of the proxy server. In the case of XML DoS attack, re-boot of the proxy server may be necessary. For transaction whose access is being probably denied, action is taken to reject the current transaction while granting second attempt to re-enter the inputs, at the same time updating the inputs and transaction information to the malicious profile database. Isolating the normal profile and malicious databases and servers will ensure backend main server is not affected by attacks and the backend normal database is not contaminated by attack data. Consequently, the framework is able to detect, prevent and predict attacks on a real-time basis while affecting the normal transaction’s performance and main server’s availability at a minimum. In addition, this framework has an adaptive nature as the anomaly database is dynamic whereby new attacks or variants are being identified continuously. As time evolves, the normal profile database is constantly updated to reflect changes in the normal user profile and the anomaly database is updated almost in real-time to adapt to new genuine attacks. Furthermore, reports generated from the normal and anomaly databases allow the security or system administrator to flexibly monitor, configure and fine-tune the system according to security and business policies. This provides optimum detection with minimum false alarm and more accurate remedy or preventive action. Methodology and experimental design The simulated data and business policies At this point, there is no open access Web-service-derived ID/IP data. Hence, we have generated our own set of data. Simulation of an WS e-commerce application is carried out by using open source tools such as wsKnight [25]. This tool is used in penetration testing on WS applications to simulate attacks and locate vulnerabilities
[26]. It captures a WS request and displays the entire XML scheme with SOAP size. Valid or malicious inputs can be entered into the user input fields or the XML content and request re-submitted. A response of whether the request gets through or an error message is displayed indicating the request has caused an internal server error. Different transactions with valid inputs or with existing malicious codes (refer to Appendix D of Andreu [26] for a list of malicious codes) are entered into the WS e-commerce application to simulate the scenarios for different SOAP sizes. As every WS request comes with a specific content-length, when different inputs are entered, the content-length then varies according to the length of the inputs regardless whether the inputs are genuine or malicious. Even though the user inputs do not contain malicious codes, if the SOAP size is out of normal range, then the transaction will still cause an internal server error. Referring to Fig. 3a, the actual SOAP size should be 400 bytes, but the captured SOAP size is 439 bytes, oversized by 39 bytes. This is due to malicious codes being inserted in the XML content causing an oversized payload or CDATA attack. When the SOAP size is in normal range, but it still causes an internal server error (Fig. 3b), this is due to malicious codes, (@’), entered in the user input fields. Making use of this fact, for every WS request, the SOAP message size is checked against the initial size, and input values and input fields lengths are validated following a set of business policies. Business policies are applied onto the data set so as to limit the input values to lie within the valid range of values. Outliers that contain invalid input values and out-of-range input or SOAP size can then be identified. The business policies are listed as shown in Table 3. Following the SOAP size obtained from simulation that indicates one alphanumeric consuming one byte of storage, the original SOAP size with no input is therefore, 373 bytes. Subsequently, the minimum input size (38 bytes) and maximum input size (64 bytes), the minimum SOAP size (411 bytes) and maximum SOAP size (437 bytes) and the normal SOAP size range for the data set are computed. Each simulated transaction with the validated input values, input size, and SOAP size are then tabulated together with appropriate decision codes as follows: • P Deny (probably deny access) indicating inputs are invalid (in violation of business policies) but XML content is non-malicious regardless of the input and SOAP sizes (Table 4, case 1);
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
147
Fig. 3. (a) Sample simulation showing CDATA attack. (b) Sample simulation showing malicious input.
• C Allow (certainly allow access) indicating normal transaction with valid inputs and non-malicious XML content (Table 4, case 2); • C Deny (certainly deny access) indicating either inputs or XML content is malicious (Table 4, case 3: SQL injection; case 4: Buffer overflow; case 5: Oversized payload; case 6: XML DoS; case 7: XML content tampering). Notice that the decision category for P Allow (probably allow access) is redundant and therefore, not applicable here due to the efficient binding effect of the business policies The efficient binding effect, as shown in Table 5, has given rise to elimination of false
negative alarm whereby the probably allow access case turns into a certainly deny access case. Referring to Table 5 case 1, when no business policy is imposed, there is no restriction on the input values. As inputs are not validated, input size is assumed to be in normal range and XML content is assumed to be non-malicious. Consequently, decision is made to probably allow access. However, when business policies are imposed, the transaction reveals that the inputs have violated business policies. After input validation and re-computation of input size and SOAP size, it is found that they fall out-of the normal range. Therefore, the decision is re-tagged to certainly deny access (Table 5, case 1a).
Table 3 List of business policies. Policy #
Inputs Policy Description
1
2
3
4
5
6
Credit card # 16-digit numbers only, mandatory input
3-digit pin 3-digit numbers only, mandatory input
Payment amount Float values with 2 decimals, 9999999.99, mandatory input
Email address Min. 15, max. 35 alpha-numeric with no special characters, except @, mandatory input
Input size Min. 38, max. 64 bytes
SOAP size Min. 411, max. 437 bytes. (original SOAP size with no input is 373 bytes)
148
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Table 4 Categories of decision outcomes. Case #
Input values
Size & decision
Comments
1
Credit card #: 123456789012345 3-digit pin : 1 : 55 Amount : aaa Email Credit card #: 8765432187654320 3-digit pin : 333 : 77.00 Amount :
[email protected] Email Credit card #: ‘or 1=1 or ”=1‘ 3-digit pin : ‘or 1=1 or ”=1‘ : ‘or 1=1 or ”=1‘ Amount : ’or 1=1 or ”=1 Email Credit card #: * 3-digit pin : * Amount :* : ************************ Email **************************************** Credit card#: 123456789012345 3-digit pin : 555 : 3.33 Amount :
[email protected] Email
Input size: 21 SOAP size: 394
Invalid inputs in violation of business policies No malicious input, greatly undersized
2
3
4
5
6
7
Decision: P deny Input size: 45 SOAP size: 418 Decision: C allow
Valid inputs, non-malicious XML, normal size
Input size: 64 SOAP size: 437 Decision: C deny
Normal range, Malicious input (SQL injection)
Input size: SOAP size: Decision:
Malicious input, greatly oversized (Buffer overflow)
103 476 C deny
Input size: 38 SOAP size: 440 Decision: C deny
Credit card #: 123456789012345 3-digit pin : 555 : 3.33 Amount :
[email protected] Email Credit Card#: 123456789012345 3-digit pin: 555 Amount:102.35 Email:
[email protected]
Furthermore, if a probably allow access transaction is given a second attempt to re-enter inputs, the decision outcome would be certainly allow access for normal transaction, probably deny access for invalid inputs or certainly deny access for malicious transaction which will be captured accordingly. To allow second attempt for probably allow access transaction to re-enter inputs is, therefore, redundant, and so is the decision category for probably allow access. Eliminating this redundancy also eliminates the possibility of false negative alarms at the first instance. False negative alarms are vulnerabilities which are not identified while in actual fact they exist and decision should be made to deny access rather than allow access. Consequently, this provides detection and prevention of vulnerabilities or attacks on a real-time basis. In real-life environments, a genuine user will not intentionally enter invalid inputs that violate business policies consecutively. In order to segregate genuine transactions from genuine attacks, another business policy can be imposed. This policy is to allow those users who have been probably denied access the first time, as in FARM, to re-enter valid inputs. A genuine attack shall remain a genuine attack and will not turn into a false positive if subsequent attempts for re-entering inputs still render it a probably or certainly deny access. The first crisp set of data (FC313), consisting of 313 pairs of SOAP size and transaction inputs, namely, credit card number, pin
Input size: 38 SOAP size: 1157 Decision: C deny
Valid inputs, SOAP oversized payload with hidden scripts: . . .
. . . (CDATA attack) or hidden script: . . .. . . (cross-site scripting, XSS attack) Valid inputs, extremely oversized (predictive of XML DoS or new attack)
Input size: 41 SOAP size: 300 Decision: C deny
Valid inputs, extremely undersized (predictive of XML content tampering or new attack)
number, and amount payable, is obtained. This first crisp data set is used to obtain a second crisp set (SC313) with a different way of computation of the SOAP size and input field length that assumes each alphanumeric character consists of 2 bytes and a float value of 8 bytes. As seen from the simulation samples (Fig. 3a and b) and the seven cases of Table 4, these simulated sets of data (FC313 and SC313), as governed by the business policies (Table 3) are no different from actual real-life data. A valid credit card number can be any combination of 16-digits number and a valid pin number can be any combination of 3-digits number. Similarly, any valid payment amount can be a float value with 2 decimals. For the email address, any 15–35 alpha-numeric with no special character except the ‘@’ character can form a valid account, for example, with ‘yahoo.com’ or ‘gmail.com’. As for attack data, the existing malicious codes obtained from Andreu [26] represent actual attack signatures. Fuzzy logic and predictive associative patterns The simulated crisp sets of data are quantitative data, for example, input lengths (normal range: 38–64 bytes) and SOAP message size (normal range: 411–437 bytes) are ranges of quantitative values which do not provide meaningful means to perform the
Table 5 Elimination of false negative alarm. Case
Input values
Size & decision
Comments
1
Credit card #: 1234567890123456 3-digit pin : 111 : 555.55 Amount : Email Refer to Table 3 Inputs violate business policy #: 4,5,6. Re-compute input & SOAP size Re-tag decision code
Input size :25 SOAP size : 398 : P allow Decision
Without policy, no restriction on inputs, assuming input size in normal range & XML content non malicious
25 < min (38) 398 < min (411) : C deny Decision
With business policies, both input size & SOAP size are out-of normal range. Decision: P allow to C deny (False –ve)
1a
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
149
Table 6 Categorized attributes. Input values
Input size
SOAP size
XML content
Decision
Valid Malicious New Invalid
Normal Out-of-range Extremely-out
Matched Not-matched Extremely-not-matched
Non-malicious Malicious New
C Allow C Deny P Deny
Fig. 4. (a) Fuzzification of input size. (b) Fuzzification of SOAP size.
intrusion detection and prevention functions. In addition, these quantitative data consist of many overlapping values as they are bounded within the minimum and maximum values of the input length and SOAP size defined by the business policies. Our predictive association rule model uses fuzzy logic to convert numerical attributes to fuzzy attributes through the fuzzification process. For example, input length is “slightly undersized” when it is slightly out-of-range of the lower limit of 38 bytes (Fig. 4a), or SOAP message size is “extremely oversized”, when it is extremelyout-of range by many folds above the upper limit of 437 bytes (Fig. 4b). Consequently, after defining a set of meaningful linguistic labels represented by fuzzy sets on the domain of the quantitative attributes and map to a new domain, the meaning of the values in the new domain is clear as shown in Fig. 4a and b. With relevant business policies applied, FARM is expected to be able to resolve unclear or ambiguous decisions arising from different degrees of fuzziness or the sharp boundary problem. These ambiguities concern decisions such as probably allow or probably disallow access, which contribute to false positive and false negative alarms. On the other hand, crisp data set uses sharp partitioning to transform numerical attributes to binary ones. For example,
input size is “65 bytes and above” or SOAP message size is “410 bytes and below” is considered as not in the normal range and shall give rise to loss of information due to the sharp ranges. Fuzzy logic thus provides a smooth transition between member and non-member of a class or category, and address the sharp boundary problem. The justification for using fuzzy logic in FARM can be seen from performance results as shown in the subsection “Comparisons on classification accuracy”. From detailed analysis of the crisp sets of data, it is observed that the data fall into certain associative patterns characterised by five attributes. These attributes are: input values, input size, SOAP size, XML content and decision. Each attribute is categorised as shown in Table 6. Referring to Table 6, for example, the input values which are categorical in nature, can be either valid (in normal user profile), invalid (in violation of business policies), malicious (in existing malicious profile) or new/unknown (not in normal or malicious profile). The input size can fall in the normal range of values which are between the minimum and maximum limits, slightly/far below the minimum limit, or slightly/extremely out of the maximum limit (Fig. 4a). For the SOAP size, it is categorised into ‘matched’ or ‘not matched’ to indicate whether the combined input lengths and original SOAP size matches or not matches with the SOAP size that comes with the captured service request, or ‘extremely not matched’ to indicate that the captured service request contains many folds of heavier or lighter payload (Fig. 4b). Also, when SOAP size is ‘matched’, it indicates XML content is non malicious. On the other hand, ‘not matched’ means XML content is malicious and ‘extremely not matched’ means new or unknown XML contents. Hence there is a correlation between SOAP size and XML content. Subsequently, decision is made to allow access, deny access or probably deny access based on whether the input values are malicious or not, the input lengths and SOAP size are in normal range, slightly or way out-of-range. Identifying meaningful associative patterns and subsequently matching each transaction to the relevant pattern thus helps to identify the anomalies from the normal. The associative patterns formed are tabulated and presented in Table 7. To highlight how this works, we refer to Table 7. In Pattern 1, only when all the input values are valid and the input size is in normal range and the captured SOAP size matches with the combined input size and the original SOAP size indicates non-malicious XML content, then access is certainly allowed (Table 7, Pattern 1). In the case that the input values are malicious or new disregarding input size, SOAP size and XML content, access is certainly denied (Table 7, Patterns 4–15). Similarly, in the case that SOAP size does not match, or extremely does not match, this indicates malicious, new or unknown XML content disregarding input values and input size. Consequently, access is certainly denied as well (Table 7, Patterns 2–3, 7–9, 13–18). However, when the SOAP size matches, indicating non-malicious XML content, then decision is made with reference to the input values – certainly allow access if inputs are valid (Table 7, Pattern 1); certainly deny access when inputs are malicious, new or unknown (Table 7, Patterns 4–6, 10–12); probably deny access when inputs are invalid (Table 7, Patterns 19–21). Consequently, SOAP-related attack such as oversized payload (Table 4, case 5: Oversized payload e.g. CDATA, XSS) can be detected and prevented in the case that SOAP size is not matched. Other
150
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Table 7 Association patterns between attributes. Pattern #
Input values
Input size
SOAP size
XML content
Decision
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Valid Valid Valid Malicious Malicious Malicious Malicious Malicious Malicious New New New New New New Invalid Invalid Invalid Invalid Invalid Invalid
Normal Normal Normal Normal Out-of-range Extremely out Normal Out-of-range Extremely out Normal Out-of-range Extremely out Normal Out-of-range Extremely out Normal Out-of-range Extremely out Normal Out-of-range Extremely out
Matched Not matched Extremely not matched Matched Matched Matched Not matched Not matched Extremely not matched Matched Matched Matched Not matched Not matched Extremely not matched Not matched Not matched Extremely not matched Matched Matched Matched
Non malicious Malicious New Non malicious Non malicious Non malicious Malicious Malicious New Non malicious Non malicious Non malicious Malicious Malicious New Malicious Malicious New Non malicious Non malicious Non malicious
C Allow C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny C Deny P Deny P Deny P Deny
attacks such as coercive parsing, recursive payload leading to XML DoS (Table 4, case 6: XML DoS) can also be detected and prevented in the case that the SOAP size is extremely not matched. In addition, when the SOAP size is not matched or extremely not matched, it can carry an undersized payload, thus indicating XML content is being tampered with (Table 4, case 7: XML content tampering). In all these three cases, there is also a possibility of discovering new kinds of attacks.
Balanced and imbalanced datasets The first (FC313) and second (SC313) crisp sets of data, each consisting of 313 instances of simulated transactions, are tagged according to the associative pattern as shown in Table 7. Two fuzzy sets of data (1stFZ and 2ndFZ), each consisting of 313 instances, are thus created. Detailed analysis of these two sets of fuzzy data reveals that they do not contain all the instances that follow those associative patterns shown in Table 7 – for example, no instance follows the associative patterns of Table 7 (Patterns 7–10, 13–16 and 21). In view of this matter, some instances are randomly replaced with those following the missing associative patterns. Subsequently, two more fuzzy sets (1stFZN and 2ndFZN), each consisting of 313 instances with new associative patterns are formed. 1stFZN and 2ndFZN fuzzy datasets with new associative patterns are combined to obtain 132 instances for the ‘p deny’ class. Down sampling and over sampling [27,28] are then applied to the ‘c allow’ and ‘c deny’ decision classes so as to make each class evenly distributed with 132 instances. Hence, a balanced fuzzy data set (BFZ) with 396 instances is obtained. The average class size distributions for the four imbalanced datasets and the class size distribution for the balanced data set are shown in Table 8.
Table 8 Class size distribution. Data set
Imbalanced data set Balanced set
Decision class (Class size in %) C Allow
C Deny
P Deny
54.63 33.34
23.81 33.33
21.56 33.33
These five datasets (1stFZ, 2ndFZ, 1stFZN, 2ndFZN and BFZ) are then used in WEKA [29] for classification and association rule mining. Random forests and imbalanced data set Due to the imbalanced nature (normal data/instances is larger in number than attack data/instances) of the data set, it is a concern that different cluster sizes of data inherent in intrusion data set would affect accuracy of classification. This is confirmed by the experimental results from Li et al. [27] in which the F-measure averaged over 10 down-sampling (balanced) runs was only 0.815, versus 0.956 before down sampling (imbalanced). Experiments conducted by Zhang et al. [28] also showed a discrepancy in overall error rate between a balanced (0.05%) and imbalanced (1.92%) data set with significant difference. Experiments conducted in Ye et al. [30] with balanced and imbalanced data sets on different classifiers such as associative classifier, linear support vector machine (SVM) and their hierarchical associative classifier (HAC) did show varying accuracy in precision and prediction. As random forests is robust and can best handle imbalanced data problem [9], the Random Forests algorithm from WEKA [29] is applied through all the data sets. The random forests algorithm represents an ensemble of un-pruned classification trees [28]. For dealing with large datasets containing many features, this algorithm is unsurpassable in accuracy. This accuracy may be attributed to a tree classification process that generates many classification trees. In this process, every tree is built by a different bootstrap sample of the original data. When a forest is constructed and if a new object needs to be classified, then it is put into every tree of the forest. In this way, every tree can provide a vote to indicate its decision about the class of the new object. The forest then chooses the class that has the greatest number of votes for this new object. Consequently, when each tree is built, about one-third of the bootstrap sample cases have to be left out and not used in training. These leftout cases are the out-of-bag (oob) cases. As trees are added to the forest, these left out cases are used to obtain a run-time unbiased estimate of the classification error [31]. The Random Forests algorithm runs through each data set three times; first as full training set with all instances, second as 10-fold cross-validation, and third as 66% training: 34% testing split. For each run, five attributes, namely, input values, input size, SOAP size,
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
XML content and decision are used in generating a random forest of 10 trees with each constructed tree considering three random features. Classification accuracy, root mean squared error, out-ofbag (oob) error are obtained for each run and each set of data. The results obtained are summarised and discussed under the heading “Summary of results and comparisons”.
Association rule and Apriori algorithm Association rule is a well-known data-mining technique initially applied to market basket analysis which aims to find the regularities in buying habits of customers in a supermarket. Recently, association rule has been used in intrusion detection research for pattern or attack signature recognition. Our association rule mining process follows the most common approach of finding strong association rules. This process consists of two phases. The first phase is to find all frequent itemsets using the basic Apriori algorithm. The second phase involves the generation of strong association rules from the identified frequent itemsets of the first phase, generally known as pruning. Pruning is based on the fact that if an itemset is frequent all its subsets are frequent as well. Therefore, the algorithm discards every candidate itemset that has an infrequent subset. However, not all discovered strong rules are interesting enough to be of good use for decision making. According to Zhang et al. [32] when building classifiers, only the rules whose confidence degrees are higher than or equal to their consequents’ degrees of support are to be considered. An interesting rule shall, therefore, have the support of consequent equal to the support of its antecedent and achieving the minimal confidence level. Hence, a heuristic rule-ofthumb for identifying uninteresting or misleading rule is to check whether the support for consequent and support for antecedent does not tally, even though the overall minimal support is fulfilled. Studies have shown that associative classification is able to handle unstructured data with high classification accuracy and strong flexibility. On the other hand, it still suffers from issues such as generation of a huge set of rules, biased classification or overfitting due to the fact that only one high-confidence rule is used in classification. As such, these limitations should not be ignored. However, determining efficient algorithms for classification, discovering interesting rules and reducing redundant rules are beyond the scope of this study. Nevertheless, the classification accuracy of FARM based on random forest as classifier is presented for discussion under the heading “The classification accuracy, detection and false alarm rates”.
151
Sensitivity and extensibility analysis To generate interesting and valuable association rules from the simulated sets of data, Apriori algorithm of WEKA is used. The Apriori algorithm requires two thresholds, i.e., minimum confidence and support to determine the degree of association that must hold before the rule will be mined. In our case, a minimum confidence of 0.99 and minimum support of 0.01 are used. The CAR (Classification Association Rule) parameter is set to ‘true’ to specify the decision class as the consequence so that the rules generated match with the associative patterns for the purpose of comparison and not for classification. To demonstrate the sensitivity of this association rule model, 50 rules are generated for each imbalanced fuzzy dataset, 1stFZ, 2ndFZ, 1stFZN and 2ndFZN respectively. All the 50 rules matched with each other for the 1stFZN and 2ndFZN fuzzy datasets with new instances, while 1stFZ and 2ndFZ fuzzy datasets matched only 48 rules, 2 rules each are not matched. In another instance, all the 52 rules found from matching 1stFZ and 2ndFZ fuzzy datasets are matched with the 50 rules identified from matching 1stFZN and 2ndFZN fuzzy datasets with new instances. It is found that 47 rules matched while 7 rules are unmatched. Hence, this model is sensitive to different datasets with varied associative patterns or instances. To further demonstrate the extensibility of this model, it is found that 73 and 69 rules are generated with minimum confidence of close to 1 for 1stFZ and 2ndFZ fuzzy datasets respectively; 75 and 68 rules for 1stFZN and 2ndFZN fuzzy datasets with new instances respectively. Additionally, each dataset has 6–9 rules whose confidence ranges from 0.82 to 0.93. Even with the balanced dataset, 108 rules are generated when CAR parameter is set to true while 371 rules are generated when the CAR parameter is set to false, minimum confidence and support set to 0.98 and 0.02 respectively. Surely not all the rules generated are of interest for the best decision-making. Sample of best rules obtained are summarised and discussed in next section.
Summary of results and comparisons The classification accuracy, detection and false alarm rates Results showing random forest classification accuracy, root mean squared error (RMSE), out-of-bag (oob) error are tabulated for comparison as shown in Table 9. Referring to Table 9 (Columns 2–5), it is seen that the classification rate for the imbalanced and balanced fuzzy datasets is 100% with root mean squared error and oob error of no more than 0.05. Notice that the oob error for each dataset across the
Table 9 Summary of random forest classification results. Datasets
Correctly classified (%)
Incorrectly classified (%)
Root mean squared error (RMSE)a
Out-of-bag (oob) error
1stFZ (Imbalanced)
100
0
0.0256
2ndFZ (Imbalanced)
100
0
1stFZN (Imbalanced)
100
0
2ndFZN (Imbalanced)
100
0
BFZ (Balanced)
100
0
0 0.0137 0.0211 0.0087 0.0189 0 0 0.0138 0.0492 0 0.0107 0.0201 0
0.0351
0.0256
0.0351
0.0455
a The three results of RMSE are in the order of full training set, 10-fold cross-validation, 66% training: 34% testing split. Only 1 result is displayed when the results are the same for the three modes of testing.
152
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Fig. 5. Sample of invalid rules not matching fuzzy association patterns.
three modes of training (full training set with all instances, 10fold cross-validation, 66% training: 34% testing split) is the same. This indicates that when using random forests algorithm, having cross-validation for getting an unbiased estimate of the test error is redundant. This is because when each tree is built using the bootstrap sample, about one-third of the oob cases are left out and not used in training. This is implicitly cross-validation or split testing. However, the results are tabulated here for comparisons with other error measures such as the root mean squared error. Both the root mean squared error and oob error are less than 0.05 indicating good results, thus, further affirming the classification accuracy to be close to 100%. The consistency in these results indicates that the imbalanced intrusion datasets issue is not a concern here. Nevertheless, the random forests algorithm can periodically be used to run through constantly updated simulated datasets or real intrusion datasets to check against inconsistency and cater for fine-tuning. As the classification is based on the decision category of certainly allow access, certainly deny access or probably deny access, if a greater than 99% classification rate is achieved, then it also indicates a detection rate or prediction accuracy of greater than 99%. Furthermore, there is no decision category for probably allow access, hence eliminating the possibility of false negative alarm. In addition, the possibility of a probably deny access case changing to a certainly allow access case (false positive alarm), is almost zero. This is due to the fact that the probably deny access cases exist only when they violate the business policies with no malicious inputs or non-malicious XML content. Moreover, they are given a second chance to enter valid inputs to prove that they are genuine valid transactions. There is no doubt of it being a malicious transaction if the second or subsequent input attempts still render it a probably deny access or certainly deny access decision as explained in the section under the heading “The simulated data and business policies”. Interesting and valid association rules As mentioned in the section “Sensitivity and extensibility analysis”, the association rule model is able to generate a large number of
rules according to the parameters set or defined. However, some of the rules may not be valid to cater for the best detection, prevention and prediction functions required. Subsequently, by applying the heuristic rule-of-thumb mentioned in the section “Association rule and Apriori algorithm”, those rules having antecedent frequencies greater than consequent frequencies with minimal support of 2% but overall confidence lower than minimal confidence of 99% are identified as uninteresting or invalid rules not matching the fuzzy association patterns (Fig. 5: Rules 1–6). However, the main aim of our model is to find interesting rules in order to identify the most important attribute or set of attributes that match with the 21 association patterns between the five attributes as shown in Table 7. Following the heuristic rule-ofthumb that an interesting rule shall have the support of consequent equal to the support of its antecedent and achieving the minimal confidence level, interesting and valid rules are found. For example, rules with minimal support of about 20% (Fig. 6: Rules 1–6, Fig. 7: Rules 7–9) and those new rules obtained with new instances whose minimal support are at least 2% (Fig. 8: Rules 10–18), and the overall confidence is at least 99% for all rules. Referring to Fig. 6 (Rules 1–6), where decision is to certainly allow access, actually, rule 1 (input values valid, SOAP size matched) and rule 2 (input values valid, XML content non-malicious) are sufficient for making the decision as compared to association pattern 1. However, to further confirm that the SOAP size matched or XML content is non-malicious, Rules 3–5 are used to confirm the input size is normal or SOAP size matches. Rule 6 is the full test for all the attributes that conform to association pattern 1 where input values are valid, input size is normal, SOAP size matches, XML content is non-malicious and therefore, decision is to certainly allow access. As for the case when decision is made to probably deny access, Fig. 7 (Rules 7–9) is sufficient for making the right decision. When the input values are invalid, SOAP size matches and/or XML content is non-malicious, decision is therefore made to probably deny access as compared to association patterns 19–21. The rules for making the decision to certainly deny access are even much simpler. Conditions relevant to the said purpose are when input values are considered as new as shown in Fig. 8 (Rules 10–13) or malicious (Rule 17); SOAP size extremely not matched
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
153
Fig. 6. Sample of interesting rules matching association pattern 1.
Fig. 7. Sample of interesting rules matching association patterns 19–21.
(Rules 14 and 18), or input size extremely out of range (Rule 15), or XML content contains new values (Rules 16 and 18), as compared to association patterns 2–18. A point to note is that when the Predictive Apriori algorithm from WEKA [29] is applied onto these datasets, the same 18 rules can be retrieved. This is because the algorithm searches with an increasing
support threshold for the best ‘n’ rules concerning a support-based corrected confidence value. A rule is added when its expected predictive accuracy is among the ‘n’ best and it is not subsumed by a rule with at least the same expected predictive accuracy. This thus demonstrated the predictive ability of our FARM. Our ID/IP framework coupled with this model indeed provides the adaptivity,
Fig. 8. Sample of interesting rules matching association patterns 2–18.
154
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
Table 10 Performance comparisons of different classifiers. Classifier
Random Forests
Multilayer Perceptron
RMSE Time to build model (s)
<0.009 0–0.03
<0.005 2–2.55
Bayes Naive
Decision Table
<0.07 0
<0.08 0.02–0.05
Table 11 Performance comparisons of fuzzy and crisp datasets using Random Forest. Datasets
Correctly classified (%)
Incorrectly classified (%)
Root mean squared error (RMSE)a
Out-of-bag (oob) error
1stFZ (313 instances)
100
0
0.0256
2ndFZ (313 instances)
100
0
1stCRP (313 instances)
100 99.04 99.06 100 99.06 99.04
0 0.96 0.94 0 0.94 0.96
0 0.0137 0.0211 0.0087 0.0189 0 0.0103 0.0631 0.0797 0.0080 0.0556 0.0830
2ndCRP (313 instances)
0.0351
0.0288
0.0351
a The three results of RMSE are in the order of full training set, 10-fold cross-validation, 66% training: 34% testing split. Only 1 result is displayed when the results are the same for the three modes of testing.
predictivity, sensitivity, scalability and accuracy towards effective intrusion detection, prevention and prediction functions. Comparisons on classification accuracy As seen from Table 9, this FARM does provide a very good classification rate with small error measures (lowest RMSE: 0.0087). Different classifiers from WEKA, such as Multilayer Perceptron, Bayes Naïve, and Decision Table are applied onto the same imbalanced and balanced datasets. The root mean squared errors (RMSEs) and the time to build each model are tabulated as shown in Table 10. As seen in Table 10 (Column 2), the classification accuracy using random forests still achieves the best results with RMSE of close to zero and time to build model within 0.03 s for each data set with sample size of no more than 400 instances. To further justify the use of fuzzy logic in our FARM, experiments using Random Forest as classifier from WEKA are conducted on both the fuzzy and crisp datasets. As seen from Table 11, the performance of the fuzzy datasets perform slightly better than the crisp datasets both in terms of RMSE and classification accuracy. The RMSE of the
fuzzy datasets range from 0 to 0.02 while the RMSE of the crisp datasets range from 0.008 to 0.083. The fuzzy datasets achieve a classification accuracy of 100% while the crisp datasets obtain a classification error of about 1%. At this juncture, with these results, we may tend to conclude that there is no significant difference between the fuzzy and crisp datasets. Subsequently, more experiments using different classifiers such as Naïve Bayes, Decision Table and Mutilayer Perceptron from WEKA are conducted on both the fuzzy and crisp datasets consisting of increasing number of instances of 313, 626 and 1252 respectively. As seen from Table 12, the performance of the fuzzy datasets again, perform better than the crisp datasets both in terms of RMSE and classification accuracy. The RMSE of the fuzzy datasets range from 0.0022 to 0.0324 while the RMSE of the crisp datasets range from 0.008 to 0.1809. The fuzzy datasets achieve a classification accuracy of 100% while the crisp datasets obtain a classification error of greater than 5%. In terms of time-to-build model, except for classifier Multilayer Perceptron, the results for both the fuzzy and crisp datasets are consistently low in the range of 0–0.05 s. The RMSE, a measure of the differences between the predicted values of a model and the actually observed values, is a good
Table 12 Performance comparisons of fuzzy and crisp datasets with different classifiers. Datasets
Correctly classified (%)a
Incorrectly classified (%)a
Root mean squared error (RMSE)a
Time-to-build model (s)a
1stCRP (313 instances)
96.17 99.68 96.49 96.49 100 96.49 94.60 99.84 96.49 94.57 100 97.60 100 100 100 100 100 100
3.83 0.32 3.51 3.51 0 3.51 5.40 0.16 3.51 5.43 0 2.40 0 0 0 0 0 0
0.1508 0.0461 0.1377 0.1444 0.0363 0.1418 0.1809 0.0272 0.1392 0.0080 0.0556 0.0830 0.0319 0.0324 0.0034 0.0277 0.0177 0.0022
0 0.02 0.25 0 0 0.27 0 0.05 0.52 0 0.03 0.98 0 0.02 1.76 0 0.02 3.37
2ndCRP (313 instances) 1stCRP + 2ndCRP (626 instances) (1stCRP + 2ndCRP) × 2 (1252 instances) 1stFZ + 2ndFZ (626 instances) (1stFZ + 2ndFZ) × 2 (1252 instances)
a The three results of correctly classified (%), incorrectly classified (%), RMSE and time-to-build model (in seconds) are in the order of different classifiers: Naïve Bayes, Decision Table, and Multilayer Perceptron.
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
155
Table 13a Web service request–response time for cases with non-malicious XML content. Case # User inputs
XML content
1
Non-malicious
2
3
4
Credit card: 123 Pin: 111 Amount: 345 Email:
[email protected] Credit card: 1234567812345678 Pin: 123 Amount: 456.78 Email: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Credit card: 1234567812345678 Pin: 123 Amount: 9.99 Email:
[email protected] Credit card: 1234567812345678 Pin: 123 Amount: 9999999.99 Email:
[email protected]
measure of accuracy. It is used in our experiments as a basis of comparison of results between the crisp and fuzzy datasets. Basically, the smaller the RMSE value, the better is the accuracy. Based on the experimental results tabulated in Tables 11 and 12, we may have to conclude that there is no significant difference between using a crisp and a fuzzy method except for the RMSE at a point where the data size is maximum. Therefore, further experiments using bigger scale of data are required in determining the optimum data size for best performances. This opens up an area of research for the future. However, a point to note is that, if actual crisp values are used, for example in the crisp datasets, the SOAP size and input size are quantitative values; the categorical input values of new, valid, malicious and invalid are represented in quantitative values as 3, 5, 7 and 9 in that order; the combined input size and original SOAP size matches or not matches with the SOAP size that comes with the captured service request is represented by the value 0 and 1 respectively, then the fuzzy association patterns as shown in Table 7 could not be formed. Consequently, there will not be any fuzzy associative rule to be derived. Hence, our FARM with 18 fuzzy associative rules does contribute significantly towards detecting, preventing and predicting WS attacks. Web service request–response time performance In order to evaluate the time performance for capturing SOAP size and XML content for input validation, each Web service’s request and response times are recorded during simulation for the generation of experimental data (see the section “The simulated
SOAP size Status 396
Invalid inputs SOAP size in normal range
Request–response time Within 1 s
Non-malicious 1100
Within 1 s Malicous user inputs (Buffer overflow) SOAP size is oversized by many folds
Non-malicious
411
Non-malicious
437
Valid user inputs, normal transaction SOAP size is in minimum normal range Valid user inputs, normal transaction SOAP size is in maximum normal range
Within 1 s
Within 1 s
data and business policies” for details of the generation of experimental data). The simulation of the Web service request–response examples are performed using machines with Intel Pentium Dual CPU E2160. The processors run at 1.80 GHz and 1.79 GHz together with 0.99 GB of RAM under the operating system of Microsoft Windows Server 2003. Some examples of request–response time performance are tabulated as shown in Tables 13a and 13b. It is found that regardless whether the SOAP size is in normal range, slightly out-of-range or out-of-range by many folds, the request–response time for each example is consistent, just a split of a second. As seen in Table 13a, all the four cases (Case 1: invalid user inputs, SOAP size in normal range; Case 2: malicious user inputs with SOAP extremely oversized; Case 3: normal transaction with minimum SOAP size; Case 4: normal transaction with maximum SOAP size) have different SOAP sizes with non-malicious XML content, yet the time taken to respond to each request is still the same, i.e. within 1 s. Similarly as seen in Table 13b, all the three cases (Case 1: malicious user inputs, SOAP size in normal range; Case 2: hidden malicious XML content, SOAP oversized greatly; Case 3: hidden malicious content, SOAP extremely oversized) whether with malicious user inputs or malicious XML content having different SOAP sizes, yet the request–response time is still within 1 s. As seen from Tables 13a and 13b each of the WS request–response time is consistently low with split of a second across different SOAP sizes. Consequently, our approach does not require additional steps or process to determine whether
Table 13b Web service request–response time for cases with malicious user inputs/XML content. Case #
User inputs
XML content
Status
Request–response time
1
Credit card: @’ Pin: @’ Amount: @’ Email: @’ Credit card: 2345 Pin: 123 Amount: 345678 Email:
[email protected] Credit card: 1234567812345678 Pin: 123 Amount: 9999999.99 Email:
[email protected]
Non-malicious
381
Malicious user inputs SOAP size in normal range
Within 1 s
Malicious. Hidden scripts
439
Malicous XML content (CDATA attack) SOAP size is oversized by 39
Within 1 s
2168
Malicous XML content SOAP size is oversized by many folds
Within 1 s
2
3
Hidden malicious content @@@@@@ @@@@@@ @@@@@@ @@@@@@ @@@@@@
SOAP size
156
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157
to authenticate or validate the SOAP’s message content first in order to minimise the cost of filtering out the attacks. Nevertheless, the efficiency of our framework incorporated with FARM is yet to be verified further in the future using real-world WS e-commerce applications and with higher performance machines. Performance of other models in defending against WS attacks The inherent technological nature of Web service has given rise to attacks such as recursive payload attacks, oversized payload attacks and coercive payload attacks, which in turn leads to XML DoS attacks. To countermeasure against these attacks, Pinzón et al. [19] have proposed a distributed hierarchical multi-agent architecture for blocking malicious SOAP messages with satisfactory results (Table 2, Item #1). In an earlier work, Chan et al. [4] have developed an adaptive neural fuzzy inference system (ANFIS) enhanced with business policies and fuzzy if–then rules to countermeasure WS attacks such as SQL and XML injections, SOAP oversized payload, recursive and coercive payloads that lead to XML DoS with detection accuracy as high as 99% and false alarm rate of close to 1% (Table 2, Item #7). In another recent research in Corona et al. [23], user-defined models in the form of anomaly templates that associate a suitable action in response to anomalous web traffic is able to accurately detect attacks such as Cross-site scripting (XSS), SQL injection and buffer overflow attacks with false alarm rates ranging from 5.4% to 0.47% (Table 2, Item #5). Our FARM, with 18 fuzzy associative rules, detection accuracy of greater than 99% and false alarm rate of less than 1%, coupled with the ID/IP framework in defending against WS attacks, is indeed novel and significantly provides an added layer of security protection to WS e-commerce applications. Conclusion and future work SOAP and XML-related attacks do exist at the Application layer and can be detected and prevented by validating input values, input field lengths, and SOAP size. We have applied fuzzy logic in our model to define a set of meaningful linguistic labels represented by fuzzy sets on the domain of the quantitative attributes and map to a new domain. Further analyses of the datasets allow us to discover associative patterns among the attributes. By using the Apriori algorithm to generate and prune the rules and through a series of sensitivity and extensibility analysis conducted on the simulated datasets, our FARM is able to discover 18 interesting rules with at least 99% confidence. Furthermore, by segregating the anomalies from the normal using our FARM has enabled the ID/IP framework to determine frequently occurring features from the set of interesting rules. This in turn helps the security administrator to prioritise which feature to focus on in the future thus addressing the features selection problem. Subsequently, by restricting the inputs using business policies, we have further strengthened the model to be able to detect and prevent SQL injection, buffer overflow, SOAP oversized payload such as cross-site scripting (XSS) and CDATA attacks; predict and prevent XML DoS caused by coercive parsing, recursive payload; and discovery of new or unknown to existing XML content tampering attacks. Our most significant contribution is our ID/IP framework coupled with this FARM is adaptive, dynamic, predictive, sensitive, scalable and accurate with a detection rate and prediction accuracy of greater than 99% and false alarm rate of less than 1%. This novel framework and FARM thus significantly present a viable added layer of security protection for the WS e-commerce applications. Future work is to make use of real-world WS e-commerce applications, with tremendously scaled-up size datasets, to capture the
normal and attack data for further optimum evaluation of the framework, besides detection and false alarm rates for effectiveness, and on ‘time’ performance for efficiency. Acknowledgment This study was conducted while the second author was a faculty at the Faculty of Information Technology, currently known as Faculty of Computing and Informatics, of Multimedia University, Malaysia. References [1] Veracode, Study of Software Related Cybersecurity Risks in Public Companies, 2012, Available from http://www.iseprograms.com/lib/Veracode Software Related Cybersecurity Risks Public Companies.PDF (15.08.12). [2] M. Khari, M. Gaur, Y. Tuteja, Meticulous study of firewall using security detection tools, Int. J. Comp. Appl. Inform. Technol. 2 (1) (2013) 1–9. [3] M. Jensen, N. Gruschka, R. Herkenhöner, A survey of attacks on web services, J. Comp. Sci.-Res. Develop. 24 (4) (2009) 185–197. [4] G.Y. Chan, C.S. Lee, S.H. Heng, Policy-enhanced ANFIS model to counter SOAPrelated attacks, Knowledge-Based Syst. 35 (2012) 64–76. [5] C.W. Chen, Stability analysis and robustness design of nonlinear systems: an NN-based approach, Appl. Soft Comput. 11 (2011) 2735–2742. [6] J.W. Lin, C.W. Chen, C.Y. Peng, Potential hazard analysis and risk assessment of debris flow by fuzzy modeling, Nat. Hazards 64 (2012) 273–282. [7] C.F. Hsu, C.M. Lin, R.G. Yeh, Supervisory adaptive dynamic RBF-based neuralfuzzy control system design for unknown nonlinear systems, Appl. Soft Comput. 13 (2013) 1620–1626. [8] M. Sheikhan, Z. Jadidi, Misuse detection using hybrid of association rule mining and connectionist modeling, World Appl. Sci. J. 7 (2009) 31–37 (Special issue of computer & IT). [9] A. Zainal, M.A. Maarof, S.M. Shamsuddin, Ensemble classifiers for network intrusion detection system, J. Inform. Assur. Secur. 4 (2009) 217–225. [10] M.M.T. Jawhar, M. Mehrotra, Design network intrusion detection system using hybrid fuzzy-neural network, Int. J. Comp. Sci. Secur. 4 (3) (2010) 285–294. [11] A. Tajbakhsh, M. Rahmati, A. Mirzaei, Intrusion detection using fuzzy association rules, Appl. Soft Comput. 9 (2009) 462–469. [12] T. Ozyer, R. Alhajj, K. Barker, Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule pre-screening, J. Network Comp. Appl. 30 (1) (2007) 99–113. [13] M. Sheikhan, M.S. Rad, Misuse detection based on feature selection by fuzzy association rule mining, World Appl. Sci. J. 10 (2010) 32–40 (Special Issue of Computer & Electrical Engineering). [14] S. Mabu, C. Chen, N. Lu, K. Shimada, An intrusion-detection model based on fuzzy class-association-rule mining using genetic network programming, IEEE Trans. Syst. Man Cybernet. – Part C: Appl. Rev. 41 (1) (2011) 130–139. [15] V. Markam, L.S.M. Dubey, A general study of associations rule mining in intrusion detection system, Int. J. Emerg. Techn. Adv. Eng. 2 (1) (2012) 347–356. [16] X. Ye, Countering DDoS and XDoS attacks against Web services, in: Proceedings of IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, 2008, pp. 346–352. [17] V. Relan, B. Sonawane, Detection and Mitigation of Web Services Attacks using Markov Model. CMSC 678: Machine Learning Project, 2009, Spring, Available from http://userpages.umbc.edu/ relan1/CMSC%20678 %20Project%20Report.pdf [19.07.10]. [18] U. Thakar, N. Dagdee, S. Varma, Pattern analysis and signature extraction for intrusion attacks on web services, Int. J. Network Secur. Appl. (IJNSA) 2 (3) (2010) 190–205. [19] C.I. Pinzón, J. Bajo, J.F. De Paz, J.M. Corchado, S-MAS: an adaptive hierarchical distributed multi-agent architecture for blocking malicious SOAP messages within Web services environments, Expert Syst. Appl. 38 (2011) 5486–5499. [20] M. Bazarganigilani, B. Fridey, A. Syed, Web service intrusion detection using XML similarity classification and wsdl description, Int. J. u- and e- Service Sci. Technol. 4 (3) (2011) 61–72. [21] A. Karthigeyan, C. Andavar, A.J. Ramya, Adaptable practices for curbing XDoS attacks, Int. J. Sci. Eng. Res. 3 (6) (2012) 1–6. [22] M.E. Patil, R.D. More, Survey of intrusion detection system in multitier web application, Int. J. Emerg. Technol. Adv. Eng. 2 (10) (2012) 570–575. [23] I. Corona, R. Tronci, G. Giacinto, SuStorID: a multiple classifier system for the protection of web services, in: 21st International Conference on Pattern Recognition (ICPR 2012), 2375-2378, November 11–15, Tsukuba, Japan, 2012. [24] P. Kalavadekar, V. Mogal, A practical approach to intrusion detection system for multilayer web services, Int. J. Emerg. Technol. Adv. Eng. 3 (3) (2013) 951–955. [25] wsKnight. http://net-square.com/wschess/index.html (downloaded in July 2008). [26] A. Andreu, Professional Pen Testing for Web Applications, Wiley Publishing Inc., Indianapolis, 2006.
G.-Y. Chan et al. / Applied Soft Computing 24 (2014) 142–157 [27] P. Li, L. Liu, D. Gao, M.K. Reiter, On challenges in evaluating malware clustering. The 13th International Symposium on Recent Advances in Intrusion Detection, (RAID) 2010, LNCS 6307 (2010) 238–255. [28] J. Zhang, M. Zulkernine, A. Haque, Random-forests-based network intrusion detection systems, IEEE Trans. Syst. Man Cybernet. – Part C: Appl. Rev. 38 (5) (2008) 649–659. [29] WEKA software (weka-3-6-4jre.exe): http://www.cs.waikato.ac.nz/ml/weka/ (downloaded in December 2010).
157
[30] Y. Ye, T. Li, K. Huang, Q. Jiang, Y. Chen, Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list, J. Intell. Inform. Syst. 35 (1) (2010) 1–20. [31] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [32] X. Zhang, G. Chen, Q. Wei, Building a highly-compact and accurate associative classifier, Appl. Intell. 34 (2011) 74–86.