Efficient file fuzz testing using automated analysis of binary file format

Journal of Systems Architecture 57 (2011) 259–268 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.el...

Download PDF

1MB Sizes 0 Downloads 39 Views

Report

PDF Reader
Full Text

Journal of Systems Architecture 57 (2011) 259–268

Contents lists available at ScienceDirect

Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc

Efﬁcient ﬁle fuzz testing using automated analysis of binary ﬁle format Hyoung Chun Kim a, Young Han Choi a, Dong Hoon Lee b,* a b

The Attached Institute of Electronics and Telecommunications Research Institute (ETRI), P.O. Box 1, Yuseong Post Ofﬁce, Daejeon 305-600, Republic of Korea The Graduate School of Information Management and Security, Korea University, Anam-dong, Sungbuk-ku, Seoul 136-701, Republic of Korea

a r t i c l e

i n f o

Article history: Received 15 October 2009 Received in revised form 27 February 2010 Accepted 17 March 2010 Available online 20 March 2010 Keywords: Software testing Fuzzing Security testing

a b s t r a c t Fuzz testing is regarded as the most useful technique in ﬁnding serious security holes in a software system. It inserts unexpected data into the input of the software system and ﬁnds the system’s bugs or errors. However, one of the disadvantages that fuzz testing executed using binary ﬁles has is that it requires a large number of fault-inserted ﬁles to cover every test case, which could be up to 28FILESIZE ﬁles. In order to overcome this drawback, we propose a novel algorithm that efﬁciently reduces the number of fault-inserted ﬁles, yet still maintain the maximum test case coverage. The proposed approach enables the automatic analysis of ﬁelds of binary ﬁles by tracking and analyzing stack frames, assembly codes, and registers as the software system parses the ﬁles. We evaluate the efﬁcacy of the new method by implementing a practical tool, the Binary File Analyzer and Fault Injector (BFAFI), which traces the program execution and analyzes the ﬁelds in binary ﬁle format. Our experiments demonstrate that the BFAFI reduced the total number of fault-inserted ﬁles with maximum test case coverage as well as detected approximately 14 times more exceptions than did the general fuzzer. Also, the BFAFI found 11 causes of exceptions; ﬁve of them were found only by BFAFI. Ten of the 11 causes of exceptions that we found were generated by a graphic rendering engine (GDI32.dll); the other was generated by the system library (kernel32.dll) in Windows XP SP2. Ó 2010 Elsevier B.V. All rights reserved.

1. Introduction It is common that attackers use vulnerabilities relative to ﬁle parsing of software systems. Such vulnerabilities allow an enemy into the system simply by opening a ﬁle with an exploit code. Recently, there have been an increasing number of ﬁle-related vulnerabilities found in web browsers, ofﬁce software, email clients, and multi-media players. The risks of such vulnerabilities have become more serious due to content sharing trend in the Web 2.0 era. Zero-day attacks especially are found to be becoming widespread before the security patches are in place. Therefore, it is very critical to identify software security problems by exploiting security testing before the attackers do. Among a number of various security testing ideas, fuzz testing is the most useful in ﬁnding serious security holes in a software system. For example, Microsoft ﬁnds about 20–25% of its security bugs by fuzzing a product before it is shipped [1]. Fuzz testing as a security test inserts random data or faults into the input of the software system and ﬁnds software exceptions [2]. Similarly, ﬁle fuzz testing inserts fault data into ﬁles, has the software read them, and detects its exceptions.

* Corresponding author. E-mail addresses: [email protected] (H.C. Kim), [email protected] (Y.H. Choi), [email protected] (D.H. Lee). 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.03.002

In order to cover all test cases, ﬁle fuzz testing must generate a large number of fault-inserted ﬁles, which could number up to 28FILESIZE . FILESIZE is the total size of a ﬁle in bytes. In order to lower the total number of the fault-inserted ﬁles for testing, the ﬁle formats should be analyzed; however, this is a time consuming task based on the fact that there are over 14,000 ﬁle formats [3], and most of them are not open to the public. In this paper, we propose a new methodology that reduces the quantity of fault-inserted ﬁles used for ﬁle fuzz testing by automatically analyzing their binary ﬁle formats, thus maximizing test case coverage. To do so, we divided the binary ﬁle formats into ﬁelds, a minimum logically meaningful unit in ﬁle processing, and inserted fault data into the ﬁelds by data type. As a result, we decreased the total number of fault-inserted ﬁles used yet still maintained the maximum test case coverage. In order to analyze ﬁelds of ﬁle formats automatically, we traced the execution of the software system while parsing the target ﬁle. Data in the binary ﬁle is divided into ﬁelds and these ﬁelds are transmitted to parameters of parsing functions. Keeping this in mind, we implemented a practical tool to trace and analyze the execution of the software system. We evaluated the veriﬁcation of our approach by applying it to the WMF image ﬁle format [4], and implemented a debugger that traces and analyzes the execution of WMF ﬁle parsing software. We named the debugger Binary File Analyzer and Fault Injector (BFAFI).

260

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

The BFAFI detected many more exceptions than general fuzzers requiring the same number of fault-inserted ﬁles and many of the exceptions were found only by the BFAFI. Also, the BFAFI found many more causes of exceptions than general fuzzers. In other words, general fuzzers need to input a larger number of fault-inserted ﬁles to identify the same number of exceptions and their causes detected by the BFAFI. In this paper, our methodology focuses on the buffer overﬂow vulnerability because it is the most common and fatal problem in software systems. The key contributions of this paper are as follows: We propose a methodology that efﬁciently reduces the total number of fault-inserted ﬁles for ﬁle fuzz testing by analyzing the parsing mechanism of the software system in real time. To prove the efﬁcacy of our approach, we utilized a practical tool, BFAFI, which analyzed the parsing of the WMF ﬁle format and automatically divided its ﬁelds. The BFAFI was capable of ﬁnding many more exceptions than a general fuzzer with the same number of fault-inserted ﬁles including exceptions only detected by the BFAFI. BFAFI found many more causes of exceptions than the general fuzzer did. Ten of 11 exceptions found were generated by a graphic rendering engine (GDI32.dll) and the other was generated by a system library (kernel32.dll) in Windows XP SP2. The rest of the paper is organized as follows. Section 2 introduces and compares related works with our research. Section 3 explains the problems of binary fuzz testing. Section 4 proposes a new method that automatically divides binary ﬁle data into ﬁelds. Section 5 shows how the new method can be implemented with WMF ﬁle format, and the performance of the BFAFI with experiment results. Section 6 discusses limitations of our research and their solutions, and, ﬁnally, we conclude this paper in Section 7. 2. Related work Fuzz Testing is a method that inserts faults into the input data of the software system in order to ﬁnd exceptions of software [5– 7]. These exceptions can make security vulnerabilities in the software system. Fig. 1 depicts the general methodology of fuzz testing. Input is the entry-point of fuzz testing and can be various types such as ﬁles, conﬁgurations, registry entries, APIs, user interfaces, and network interfaces [8]. Output refers to the result of the software system’s processing of the fuzzed data as input. By utilizing a linear process, fuzz testing monitors any exceptions or crashes that the software system experiences. In our research, we focus on binary ﬁles as input. The binary ﬁle format is not readable because it consists of binary codes such as 0xD3 0x3E. Binary ﬁle fuzz testing inserts unexpected data (faults) into the binary ﬁles to make fuzzed data (fault-inserted ﬁles), lets the software system read them, and ﬁnds its exceptions or crashes. Generally, fuzz testing has two types: generation and data mutation [8]. Generation makes test cases (fuzzed data) based on a speciﬁcation after target protocol (or format) is analyzed completely. On the other hand, data mutation makes test cases by merely inserting faults into existing sample ﬁles.

Miller et al. [5] ﬁrst introduced fuzz testing that inserts fault data randomly into the input of UNIX system utilities using data mutation. He performed fuzz testing continuously on Windows NT and MacOS [9,10]. Random data is easy to generate because it is not necessary to consider the input data format. As a result, when fuzz testing is performed, most random data are processed as errors. FileFuzz [11,12] and SPIKEﬁle [13] perform fuzz testing focusing on ﬁles using data mutation. FileFuzz executes fuzz testing by randomly inserting faults into the ﬁle without considering ﬁle format. SPIKEﬁle, based on SPIKE [14], which presented the framework for network fuzz testing, uses the same fault insertion method as FileFuzz. However, our methodology inserts fault data into the ﬁle considering the binary ﬁle format and consequently gets the maximum test case coverage by utilizing the minimum number of fault-inserted ﬁles. PROTOS [15] is the network fuzzer using the generation. PROTOS implements a network protocol in which fuzz testing is performed, and inserts fault data into the network packets. Due to the properties of PROTOS, user must analyze various network protocols manually, and this takes a very long time per target protocol. We also use semi-generation for fuzz testing, but our methodology automatically analyzes the binary ﬁle format using the BFAFI. Autodafe [16] inserts fault data into input by considering protocol ﬁelds that are transmitted to dangerous functions. In order to trace fault data, Autodafe uses a debugging mechanism. We also implemented our tool using the debugging mechanism, but our tool analyzes ﬁelds of binary ﬁles automatically. Automatic Protocol Reverse Engineering is an approach that analyzes protocol formats automatically focusing on network protocols and ﬁle formats. Due to the complexity of ﬁle formats, most of these approaches place the focus on network protocols. Recently, protocol reverse engineering has been used in various research ﬁelds: analysis for botnet communication protocols [17], fuzz testing for network protocols [18], and so on. Discoverer [19] is a tool for automatic reverse engineering focusing on network protocols. It clusters network messages with the same format together and infer the message format by comparing messages in a single cluster. However, it has limitations: trace dependency and pre-deﬁned semantics. As stated in Discoverer [19], information from network traces decides the accuracy of results and only pre-deﬁned protocol semantics can be recognized. Especially, trace-based approaches [19,20] become ineffective in the case of encrypted network trafﬁc. Cui et al. [21] presented a tool that can analyze arbitrary record sequences of ﬁle formats and network protocols. After setting input data as a taint, they tracked the data for all instructions in loop modules. The limitation of this procedure is that the coverage of the reversed engineered format is dependent on the diversity of the input samples. Lin et al. [22] presented a tool that can extract network protocol ﬁelds by monitoring the program execution. Wondracek et al. [23] approached works by dynamically monitoring the execution of the application, and analyzing how the program processes the received messages. Using dynamic taint analysis, they monitored the propagation and processing of data in applications. While these approaches monitor the delimiters, loops, comparisons, etc. during program execution, our approach monitors the starting point of each ﬁeld in ﬁles by tracing instructions and registers to ﬁnd number type ﬁelds that are closely related to security problems. So, we can signiﬁcantly reduce the fuzzing area in ﬁle fuzz testing.

3. Problem: Number of test cases for binary ﬁle fuzz testing

Fig. 1. Methodology of fuzz testing.

One of the problems ﬁle fuzz testing has is that it generates a large number of fault-inserted ﬁles to cover all test cases. The number of fault-inserted ﬁles is exponentially increased with the ﬁle

261

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

Fig. 2. File data expressed in bytes set. P i is the start position of Bi . Si is the size of Bi . Bi is the block deﬁned by P i and Si . i and j are positive number. NB is the number of blocks.

size. In the worst case, it could generate up to 28FILESIZE ﬁles, many of which are not necessary for successful testing. A software system parsing a fault-inserted ﬁle, which is generated without considering the ﬁle format, might consider this ﬁle to be invalid before the inserted fault reached faulty pieces of code. Therefore, ﬁle fuzz testing does not need to generate all test cases in order to complete testing. Fig. 2 shows the binary ﬁle data expressed by byte set. We deﬁne a block as the area that fault data will be inserted into; the majority of ﬁles consist of multiple blocks. The ith block (Bi ) consists of the start position of the block (Pi ) and the size of the block (Si ). No blocks can overlap. B denotes the total area in which the fault data can be inserted. If B is determined, the total number of fault-inserted ﬁles is:

Total number of fault-inserted files / Number of data mutation case of B The total number of fault-inserted ﬁles is proportional to the number of data mutations of B. In other words, the test cases of B decide the number of fault-inserted ﬁles. The test case of B is changed based on the size of the fault-inserted area and the number of variations of byte data. The worst case occurs when NB ¼ 1, P 1 ¼ 1, S1 ¼ FILESIZE, and the byte data is changed from 0x00 to 0xFF. The total number of generated ﬁles in the worst case is 28FILESIZE . However, when the FILESIZE is big, it is difﬁcult to generate a huge number of ﬁles during the testing. When faults are inserted without considering ﬁle formats, a large number of ﬁles must be generated to get high test case coverage. In FileFuzz [8], tester can decide the fault insertion position and the value of the fault. FileFuzz generates only one fault-inserted ﬁle per start position of fault. For instance, in order to test a 35 KB WMF ﬁle sufﬁciently, FileFuzz generates 9; 175; 040 ð¼ 35 1024 28 Þ fuzzed ﬁles. 28 means the number of cases that one byte generates from 00 to FF. However, our tool generates only 86,820 ﬁles, covering most test cases with the same

(a) 100

size WMF ﬁle. This is only 0:95% ð¼ 86; 820=9; 175; 040Þ of ﬁles generated by FileFuzz, as shown in Section 5. There is another problem in fuzz testing using a random function. Because the random function decides the start position of the fault insertion randomly, the fault inserted area can be duplicated. So, for high test case coverage, a huge number of fault-inserted ﬁles is needed. We experimented to show how much the duplicated cases occur using Random class of C# language. First, we deﬁne coverage as the ratio of the total amount of fault inserted area to ﬁle size with one byte fault. For example, 100% coverage means that the random function accesses at least once for all bytes. Fig. 3 shows test case coverage of fuzz testing using random function. There are two parameters in our experiments: One is the target ﬁle size labeled on the X-axis. The other is the number of called times of random function labeled on the legend. In (a), we targeted 1, 10, 100, and 1000 KB ﬁles, and performed fuzz testing by calling the random function 1000 and 10,000 times respectively. In (b), we called the random function 100,000 and 1,000,000 times respectively using ﬁles of 1, 2, 5, 10, 15, and 20 MB. If ﬁle size is 1 MB and the random function is called 100,000 times, the coverage is about 10%. Therefore, this experimental result shows that the larger the ﬁle size used, the more test case ﬁles are generated to cover all test cases; this process shows a rapid decrease of coverage with larger size ﬁles because of duplication of fault insertion. Therefore, analysis of ﬁle format must be done before inserting faults into ﬁles. Analyzing ﬁle format is the process of ﬁnding the number of (Bi ), the position of (P i ), and the size of (Si ). Because we can ﬁnd the type of ﬁelds with our BFAFI tool, we just need to perform fuzz testing on the necessary positions with necessary values. By doing so, we can cut down the number of test cases as well as get high test case coverage. In the next section, we explain the method that automatically analyzes ﬁelds of binary ﬁles by tracing the parsing mechanism of the software system that processes binary ﬁles.

4. Automatic analysis of binary ﬁle format ﬁelds and implementation In this section, we propose a novel analysis algorithm that can divide the binary ﬁle format into ﬁelds to insert fault data into the ﬁle considering ﬁle formats. Our analysis algorithm makes it possible to get the maximum test case coverage efﬁciently with a small number of fault-inserted ﬁles compared with using another ﬁle fuzzer without considering ﬁle format. Fig. 4 shows our methodology for ﬁle fuzz testing, which analyzes ﬁle format automatically and ﬁnds an abnormal behavior of a program. Our research focuses on analysis of ﬁle format and making the minimum number of fault-inserted ﬁles with the maximum test case coverage. By tracing the execution of the program and analyzing the execution in the assembly level, we can extract ﬁeld information of a sample ﬁle. After inserting faults into the sample

(b) 100 1,000 10,000

60 40 20 0

100,000

80

Coverage(%)

Coverage(%)

80

1,000,000

60 40 20

1

10

100

FileSize(KB)

1000

0

1

2

5

10

15

20

File Size(MB)

Fig. 3. Test case coverage using Random class in C#. Y-axis is the coverage of targeted ﬁle size. Random function is called 1000 and 10,000 times respectively using smaller size ﬁles in (a) and random function is called 10,000 and 1,000,000 times respectively using larger size ﬁles in (b).

262

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

Fig. 4. Our methodology for ﬁle fuzz testing.

ﬁle using the ﬁeld information, we let the target program read the fault-inserted ﬁles. If the program generates an exception event, we analyze the abnormal behavior in order to ﬁnd the cause of the exception. First, we explain the ﬁle parsing mechanism in the software system based on the ﬁle format. In order to use data in ﬁles meaningfully, software systems divide data into various ﬁelds. Field is a the minimum unit that presents data type in ﬁle format. For example, the BMP image ﬁle format stores images using bits (0 or 1) and the system divides ﬁle data into ﬁle header and bitmap data when it reads a BMP ﬁle. File header includes header size, size of bitmap, number of bits per pixel, and so on. Bitmap data has real image data that can be parsed based on ﬁle header. The software system loads the bitmap data into memory and displays in on the screen using parsed ﬁelds of bitmap data. In order to divide a binary ﬁle into ﬁelds, our analysis tool collects and analyzes the data which are generated during the parsing process by software systems, such as stack frames, assembly codes, and registers. Stack frames and registers indicate data in ﬁle, and assembly code is related to dividing the data into ﬁelds based on ﬁle format. Usually a ﬁle format parsing engine has a lot of functions and processes various data in run-time. Therefore, by using the trace of parsing functions and real-time data that are related to parsing, we can divide ﬁle data into ﬁelds based on ﬁle format. We classify ﬁle ﬁelds into two types: number type and byte array type. Deﬁnitions of the two types are as follows: Number type (NT): In most cases, NT is used as number while the software system parses ﬁles. In x86 CPU [24], most number types are saved as binary values of 2 or 4 bytes. NT correlates to security problems such as buffer overﬂow vulnerability. A software system parses NT as the size of buffer or the relative address (offset) of the ﬁeld position in memory. For instance, the ﬁrst ﬁeld of a record in the WMF ﬁle format is the size of the record. It is related to the size of the allocated memory buffer and the relative address of position that a parsing module will access the next time. In many cases of security problems, these values are used to make a buffer overﬂow vulnerability. Therefore, we focus on NT ﬁelds for successful security testing. Byte array type (BAT): We deﬁne BAT as all ﬁelds except NT. It can be a set of number types, byte types, data structures, strings, and so on. We did not insert faults into BAT ﬁelds in our experiment, because BAT ﬁelds have less inﬂuence on security problems than NT ﬁelds. In this paper, we focus on NT type ﬁelds, because the type is related to security holes such as buffer overﬂow. Fig. 5 shows an

Fig. 5. An example of generating a buffer overﬂow when a parsing module receives a wrong value.

example of generating a buffer overﬂow when a parsing module receives a wrong value. We write a pseudo code of a parser that parses ﬁelds in ﬁle data. In the code, the parser allocates a buffer with SIZE bytes. When the parser copies Field_1_Data into the buffer, if the value of Field_1_Size is changed by a malicious user, it causes a buffer overﬂow because parser copies many more data into the smaller size buffer. With a ﬁle format parsing engine, ﬁelds are transmitted to parameters of parsing functions. Therefore, if we trace the parameters of parsing functions and registers, we can divide a ﬁle into various ﬁelds. The relation between data in the stack frame and binary ﬁle data in memory is shown in Fig. 6. In x86 CPU, a software system stores data by four bytes in a stack frame and data in the stack frame are number or pointer of data in memory. The data are transmitted to the parameters of parsing functions and the parameters of parsing functions have three types: Address, Number, and Generated value. The three types are as follows: TYPE 1 (Address): This is the address that points to the BAT ﬁeld in memory. As stated above, the BAT ﬁeld can be a set of NT ﬁelds, data structures, and strings. So, using the Address enables BFAFI to trace the BAT ﬁelds continuously until the BAT ﬁelds can be divided into NT ﬁelds. If the remains of the BAT ﬁeld cannot be divided any more, we regard the remains as BAT ﬁelds. TYPE 2 (Number): We deﬁne the two (16 bit) or four (32 bit) bytes data as number type. Such data is stored in the stack

Fig. 6. Relation between data in the stack frame and binary ﬁle data in memory. During parsing of the ﬁle, data is divided into various ﬁelds.

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

263

Table 1 Algorithm for extracting ﬁeld information from record data. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

INPUT: assembly code (assem): log data from tracing execution of a program, register (param): related to parameter of parsing function referring the ﬁle record data OUTPUT: ﬁeld information set (ﬁelds). For ith ﬁeld, (1) ﬁelds[i].start – position of start byte of ﬁeld (2)ﬁelds[i].size – size of ﬁeld (3) ﬁelds[i].type – type of ﬁeld Extract_Field (assem, param) { Insert param into reg; s = 1; /* reg is the string array and s is a index of ﬁelds */ for(i = 1; I <= L; i++) { /* L: the number of assembly code (assem) lines */ For (j = 1; j <= M; j++) { /* M : size of reg*/ if(assem[i] includes reg[j]) { /* reg[j] is the address referring ﬁle data */ if(assem[i] accesses memory of reg[j]) { /* assem[i] reads ﬁle data */ Insert the address that assem[i] accesses into ﬁelds[s].start; s++; } else if(assem[i] copies reg[j] into R) { /* R is another register */ Insert R into reg; } break; } } } Insert size of ﬁle record data into ﬁelds.size; /* for calculating the size of last ﬁeld */ for (k = 1; k <= N 1; k++) { /* N: size of ﬁelds*/ ﬁelds[k].size = ﬁelds[k + 1]. start ﬁelds[k].start; if(ﬁelds[k].size is 2 or 4) ﬁelds[k].type = NT; else ﬁelds[i].type = BAT; } return reg }

frame directly. The position and the size of number type are saved in order to perform fuzz testing. TYPE 3 (Generated value): This is the value generated by the software system while parsing the binary ﬁle. It can be NT or BAT. However, it is not directly related to ﬁelds of the binary ﬁle. That is why we did not consider the Generated value type parameters in the automatic protocol analysis. For instance, various ﬁle handles are generated while the Windows system parses ﬁles. Table 1 describes our algorithm for extracting ﬁelds from ﬁle record data. After we write the log ﬁle during program execution, we analyze the log ﬁle using this algorithm. The log ﬁle includes all assembly codes of a parsing function that the BFAFI traces. The inputs of the algorithm are assembly codes and register, which is a parameter of the parsing function (line 1). param is the register referring ﬁle record data. Our algorithm generates information about ﬁelds and consists of start position of ﬁeld, size of ﬁeld, and type of ﬁeld (line 2). First, we save param to reg (line 4). reg is a string array including all registers related to parameters referring to the ﬁle record data. Based on param, we trace all registers related to ﬁle record data. In the next step, we check whether registers in an assembly code are related to ﬁle record data or not by comparison of reg with assem line by line (lines 5–7). If an assembly code line includes a register in reg, we analyze the code line. If the code accesses other address of the ﬁle record data using related register, we insert the address into the ﬁelds.start (lines 8–10). For example, if reg includes ebx register and assembly code is mov cx, [ebx+0x4], ebx is the start address of the ﬁle record data and the code accesses the 5th byte from the start of the ﬁle record data. We determine that the position [ebx+0x4] is the start of the following ﬁeld. If assembly code copies register in reg into the new register, we update reg with that new register (lines 11 and 12). The next step is to calculate the size of ﬁeld and select the type of ﬁeld. We set the size of ﬁeld as the difference between ﬁelds[k + 1].start and ﬁelds[k].start (line 20). If the size of ﬁeld is 2 or 4 bytes, we set the ﬁeld type as NT type. We set the rest of the ﬁeld as BAT type (lines 21 and 22).

5. File fuzz testing for WMF ﬁle format: system design and experiment results In this section, we explain the system design and experiment results. We applied the algorithm proposed in Section 4 to the WMF (Windows MetaFile) image ﬁle format [4], and implemented a debugger that can trace and analyze ﬁelds of the WMF ﬁle format automatically. We named the debugger Binary File Analyzer and Fault Injector (BFAFI). After performing fuzz testing using the BFAFI, we found 11 causes of exceptions in the graphic rendering engine in and system library of Windows XP SP2. 5.1. System design: automatic analysis of WMF ﬁle format WMF is the vector graphic image ﬁle format designed by Microsoft. The WMF ﬁle consists of a WMF ﬁle header and several records. The record is composed of three parts: the size of the record, the function number, and the parameters of the function. According to the function number, the number and the size of the function parameters are ﬁxed. In order to automatically analyze ﬁelds of the WMF ﬁle format, we implemented the debugger (BFAFI), which traces the parsing process of the WMF ﬁle in the software system. The BFAFI automatically analyzes the ﬁeld position in the WMF ﬁle and the ﬁeld type such as NT and BAT. We implemented the BFAFI using WinDbg Extension [25]. The architecture of the BFAFI is shown in Fig. 7. It divides the WMF ﬁle format into each ﬁeld automatically and inserts fault data into the NT ﬁelds of the WMF ﬁle. The modules of the BFAFI are as follows: Binary File Format Analyzer (BFA) traces parsing process of the WMF ﬁle and analyzes the ﬁeld position and ﬁeld type of the WMF ﬁle format. The BFA is composed of a Software Monitor (SM) and a Log Analyzer (LA). First, the SM traces the execution of the software system while it parses the WMF ﬁle format, and writes a log ﬁle about the execution, such as the values of registers, and instructions. The SM monitors all instructions of parsing functions and traces CPU register related to them. When parsing functions are called, the SM writes the

264

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

Fig. 7. Architecture of Binary File Analyzer and Fault Injector tool (BFAFI).

Table 2 Monitoring rule for WMF ﬁle format. hPROGRAM name=‘IE’ path=‘C:nProgram FilesnInternet Explorerniexplore.exe’i hDLLFILE name=‘gdi32’ func_number=‘1’i hFUNCTION name=‘PlayMetaFileRecord’ para_number=‘4’i hPARAM name=‘hdc’ type=‘ Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i h/FUNCTIONi h/DLLFINEi h/PROGRAMi

values of registers, assembly codes, and parameters in the log ﬁle. Using written execution information, the LA analyzes the log ﬁle and extracts ﬁeld information from the WMF ﬁle records. Because it a takes long time for the BFA to trace and

analyze the WMF ﬁle format, the LA analyzes the log ﬁle after the SM writes the log ﬁle about the execution of parsing functions. And then, the LA generates information for ﬁelds of ﬁle format and transfers the information to the Fault Injection tool. Fault Injection Tool (FIT) inserts fault data into the ﬁelds of the WMF ﬁle considering ﬁle format, and executes the faultinserted ﬁles automatically. This module performs fuzz testing. When the Fault Injector (FI) inserts faults into the WMF ﬁle, it considers two types (NT, BAT) of ﬁelds of the WMF ﬁle format. By doing this, the BFAFI can lower the number of fault-inserted ﬁles with maximum test case coverage. After the Exception Monitor (EM) executes the target software with a fault-inserted ﬁle, it monitors exception events of the target software. If the target software generates an exception event, the EM holds the event and analyzes it. We deﬁned a conﬁguration rule for the BFAFI using XML as shown in Table 2. The conﬁguration rule represents the functions

Fig. 8. An example of analyzing a ﬁeld of the WMF ﬁle format.

265

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

related to a parsing module. The PROGRAM tag represents the program name that SM in the BFA traces, and DLLFILE represents a DLL ﬁle that includes parsing functions. In the PROGRAM tag, we deﬁne the target parsing function. The attributes in the PARAM tag are name, type and size of parameter of the parsing function. The types of parameter are explained in Section 4. We explain an example in which the BFAFI analyzes the ﬁelds of the WMF ﬁle format in Fig. 8. The BFAFI traces a graphic rendering engine of Internet Explorer (IE) in order to analyze the WMF ﬁle format. IE processes the WMF ﬁle using GDI32.DLL. Among the various functions in the DLL ﬁle, PlayMetaFileRecord( ) is the main function for parsing the WMF ﬁle and this function divides a record into ﬁelds. The record includes information that a WMF function uses such as function number, record size, and parameters. Therefore, PlayMetaFileRecord( ) extracts the information in the record and we select the function as the target function. In the BFA, the Software Monitor (SM) traces IE during the reading of a WMF ﬁle, and writes a log when IE calls PlayMetaFileRecord( ). The log includes the values of registers, assembly code, and stack data. After IE parses the WMF ﬁle, the Log Analyzer (LA) starts to analyze the log. Fig. 8 is an example of analyzing a ﬁeld of the WMF ﬁle format. We reconstruct the log data for explanation. The SM knows that the third parameter of the PlayMetaFileRecord( ) refers to a ﬁle record, and writes the data of the ﬁle record. The ﬁle record data is 0x04 0x00 0x00 0x00 0x03 0x01 0x08 0x00. STACK emulates the stack frame using push and pop instructions. In push 0x130, the LA inserts 0x130 into STACK. In assembly codes, the third parameter is represented as [ebp+0x10]. First, the LA searches [ebp+0x10] in the log data. Assembly code mov ebx, [ebp+0x10] shows that the address of the third parameter is saved in the ebx register, and then LA searches ebx and

[ebp+0x10], because the two values are related to the third parameter. In mov edi, [ebx], the WMF parser accesses the ﬁrst byte of record data. The LA recognizes that the parser reads the ﬁrst byte of the record for parsing and that this is the start position of a ﬁeld. Next, LA ﬁnds ebx is in mov cx [ebx+0x4]. [ebx+0x4] represents that the parser accesses 5th byte of the record. The LA recognizes that the 5th byte is the start of the following ﬁeld. After the LA analyzes all the instructions in the log, it extracts the ﬁelds based on the start positions of the ﬁelds. In Fig. 8, from 0 byte to 4th byte, it is a ﬁeld, and its type is NT (number type) because its size is 4 bytes. In this case, the rest of the record data are divided into two ﬁelds such as 0x03 0x01 and 0x08 0x00. In the same way, the LA analyzes the WMF ﬁle format continuously. 5.2. Experiment results: fuzz testing for WMF ﬁle format We performed fuzz testing with 10 general WMF ﬁles using the BFAFI. First, we conﬁgured a rule as shown in Table 2 and analyzed ﬁelds of the WMF ﬁle format using the BFAFI. We focused on the PlayMetaFileRecord function because it is the entry function of the parsing engine for the WMF ﬁle format. Thus, the BFAFI traced the execution of PlayMetaFileRecord function. Automatic Field Analysis: BFA in the BFAFI analyzes target WMF ﬁle and divides data in ﬁle into various ﬁelds. The WMF ﬁle consists of the WMF header record and several WMF records [26]. Also, the WMF record consists of the record size ﬁeld, record function ﬁeld, and several parameter ﬁelds, as shown in Fig. 9. Fig. 10 shows the size of the accurately analyzed regions in records using the BFA in comparison with the real size of the record. The X-axis represents a function number of a record. The blue bar denotes the real record size and the red bar denotes the size of the accurately

Fig. 9. Windows MetaFile (WMF) ﬁle format.

Comparison of Real Record Size and Analyzed Record Size 60

50

Size of Field

40

30

20

0xFC

0xFB

0xFA

0xF0

0x32

0x2E

0x2D

0x27

0x25

0x24

0x1E

0x14

0x0C

0x0B

0x09

0x07

0x06

0x04

0x03

0x02

0

0x01

10

Function Number Real Record Size

Analyzed Record Size

Fig. 10. Comparison of real record size and analyzed region size of the record using BFA. The blue bar is the real record size and the red bar is the size of analyzed area of using BFA. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

266

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

analyzed region using the BFA. Therefore, the difference between the two bars represents the region that BFA cannot analyze. For instance, in case of an ExtTextOut (0x32) record, its real size is 26 bytes and the positions of the ﬁelds are ‘0 4 6 8 10 12’, as analyzed by the BFA. This analysis result shows that the BFA analyzed the region from the 0 byte to the 12th byte and could not analyze the region from the 12th byte to the end of the record. Therefore, the BFAFI divided the record into ﬁve NT ﬁelds and one BAT ﬁeld. We assumed that the last BAT ﬁeld was an error, because we did not know whether the ﬁeld could be divided into NT ﬁelds or not. The BFA analyzed 17 records exactly among the 21 records that compose the 10 target WMF ﬁles. Partially analyzed records are: SetBkColor (0x01), SetTextColor (0x09), ExtTextOut (0x32), and CreateFontIndirect (0xFB). The accuracy of the analysis for the SetBkColor record is 40.0%, for the SetTextColor is 40.0%, for the ExtTextOut is 46.2%, and for the CreateFontIndirect is 41.1%. In this paper, our algorithm uses the pattern matching method, which recognizes ﬁelds based on information of the memory address that the parsing function accesses. So, our algorithm has a limitation for covering all cases and our experiment showed some errors that the BFAFI cannot analyze. However, though the BFAFI has some errors, it analyzed many records exactly. Fault-inserted File Generation: FIT in the BFAFI generates fault-inserted ﬁles using information of ﬁelds of the WMF ﬁle format that the BFA analyzed. Table 3 shows results analyzed by BFA and the number of fault-inserted ﬁles generated by the BFAFI and FileFuzz for each WMF ﬁle. In this paper, we use FileFuzz as a general fuzzer for comparison with BFAFI, because FileFuzz is the most representative ﬁle fuzzer. As mentioned above, we utilize only NT (number type) ﬁelds and inserted boundary values (B) into the ﬁelds. NT consists of two types: two bytes (NT2) and four bytes (NT4). We generated fault-inserted ﬁles by using the boundary val-

ues of the two types. Therefore, in four bytes, B includes f231 ; 215 ; 1; 0; 1; 215 1; 231 1g, and in two bytes, B includes f215 ; 1; 0; 1; 215 1g [27]. These values represent the boundary value of number. We did not insert fault data into BAT types. For instance, the number of ﬁelds in 1.wmf ﬁle is 17,259. In these ﬁelds, the number of NT ﬁelds is 17,232 (NT2 = 16,902, NT4 = 330) and the number of BAT is 27. Focusing on NT type ﬁelds, the total number of fault-inserted ﬁles is 86; 820 ð¼ 16; 902 5 þ 330 7Þ. Exception Analysis: We performed fuzz testing for Internet Explorer (IE) in Windows XP SP2 using fault-inserted ﬁles, as shown in Table 3. By opening the HTML document (

), we executed fuzz testing for the WMF parsing engine. We focused on the Abort failure among the CRASH scale categories [28]. The Abort failure is the state in which a program is terminated abnormally. In the Windows system, this failure is related to an Access Violation (AV) exception, which can make a security hole. We found various exceptions in the graphic rendering engine (GDI32.dll) and the system library (kernel32.dll) of the system. We named the exceptions EX1, EX2, . . . , EX11. Fig. 11 compares the number of exceptions found by the BFAFI and FileFuzz. The results are based on the fuzz testing with the same number of fault-inserted ﬁles, which are generated by the BFAFI and FileFuzz, respectively, to enable exact comparison. In the case of 1.wmf, we performed fuzz testing with 86,820 fault-inserted ﬁles for the BFAFI and with randomly selected 86,820 faultinserted ﬁles for FileFuzz. As shown in Fig. 11, the BFAFI found approximately 14 times more exceptions than FileFuzz with the same number of fault-inserted ﬁles and there are many exceptions found only by the BFAFI. For instance, while FileFuzz found only 43 exceptions for EX1 in 1.wmf ﬁle, the BFAFI found 340 exceptions. Next, we compared the number of causes of exceptions which are found by the BFAFI and FileFuzz respectively. As shown in

Table 3 Analysis results from BFA and the number of fault-inserted ﬁles generated by BFAFI and FileFuzz. NT2 means two byte size of NT and NT4 means four byte size of NT. (Field ¼ NT2 þ NT4 þ BAT; BFAFI ¼ NT2 5 þ NT4 7). Target ﬁles

Analyzed result by BFA

Number of fault-inserted ﬁles

File name

File size (KB)

Field

NT2

NT4

BAT

BFAFI

FileFuzz

l.wmf 2.wmf 3.wmf 4.wmf 5.wmf 6.wmf 7.wmf 8.wmf 9.wmf 10wmf

35 3 14 6 8 1 9 37 4 8

17,259 766 6212 2566 3113 149 4133 18,265 1294 3404

16,902 476 5722 2163 2666 80 4029 18,112 1069 3070

330 202 462 367 421 31 77 126 169 298

27 88 28 36 26 38 27 27 56 36

86,820 3,794 31,844 13,384 16,277 617 20,684 91,442 6,528 17,436

9,175,040 786,432 3,670,016 1,572,864 2,097,152 262,144 2,359,296 9,699,328 1,048,576 2,097,152

(a)

(b)

Fig. 11. Total number of exceptions found by the BFAFI (a) and FileFuzz (b).

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

267

parsing module and knew that the PlayMetaFileRecord function was related to the WMF parsing module. If we know the parsing functions for various ﬁle formats, we can divide the ﬁle data into ﬁelds. However, analyzing a parsing module is tedious and time consuming. In order to overcome this limitation, we have been developing the extended BFAFI, which traces all functions of a parsing module and automatically reconstructs control ﬂow of parsing functions. That way, the BFAFI is able to perform the whole process of ﬁle fuzz testing while automatically considering the ﬁle format. Fig. 12. Number of causes of exception found by the BFAFI and FileFuzz. I is the case of the BFAFI and II is the case of FileFuzz.

Table 4 Causes of exceptions found by the BFAFI. It shows position of assembly code in memory and assembly codes that causes exceptions. Name

Position

Assembly code

EX1 EX2 EX3

GDI32!PlayMetaFileRecord+0xcfa GDI32!PlayMetaFile+0x262 GDI32!PlayMetaFileRecord+0x2bd

EX4

GDI32!PlayMetaFileRecord+0x130

EX5

GDI32!PlayMetaFileRecord+0x458

EX6

GDI32!PlayMetaFileRecord+0x712

EX7

GDI32!PlayMetaFileRecord+0x59f

EX8 EX9 EX10 EX11

GDI32!PlayMetaFileRecord+0xda3 kernel32!MultiByteToWideChar+0x7e7 GDI32!PlayMetaFileRecord+0xd95 GDI32!PlayMetaFileRecord+0x56b

movzx ecx,word ptr [eax] movsx ecx,word ptr [ebx] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] movsx ecx,word ptr [ebx+eax2-4] movsx ebx,word ptr [edx] movzx ecx,byte ptr [edi] movsx ebx,word ptr [edx] movsx edx,word ptr [ebx]

Fig. 12, the BFAFI found more causes of exception than FileFuzz. There are ﬁve causes of exceptions that are found only by the BFAFI: EX3, EX5, EX8, EX9, and EX10. In fuzz testing, ﬁnding causes of exceptions depends on the test case coverage of the fuzzer. Thus, we are able to ﬁnd that the BFAFI can maximize test case coverage with the same number of fault-inserted ﬁles. Last, we analyzed the causes of exception that the BFA found by debugging the exceptions. Using symbol ﬁles provided by Microsoft, we found the address of memory and positions in modules where the exceptions are generated as shown in Table 4. Ten of 11 exceptions are generated by a graphic rendering engine (GDI32.dll), and the other is generated by the system library (kernel32.dll), as processing for a string in a graphic rendering engine affects the exception. In Table 4, the position column represents memory position of assembly code, which causes exceptions. This assembly code is located on the same row in the assembly code column. Because Microsoft products are not open source software, we analyzed the exceptions only at the assembly code level. All exceptions are generated by wrong memory access when the code reads data in memory. In the case of EX1, eax register accesses memory address that operating system did not allocate and eax causes an Access Violation (AV). AV is an exception related to memory in Windows OS. 6. Limitation and future work We proposed a methodology that extracted ﬁeld information of a ﬁle format by tracing its parsing functions. The limitation of our method is that the parsing function for ﬁle format should be identiﬁed in advance. In our experiments, we manually analyzed the

7. Conclusion The algorithm we propose successfully reduces the number of fault-inserted ﬁles with maximum test case coverage. It performs automatic analysis of binary ﬁle formatted ﬁelds and inserts fault data into the binary ﬁle based on its format in the ﬁelds. To do so, we trace and analyze stack frames and assembly codes while the software system reads the binary ﬁle. To evaluate our method, we applied it to the WMF ﬁle format and employed a debugger called the Binary File Analyzer and Fault Injector (BFAFI). This practical tool performs the trace as the program executes; it also analyzes ﬁelds of the WMF ﬁle automatically. In the ﬁeld analysis experiment, the BFA analyzed exactly 17 records out of 21 records, which composed 10 target WMF ﬁles. Also, the BFAFI reduced the total number of fault-inserted ﬁles with maximum test case coverage. The BFAFI generated 86,820 fault-inserted ﬁles with a 35 KB WMF ﬁle; this was only 0.95% (=86,820/9,175,040) of the ﬁles required by FileFuzz. Exception analysis experiments showed that the BFAFI was capable of ﬁnding about 14 times more exceptions than FileFuzz, while needing the same number of fault-inserted ﬁles. Further many of the exceptions were detected only by the BFAFI. Also, the BFAFI captured 11 causes of exceptions, ﬁve of which were captured only by the BFAFI. Based on the fact that a fuzzer should have good test coverage to ﬁnd causes of exceptions well, we demonstrated that the BFAFI was able to maximize test case coverage with the same number of fault-inserted ﬁles. In other words, if we want to get the same number of exceptions and causes of exceptions as the BFAFI by using a general fuzzer, a larger number of fault-inserted ﬁles are required. Among 11 causes of exceptions, 10 were found in the graphic rendering engine (GDI32.dll) and one in the system library (kernel32.dll) of Windows XP SP2. References [1] K.R. Van Wyk, Software security: integrating security tools into a secure software development process, in: 19th Annual FIRST Conference, 2007. [2] A. Takanen, J.D. Demott, C. Miller, FUZZING for Software Security Testing and Quality Assurance, Artech House, 2008. [3] FILExt – The File Extension Source, . [4] C. Petzold, Programming Windows, ﬁfth ed., Microsoft Press, 1998. pp. 1097– 1170. [5] B.P. Miller, L. Fredrikson, B. So, An empirical study of the reliability of unix utilities, Communication of the ACM 33 (12) (1990) 32–44. [6] M. Sutton, A. Greene, P. Amini, Fuzzing: Brute Force Vulnerability Discovery, Addison-Wesley, 2007. [7] J.M. Voas, G. McGraw, Software Fault Injection: Inoculating Programs Against Errors, John Wiley and Sons, 1998. [8] P. Oehlert, Violating assumptions with fuzzing, IEEE Security and Privacy Magazine 3 (2) (2005) 58–62. [9] J.E. Forrester, B.P. Miller, An empirical study of the robustness of windows NT applications using random testing, in: 4th USENIX Windows System Symposium, August 2000. [10] B.P. Miller, G. Cooksey, F. Moore, An empirical study of the robustness of MacOS applications using random testing, in: International Symposium on Software Testing and Analysis, July 2006. [11] FILEFUZZ, . [12] M. Sutton, A. Greene, The art of ﬁle format fuzzing, in: Blackhat USA Conference, 2005. [13] SPIKEﬁle, .

268

H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268

[14] D. Aitel, The Advantage of Block-based Protocol Analysis for Security Testing, Immunity Inc., 2002. [15] PROTOS, . [16] M. Vuagnous, Autodafe: an act of software torture, in: 22nd Chaos Communication Congress, 2005. [17] J. Caballero, P. Poosankam, C. Kreibich, D. Song, Bidirectional Protocol Reverse Engineering: Message Format Extraction and Field Semantics Inference, Technical Reports (EECS-2009-57), 2009. [18] P.M. Comparetti, G. Wondracek, C. Kruegel, E. Kirda, Prospex: protocol speciﬁcation extraction, in: IEEE Symposium on Security and Privacy, 2009. [19] W. Cui, J. Kannan, H.J. Wang, Discoverer: automatic protocol reverse engineering from network traces, in: 12th USENIX Security Symposium, 2007. [20] J. Caballero, D. Song, Polyglot: automatic extraction of protocol format using dynamic binary analysis, in: 14th ACM Conference on Computer and Communications Security, 2007. [21] W. Cui, M. Peinado, K. Chen, H.J. Wang, L.I. Briz, Tupni: Automatic reverse engineering of input formats, in: 15th ACM Conference on Computer and Communications Security, 2008. [22] Z. Lin, X. Jiang, D. Xu, X. Zhang, Automatic protocol reverse engineering through context-aware monitored execution, in: 15th Symposium on Network and Distributed System Security, 2008. [23] G. Wondracek, P.M. Comparetti, C. Kruegel, E. Kirda, Automatic network protocol analysis, in: 15th Symposium on Network and Distributed System Security, 2008. [24] IA-32 Intel Architecture Optimization Reference Manual, Intel Corporation, June 2005. [25] Debugging Tools for Windows, . [26] Microsoft Corporation, Windows Metaﬁle Format Speciﬁcation, Microsoft Developer Network Library, 2009. [27] M. Schmid, F. Hill, Data generation techniques for automated software robustness testing, in: 6th International Conference on Testing Computer Software, 1999.

[28] P. Koopman, J. Sung, C. Dingman, D. Siewiorek, T. Marz, Comparing operating systems using robustness benchmarks, in: 16th Symposium on Reliable Distributed Systems, 1997. Hyoung Chun Kim is currently a senior research engineer in the Attached Institute of Electronics and Telecommunications Research Institute. His research interests include software security, intrusion detection systems, and network security. He received his B.Sc. and M.Sc. degrees in computer science from Korea University, Korea, in 1999 and 2001, respectively. Young Han Choi is currently a research engineer in the Attached Institute of Electronics and Telecommunications Research Institute. His research interests include software security, intrusion detection systems, and operating system. He received his B.Sc. and M.Sc. degrees in electronic engineering from Hanyang University and Korea Advanced Institute of Science and Technology, Korea, in 2002 and 2004, respectively.

Dong Hoon Lee holds the Professor and the Director of Academic Affairs of Graduate School of Information Security of the Korea University at Seoul in Korea. His research interests include information security, cryptology, and ubiquitous security. He received a B.S. (1985) of Economics at the Korea University and M.Sc. (1988) degrees from the University of Oklahoma, and a Ph.D. (1992) from the University of Oklahoma.

Efficient file fuzz testing using automated analysis of binary file format

Efficient file fuzz testing using automated analysis of binary file format

Recommend Documents