Journal of Systems Architecture 57 (2011) 259–268
Contents lists available at ScienceDirect
Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc
Efficient file fuzz testing using automated analysis of binary file format Hyoung Chun Kim a, Young Han Choi a, Dong Hoon Lee b,* a b
The Attached Institute of Electronics and Telecommunications Research Institute (ETRI), P.O. Box 1, Yuseong Post Office, Daejeon 305-600, Republic of Korea The Graduate School of Information Management and Security, Korea University, Anam-dong, Sungbuk-ku, Seoul 136-701, Republic of Korea
a r t i c l e
i n f o
Article history: Received 15 October 2009 Received in revised form 27 February 2010 Accepted 17 March 2010 Available online 20 March 2010 Keywords: Software testing Fuzzing Security testing
a b s t r a c t Fuzz testing is regarded as the most useful technique in finding serious security holes in a software system. It inserts unexpected data into the input of the software system and finds the system’s bugs or errors. However, one of the disadvantages that fuzz testing executed using binary files has is that it requires a large number of fault-inserted files to cover every test case, which could be up to 28FILESIZE files. In order to overcome this drawback, we propose a novel algorithm that efficiently reduces the number of fault-inserted files, yet still maintain the maximum test case coverage. The proposed approach enables the automatic analysis of fields of binary files by tracking and analyzing stack frames, assembly codes, and registers as the software system parses the files. We evaluate the efficacy of the new method by implementing a practical tool, the Binary File Analyzer and Fault Injector (BFAFI), which traces the program execution and analyzes the fields in binary file format. Our experiments demonstrate that the BFAFI reduced the total number of fault-inserted files with maximum test case coverage as well as detected approximately 14 times more exceptions than did the general fuzzer. Also, the BFAFI found 11 causes of exceptions; five of them were found only by BFAFI. Ten of the 11 causes of exceptions that we found were generated by a graphic rendering engine (GDI32.dll); the other was generated by the system library (kernel32.dll) in Windows XP SP2. Ó 2010 Elsevier B.V. All rights reserved.
1. Introduction It is common that attackers use vulnerabilities relative to file parsing of software systems. Such vulnerabilities allow an enemy into the system simply by opening a file with an exploit code. Recently, there have been an increasing number of file-related vulnerabilities found in web browsers, office software, email clients, and multi-media players. The risks of such vulnerabilities have become more serious due to content sharing trend in the Web 2.0 era. Zero-day attacks especially are found to be becoming widespread before the security patches are in place. Therefore, it is very critical to identify software security problems by exploiting security testing before the attackers do. Among a number of various security testing ideas, fuzz testing is the most useful in finding serious security holes in a software system. For example, Microsoft finds about 20–25% of its security bugs by fuzzing a product before it is shipped [1]. Fuzz testing as a security test inserts random data or faults into the input of the software system and finds software exceptions [2]. Similarly, file fuzz testing inserts fault data into files, has the software read them, and detects its exceptions.
* Corresponding author. E-mail addresses:
[email protected] (H.C. Kim),
[email protected] (Y.H. Choi),
[email protected] (D.H. Lee). 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.03.002
In order to cover all test cases, file fuzz testing must generate a large number of fault-inserted files, which could number up to 28FILESIZE . FILESIZE is the total size of a file in bytes. In order to lower the total number of the fault-inserted files for testing, the file formats should be analyzed; however, this is a time consuming task based on the fact that there are over 14,000 file formats [3], and most of them are not open to the public. In this paper, we propose a new methodology that reduces the quantity of fault-inserted files used for file fuzz testing by automatically analyzing their binary file formats, thus maximizing test case coverage. To do so, we divided the binary file formats into fields, a minimum logically meaningful unit in file processing, and inserted fault data into the fields by data type. As a result, we decreased the total number of fault-inserted files used yet still maintained the maximum test case coverage. In order to analyze fields of file formats automatically, we traced the execution of the software system while parsing the target file. Data in the binary file is divided into fields and these fields are transmitted to parameters of parsing functions. Keeping this in mind, we implemented a practical tool to trace and analyze the execution of the software system. We evaluated the verification of our approach by applying it to the WMF image file format [4], and implemented a debugger that traces and analyzes the execution of WMF file parsing software. We named the debugger Binary File Analyzer and Fault Injector (BFAFI).
260
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
The BFAFI detected many more exceptions than general fuzzers requiring the same number of fault-inserted files and many of the exceptions were found only by the BFAFI. Also, the BFAFI found many more causes of exceptions than general fuzzers. In other words, general fuzzers need to input a larger number of fault-inserted files to identify the same number of exceptions and their causes detected by the BFAFI. In this paper, our methodology focuses on the buffer overflow vulnerability because it is the most common and fatal problem in software systems. The key contributions of this paper are as follows: We propose a methodology that efficiently reduces the total number of fault-inserted files for file fuzz testing by analyzing the parsing mechanism of the software system in real time. To prove the efficacy of our approach, we utilized a practical tool, BFAFI, which analyzed the parsing of the WMF file format and automatically divided its fields. The BFAFI was capable of finding many more exceptions than a general fuzzer with the same number of fault-inserted files including exceptions only detected by the BFAFI. BFAFI found many more causes of exceptions than the general fuzzer did. Ten of 11 exceptions found were generated by a graphic rendering engine (GDI32.dll) and the other was generated by a system library (kernel32.dll) in Windows XP SP2. The rest of the paper is organized as follows. Section 2 introduces and compares related works with our research. Section 3 explains the problems of binary fuzz testing. Section 4 proposes a new method that automatically divides binary file data into fields. Section 5 shows how the new method can be implemented with WMF file format, and the performance of the BFAFI with experiment results. Section 6 discusses limitations of our research and their solutions, and, finally, we conclude this paper in Section 7. 2. Related work Fuzz Testing is a method that inserts faults into the input data of the software system in order to find exceptions of software [5– 7]. These exceptions can make security vulnerabilities in the software system. Fig. 1 depicts the general methodology of fuzz testing. Input is the entry-point of fuzz testing and can be various types such as files, configurations, registry entries, APIs, user interfaces, and network interfaces [8]. Output refers to the result of the software system’s processing of the fuzzed data as input. By utilizing a linear process, fuzz testing monitors any exceptions or crashes that the software system experiences. In our research, we focus on binary files as input. The binary file format is not readable because it consists of binary codes such as 0xD3 0x3E. Binary file fuzz testing inserts unexpected data (faults) into the binary files to make fuzzed data (fault-inserted files), lets the software system read them, and finds its exceptions or crashes. Generally, fuzz testing has two types: generation and data mutation [8]. Generation makes test cases (fuzzed data) based on a specification after target protocol (or format) is analyzed completely. On the other hand, data mutation makes test cases by merely inserting faults into existing sample files.
Miller et al. [5] first introduced fuzz testing that inserts fault data randomly into the input of UNIX system utilities using data mutation. He performed fuzz testing continuously on Windows NT and MacOS [9,10]. Random data is easy to generate because it is not necessary to consider the input data format. As a result, when fuzz testing is performed, most random data are processed as errors. FileFuzz [11,12] and SPIKEfile [13] perform fuzz testing focusing on files using data mutation. FileFuzz executes fuzz testing by randomly inserting faults into the file without considering file format. SPIKEfile, based on SPIKE [14], which presented the framework for network fuzz testing, uses the same fault insertion method as FileFuzz. However, our methodology inserts fault data into the file considering the binary file format and consequently gets the maximum test case coverage by utilizing the minimum number of fault-inserted files. PROTOS [15] is the network fuzzer using the generation. PROTOS implements a network protocol in which fuzz testing is performed, and inserts fault data into the network packets. Due to the properties of PROTOS, user must analyze various network protocols manually, and this takes a very long time per target protocol. We also use semi-generation for fuzz testing, but our methodology automatically analyzes the binary file format using the BFAFI. Autodafe [16] inserts fault data into input by considering protocol fields that are transmitted to dangerous functions. In order to trace fault data, Autodafe uses a debugging mechanism. We also implemented our tool using the debugging mechanism, but our tool analyzes fields of binary files automatically. Automatic Protocol Reverse Engineering is an approach that analyzes protocol formats automatically focusing on network protocols and file formats. Due to the complexity of file formats, most of these approaches place the focus on network protocols. Recently, protocol reverse engineering has been used in various research fields: analysis for botnet communication protocols [17], fuzz testing for network protocols [18], and so on. Discoverer [19] is a tool for automatic reverse engineering focusing on network protocols. It clusters network messages with the same format together and infer the message format by comparing messages in a single cluster. However, it has limitations: trace dependency and pre-defined semantics. As stated in Discoverer [19], information from network traces decides the accuracy of results and only pre-defined protocol semantics can be recognized. Especially, trace-based approaches [19,20] become ineffective in the case of encrypted network traffic. Cui et al. [21] presented a tool that can analyze arbitrary record sequences of file formats and network protocols. After setting input data as a taint, they tracked the data for all instructions in loop modules. The limitation of this procedure is that the coverage of the reversed engineered format is dependent on the diversity of the input samples. Lin et al. [22] presented a tool that can extract network protocol fields by monitoring the program execution. Wondracek et al. [23] approached works by dynamically monitoring the execution of the application, and analyzing how the program processes the received messages. Using dynamic taint analysis, they monitored the propagation and processing of data in applications. While these approaches monitor the delimiters, loops, comparisons, etc. during program execution, our approach monitors the starting point of each field in files by tracing instructions and registers to find number type fields that are closely related to security problems. So, we can significantly reduce the fuzzing area in file fuzz testing.
3. Problem: Number of test cases for binary file fuzz testing
Fig. 1. Methodology of fuzz testing.
One of the problems file fuzz testing has is that it generates a large number of fault-inserted files to cover all test cases. The number of fault-inserted files is exponentially increased with the file
261
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
Fig. 2. File data expressed in bytes set. P i is the start position of Bi . Si is the size of Bi . Bi is the block defined by P i and Si . i and j are positive number. NB is the number of blocks.
size. In the worst case, it could generate up to 28FILESIZE files, many of which are not necessary for successful testing. A software system parsing a fault-inserted file, which is generated without considering the file format, might consider this file to be invalid before the inserted fault reached faulty pieces of code. Therefore, file fuzz testing does not need to generate all test cases in order to complete testing. Fig. 2 shows the binary file data expressed by byte set. We define a block as the area that fault data will be inserted into; the majority of files consist of multiple blocks. The ith block (Bi ) consists of the start position of the block (Pi ) and the size of the block (Si ). No blocks can overlap. B denotes the total area in which the fault data can be inserted. If B is determined, the total number of fault-inserted files is:
Total number of fault-inserted files / Number of data mutation case of B The total number of fault-inserted files is proportional to the number of data mutations of B. In other words, the test cases of B decide the number of fault-inserted files. The test case of B is changed based on the size of the fault-inserted area and the number of variations of byte data. The worst case occurs when NB ¼ 1, P 1 ¼ 1, S1 ¼ FILESIZE, and the byte data is changed from 0x00 to 0xFF. The total number of generated files in the worst case is 28FILESIZE . However, when the FILESIZE is big, it is difficult to generate a huge number of files during the testing. When faults are inserted without considering file formats, a large number of files must be generated to get high test case coverage. In FileFuzz [8], tester can decide the fault insertion position and the value of the fault. FileFuzz generates only one fault-inserted file per start position of fault. For instance, in order to test a 35 KB WMF file sufficiently, FileFuzz generates 9; 175; 040 ð¼ 35 1024 28 Þ fuzzed files. 28 means the number of cases that one byte generates from 00 to FF. However, our tool generates only 86,820 files, covering most test cases with the same
(a) 100
size WMF file. This is only 0:95% ð¼ 86; 820=9; 175; 040Þ of files generated by FileFuzz, as shown in Section 5. There is another problem in fuzz testing using a random function. Because the random function decides the start position of the fault insertion randomly, the fault inserted area can be duplicated. So, for high test case coverage, a huge number of fault-inserted files is needed. We experimented to show how much the duplicated cases occur using Random class of C# language. First, we define coverage as the ratio of the total amount of fault inserted area to file size with one byte fault. For example, 100% coverage means that the random function accesses at least once for all bytes. Fig. 3 shows test case coverage of fuzz testing using random function. There are two parameters in our experiments: One is the target file size labeled on the X-axis. The other is the number of called times of random function labeled on the legend. In (a), we targeted 1, 10, 100, and 1000 KB files, and performed fuzz testing by calling the random function 1000 and 10,000 times respectively. In (b), we called the random function 100,000 and 1,000,000 times respectively using files of 1, 2, 5, 10, 15, and 20 MB. If file size is 1 MB and the random function is called 100,000 times, the coverage is about 10%. Therefore, this experimental result shows that the larger the file size used, the more test case files are generated to cover all test cases; this process shows a rapid decrease of coverage with larger size files because of duplication of fault insertion. Therefore, analysis of file format must be done before inserting faults into files. Analyzing file format is the process of finding the number of (Bi ), the position of (P i ), and the size of (Si ). Because we can find the type of fields with our BFAFI tool, we just need to perform fuzz testing on the necessary positions with necessary values. By doing so, we can cut down the number of test cases as well as get high test case coverage. In the next section, we explain the method that automatically analyzes fields of binary files by tracing the parsing mechanism of the software system that processes binary files.
4. Automatic analysis of binary file format fields and implementation In this section, we propose a novel analysis algorithm that can divide the binary file format into fields to insert fault data into the file considering file formats. Our analysis algorithm makes it possible to get the maximum test case coverage efficiently with a small number of fault-inserted files compared with using another file fuzzer without considering file format. Fig. 4 shows our methodology for file fuzz testing, which analyzes file format automatically and finds an abnormal behavior of a program. Our research focuses on analysis of file format and making the minimum number of fault-inserted files with the maximum test case coverage. By tracing the execution of the program and analyzing the execution in the assembly level, we can extract field information of a sample file. After inserting faults into the sample
(b) 100 1,000 10,000
60 40 20 0
100,000
80
Coverage(%)
Coverage(%)
80
1,000,000
60 40 20
1
10
100
FileSize(KB)
1000
0
1
2
5
10
15
20
File Size(MB)
Fig. 3. Test case coverage using Random class in C#. Y-axis is the coverage of targeted file size. Random function is called 1000 and 10,000 times respectively using smaller size files in (a) and random function is called 10,000 and 1,000,000 times respectively using larger size files in (b).
262
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
Fig. 4. Our methodology for file fuzz testing.
file using the field information, we let the target program read the fault-inserted files. If the program generates an exception event, we analyze the abnormal behavior in order to find the cause of the exception. First, we explain the file parsing mechanism in the software system based on the file format. In order to use data in files meaningfully, software systems divide data into various fields. Field is a the minimum unit that presents data type in file format. For example, the BMP image file format stores images using bits (0 or 1) and the system divides file data into file header and bitmap data when it reads a BMP file. File header includes header size, size of bitmap, number of bits per pixel, and so on. Bitmap data has real image data that can be parsed based on file header. The software system loads the bitmap data into memory and displays in on the screen using parsed fields of bitmap data. In order to divide a binary file into fields, our analysis tool collects and analyzes the data which are generated during the parsing process by software systems, such as stack frames, assembly codes, and registers. Stack frames and registers indicate data in file, and assembly code is related to dividing the data into fields based on file format. Usually a file format parsing engine has a lot of functions and processes various data in run-time. Therefore, by using the trace of parsing functions and real-time data that are related to parsing, we can divide file data into fields based on file format. We classify file fields into two types: number type and byte array type. Definitions of the two types are as follows: Number type (NT): In most cases, NT is used as number while the software system parses files. In x86 CPU [24], most number types are saved as binary values of 2 or 4 bytes. NT correlates to security problems such as buffer overflow vulnerability. A software system parses NT as the size of buffer or the relative address (offset) of the field position in memory. For instance, the first field of a record in the WMF file format is the size of the record. It is related to the size of the allocated memory buffer and the relative address of position that a parsing module will access the next time. In many cases of security problems, these values are used to make a buffer overflow vulnerability. Therefore, we focus on NT fields for successful security testing. Byte array type (BAT): We define BAT as all fields except NT. It can be a set of number types, byte types, data structures, strings, and so on. We did not insert faults into BAT fields in our experiment, because BAT fields have less influence on security problems than NT fields. In this paper, we focus on NT type fields, because the type is related to security holes such as buffer overflow. Fig. 5 shows an
Fig. 5. An example of generating a buffer overflow when a parsing module receives a wrong value.
example of generating a buffer overflow when a parsing module receives a wrong value. We write a pseudo code of a parser that parses fields in file data. In the code, the parser allocates a buffer with SIZE bytes. When the parser copies Field_1_Data into the buffer, if the value of Field_1_Size is changed by a malicious user, it causes a buffer overflow because parser copies many more data into the smaller size buffer. With a file format parsing engine, fields are transmitted to parameters of parsing functions. Therefore, if we trace the parameters of parsing functions and registers, we can divide a file into various fields. The relation between data in the stack frame and binary file data in memory is shown in Fig. 6. In x86 CPU, a software system stores data by four bytes in a stack frame and data in the stack frame are number or pointer of data in memory. The data are transmitted to the parameters of parsing functions and the parameters of parsing functions have three types: Address, Number, and Generated value. The three types are as follows: TYPE 1 (Address): This is the address that points to the BAT field in memory. As stated above, the BAT field can be a set of NT fields, data structures, and strings. So, using the Address enables BFAFI to trace the BAT fields continuously until the BAT fields can be divided into NT fields. If the remains of the BAT field cannot be divided any more, we regard the remains as BAT fields. TYPE 2 (Number): We define the two (16 bit) or four (32 bit) bytes data as number type. Such data is stored in the stack
Fig. 6. Relation between data in the stack frame and binary file data in memory. During parsing of the file, data is divided into various fields.
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
263
Table 1 Algorithm for extracting field information from record data. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
INPUT: assembly code (assem): log data from tracing execution of a program, register (param): related to parameter of parsing function referring the file record data OUTPUT: field information set (fields). For ith field, (1) fields[i].start – position of start byte of field (2)fields[i].size – size of field (3) fields[i].type – type of field Extract_Field (assem, param) { Insert param into reg; s = 1; /* reg is the string array and s is a index of fields */ for(i = 1; I <= L; i++) { /* L: the number of assembly code (assem) lines */ For (j = 1; j <= M; j++) { /* M : size of reg*/ if(assem[i] includes reg[j]) { /* reg[j] is the address referring file data */ if(assem[i] accesses memory of reg[j]) { /* assem[i] reads file data */ Insert the address that assem[i] accesses into fields[s].start; s++; } else if(assem[i] copies reg[j] into R) { /* R is another register */ Insert R into reg; } break; } } } Insert size of file record data into fields.size; /* for calculating the size of last field */ for (k = 1; k <= N 1; k++) { /* N: size of fields*/ fields[k].size = fields[k + 1]. start fields[k].start; if(fields[k].size is 2 or 4) fields[k].type = NT; else fields[i].type = BAT; } return reg }
frame directly. The position and the size of number type are saved in order to perform fuzz testing. TYPE 3 (Generated value): This is the value generated by the software system while parsing the binary file. It can be NT or BAT. However, it is not directly related to fields of the binary file. That is why we did not consider the Generated value type parameters in the automatic protocol analysis. For instance, various file handles are generated while the Windows system parses files. Table 1 describes our algorithm for extracting fields from file record data. After we write the log file during program execution, we analyze the log file using this algorithm. The log file includes all assembly codes of a parsing function that the BFAFI traces. The inputs of the algorithm are assembly codes and register, which is a parameter of the parsing function (line 1). param is the register referring file record data. Our algorithm generates information about fields and consists of start position of field, size of field, and type of field (line 2). First, we save param to reg (line 4). reg is a string array including all registers related to parameters referring to the file record data. Based on param, we trace all registers related to file record data. In the next step, we check whether registers in an assembly code are related to file record data or not by comparison of reg with assem line by line (lines 5–7). If an assembly code line includes a register in reg, we analyze the code line. If the code accesses other address of the file record data using related register, we insert the address into the fields.start (lines 8–10). For example, if reg includes ebx register and assembly code is mov cx, [ebx+0x4], ebx is the start address of the file record data and the code accesses the 5th byte from the start of the file record data. We determine that the position [ebx+0x4] is the start of the following field. If assembly code copies register in reg into the new register, we update reg with that new register (lines 11 and 12). The next step is to calculate the size of field and select the type of field. We set the size of field as the difference between fields[k + 1].start and fields[k].start (line 20). If the size of field is 2 or 4 bytes, we set the field type as NT type. We set the rest of the field as BAT type (lines 21 and 22).
5. File fuzz testing for WMF file format: system design and experiment results In this section, we explain the system design and experiment results. We applied the algorithm proposed in Section 4 to the WMF (Windows MetaFile) image file format [4], and implemented a debugger that can trace and analyze fields of the WMF file format automatically. We named the debugger Binary File Analyzer and Fault Injector (BFAFI). After performing fuzz testing using the BFAFI, we found 11 causes of exceptions in the graphic rendering engine in and system library of Windows XP SP2. 5.1. System design: automatic analysis of WMF file format WMF is the vector graphic image file format designed by Microsoft. The WMF file consists of a WMF file header and several records. The record is composed of three parts: the size of the record, the function number, and the parameters of the function. According to the function number, the number and the size of the function parameters are fixed. In order to automatically analyze fields of the WMF file format, we implemented the debugger (BFAFI), which traces the parsing process of the WMF file in the software system. The BFAFI automatically analyzes the field position in the WMF file and the field type such as NT and BAT. We implemented the BFAFI using WinDbg Extension [25]. The architecture of the BFAFI is shown in Fig. 7. It divides the WMF file format into each field automatically and inserts fault data into the NT fields of the WMF file. The modules of the BFAFI are as follows: Binary File Format Analyzer (BFA) traces parsing process of the WMF file and analyzes the field position and field type of the WMF file format. The BFA is composed of a Software Monitor (SM) and a Log Analyzer (LA). First, the SM traces the execution of the software system while it parses the WMF file format, and writes a log file about the execution, such as the values of registers, and instructions. The SM monitors all instructions of parsing functions and traces CPU register related to them. When parsing functions are called, the SM writes the
264
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
Fig. 7. Architecture of Binary File Analyzer and Fault Injector tool (BFAFI).
Table 2 Monitoring rule for WMF file format. hPROGRAM name=‘IE’ path=‘C:nProgram FilesnInternet Explorerniexplore.exe’i hDLLFILE name=‘gdi32’ func_number=‘1’i hFUNCTION name=‘PlayMetaFileRecord’ para_number=‘4’i hPARAM name=‘hdc’ type=‘ Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i hPARAM name=‘hdc’ type=‘Generated value’ size=‘ ’i h/FUNCTIONi h/DLLFINEi h/PROGRAMi
values of registers, assembly codes, and parameters in the log file. Using written execution information, the LA analyzes the log file and extracts field information from the WMF file records. Because it a takes long time for the BFA to trace and
analyze the WMF file format, the LA analyzes the log file after the SM writes the log file about the execution of parsing functions. And then, the LA generates information for fields of file format and transfers the information to the Fault Injection tool. Fault Injection Tool (FIT) inserts fault data into the fields of the WMF file considering file format, and executes the faultinserted files automatically. This module performs fuzz testing. When the Fault Injector (FI) inserts faults into the WMF file, it considers two types (NT, BAT) of fields of the WMF file format. By doing this, the BFAFI can lower the number of fault-inserted files with maximum test case coverage. After the Exception Monitor (EM) executes the target software with a fault-inserted file, it monitors exception events of the target software. If the target software generates an exception event, the EM holds the event and analyzes it. We defined a configuration rule for the BFAFI using XML as shown in Table 2. The configuration rule represents the functions
Fig. 8. An example of analyzing a field of the WMF file format.
265
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
related to a parsing module. The PROGRAM tag represents the program name that SM in the BFA traces, and DLLFILE represents a DLL file that includes parsing functions. In the PROGRAM tag, we define the target parsing function. The attributes in the PARAM tag are name, type and size of parameter of the parsing function. The types of parameter are explained in Section 4. We explain an example in which the BFAFI analyzes the fields of the WMF file format in Fig. 8. The BFAFI traces a graphic rendering engine of Internet Explorer (IE) in order to analyze the WMF file format. IE processes the WMF file using GDI32.DLL. Among the various functions in the DLL file, PlayMetaFileRecord( ) is the main function for parsing the WMF file and this function divides a record into fields. The record includes information that a WMF function uses such as function number, record size, and parameters. Therefore, PlayMetaFileRecord( ) extracts the information in the record and we select the function as the target function. In the BFA, the Software Monitor (SM) traces IE during the reading of a WMF file, and writes a log when IE calls PlayMetaFileRecord( ). The log includes the values of registers, assembly code, and stack data. After IE parses the WMF file, the Log Analyzer (LA) starts to analyze the log. Fig. 8 is an example of analyzing a field of the WMF file format. We reconstruct the log data for explanation. The SM knows that the third parameter of the PlayMetaFileRecord( ) refers to a file record, and writes the data of the file record. The file record data is 0x04 0x00 0x00 0x00 0x03 0x01 0x08 0x00. STACK emulates the stack frame using push and pop instructions. In push 0x130, the LA inserts 0x130 into STACK. In assembly codes, the third parameter is represented as [ebp+0x10]. First, the LA searches [ebp+0x10] in the log data. Assembly code mov ebx, [ebp+0x10] shows that the address of the third parameter is saved in the ebx register, and then LA searches ebx and
[ebp+0x10], because the two values are related to the third parameter. In mov edi, [ebx], the WMF parser accesses the first byte of record data. The LA recognizes that the parser reads the first byte of the record for parsing and that this is the start position of a field. Next, LA finds ebx is in mov cx [ebx+0x4]. [ebx+0x4] represents that the parser accesses 5th byte of the record. The LA recognizes that the 5th byte is the start of the following field. After the LA analyzes all the instructions in the log, it extracts the fields based on the start positions of the fields. In Fig. 8, from 0 byte to 4th byte, it is a field, and its type is NT (number type) because its size is 4 bytes. In this case, the rest of the record data are divided into two fields such as 0x03 0x01 and 0x08 0x00. In the same way, the LA analyzes the WMF file format continuously. 5.2. Experiment results: fuzz testing for WMF file format We performed fuzz testing with 10 general WMF files using the BFAFI. First, we configured a rule as shown in Table 2 and analyzed fields of the WMF file format using the BFAFI. We focused on the PlayMetaFileRecord function because it is the entry function of the parsing engine for the WMF file format. Thus, the BFAFI traced the execution of PlayMetaFileRecord function. Automatic Field Analysis: BFA in the BFAFI analyzes target WMF file and divides data in file into various fields. The WMF file consists of the WMF header record and several WMF records [26]. Also, the WMF record consists of the record size field, record function field, and several parameter fields, as shown in Fig. 9. Fig. 10 shows the size of the accurately analyzed regions in records using the BFA in comparison with the real size of the record. The X-axis represents a function number of a record. The blue bar denotes the real record size and the red bar denotes the size of the accurately
Fig. 9. Windows MetaFile (WMF) file format.
Comparison of Real Record Size and Analyzed Record Size 60
50
Size of Field
40
30
20
0xFC
0xFB
0xFA
0xF0
0x32
0x2E
0x2D
0x27
0x25
0x24
0x1E
0x14
0x0C
0x0B
0x09
0x07
0x06
0x04
0x03
0x02
0
0x01
10
Function Number Real Record Size
Analyzed Record Size
Fig. 10. Comparison of real record size and analyzed region size of the record using BFA. The blue bar is the real record size and the red bar is the size of analyzed area of using BFA. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
266
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
analyzed region using the BFA. Therefore, the difference between the two bars represents the region that BFA cannot analyze. For instance, in case of an ExtTextOut (0x32) record, its real size is 26 bytes and the positions of the fields are ‘0 4 6 8 10 12’, as analyzed by the BFA. This analysis result shows that the BFA analyzed the region from the 0 byte to the 12th byte and could not analyze the region from the 12th byte to the end of the record. Therefore, the BFAFI divided the record into five NT fields and one BAT field. We assumed that the last BAT field was an error, because we did not know whether the field could be divided into NT fields or not. The BFA analyzed 17 records exactly among the 21 records that compose the 10 target WMF files. Partially analyzed records are: SetBkColor (0x01), SetTextColor (0x09), ExtTextOut (0x32), and CreateFontIndirect (0xFB). The accuracy of the analysis for the SetBkColor record is 40.0%, for the SetTextColor is 40.0%, for the ExtTextOut is 46.2%, and for the CreateFontIndirect is 41.1%. In this paper, our algorithm uses the pattern matching method, which recognizes fields based on information of the memory address that the parsing function accesses. So, our algorithm has a limitation for covering all cases and our experiment showed some errors that the BFAFI cannot analyze. However, though the BFAFI has some errors, it analyzed many records exactly. Fault-inserted File Generation: FIT in the BFAFI generates fault-inserted files using information of fields of the WMF file format that the BFA analyzed. Table 3 shows results analyzed by BFA and the number of fault-inserted files generated by the BFAFI and FileFuzz for each WMF file. In this paper, we use FileFuzz as a general fuzzer for comparison with BFAFI, because FileFuzz is the most representative file fuzzer. As mentioned above, we utilize only NT (number type) fields and inserted boundary values (B) into the fields. NT consists of two types: two bytes (NT2) and four bytes (NT4). We generated fault-inserted files by using the boundary val-
ues of the two types. Therefore, in four bytes, B includes f231 ; 215 ; 1; 0; 1; 215 1; 231 1g, and in two bytes, B includes f215 ; 1; 0; 1; 215 1g [27]. These values represent the boundary value of number. We did not insert fault data into BAT types. For instance, the number of fields in 1.wmf file is 17,259. In these fields, the number of NT fields is 17,232 (NT2 = 16,902, NT4 = 330) and the number of BAT is 27. Focusing on NT type fields, the total number of fault-inserted files is 86; 820 ð¼ 16; 902 5 þ 330 7Þ. Exception Analysis: We performed fuzz testing for Internet Explorer (IE) in Windows XP SP2 using fault-inserted files, as shown in Table 3. By opening the HTML document (
), we executed fuzz testing for the WMF parsing engine. We focused on the Abort failure among the CRASH scale categories [28]. The Abort failure is the state in which a program is terminated abnormally. In the Windows system, this failure is related to an Access Violation (AV) exception, which can make a security hole. We found various exceptions in the graphic rendering engine (GDI32.dll) and the system library (kernel32.dll) of the system. We named the exceptions EX1, EX2, . . . , EX11. Fig. 11 compares the number of exceptions found by the BFAFI and FileFuzz. The results are based on the fuzz testing with the same number of fault-inserted files, which are generated by the BFAFI and FileFuzz, respectively, to enable exact comparison. In the case of 1.wmf, we performed fuzz testing with 86,820 fault-inserted files for the BFAFI and with randomly selected 86,820 faultinserted files for FileFuzz. As shown in Fig. 11, the BFAFI found approximately 14 times more exceptions than FileFuzz with the same number of fault-inserted files and there are many exceptions found only by the BFAFI. For instance, while FileFuzz found only 43 exceptions for EX1 in 1.wmf file, the BFAFI found 340 exceptions. Next, we compared the number of causes of exceptions which are found by the BFAFI and FileFuzz respectively. As shown in
Table 3 Analysis results from BFA and the number of fault-inserted files generated by BFAFI and FileFuzz. NT2 means two byte size of NT and NT4 means four byte size of NT. (Field ¼ NT2 þ NT4 þ BAT; BFAFI ¼ NT2 5 þ NT4 7). Target files
Analyzed result by BFA
Number of fault-inserted files
File name
File size (KB)
Field
NT2
NT4
BAT
BFAFI
FileFuzz
l.wmf 2.wmf 3.wmf 4.wmf 5.wmf 6.wmf 7.wmf 8.wmf 9.wmf 10wmf
35 3 14 6 8 1 9 37 4 8
17,259 766 6212 2566 3113 149 4133 18,265 1294 3404
16,902 476 5722 2163 2666 80 4029 18,112 1069 3070
330 202 462 367 421 31 77 126 169 298
27 88 28 36 26 38 27 27 56 36
86,820 3,794 31,844 13,384 16,277 617 20,684 91,442 6,528 17,436
9,175,040 786,432 3,670,016 1,572,864 2,097,152 262,144 2,359,296 9,699,328 1,048,576 2,097,152
(a)
(b)
Fig. 11. Total number of exceptions found by the BFAFI (a) and FileFuzz (b).
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
267
parsing module and knew that the PlayMetaFileRecord function was related to the WMF parsing module. If we know the parsing functions for various file formats, we can divide the file data into fields. However, analyzing a parsing module is tedious and time consuming. In order to overcome this limitation, we have been developing the extended BFAFI, which traces all functions of a parsing module and automatically reconstructs control flow of parsing functions. That way, the BFAFI is able to perform the whole process of file fuzz testing while automatically considering the file format. Fig. 12. Number of causes of exception found by the BFAFI and FileFuzz. I is the case of the BFAFI and II is the case of FileFuzz.
Table 4 Causes of exceptions found by the BFAFI. It shows position of assembly code in memory and assembly codes that causes exceptions. Name
Position
Assembly code
EX1 EX2 EX3
GDI32!PlayMetaFileRecord+0xcfa GDI32!PlayMetaFile+0x262 GDI32!PlayMetaFileRecord+0x2bd
EX4
GDI32!PlayMetaFileRecord+0x130
EX5
GDI32!PlayMetaFileRecord+0x458
EX6
GDI32!PlayMetaFileRecord+0x712
EX7
GDI32!PlayMetaFileRecord+0x59f
EX8 EX9 EX10 EX11
GDI32!PlayMetaFileRecord+0xda3 kernel32!MultiByteToWideChar+0x7e7 GDI32!PlayMetaFileRecord+0xd95 GDI32!PlayMetaFileRecord+0x56b
movzx ecx,word ptr [eax] movsx ecx,word ptr [ebx] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] rep movs dword ptr es:[edi],dword ptr [esi] movsx ecx,word ptr [ebx+eax2-4] movsx ebx,word ptr [edx] movzx ecx,byte ptr [edi] movsx ebx,word ptr [edx] movsx edx,word ptr [ebx]
Fig. 12, the BFAFI found more causes of exception than FileFuzz. There are five causes of exceptions that are found only by the BFAFI: EX3, EX5, EX8, EX9, and EX10. In fuzz testing, finding causes of exceptions depends on the test case coverage of the fuzzer. Thus, we are able to find that the BFAFI can maximize test case coverage with the same number of fault-inserted files. Last, we analyzed the causes of exception that the BFA found by debugging the exceptions. Using symbol files provided by Microsoft, we found the address of memory and positions in modules where the exceptions are generated as shown in Table 4. Ten of 11 exceptions are generated by a graphic rendering engine (GDI32.dll), and the other is generated by the system library (kernel32.dll), as processing for a string in a graphic rendering engine affects the exception. In Table 4, the position column represents memory position of assembly code, which causes exceptions. This assembly code is located on the same row in the assembly code column. Because Microsoft products are not open source software, we analyzed the exceptions only at the assembly code level. All exceptions are generated by wrong memory access when the code reads data in memory. In the case of EX1, eax register accesses memory address that operating system did not allocate and eax causes an Access Violation (AV). AV is an exception related to memory in Windows OS. 6. Limitation and future work We proposed a methodology that extracted field information of a file format by tracing its parsing functions. The limitation of our method is that the parsing function for file format should be identified in advance. In our experiments, we manually analyzed the
7. Conclusion The algorithm we propose successfully reduces the number of fault-inserted files with maximum test case coverage. It performs automatic analysis of binary file formatted fields and inserts fault data into the binary file based on its format in the fields. To do so, we trace and analyze stack frames and assembly codes while the software system reads the binary file. To evaluate our method, we applied it to the WMF file format and employed a debugger called the Binary File Analyzer and Fault Injector (BFAFI). This practical tool performs the trace as the program executes; it also analyzes fields of the WMF file automatically. In the field analysis experiment, the BFA analyzed exactly 17 records out of 21 records, which composed 10 target WMF files. Also, the BFAFI reduced the total number of fault-inserted files with maximum test case coverage. The BFAFI generated 86,820 fault-inserted files with a 35 KB WMF file; this was only 0.95% (=86,820/9,175,040) of the files required by FileFuzz. Exception analysis experiments showed that the BFAFI was capable of finding about 14 times more exceptions than FileFuzz, while needing the same number of fault-inserted files. Further many of the exceptions were detected only by the BFAFI. Also, the BFAFI captured 11 causes of exceptions, five of which were captured only by the BFAFI. Based on the fact that a fuzzer should have good test coverage to find causes of exceptions well, we demonstrated that the BFAFI was able to maximize test case coverage with the same number of fault-inserted files. In other words, if we want to get the same number of exceptions and causes of exceptions as the BFAFI by using a general fuzzer, a larger number of fault-inserted files are required. Among 11 causes of exceptions, 10 were found in the graphic rendering engine (GDI32.dll) and one in the system library (kernel32.dll) of Windows XP SP2. References [1] K.R. Van Wyk, Software security: integrating security tools into a secure software development process, in: 19th Annual FIRST Conference, 2007. [2] A. Takanen, J.D. Demott, C. Miller, FUZZING for Software Security Testing and Quality Assurance, Artech House, 2008. [3] FILExt – The File Extension Source,
. [4] C. Petzold, Programming Windows, fifth ed., Microsoft Press, 1998. pp. 1097– 1170. [5] B.P. Miller, L. Fredrikson, B. So, An empirical study of the reliability of unix utilities, Communication of the ACM 33 (12) (1990) 32–44. [6] M. Sutton, A. Greene, P. Amini, Fuzzing: Brute Force Vulnerability Discovery, Addison-Wesley, 2007. [7] J.M. Voas, G. McGraw, Software Fault Injection: Inoculating Programs Against Errors, John Wiley and Sons, 1998. [8] P. Oehlert, Violating assumptions with fuzzing, IEEE Security and Privacy Magazine 3 (2) (2005) 58–62. [9] J.E. Forrester, B.P. Miller, An empirical study of the robustness of windows NT applications using random testing, in: 4th USENIX Windows System Symposium, August 2000. [10] B.P. Miller, G. Cooksey, F. Moore, An empirical study of the robustness of MacOS applications using random testing, in: International Symposium on Software Testing and Analysis, July 2006. [11] FILEFUZZ, . [12] M. Sutton, A. Greene, The art of file format fuzzing, in: Blackhat USA Conference, 2005. [13] SPIKEfile, .
268
H.C. Kim et al. / Journal of Systems Architecture 57 (2011) 259–268
[14] D. Aitel, The Advantage of Block-based Protocol Analysis for Security Testing, Immunity Inc., 2002. [15] PROTOS, . [16] M. Vuagnous, Autodafe: an act of software torture, in: 22nd Chaos Communication Congress, 2005. [17] J. Caballero, P. Poosankam, C. Kreibich, D. Song, Bidirectional Protocol Reverse Engineering: Message Format Extraction and Field Semantics Inference, Technical Reports (EECS-2009-57), 2009. [18] P.M. Comparetti, G. Wondracek, C. Kruegel, E. Kirda, Prospex: protocol specification extraction, in: IEEE Symposium on Security and Privacy, 2009. [19] W. Cui, J. Kannan, H.J. Wang, Discoverer: automatic protocol reverse engineering from network traces, in: 12th USENIX Security Symposium, 2007. [20] J. Caballero, D. Song, Polyglot: automatic extraction of protocol format using dynamic binary analysis, in: 14th ACM Conference on Computer and Communications Security, 2007. [21] W. Cui, M. Peinado, K. Chen, H.J. Wang, L.I. Briz, Tupni: Automatic reverse engineering of input formats, in: 15th ACM Conference on Computer and Communications Security, 2008. [22] Z. Lin, X. Jiang, D. Xu, X. Zhang, Automatic protocol reverse engineering through context-aware monitored execution, in: 15th Symposium on Network and Distributed System Security, 2008. [23] G. Wondracek, P.M. Comparetti, C. Kruegel, E. Kirda, Automatic network protocol analysis, in: 15th Symposium on Network and Distributed System Security, 2008. [24] IA-32 Intel Architecture Optimization Reference Manual, Intel Corporation, June 2005. [25] Debugging Tools for Windows, . [26] Microsoft Corporation, Windows Metafile Format Specification, Microsoft Developer Network Library, 2009. [27] M. Schmid, F. Hill, Data generation techniques for automated software robustness testing, in: 6th International Conference on Testing Computer Software, 1999.
[28] P. Koopman, J. Sung, C. Dingman, D. Siewiorek, T. Marz, Comparing operating systems using robustness benchmarks, in: 16th Symposium on Reliable Distributed Systems, 1997. Hyoung Chun Kim is currently a senior research engineer in the Attached Institute of Electronics and Telecommunications Research Institute. His research interests include software security, intrusion detection systems, and network security. He received his B.Sc. and M.Sc. degrees in computer science from Korea University, Korea, in 1999 and 2001, respectively. Young Han Choi is currently a research engineer in the Attached Institute of Electronics and Telecommunications Research Institute. His research interests include software security, intrusion detection systems, and operating system. He received his B.Sc. and M.Sc. degrees in electronic engineering from Hanyang University and Korea Advanced Institute of Science and Technology, Korea, in 2002 and 2004, respectively.
Dong Hoon Lee holds the Professor and the Director of Academic Affairs of Graduate School of Information Security of the Korea University at Seoul in Korea. His research interests include information security, cryptology, and ubiquitous security. He received a B.S. (1985) of Economics at the Korea University and M.Sc. (1988) degrees from the University of Oklahoma, and a Ph.D. (1992) from the University of Oklahoma.