An abstract semantics tool for secure information flow of stack-based assembly programs

An abstract semantics tool for secure information flow of stack-based assembly programs

Microprocessors and Microsystems 26 (2002) 391–398 www.elsevier.com/locate/micpro An abstract semantics tool for secure information flow of stack-bas...

1MB Sizes 0 Downloads 20 Views

Microprocessors and Microsystems 26 (2002) 391–398 www.elsevier.com/locate/micpro

An abstract semantics tool for secure information flow of stack-based assembly programs C. Bernardeschi*, N. De Francesco, G. Lettieri Department of Information Engineering, University of Pisa, Via Diotisalvi 2, 56126 Pisa, Italy Received 27 March 2002; revised 7 July 2002; accepted 18 July 2002

Abstract We present a tool supporting the verification of programs written in stack-based assembly language against the secure information flow property. First, the tool builds the transition system, which corresponds to an abstract execution of the program, embodying security information and abstracting from the actual values. Then the states of the abstract transition system are checked to detect the satisfaction of the secure information flow property. The tool offers a windows user interface, through which the user can control the verification process, and observe the intermediate and final results. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Secure information flow; Formal verification; Abstract interpretation; Stack based assembly code

1. Introduction Security of computer systems requires that information is only manipulated respecting some specified security policy. The access control mechanisms, which are normally used to protect data, can be in some cases unadequate. In fact they help to prevent information release, but they do not control information propagation. For example, we may want to allow an untrusted program to read and manipulate private data, but not to distribute them. However, an access control mechanism will either prevent the program from accessing the data, or impractically restrict its communications. Therefore the need has been recognized for more flexible mechanisms, by means of which the flow of data can be analyzed, and communication can be allowed as long as sensitive data are not leaked. In this paper we adopt a mechanism based on the secure information flow property. The secure information flow property of programs in systems based on a multilevel security policy requires that information at a given security level does not flow to lower levels [1,7,13,14]. Let us suppose to have two levels, low and high, used to model public and private data, respectively. A program, with variables partitioned into two disjoint sets corresponding to high and low security, has * Corresponding author. Tel.: þ 39-50-568511; fax: þ 39-50-568522. E-mail address: [email protected] (C. Bernardeschi).

secure information flow if observations of the final value of the low security variables do not reveal any information about the initial values of the high security ones. Assume y is a high security variable and x a low security one. Examples of violation of secure information flow are: (1) x U y and (2) if y ¼ 0 then x U 1 else x U 0. In the first case there is an explicit information flow from y to x, while, in the second case there is an implicit information flow: in both cases, checking the final value of x reveals information on the value of the higher security variable y. The problem of security leakages has been extensively studied for programs written in structured high level languages [1,4,5,18,20]. However, it is often the case that the source code of programs is not available. For example, when an applet is sent over the Internet, the bytecode of the applet is transmitted and remotely executed [22]. To protect end-users from hostile applets that try to leak private information, the secure information property must be investigated directly on the bytecode of the downloaded applet. Java bytecode is an example of stack-based assembly language. The secure information flow in programs written in stack-based assembly languages differs from the case of high level languages in many aspects. In a high level language explicit flow of information occurs with the assignment statement: the security level of the information stored in the variable on the left-hand side of the assignment

0141-9331/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 4 1 - 9 3 3 1 ( 0 2 ) 0 0 0 6 4 - 9

392

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Fig. 1. The instruction set.

can be statically deduced by the variables occurring in the expression on the right-hand side. In stack-based assembly code, instead, assignment instructions pop their arguments off the stack. The assigned value is pushed onto the stack by a different instruction. Therefore the security level of a value cannot be deduced syntactically by examining the code. Moreover, in high-level languages the scope of the implicit flow caused by the condition of branching or repetitive statements can be easily derived, since it coincides with the syntactic scope of the command itself. Since assembly languages are unstructured, jumps may go to any program point, making more complicated to find the scope of implicit flows. Finally, also the operand stack is influenced by the implicit flows, because the stack may be manipulated in different ways by the branches of a conditional instruction: they can perform a different number of pop and push operations, and with a different order. In this paper, we present a tool for checking secure information flow in assembly code, based on abstract interpretation. Abstract interpretation [11,12,15,19] is a method for analyzing programs in order to collect approximate information about their run-time behavior. It is based on a non-standard semantics that is, a semantic definition in which a simpler (abstract) domain replaces the standard (concrete) one, and the operations are interpreted on the new domain. Using this approach different analyses can be systematically defined. Moreover, the proof of the correctness of the analysis can be done in a standard way. The tool, named abstract semantic tool (AST), relies on the work of some of the authors and others [6,8]. In these papers, a concrete semantics is defined which handles, in addition to execution aspects, the flow of information among variables. Then an abstract semantics is given for the language, embodying only the security information and abstracting from the actual values. Both semantics are

operational and the abstract semantics is obtained by a suitable redefinition of the domains on which the concrete one operates. The tool applies the abstract operational semantics of the language to build an abstract transition system (state graph) of the program, which approximates the concrete semantics of all possible executions in a finite way. Then the secure information flow property is expressed in terms of conditions on the states of this transition system. The states of the abstract transition system are checked to detect the satisfaction of the property. The paper is organized as follows: Section 2 presents the language, the security model and the basis of the verification method. Section 3 introduces the tool and its application to an example. Section 4 concludes the work.

2. Secure information flow by abstract interpretation We consider a simple assembly language with an operand stack, a memory containing the local variables, simple arithmetic instructions and conditional/unconditional branches. The instruction set is shown in Fig. 1, where x ranges over a set var of local variables and op ranges over a set of binary arithmetic operations (add,sub,…). A program P is a sequence of instructions, P ¼ kcl; numbered starting from 0. We denote by c½i the ith instruction of the program. From now on, we also say that instruction c½i is at address i, i [ N: We assume that a program is always executed starting from the instruction c[0] and with an empty operand stack. Moreover, we assume no stack overflow and underflow, and jumps made only to target addresses inside the program. We give the standard semantics of the language in terms of a transition system [17]. A state of the transition system is given by the triple kPC; MEM; STACKl; where PC is the program counter, MEM is the memory representing the current state of the local variables of the program and STACK is the current state of the operand stack (l indicates an empty stack). The transition between states occurs by executing an instruction. Consider the assembly program in Fig. 2(a). It corresponds to the high level construct: if y ¼ 0 then x U 1 else x U 0. Fig. 2(b) shows the operational semantics of the program when the initial memory is MEMðxÞ ¼ 5 and MEMðyÞ ¼ 1: Starting from the initial state k0; MEM; ll; the instruction load y at address 0 is executed. The new state reached is k1,MEM,1l where MEMðyÞ has been pushed onto the operand stack and PC is equal to 1. After, the true branch of the if instruction is executed. In the final state MEMðxÞ ¼ 0 and MEMðyÞ ¼ 1: 2.1. The security model

Fig. 2. (a) The program P. (b) Its standard semantics.

To model secure information flow, we assume a finite set L of security levels, ranged over by s; t; …; totally ordered

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Fig. 3. The concrete transition system for P.

by h . Given s; t [ L; by s e t we denote the higher of s and t. A program P is defined as a pair P ¼ kc; Ll where each variable of the program is assigned a security level. L is a partition of the variables of the program according to their security level: ;s [ L; Ls denote the set of variables with security level s. We denote by Lhs ; the set of variables with level less or equal to s: Lhs ¼
393

security environment which models the open implicit flows. Enriched values are pairs v ¼ ðk; sÞ; where k is a constant and s [ L is denoted as the security level of v. We use the control flow graph of the program and the notion of immediate postdomination to handle implicit flows. The control flow graph is a directed graph that contains the control dependencies among the instructions of the program [3]. The nodes of the graph correspond to the program instructions and the graph contains an edge from node i to node j if and only if the instruction at address j can be immediately executed after that at address i. A node j immediately postdominates the node i, denoted by j ¼ ipdðiÞ if j is the first node on every path starting from i. Given a branching instruction at address i, ipdðiÞ is the first instruction not affected by the implicit flow, since it represents the point in which the different branches join. We use a stack to handle the implicit flows opened by nested conditional instructions. A state of the transition system is a 5-uple kENV; PC; MEM; STACK; IPDl; where ENV is a security level, representing the security environment, PC, MEM and STACK are the program counter, the concrete memory and the concrete operand stack, respectively, and IPD is a stack used to handle implicit flows. Given an initial memory MEM, in the initial state of the concrete transition system, ENV is set to the lowest security level, PC is equal to the address of the first instruction, MEM is the given memory, STACK and IPD are empty. The environment is possibly upgraded when a branching instruction is executed and downgraded when the scope of the implicit flow of such instruction terminates. Consider an if instruction at address i. If this instruction is executed under the security environment s and the value on top of the operand stack is ðk; tÞ; then the environment is upgraded to s e t and the couple ðipdðiÞ; sÞ is recorded in the IPD stack. In fact, the choice of the branch to be executed depends both on the already open implicit flows (s ) and on the condition of the if, that is the top of the operand stack. In the semantics, on entering an implicit flow, the security level of each value present in the operand stack is upgraded, to take into account the fact that the stack may be manipulated in different ways by the two branches. When the instruction at address ipdðiÞ is executed, (PC ¼ ipdðiÞ and IPD ¼ ðipdiðiÞ; sÞIPD0 ), the environment is reset to the one holding before entering the branching instruction (s ). Then the instruction c½PC is executed. In fact, the instruction c½PC is not in the scope of the if instruction. Since we consider sequential programs only, the concrete transition system has only one, possibly infinite, path. Assume L ¼ {l; h}; with l the low security level and h the high level. Consider the program in Fig. 2 with Ll ¼ {x} and Lh ¼ {y}. The concrete semantics of the program, when in the initial memory x holds 5 and y holds 1, is the concrete transition system shown in Fig. 3. Note that the execution of c½1 ¼ if 4 upgrades ENV to h and pushes (5,l ) onto the IPD stack. When the instruction at address 5 is

394

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Theorem 2. Let P ¼ kc; Ll and s [ L: An abstract memory MEM \ is s -safe for P if ;x [ Lhs : MEM\ ðxÞ h s: Consider the abstract transition system for P. If, for each final state kENV; PC; MEM\ ; STACK\ ; ll of the transition system it holds that MEM\ is s-safe for P, then P is s-secure.

Fig. 4. The abstract transition system for P.

executed; first the environment is downgraded to l, then the instruction is executed. The purpose of abstract interpretation (or abstract semantics) is to correctly approximate the concrete semantics of all executions in a finite way. The first step in the construction of the abstract semantics is the definition of the abstract domains. In particular, in our abstract semantics each concrete value ðk; sÞ; composed of a pair of a value and a security level, is approximated by considering only its security level (s ). The abstract transition system of the program is defined in the same way as for the concrete semantics, but with the abstract states. We indicate abstract memories with MEM\ and abstract stacks with STACK\. Due to the loss of precision of data, when dealing with conditional or iterative commands, the abstract transition system has multiple execution paths. The abstract states are analogous to the concrete ones, but with values substituted by their security levels. For example, the abstract state corresponding to kh; 5; ½ð5; lÞ; ð1; hÞ; ð0; hÞ; ð5; lÞl in Fig. 3 is: kh; 5; ½l; h; h; ð5; lÞl: Let P ¼ kc; Ll: The initial state of the abstract transition system is: ks0 ; 0; MEM\0 ; l; ll where s0 is the lowest security level [ L and MEM\0 is the abstract memory for P such that ;s [ L; ;x [ Ls : MEM\0 ðxÞ ¼ s: Theorem 2 states the adequacy of the abstract semantics to characterize secure information flow.

Theorem 2 is the basis of our security checking methodology: proving s-security of a program P ¼ kc; Ll can be done by building the abstract transition system starting from an initial state with the lowest environment in L and in which each variable of the program is assigned its security level as specified by L. After, final states are examined for s-safety of memories. Fig. 4 shows the abstract transition system of the program in Fig. 2. Note that there are two possible paths starting from the branching instruction if. The abstract transition system does not satisfy l-security. In the final state, the low variable x is assigned a h security level.

3. The tool AST supports the secure information flow property verification for the assembly language introduced in Section 2. Its purpose is to check the s-security of a program, with respect to a given assignment of security levels to the variables used in the program. The tool offers a windows user interface, through which the user can control the verification process, and observe the intermediate and final results. AST is composed of the following building blocks (Fig. 5): IPD calculator, transition system engine, state repository and state checker. First, the IPD calculator finds the ipd of all the if instruction occurring in the input program. The ipd is calculated by inspecting the control flow graph of the program, which is built by statically examining the program. Then, the transition system engine starts building the abstract transition system of the input program state, using the information obtained by the IPD calculator and the information about the security levels of the variables, provided by the user. The generated states are kept in the

Fig. 5. Building blocks of the abstract semantic tool.

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Fig. 6. File example.jvm.

state repository, where they are later retrieved by the state checker. The state checker receives a safe level s as input, and checks the s-safety of the system states, with respect to the given assignment of security levels to the program variables. If s-unsafe states are found in the final states, the input program is not s-secure. Inputs to AST must be provided in a text input file. As an example, Fig. 6 shows the input file used to check the ssecurity of the program introduced in Fig. 2. The file starts with a number of directives, followed by the program to be checked. The #safeLevel directive specifies the safe

395

level, while each #variableLevel directive assigns a security level to a variable used in the program. Note that security levels are identified by natural numbers, with a higher number representing a higher security level. Fig. 7 shows the windows interface of AST with respect to the input file shown in Fig. 6. The code window shows the code that is currently being checked. The states window contains a line for each state generated so far. Each line contains the information that completely identifies a state. Note that the lines describing the states contain only references to a memory, an operand stack and an ipd stack, while the actual contents of these items are shown in separate windows. In fact, it is common for several states to have equal contents in at least one of these items. AST reduces memory occupation by storing only distinct memories, operand stacks and ipd stacks. The last window is the results window, which shows a number of items, such as the number of states generated and the number of states processed. The most important information is the number of final-unsafe states. A value greater than zero in this field indicates that the program is not s-secure. The user interface also offers a control panel. Using the control

Fig. 7. User interface of the abstract semantic tool.

396

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Fig. 8. Verification of the input file of Fig. 6.

panel, the user can start, pause and stop the verification process. It is also possible to observe the verification process step-by-step. Fig. 8 shows the contents of the windows when the verification process is done. We can see that the results window reports the existence of an unsafe final state. The unsafe memory corresponding to the unsafe final state is shown in the Memory window. Here, we can see that the low variable x has acquired a high information. The complete state of the tool can be saved to a file, to allow further processing by other tools, or to stop the verification process and restore it a later time.

Fig. 9. The library decryption procedure (pseudocode).

3.1. An example In the following, we show an example of application of the tool to the library decryption procedure, taken from Ref. [23]. A pseudocode for the procedure is shown in Fig. 9. The procedure decrypts an encrypted character array cypher, using a secret key and storing the result in a separate clear text array clear. It also calculates the cost of doing the decryption, and stores it in a variable charge. Before applying AST, a security level must be assigned to the variables. In a complete application scenario, security levels are properties of the input and output of the program. Typically, a program manipulates information, normally obtained from some file on the host, or some other input channel (keyboard,…), producing some output on some file or some other output channel (video,network,…). Suppose that these channels, besides being characterized by access rights, are also assigned a secrecy level. For example, a file f can have read access right, but it can be required that the information it holds must not be transferred into a public file. In this case, the data read from this file will be private, i.e. they will be considered as having a high secrecy level.

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398

Fig. 10. Output file produced by AST (excerpt).

The assignment of a secrecy level to the variables of the program will be given taking into account the variables onto which the input is stored and from which the output is taken. In our example, the variables onto which file f or some part of it is stored will be assigned a high security level, while the variables whose value is output on a public file will be assigned a low security level. Applying the AST tool to the program will check if some information held by file f is propagated onto some public file. In the decryption procedure example, the encrypted text will be stored in a public file, the secret key, and the clear text will be read and written to private files, and the final value of charge will be written to a public file. Therefore, the encrypted array and the charge variable have a low security level, while the key variable and the clear text array have a high security level. The question is if, by examining the final value of the charge variable, one can deduce information on the value of the key or of the clear text array. The code for the example consists of 157 instructions, 20 of which are if instructions. The tool correctly reports that the program is safe. Fig. 10 shows some of the information contained in the output file produced by AST for the library decryption program. Note that 222 states have been generated. However, only one memory, seven operand stacks and five ipd stacks have actually been generated. The [states] section contains all the information required to build the abstract transition system. This information could be used by other tools to create a visual representation of the state graph, or to find a path leading from the initial state to a final-unsafe state.

4. Related work and conclusions Nowadays, the problem of security leakages for assembly languages is of great importance since code downloaded by the web is often assembly code. Different techniques for the analysis of the assembly code are defined in the literature, but all of them check safety of machine code, where safety means correct typing, no stack underflow or overflow, jumps only to target addresses inside the code,

397

and so on. In Ref. [16] a typed stack based assembly language is defined, while Ref. [21], defines a type system for checking Java bytecode. In Ref. [2] the proof carrying code approach is described, where safety is proved using a high order logic. In Ref. [25] the machine program is annotated with some initial information on the machine on which the code must be executed and correctness properties are derived by means of an abstract execution of the program. Our approach allows the verification of code written in a simple stack-based assembly language with respect to security leakages caused by the information flow. In particular, AST is a tool supporting automatic secure information flow property verification. The tool is based on an abstract interpretation approach of the operational semantics of the language. Before applying verification to a (downloaded) code, a high security level is assigned to the variables storing private input or output data and a low security level to the variables storing public input or output data. To the best of our knowledge, the only work on secure information flow in assembly code is presented in Ref. [9]. The work is based on model checking and enables a smart card issuer to verify that a new applet securely interacts with already downloaded applets. The work concentrates on applet interfaces. We are extending our assembly language by including subroutine calls and objects. This will allow the application of our verification approach directly on downloaded Java applets before their execution on the local host.

Acknowledgements Special thanks go to the anonymous reviewers for their suggestions. We thank Dr Massimo Grasso, who participated in the development of the tool during his master thesis.

References [1] G.R. Andrews, R.P. Reitman, An axiomatic approach to information flow in programs, ACM Trans. Program. Lang. Syst. 2 (1) (1980) 56–76. [2] A.W. Appel, A.P. Felty, A Semantic Model of Types and Machine Instructions for Proof-Carring Code, ACM SIGPLAN Conference on Programming Language Design and Implementation Proceedings, 2000, pp. 243–253. [3] T. Ball, What’s in a region? or computing control dependence regions in near-linear time for reducible control flow, ACM Lett. Program. Lang. Syst. 2 (1–4) (1993) 1–16. [4] J. Banatre, C. Bryce, D.L. Me´tayer, Compile-time detection of information flow in sequential programs, LNCS 875 (1994) 55 –73. [5] R. Barbuti, C. Bernardeschi, N. De Francesco, Abstract interpretation of operational semantics for secure information flow, Inform. Process. Lett. 83 (2) (2002) 101– 108. [6] R. Barbuti, C. Bernardeschi, N. De Francesco, Checking Security of

398

[7]

[8]

[9]

[10]

[11] [12] [13] [14] [15]

[16]

[17]

[18] [19]

[20]

[21] [22] [23]

C. Bernardeschi et al. / Microprocessors and Microsystems 26 (2002) 391–398 Java Bytecode by Abstract Interpretation, The 17th ACM Symposium on Applied Computing: Special Track on Computer Security Proceedings, Madrid, 2002, March. D.E. Bell, L.J. La Padula, Secure computer systems: mathematical foundations and model. Technical Report M74-244, MITRE Corporation, Bedford, MA, 1973. C. Bernardeschi, N. De Francesco, Combining Abstract Interpretation and Model Checking for Analyzing Security Properties of Java Bytecode, Third International Workshop on Verification, Model Checking and Abstract Interpretation Proceedings, Venice, 2002, January. P. Bieber, J. Cazin, P. Girard, J-L. Lanet, V. Wiels, G. Zanon, Checking Secure Interactions of Smart Card Applets, ESORICS Proceedings 2000, 2000. P. Cousot, R. Cousot, Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints, Fourth Annual ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages Proceedings, Los Angeles, CA, 1977, pp. 238–252. P. Cousot, R. Cousot, Abstract interpretation frameworks, J. Logic Comput. 2 (1992) 511– 547. P. Cousot, R. Cousot, Inductive Definitions, Semantics and Abstract Interpretations, ACM POPL’92 Proceedings, 1992, pp. 83–94. D.E. Denning, A lattice model of secure information flow, Commun. ACM 19 (5) (1976) 236 –243. D.E. Denning, P.J. Denning, Certification of programs for secure information flow, Commun. ACM 20 (7) (1977) 504 –513. N.D. Jones, F. Nielson, Abstract interpretation: a semantic based tool for program analysis, in: S. Abramsky, D.M. Gabbay, T.S.E. Maibaum (Eds.), Handbook of Logic in Computer Science, vol. 4, Oxford University Press, Oxford, 1995, pp. 527 –636. G. Morrisett, D. Walker, K. Crary, N. Glew, From system f to typed assembly language, ACM Trans. Program. Lang. Syst. 21 (3) (1999) 527–568. G.D. Plotkin, A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University, 1981. A. Sabelfeld, D. Sands, A per model of secure information flow in sequential programs, LNCS 1576 (1996) 40–58. D.A. Schmidt, Abstract Interpretation of Small-Step Semantics, Fifth LOMAPS Workshop on Analysis and Verification of Multiple-Agent Languages Proceedings, 1996. G. Smith, D. Volpano, Secure Information Flow in a Multi-Threaded Imperative Language, Proceedings 25th Annual ACM SIGPLAN– SIGACT Symposium on Principles of Programming Languages, San Diego CA, 1998, pp. 1–10. R. Stata, M. Abadi, A type system for Java bytecode subroutine, ACM Trans. Program. Lang. Syst. 21 (1) (1999) 90–137. T. Lindholm, F. Yellin, The Java Virtual Machine Specification, Addison-Wesley, Reading, MA, 1996. D. Volpano, C. Irvine, Secure flow typing, Comput. Security 16 (2) (1997) 137–144.

[24] D. Volpano, G. Smith, C. Irvine, A sound type system for secure flow analysis, J. Comput. Security 4 (3) (1996) 167– 187. [25] Z. Xu, B.P. Miller, T. Reps, Safety Checking of Machine Code, Proceedings ACM SIGPLAN Conference on Programming Language Design and Implementation, 2000, pp. 70–82.

Cinzia Bernardeschi received the Laurea degree in Computer Science in 1987 and the PhD degree in 1996, both from the University of Pisa, Pisa, Italy. She is an associate professor with the Department of Information Engineering of the University of Pisa. Since 1990 she has been working on formal methods and dependable systems. Her current research interests are in the verification of security properties of embedded systems.

Nicoletta De Francesco received the Laurea degree in Computer Science in 1974 from the University of Pisa, Pisa, Italy. She is the professor with the Department of Information Engineering of the University of Pisa. Her research interests include formal methods for description and analysis of concurrent and distributed systems, modal and temporal logics, automatic verification and system security.

Giuseppe Lettieri received the Laurea degree, with laude, in Computer System Engineering in 1998 and the PhD degree, in 2001, both from the University of Pisa, Pisa, Italy. Since 2001 he has a research position with the Department of Information Engineering of the University of Pisa. His research interests include operating systems and programming languages.