C. R. Rao, ed., Handbook of Statistics, Vol. 9 © 1993 Elsevier Science Publishers B.V. All rights reserved.
J
Programming Languages and Systems S. Purushothaman and J. Seaman
1. Introduction
In this chapter we will provide a bird's eye view of the organization, use and effectiveness of computer systems. A computer can be viewed as a machine that performs calculations when given appropriate instructions. Thus, the task of programming a computer can be informally described as providing the computer with instructions to perform calculations that will solve a given problem. The difficulty is that the computer can perform only a small set of operations, and only a small set of extremely specific commands can be used to induce the computer to carry out any of these operations. Thus the programmer must be very specific in developing this sequence of instructions, or program, and consequently the task of programming a computer is extremely error-prone. For instance, in programming a missile system, a programmer once used the code (in FORTRAN)
DO 50 I=I. I00 50
CONTINUE
instead of
DO 50 I=I, i00 50
CONTINUE
where the first statement was intended to instruct the computer to repeat a series of instructions, but the programmer accidentally used a period instead of a very important comma. The resulting command altered the contents of a memory location and failed to commence the repetition of instructions as intended. This mistake almost led to a missile being fired unintentionally. A number of such instances have been reported in the literature wherein bugs have been the cause of huge losses. Considering that errors in programming can lead to catastrophes, it is 141
142
S. Purushothaman and J. Seaman
imperative that programs be correct. Unfortunately, there is no easy way of verifying that a given program does what the programmer intended it to do. However, it is possible to create tools that aid in the development of programs that are correct. Thus the question 'How can programs be verified to be correct?' should be rephrased as 'how can computer systems be organized such that the task of programming them correctly becomes easy?' The answer to this question has been to build abstractions. Figure 1 illustrates the abstractions that are built on top of a bare machine to make it easier to use. Consider the situation before 1950 when using a computer meant programming it in machine language, which consists of mnemonics corresponding to the actual set of instructions that are understood directly by the machine. For instance, evaluating (a + b) 2, typically involved a sequence of instructions of the following form 1
Load Load Add Mul
a, RI b, R2 RI, R2 RI, R1
where the first two instructions load the contents of two m e m o r y locations denoted by a and b into appropriate registers, the third instruction adds these values, storing the result in register R 1, and the fourth instruction squares the sum and leaves the result in the register R 1 (values stored in m e m o r y locations must first be loaded into registers in order to perform calculations on them). If expressing such a simple calculation necessitates such a large n u m b e r of instructions, the reader can imagine the effort required to come up with instructions for, say, programming Karmarkar's algorithm for linear programming. In addition to managing such a great number of instructions, a p r o g r a m m e r
ARCHITECTURE
OPERATINGSYSTEMS PROGRAMMING
ENVIRONMENT
Fig. 1. Ojae can.also think of these as instructions for, say, a hand-held calculator.
Programming languages and systems
143
also had to be familiar with all of the details of the instructions concerning the hardware components of the system, in order to program them correctly. Each device has its own very specific instructions regarding its operation and had to be carefully programmed in order to get the desired result. For example, if a programmer wanted to print some data out on paper, he/she would have had to know all the specific instructions necessary first to transfer the data from where it was stored to the appropriate device and then to cause the printer to begin outputting that data. These instructions would include many device specific instructions involving buffers, flags, registers, block sizes, status bits, etc. The need to keep track of all of this information prevented the programmer from being able to concentrate entirely on solving problems which are computational in nature, and have nothing to do with device-specific instructions. Fortunately, many developments have been made in the past forty years which allow programmers to work in environments that promote the development of correct programs. This improvement is due to a combination of factors, which include technological innovations, development of high level programming languages, development of operating systems, etc. Though the developments have been in disparate areas, advances in programming languages, architectures and operating systems have combined to make computer programming a reliable and profitable pursuit.
2. Programming languages As stated above, programming a computer initially consisted of writing a series of commands in machine language and required that the programmer be aware of all of the details of the machine in order to give correct instructions. Usually, a general description of the task the computer was to accomplish would be far removed from these details. The idea that programming would be much simpler if the language contained instructions that were more general and could more naturally express a description of the task led to the introduction of high level programming languages.
2.1. High level languages The first attempt at designing a language that was more general than machine language resulted in the invention of the programming language FORTRAN. In order for this language to be of any practical use, there had to be some way to translate from FORTRAN to machine language. This introduced the notion of compilation, wherein programs written in a high level language are translated to machine language. This translation is performed by a program called a compiler. A compiler for a language such as FORTRAN would take as input a program P written in that language and translate it to a program F(P) in machine language, such that P and F(P) have the same meaning. In the
144
S. Purushothaman and J. Seaman
process of translating from P to F(P), it is reasonable to expect the translation to be based on the translation of the subparts of P. For example, consider evaluation of the expression (a + b) * (b + c). Let us assume that a, b and c are in memory locations a, b and c respectively. If we translate this expression inductively into machine (or assembly) code, the resulting program would look as follows:
Load Load Add
a, R1 b, R2 RI, R2
Load Load Add
b, R3 c, R4 R3, R4
Mul
RI, R3
where the first sequence adds a and b leaving the result in register R1, the second sequence adds b and c leaving the result in R3 and the final instruction multiplies the contents of R1 and R3. Since the translation of this expression is performed by translating the subparts independently and then combining the results, this code causes the value of b to be unnecessarily loaded twice into two different registers. A programmer programming in machine language could easily come up with a sequence of instructions that uses just three registers. As compared to m a c h i n e code (instructions in machine language) produced by a programmer, the code produced by such inductive compilation could be inefficient (in number of instructions and use of resources). Although writing a program in a high level language such as FORTRAN and using a compiler to translate it into machine code instead of coding it directly into machine language may lead to such inefficiencies, the advantage would be the savings in the programmer's effort and time. More importantly, such a program would be portable across machines. Suppose a programmer wants to perform the same task on two different machines which have different machine languages. If the programmer chooses to write programs directly in the machine language, he/she is required to write two programs, one in each language. On the other hand, if each machine has its own FORTRAN compiler which translates FORTRAN programs into the machine language of that machine, the user can write one program which can be compiled separately on each machine into the two respective machine languages, eliminating the need for the programmer to develop a new program for each machine. It is precisely with these motivations that the job of constructing the first compiler for FORTRAN was undertaken. Backus [5], head of the FORTRAN design team in the mid 1950s, reports that a lot of effort was spent in assuring that the code produced by their compiler would be as efficient (in number of instructions and time taken by the code produced) as hand-
Programming languages and systems
145
produced code. The first compiler for F O R T R A N demonstrated that high level languages are an effective alternative to programming in machine languages. The introduction of F O R T R A N and compilation generated new concerns to computer programming. First was the need to be abe to specify both the exact syntax of a program and as well as its meaning. This was followed by the interest in developing compilers that generate efficient code for even more abstract languages.
2.2. Syntactic issues Given a program P written in a high level language, a compiler for that language must be able to verify that it is a valid program syntactically, in the sense that it is composed of correctly formed instructions in an acceptable order. Of course it does not make sense to try to translate something that is not an acceptable program. Therefore there must be some way for a compiler to decide whether a program is acceptable or not. It is obvious that trying to enumerate all possible syntactically correct programs is meaningless. A more reasonable m e t h o d is to come up with a recursive/inductive definition of the set of all syntactically meaningful programs. Such a definition would create a standard for various compilers (i.e., compilers producing code for different machines, or compilers written by different set of people for the same language). The design of Algol60 [23] was noteworthy for its use of context-free grammars 2 for specifying syntax of the language being designed, for its insistence on a strict typing scheme, and introduction of block-structure and recursion. Assume that we have somehow specified a set S of all syntactically legal programs. Given a program P, we would like to test whether P E S or not. Such a membership test is also called parsing. T h e r e are two considerations now, the first is that the membership should be decidable, and the second, more stringent condition is that the membership test be carried out in time linearly proportional to the length of the program (refer to [16] for a discussion on complexity). The following notation will be used in describing syntax: • e denotes an empty string, or string of length zero. • If x and y are strings then their juxtaposition xy denotes concatenation of the two strings. • Given two sets of strings S and R, then SR denotes the set of strings {xy I x E S and y E R } . • Given a set S, we use S n to denote n-fold concatenation of strings from S where S O= {e}, S*= Ui>~OSi and S + = S * - { e } . A grammar, defined below, is one method of establishing an inductive definition of the syntax of a language. 2Also called Backus-Naur form.
146
S. Purushothaman and J. Seaman
DEFINITION 1. A grammar is specified by the tuple (N, T, S, P), where N is a set of nonterminal symbols, T is a set of terminal symbols, S ~ N is the start symbol, and P is a set of productions (or rules), P C (N U T)* x (N U T)*. Intuitively nonterminals denote syntactic classes, the terminals describe the alphabet of the language, the productions capture the rules of the grammar and the start symbol is a given nonterminal symbol which denotes the largest syntactic class being defined. For instance, in any natural language (as opposed to a programming language) the start symbol might denote the category of (say) sentences, the nonterminals might include verbs, nouns, adjectives, etc., and the terminal symbols would include the words of the language. Let r E ( N U T)* = u a v and (c~,/3)EP. By replacing a in r by/3, we derive s = u/3v. Let G = (N, T, S, P) be a grammar. The language L(G) associated with G is the set of strings containing only terminal symbols and derivable from S by a finite sequence of derivations. By imposing various restrictions on the form of productions, one derives Chomsky's hierarchy of languages and grammars. We can now complete the definition of what grammars are. DEFINITION 2. If every production p E P of a grammar G = (N, T, P, S) is in Nx(NUT)*, then G is a context-free grammar. If in addition, every production p = (o~,/3) of a context-free grammar G is such that/3 has at most one nonterminal which appears as the last symbol of/3, then G is regular. If every production p E P of a grammar G = (N, T, P, S) is in (N tO T) + x (NU T)*, where a in p = (a,/3) has at least one nonterminal, then G is a phrase-structured grammar. If in addition, the length of/3 is always larger than or equal to the length of a, then G is context-sensitive. If G is a phrase-structured (context-sensitive, context-free or regular) grammar then L(G) is a phrase-structured (context-sensitive, context-free or regular) language. It is easy to see that the class of regular languages is properly contained in the class of context-free languages 3, which are in turn contained in the class of context-sensitive languages and that the class of phrase-structured languages properly contains all the other classes. Each grammar has the ability to describe languages with certain characteristics. This ability is referred to as the power of the corresponding class of languages. To give an idea about the relative power of the various classes of languages {a n I n >10} is regular, {anb n I n >10} is context-flee (but not regular), {anb 2 In >10} is context-sensitive (but not context-free). The languages ambnc z(m!n), where A is the Ackermann's function, and {M # x lM halts on input x} where M is the binary encoding of a Turing machine are phrase3 Technically this holds only if context sensitive languages are allowed to also contain e.
147
Programming languages and systems
structured (but not context-sensitive). More importantly, given a grammar G and a string w, the test of membership 'Is w E L ( G ) ? ' is decidable for all classes except phrase-structured. But the membership question can be answered in linear time only for regular and certain subclasses of context-free grammars. Fortunately, it has been observed (over the years) that certain subclasses of context-free grammars are powerful enough to specify most of the syntax of useful programming languages. More about the properties of various classes of languages and grammars can be found in [10], their use in building compilers can be found in [3] and their use in natural language processing can be found in [15]. 2.3. Semantic issues
While discussing the notion of compilation, we mentioned that a compiler should translate a program in high level language to a program in a low level (or machine) language with the same meaning. This leads to the questions 'how does one specify the meaning of a program?' and 'when will two programs be considered to have the same meaning?' There are two advantages to specifying the semantics of a programming language: first, it leaves no a m b i g u i t y in implementations. Simply put, consider two programmers, A and B, who develop compilers for a language on machines M a and M B. We expect a program P compiled and run on either machine to produce the same output for the same input. Without a formal semantics describing the meaning of a program there is no means to compare the 'correctness' of different compilers for the same language. Secondly, one needs a set of tools to mathematically argue about the properties of a program. Properties might be as complicated as: 'Does a program P terminate?' to something as simple as determining whether all assignment statements ' X gets value of expression E ' are type correct, i.e., have the property that the type of the value of E is the same as the type of the variable X. Information about properties of programs can be used by a compiler to produce code that runs more efficiently. Reasoning about properties of a program is referred to as analysis, and using this information to produce efficient code is referred to as optimization. One way to assign meaning to a program is to consider each program as a black box so that the effect of a program can be captured as a function from input to output. In t h e early 1960s Strachey did precisely t h a t - he used the notation of Church's h-calcuI/as to state the meaning of programs as mappings from inputs to outputs. A -Calculus is a formalism that was designed to capture the computational (or intentional) nature of functions, as opposed to describing f/mctions b y their graphs, or the actual pairs Of values that represen~ the mapping of the function. T h e terms of the A-calCulus (A-terms) are • '
x
where x- is a -variable
,
h x . M w h e r e x is a v a r i a b t e and M is a ~-term ..... - M N ~ where, M and W are h-terms': " .
' .... ;: ~-
~:
148
S. Purushothaman and J. Seaman
A term of the form Ax.M is called an abstraction and a term of the form M N is called an application. The main rule of A-calculus, called fl-reduction, is the following: ( Ax.M)N = M[x := N], where M[x := N] is the A-term M with all occurrences of x replaced by the term N. This rule models the application of a function to an argument, or in programming language concepts, it models calling a function Ax.M with the argument x given the value of N. These h-terms, or more specifically, the abstractions, were used by Strachey to describe the meaning of programs. He used environments, which are themselves mappings from variables to values, to model the input and output to the program. Thus the meaning of a program was a mapping from environments to environments, represented by a A-abstraction. Actually, each instruction in the language was assigned a A-abstraction to designate its meaning. Then the meaning of an entire program could be designated as the composition o f the functions corresponding to each of the individual statements that make up the program. However, a problem arose in trying to explain the meaning of loops or iterations, statements that conditionally repeat another sequence of statements. For these statements, Strachey was forced to use the paradoxical Y-combinator, which allows for self-application of functions. More specifically, the Y-combinator is a A-term that satisfies the following equation: YF = F(YF) , where F is any term. Continually replacing YF with F(YF) in the term YF leads to the term Fn(YF) where F n represents n consecutive applications of the function F. This is where self-application of functions comes in. Self-application of functions causes a problem in finding a domain or model for A-calculus, because any domain X would have to include the set of all functions from X to X, which is impossible because of cardinality. So, because of the existence of the Y-combinator, as of 1969 there was no semantic model for the untyped A-calculus. This meant, consequently, that there was no model on which to base semantics of programming languages based on A-calculus. Therefore, developing a semantics to discuss the mathematical meaning of phrases in a programming language entailed developing a semantic model for untyped h-calculus. Let ~ be a space of values which can be used to give meanings to terms in h-calculus or to any programming !anguage. For ~ to be a valid model (or domain), it must meet the following r~equirements: (a) the notion of undefinedness is 'objectified', (b) the space is dosed under cartesian products and unions, and (c) the space is isomorphic to function spaces over itself. It was left to Dana Scott to discover the appropriate constructions necessary to satisfy all these requirements. Clearly, as stated above, i t is impossible to find a value
Programming languages and systems
149
space that is isomorphic to the function space over itself. Scott argued that it is more reasonable to expect the value space to be isomorphic to the restricted space of 'continuous' functions over itself, Indeed, it was shown by Scott in [25] 4.that complete lattices with a unique least value (according to the ordering in the lattice) are appropriate structures. By choosing such a value space, the semantics of a nonterminating computation can be identified as the least value, the notion of iteration and recursion can be explained as the limit of a sequence of successively better values (by the information ordering associated with the lattice) in the lattice. More importantly, only monotone and continuous maps over these complete lattices can be used as possible meanings of functions in the object language. Once a model (among possibly many) was found for untyped A-calculus, it led the way to using it as a target language in the prescription of denotational (compositional) semantics for any programming language, as described above. More importantly, there have been a number of efforts at designing a programming language not only by describing syntax, but also by formally describing the semantics. Once the semantics for a programming language has been established, it allows for (a) comparing the 'correctness' of various compilers for the same language, and (b) deriving compilers automatically from the semantic descriptions. In addition to this, it provides a domain for reasoning about characteristics of the language and developing optimizing compilers. Although Strachey's model provided a basis for describing the meaning of programs, there are still certain properties of some languages that are difficult to characterize in a reasonable way. For example, languages that include statements about processing in real time, or describing concurrency (executing instructions simultaneously) cause difficulties in establishing a well-understood semantics. Meanings for complicated control structures (such as coroutines) are also difficult to describe formally. Put differently, the question 'what is the correct algebraic structure that can be used as a model of parallelism?' has not been answered completely. Another important open problem is the full-abstraction problem. Consider the newtonian laws of motion and the physical reality they describe. The newtonian laws are useful precisely because they abstract the details of the physical world; that they do so can be established by experimentation and by the fact that the laws fit in mathematically with other pre-existing validated laws of nature. Now turning to issues in computer science, assuming that a programming language embodies the characteristics of a computing machine, any semantic model of that language should be faithful to physical reality. Consider a language P, a semantic function M that maps phrases of P to a structure D. Two phrases Pl and P2 can be considered to be equivalent provided the semantic function M equates them, i.e., M ( p l ) = M(p2). Consider an arbitrary program p in which pl is a subpart. Let p ' be syntactically 4 A more recent gentle introduction to these topics can be found in [9].
150
s. Purushothaman and J. Seaman
identical to p except that the subpart Pl is replaced by P 2 - i . e . , there is a program c with a hole such that p = c[pl] and p ' = c[p2]. Since M is an equality (equivalence) relation that is generally defined in a compositional way, we can expect M ( p ) = M ( p ' ) , i.e., M is a congruence. The full-abstraction problem is the converse: 'If V ( p = c [ p l ] ) ( p ' = c [ p z ] ) M ( p ) = M ( p ' ) then M(pl) = M(p2)'. Clearly, such a statement is both a property of the semantic function M and the semantic domain D. If M does not have this property then it claims two phrases of a program are equivalent based on some characteristics, but not all, of a real implementation. It is surprising that the problem of finding appropriate semantic functions and semantic domains for real programming languages has been open for almost fifteen years. A statement and history of the problem appears in [20] and the first successful attempt at providing a fully-abstract semantics for one of the real languages appears in [7]. But a lot more needs to be done before a methodology can be worked out to easily construct a fully-abstract semantics for a new language. A problem that has not been addressed until recently concerns optimal reductions in A-calculus. In A-calculus, various sub-expressions of an expression may be evaluated in different orders, which alters the amount of work required in the evaluation, without affecting the result. In defining semantics for a functional language, which is based on A-calculus, the order of evaluation of various subexpressions may be determined. However, it has been difficult to establish a semantics that can be used to determine the optimal (with respect to amount of work required) order of evaluation. An attempt at addressing this problem can be found in [1].
2.4. Symbolic computation The design of the language FORTRAN showed that languages can be designed that capture commonly used computation paradigms of a particular domain. In the case of FORTRAN, this domain was algebraic formulas. In much the same way, other languages were successfully designed which could better describe problems in other domains. For example, the design of the language LISP was intended to capture symbol manipulation. The specific features of LISP that are now considered to have been revolutionary are: (a) the notion of symbolic information, (b) a list data-structure capturing tree structures that is used to represent both data and programs, (c) the ability to construct programs on the fly and have them evaluated, and (d) a notation for describing recursive functions and composing them. McCarthy, the inventor of LISP, in his summary of the history of LISP [21] mentions that he designed the language in such a fashion that it would be easy to argue about programs written in such a language. Most importantly, the notion of referential transparency, which allows substitution of equivalent phrases for each other in the text, decreed that LISP be based merely on expressions, function definitions and function applications. Thus it is not surprising that a subset of LISP, called pure LISP, is merely untyped A-calculus. In fact, the symbol-pushing nature of A-calculus has
Programming languages and systems
151
caused LISP to be the language of choice for programming Artificial Intelligence systems, where building an inference engine for some fragment of first-order theory is a primitive task. Assuming that performing inferencing in some first-order theory is an important task for AI, treating any formal theory itself as a programming language would help. But the difficulty of such an identification depends upon the axiom system for the formal theory being considered. The important criterion is the number of inference rules required to prove new theorems from old theorems. The greater the number of inference rules in a theory, the less likely it will become a candidate for being treated as a programming language. Among the basic formal theories, A-calculus and first-order predicate calculus have had the distinct advantage that just one or two inference rules are enough to prove all new theorems from old theorems. The /3-conversion rule of A-calculus and the resolution rule of predicate calculus have thus been made use of in designing the languages LISP and PROLOG, respectively. The motivation for designing languages out of formal theories is that the inference mechanism becomes available as a basic primitive and does not have to be explicitly built. The advantage to designing languages at such a high level of abstraction is that they are easier for humans to use, reason about, and write correct programs with. On the other hand, execution of programs in that language can consume a great deal of resources (space and time). Thus language design becomes one of balancing the needs of abstraction and execution mechanisms with meaningful use of resources. To give an idea of the ease of use of such high level languages, the following definition of the transitive closure of a relation R can be coded (almost) verbatim in PROLOG. Let R be a binary relation. The transitive closure R* can be defined as
(x,y)~R*
ifx=y,
(x, y) e R*
if R(x, z)&R*(z, y).
An important criticism of PROLOG has been that it cannot be used for describing and programming open systems [12]. By definition, an open system is one in which the database of information can change dynamically with time. More abstractly, open systems model systems in which the knowledge can change with time, even to the extent of being contradictory between two different instances of time. A good example of this is a person's beliefs which obviously are not constant with time. So let us consider the simulation of a person's beliefs by a PROLOG database and associated programs. Assume, for instance, that John learns from the airline schedule that flight #202 is to arrive at 4:30pm. If the database is queried whether flight #202 is on time, it would answer in the affirmative. Now assume that John later hears that flight #202 has been canceled due to inclement weather. If the system is queried at this point it should answer in the negative. If PROLOG is restricted to pure Horn clause programs (i.e., no cuts, negation, retracts or asserts) then a P R O L O G
152
S. Purushothaman and J. Seaman
system cannot be written to handle this situation. Carl Hewitt's criticism of P R O L O G has been based on the fact that first-order logic cannot be used to describe dynamic systems. Though the criticism is valid for P R O L O G , it is not valid for logic programming systems in general. There has been a great deal of interest lately in the use of temporal/modal logics for describing dynamic systems. First-order temporal logic is based on first-order predicate calculus and has additional operators for describing the passage of time. Some of the operators and their intended interpretation is given in Table 1. To describe the fact that a proposition p is true at this instant and false the next instant, the predicate p ^ O ( - p ) can be used. To suggest the possibility that a proposition p is sometimes true and sometimes false, the _predicate --fqp ^ - E ] - - p can be used. The temporal operators used are instances of modal operators. Modalities introduce the notion of possible worlds different from the present state of affairs, and can be used to describe dynamically evolving systems. The problem of how to use modal/temporal logics effectively for AI-related work is still an open problem. In using modal/temporal logic for knowledge-representation, the main concern is one of balancing expressibility and the tractability of the decision problem for the calculus being used. The expressibility increases with any increase in the number of modal operators in the chosen logic, but unfortunately the problem of showing the consistency of a set of formulae becomes either exponentially hard or undecidable. An instance of this effort to balance these two requirements can be found in Allen's paper [4], where he discusses the use of modal operators to express intervals of time and also has a decidable decision procedure. Information on how temporal logic can be used for plan-formation can be found in [22]. In the realm of programming languages, greater expressive power means that one has to specify what has to be done rather than how to do it. This paradigm of problem solving is practiced by all engineers and scientists. For instance when an equation, such as v = 8s/St
is given, it can be used in a number of ways depending on what is required. In most contexts when an equation involves n variables, knowing n - 1 of them is enough to solve for the last variable. Obviously the equation is acting as a Table 1. Interpretation of certain temporal logic operators. Operator
Interpretation
O(P) rqp OP Pq/Q
P P P P
will will will will
be be be be
true true true true
the next time instant at all time instants in the future at some time instant (either now or in the future) at all time instants until Q becomes true
Programming languages and systems
153
constraint over the n variables. This notion of solving equations or satisfying constraints has been termed programming with constraints [27, 6]. The paradigm of programming as proposed by [27] involves specifying constraints over a set of variables and an automatic system which solves the constraints for possible solutions. It can be claimed that P R O L O G is an instance of the paradigm of constraint-driven programming, due to its ability to solve a number of problems from a given specification. Constraint-driven programming has been developed in the context of CAD tools and other engineeringoriented problems. Two well-known implementations of this paradigm by Steele [27] and Borning [6], allow a user to specify a number of variables and constraints to be satisfied among them. Once such a set of constraints has been set up, the user can define or change one of the variables and view the changes in the other variables. Both the systems analyze the constraints and build a network of dependencies among the variables. When one of the variables is changed it uses the dependency relation to propagate the changes modulo the constraints among them. A form of relaxation algorithm is used in both of the implementations. Since it is impossible to come up with implementations that solve arbitrary equations, both of these implementations adopt the approach of cooperating experts. Such an approach allows for a user to include in the system separate modules (experts) for solving equations of a certain kind, for instance, modules to solve linear equations, modules to solve linear inequalities, or modules for performing symbolic integration or differentiation. To summarize, we have argued thus far that it is advantageous to have a language that is as expressive as possible in order to be able to program AI-systems easily. However, the problem with using a very expressive language is that its implementation may be difficult, or even impossible, or perhaps just too inefficient. On the other hand, by using a language of very low expressive power, the user of the language is burdened with details not directly connected with the problem that is being solved. Thus what is necessary is a language that is somewhere in between, offering the correct kind of abstraction that makes programming easy and whose implementation makes a reasonable use of time and space.
3. Architecture and operating systems As stated above, in order for a programmer to use a computer system effectively, a high level language should be provided to eliminate the problems associated with programming in machine code. Another way to increase the programmer's accuracy and efficiency is to provide an operating system which separates the programmer from the physical details of the system. The operating system is a program which acts as an interface between the user or programmer and the hardware of the system, performing operations such as input/output or memory allocation on behalf of the user. It can also be seen as
154
S. Purushothaman and J. Seaman
a resource allocator, where the resources are the CPU (central processing unit), memory, disks, printers, etc. The operating system determines which users or which programs can use these resources at any given time and in what capacity. The goal of the operating system is to make the computer system convenient for the programmer to use, as well as making it run as efficiently and accurately as possible. Another attempt at developing efficient systems has been made in the area of architecture. The main contribution of architecture has been to create systems where parallelization is possible, which is usually implemented by increasing the number of processing units in the computer so that many instructions may be processed simultaneously. Efforts have also been made in developing hardware components that operate at higher speeds at lower costs. Both the operating system and the underlying architecture are designed to aid each other in creating an efficient and accurate system. As new architectures are developed, operating systems are redesigned to take advantage of these new developments. On the other hand, architectures can be designed to make the job of the operating system easier or more efficient. The operating system usually makes these new developments somewhat invisible to the programmer (except for the effects). However, sometimes language support is necessary in order for the programmer to incorporate the new developments into his/her program. Thus in many ways the development of operating systems, architectures, and programming languages have been dependent on each other.
3.1. The origins of operating systems In order to better understand what an operating system is, it is beneficial to look at how operating systems have developed through time. When computers were first introduced, only one person could operate them at a time. There essentially was no operating system, just the programmer and the hardware, which meant the responsibility of loading tapes and programs and accessing primary and secondary storage belonged to each individual programmer. Basically, the programmer was the operating system. Either each programmer had to sign up to be the sole user of the computer for a given period of time, or there was one (human) operator that would run programs on behalf of the programmers and was responsible to do all of the loading. In either event, the CPU was not being used very efficiently. It would sit idle while the programmer or operator would load or unload tapes or programs. An attempt to solve this problem by making the transition from one program (or job) to another more efficient was automatic job sequencing. Along with the program itself, commands were loaded that would give instructions to the computer to perform certain jobs that were previously done by the operator. For example, these commands could tell the computer to load a certain compiler, load the program, and run the program. This method, though, required a certain program, called the resident monitor, to be permanently
Programming languages and systems
155
loaded in the computer in order to interpret and carry out the automatic job sequencing commands. The resident monitor was the first operating system. Though the resident monitor increased efficiency, some problems still remained. One of these was that because the CPU is a great deal faster than I / O (input and output) the CPU still sat idle while it waited for the input and output to execute. This problem was solved by requiring the CPU to only initiate the I/O, and then to continue processing while the I / O was simultaneously carried out by the I/O device. When the I / O was complete, it notified the CPU by an interrupt. Thus the I / O and the CPU could run concurrently. Another method that allowed the CPU to run more efficiently was to read several programs along with all necessary input to a disk. This allowed the CPU to access I / O whenever it needed it directly to or from the disk. The interrupt method was still used, but when the CPU was done with one job it could proceed to the next without waiting for I / O from the current job to complete. The possibility of having the input for many jobs available on disk and having more than one job or program loaded into memory at a time led to the idea of multiprogramming. One situation that still left the CPU idle occurred when one program was required to wait for the completion of its I/O, especially input, before continuing. In multiprogramming, when one job must wait for I/O, the CPU simply stops processing that job and continues processing some other job in memory. When this job requires I/O, the CPU resumes processing some other program in memory. When a job's I / O completes, it then becomes eligible to be resumed by the CPU when some other job requires I / O or completes. This way, the CPU rarely sits idle. Multiprogramming made it possible for several programs to be in memory at once, each sharing the CPU. In a sense, it made it possible for the CPU to be used by many programs at once. Since a method had been found to share the CPU among programs, the next step was to share the CPU among users, which is called time sharing. When programmers use a terminal, in general they generate short commands or programs that use the CPU for only a short period of time. In time sharing, there is a time limit, called a time slice, for how much time one user can occupy the CPU at a time. At the end of the time slice for a given user, another user gets the CPU for his/her allotted time slice. A certain user program or command may require more than one time slice to complete, but since these time slices are very small, it does not appear to the user that they are really sharing the CPU, which makes time-sharing feasible. So with the operating system, several users may use the computer at once, with the appearance that they are the only one using it. Also, the CPU is used efficiently, being constantly switched between ready users and programs and rarely sitting idle. The operating system also takes care of details such as loading compilers, other programs, data, and starting programs. Thus the operating system successfully makes using the computer very convenient and efficient for the programmer. For more information about the history of operating systems, see the first chapters of [8, 26].
156
S. Purushothaman and J. Seaman
3.2. Resource management
As stated above, the operating system can be thought of as a resource manager, allocating resources such as the CPU, I / O devices, disks, memory, etc. to users and programs. We consider any user command or program that wants to use the CPU to be a process. So in a computer system, many processes compete for the same resources. The job of the resource allocator then becomes one of managing shared resources, which introduces new problems into the efficient operation of the system. The two major sets of problems associated with shared resource management are: (a) the validity of actions used to manage competing processes, and (b) the effectiveness of a policy used to allocate resources. To understand the first problem, consider the following situation: Assume a central processor is used to manage the books of a small bank. Consider two people (say H and W, for husband and wife) who share a joint account are at two different counters of the same bank at the same time. Moreover, assume that H is trying to withdraw an amount of money (say hm) and W is trying to deposit an amount of money (say win) into the same account. Clearly, the cashiers at the two counters would use their terminals to run different programs, one to withdraw money and the other to deposit m o n e y into the same account. Therefore, it is a single file, m e m o r y location, or counter holding information about the current balance that is being changed by the two cashiers. As a multiprogramming system would i n t e r l e a v e the actions of various jobs to give every user the impression that he is the sole user, so the instructions of the two programs being run by the two cashiers would be interleaved. The set of instructions (at the machine language level) that form the program for withdrawing and depositing money could be represented as follows: WITHDRAWAL wl: Load account balance to register R1 from File w2: Is R1 > hmg. w3: If yes, subtract h m from R1 w 4 : If yes, Load R1 to File w5: If no, say 'not enough balance' DEPOSIT d l : Load account balance to register R2 from File d2: Add w m to R2 d3: Load R2 to File Given that both of these programs would be allowed to run for a small time slice at a time, and their executions interleaved, it is possible that the instructions are executed in the order •..
wl dl d2 d3 w2 w3 w4 ....
Programming languages and systems
157
Clearly such an interleaved execution of instructions would change the old balance (Aom) to A o m - hm, whereas after these two transactions are executed the new balance should be Ao~d - hm + w,,. From a correctness point of view, the account balance is a shared resource and should never be used by two or more programs at the same time. Such problems with shared resources are not new. They date back to the use of a single train track by trains running in opposite directions and before [13]. The use of such shared resources necessitates the use of protocols. Each party in an interaction would, of course, have to follow a script for accessing shared resources. The use of a protocol, in itself, is not guaranteed to produce 'correct' results. For example, consider a four-way stop sign. Assume that the drivers manual states that the only rules to govern the use of intersection are: (a) the first automobile to arrive has the right of access, and (b) in the event that two automobiles arrive in orthogonal direction at the same time, the one on the right has the right of access. Clearly this protocol can be used without dire consequences most of the time. But, it does not mention what should happen when automobiles arrive in all four directions at the same time. By following these rules, of course, all the four automobiles would endlessly wait for each other - a situation that is termed deadlock. A similar problem arises, called indefinite postponement, when a protocol allows the component programs of a system to behave in such a fashion that some programs conspire to lock out the other programs indefinitely from accessing the shared resource. The design of 'correct' protocols is not limited to multiprogramming systems. In fact, they have gained wide spread importance due to the powerful combination of telecommunication and computers [2]. All along we have been discussing the qualitative aspects of sharing resources between a number of programs. There is more to designing computer systems that manage shared resource than just the 'correctness' aspect. Consider, for example, the situation where there are a number of programs P1,---, en that share a main memory of M locations, being managed by a 'memory management system' (MMS). Assume that each process would at various times (a) request a continuous section of the memory of arbitrary size (less than M) from MMS, (b) use it, and later (c) return it back to MMS. At an arbitrary time, the central pool of shared memory can be characterized as an alternating sequence of memory blocks that are in use and blocks that are free. When a new request is made by one of the participating processes, the memory management system could use one of a number of possible policies to satisfy the request. For example, it could use the first free block that it finds suitable to satisfy the request, or it could search for that block which has the least size among all blocks that can satisfy the request. Each policy has its own advantages and disadvantages. The choice of a policy can only be based on assumptions about the purported behavior of the participating processes. Clearly such behaviors can be modelled as stochastic processes. More information about the 'quantitative' aspects of resource sharing can be found in Kant's Chapter 2 in this volume [17].
158
S. Purushothaman and J. Seaman
3.3. Memory hierarchies Because it is one of the more specialized resources in a computer system, the memory requires more Specialized management techniques. Memory management is also a good example of how operating systems and architecture developments have worked together to create efficient systems. In order to allow several processes to share the CPU effectively, several processes must be in memory (have their instructions in memory, as opposed to secondary storage) at the same time. This limits the amount of space in memory that one process may use, but it is the goal of the operating system to eliminate this limit, as far as the user is concerned. This goal. is accomplished by virtual memory. See Figure 2. Virtual memory uses a multi-level memory system to meet this goal. The simplest situation is a two-level memory which is described here. The lowest level of the system, usually referred to as secondary storage, stores all of the programs and data for the whole system. It is usually made of large and slow, but inexpensive, media. In a two-level system, the next level, or the top level, is the main memory, which is much faster but more expensive and thus smaller
Cache
Main
Memory
speed and cost
Secondary Storage
size
Fig. 2.
Programming languages and systems
159
than the secondary storage. Everything that is stored in main m e m o r y is originally copied from secondary storage and eventually copied back to secondary storage. Main m e m o r y is broken down into sections so that it can be fairly allocated to all the currently active processes. If these sections are of equal size then they are called pages. To make things easily compatible, secondary storage programs and data are also broken down into units that are the same size as the pages in memory. In order for the CPU to run a process, the instructions and data that the CPU will access must be in main memory. When the CPU needs to access an instruction or some data that is not currently in main memory, it must perform what is called a page swap. It must locate the instruction or data in secondary storage, and then move the page containing that information into main m e m o r y before it can continue running that process. Since a page swap requires access to secondary storage, it is considered an I / O operation, so once the I / O is initiated, the CPU begins to run some other process in memory instead of waiting for the input to complete, as described earlier. Thus each process has only a subset of its instructions and data in main m e m o r y at a time. It may seem that this method of swapping pages each time an access to information in secondary storage is encountered will cause the operating system to spend more time on swapping than on executing programs, but under the right circumstances, this is not the case. Usually, during a short duration of time, a program tends to access information and data that are in the same areas of the program, as opposed to randomly skipping around the program referencing various locations. It is this locality of references in time and space that causes virtual m e m o r y to work well. T h e r e are many issues associated with maintaining virtual m e m o r y that the operating system and architecture must address (Figure 3). First, the CPU must be able to easily determine if a given address is in m e m o r y or not. For each process, its pages in secondary storage are numbered in order. For each process, there is a function that maps pages in secondary storage to pages (or page locations) in main memory. Each of these functions are stored in the computer as a table called the 'page table', which is implemented as a set of tuples. For each page of the secondary m e m o r y that is in main m e m o r y there will be an entry in the page table. The left component of the tuple identifies the secondary memory page for which it is an entry, and the right component identifies where this page has been stored in the main memory. Only the pages that are in main memory are listed in the page table. When a m e m o r y reference is encountered, the CPU must search the page table to determine if that reference is currently in m e m o r y or not. The page table is often stored in special memory called the translational lookaside buffer, which provides for extremely fast table lookup, to speed up the search. Another issue in maintaining virtual m e m o r y is when a page swap is necessary, determining which page in main m e m o r y should be replaced. An often used method is 'least recently used', which selects the page in m e m o r y whose time of last access is the earliest. This works well, but may be hard to
160
S. Purushothaman and J. Seaman Secondary Storage Main Memory
reference
Fc Page Table ii
m
13a 3b
I. M M generates reference 2. Look up reference in page table 3a.If page is in MM, then access it 3b.If page is in SS, then locate it and 4. Move page to MM to a c c e s s it
Fig. 3. implement because it may be difficult to determine when each page was last accessed. Once the page is selected, if it has been altered by its process, it is copied back to secondary storage before the next page is brought in. In virtual memory, as stated above, there may be more than two levels of memory, in which case it is a hierarchy of memory. All the information on one level must have originated as a copy of some information on the level below it. As the levels increase, the cost and speed of the media increases, and the size or amount of storage decreases. Often there is another level above the main memory, called the cache. It is a fast and expensive m e m o r y and usually acts as a buffer between the main m e m o r y and the CPU. The interaction between the cache and main m e m o r y is similar to the paging described above, but a 'page swap' from main memory to cache does not require I / O , so it does not cause
Programming languages and systems
161
the CPU to switch to another process. Generally, the addition of this level does speed up the operation of the CPU, demonstrating how the architecture of a system can work together with the operating system to improve the efficiency of the system. For more details on the architectural aspects of memory hierarchies see [14]. For an introductory approach to the concepts of paging, locality of references, and various page-replacement schemes, see [8]. For a more in-depth discussion of the theoretical basis of these concepts, see [19]. [26] also gives a thorough discussion of these topics. 3.4. CISC versus RISC
In addition to aiding the operating system directly as in virtual memory management, the architecture can increase the efficiency of the system in other ways such as the size of the instruction set. Each computer has a specific set of machine instructions that it is capable of executing, which is determined when the architecture of the system is established. Initially, when computers were first developed, computers had small sets of very simple instructions. As computers developed, more complicated instructions were developed and added to instruction sets of new computers, creating large sets of complicated instructions. These new instructions did not necessarily introduce new capabilities, but performed the same operation as a sequence of the previous instructions. But since the operation was now one machine instruction, it could be executed faster than the several simpler instructions that performed the same operation. Another advantage to these more complex instructions was that one high level language operation could now correspond to one complex instruction instead of the usual several simpler instructions. This speeds up compilation and makes it more efficient. A major disadvantage to complex instructions is that they cannot be optimized. As a compiler translates a program in a high level language to machine instructions, it optimizes the code by using information it has gleaned while translating the program. It can use this information to rearrange instructions and use certain registers to eliminate redundant instructions (as mentioned in Section 2.1) and replace some instructions with simpler ones. Thus it can actually reduce the number of instructions that would be translated from a given high level instruction. If the high level instruction is translated to one complex instruction, then the compiler cannot optimize that operation because its execution time and use of registers is fixed by the instruction. Thus some computer developers proposed that a simpler, smaller instruction set would actually produce more efficient code, given a compiler with a good optimizer. These machines are called reduced instruction set computers, or RISC machines. Computers with larger more complex instruction sets are called complex instruction set computers, or CISC machines. In general, programs translated by CISC machines generate a machine code program with fewer instructions than a machine program translated from the same program
162
s. Purushothaman and J. Seaman
by a RISC machine. But the machine program generated by the RISC machine may execute faster under certain circumstances. First of all, of course, the code must have been optimized in order for it to run faster on a RISC machine. In certain applications, such as those involving many integer operations, the RISC machine is faster, but if the application involves many complex operations, such as those associated with real numbers, the CISC machine is faster. In any case, the size and complexity of the instruction set of a machine has much to do with how efficient it is. More information about the dichotomy can be found in [24]. 3.5. Specialized units
Another way the architecture may improve the efficiency of the system is by providing parallelization, the ability to execute more than one instruction at a time. This can be carried out to a small degree by allowing the CPU to carry out more than one CPU operation at a time, which is made possible by specialized units. The CPU, or central processing unit, is made up of basically two parts. One is the control unit, which is responsible for interpreting instructions and controlling their execution. The other is the arithmetic logic unit (ALU), which is responsible for executing arithmetic operations such as addition, subtraction, multiplication, and division, and logical operations such as AND, OR, and NOT. In addition to these two units the CPU also contains registers, which are storage locations for temporary results. An ALU is a very general processing unit that is capable of performing various operations, but only one at a time. A specialized unit, on the other hand, is a very specific processing unit. It can only perform one type of operation, such as addition. One way to generate parallelism in the CPU is to build the ALU out of many individual specialized units, at least one for each operation the ALU needs to carry out, so that these units may work independently of each other. This means that if the CPU must carry out several different types of operations, it may perform them simultaneously on different specialized units. On the other hand, if the CPU must perform many of the same operations, it must do them sequentially, as a normal processor would. A CPU with specialized units may contribute some parallelism to a system, provided the processes perform many different types of operations. This parallelism definitely provides a faster, and thus more efficient system. On the other hand, this parallelism is limited to a certain kind of program and may require overhead to control the parallelism. Vector processors generally used to speed up matrix computation fall within this class. 3.6. Parallel architectures
Another attempt at employing parallelization at the architectural level is to incorporate more than one processor into the system. Though it is obvious that
Programming languages and systems
163
instructions executed in parallel will speed up the operation of the system, it is sometimes difficult to achieve the maximum possible speed up because some problems or parts of problems must run sequentially. To run various parts of a program in parallel, these parts must be known to be independent of each other, creating the need for appropriate developments in operating systems and programming languages to take full advantage of parallel architecture. There are many methods of describing or classifying parallel structure in a multi-processor computer. Those presented here are described more thoroughly in [11]. The first method is basically behavioral, based on the instruction and data streams to the processor. It classifies parallel structure as follows: SISD: single instruction, single data stream. This class includes the sequential processor which operates one instruction at a time on one item of data at a time. SIMD: single instruction, multiple data stream. This class contains computers with several processors that all execute the same instruction at a time on different items of data. This type of parallelism works well for vector and matrix applications. MISD: multiple instruction, single data stream. These computers are somewhat more difficult to describe. One possible example is the pipelined processor described below. MIMD: multiple instruction, multiple data stream. This class would be a group of processors that execute separate processes on separate data. Most computer systems can be placed in one of the above classes. A n o t h e r way to classify parallel processors is structurally. Assuming that the processors have access to the same (shared) m e m o r y units, there are at least three possible structures. See Figure 4. The first is shared bus, where each processor and each memory unit is connected to one bus (communication path), so that all communication and data transfer occurs on this one connection. This structure is prone to message collisions. One way to resolve this is to provide more than one bus, called a multi-bus, where certain processors and memory units are connected to certain busses. This can eliminate the traffic on the bus and can also be set up so that only certain processors are connected to certain m e m o r y units. The third shared m e m o r y structure is the crossbar connection. Each processor is connected to each m e m o r y unit. Only one processor can communicate with each m e m o r y unit at a time. This eliminates message collision, but may require some processors to wait to access memory units if they are already being accessed. When the processors do not share m e m o r y they have their own local m e m o r y and are best described as distributed systems which is discussed in the next subsection. Finally, it is possible to characterize parallel architectures by the presence of pipelining. Pipelining can be used to execute a series of instructions on many data items. Each processor is assigned a given instruction in the series, and the data proceeds from one processor to the next in the appropriate order. As soon as the first instruction is executed on the first item, that item proceeds to the next processor, freeing the first processor to begin executing its operation on
164
S. Purushothaman and J. Seaman
.
m
.
.
.
.
.
.
m
m
I. S h a r e d Bus S t r u c t u r e
~11 P2i .... IPnl _
M1
_'_"
M2
Mm
d
::
m
iN •m
! ! I I I IN
2. M u l t i - b u s
M1
Structure
M2
Mm
P1
P2
3. C r o s s b a r C o n n e c t i o n St ructure
I
I ! I ! e
Fig. 4.
the second data item. Execution proceeds in this manner, where each processor performs its operation on the next data item, and passes that item to the next processor, until all the items have been processed. Consequently all of the processors are kept busy, and all of the data gets processed. This m e t h o d allows a speedup by as many steps as there are in the operation. The problem with this method is that it is restricted to only those operations that can be
Programming languages and systems
165
broken down into smaller steps and performed simultaneously on a group of data. Further discussion on parallel architectures can be found in Chapter 3 of this Handbook by Krishnamurthy and Narahari [18]. Although parallel architectures introduce the potential for considerable speedup in processing time, there is much to be learned yet at the operating system and programming languages level in order to reach the full potential.
3.7. Distributed systems Distributed systems are an attempt to exploit parallelism from the operating systems point of view. A distributed system is a system of several processors, each with their own memory, connected to each other by communication lines. Since each processor has its own memory, all communication must be done by sending messages as opposed to accessing some shared memory locations. This characteristic is what makes distributed systems unique from parallel architectures. When the processors reside on computers that are separated by some distance the system is called a network. Each processor in the system is referred to as a 'node' or a 'site'. The advantages presented by distributed systems are the ability to share resources located at different sites, speedup in computation due to running processes concurrently, and reliability (if one processor fails, another one in the system can be used). There are many logistics problems to be concerned with in a network or distributed system which will not be addressed in detail here. These concern the layout of the network and how messages are sent across the network. These details can be found in [28]. Distributed systems, once physically established, allow for various methods of computation. In data migration, when a process running on one node requires data that is stored at another node, that data is transferred to the node where the process resides. When the process has completed computation on that data, the data is copied back to its original node, if any changes were made by the process. In this method, the consistency of the data must be maintained, perhaps by mutually exclusive access to the data. In computation migration, data resides permanently on the nodes, and any computations done involving data must be done by processes on the node where the data resides. If a process at one node desires to use certain data on another node, it must make a request to a process at the node where the data resides to perform the task for it. This is called a 'remote procedure call'. In job migration, jobs or processes can be performed on any of the nodes, and the operating system of the distributed system must determine which process gets executed at which node. The system then has the ability (and responsibility) to balance the processor load at all of the sites, to speed up computations by allowing several processes to run concurrently on different processors, and to allow hardware preference by a process. Distributed systems must deal with the same problems as single processor operating systems do, but the distributed system operating system is limited to
166
S. Purushothaman and J. Seaman
communication with processes by messages only. The problem of establishing mutual exclusion for certain resources becomes very complicated when communication is restricted to message passing. This problem is somewhat simplified if there is one central node that is responsible for enforcing mutual exclusion. If a process desires exclusive access to a resource, it sends a request to the central node. If the resource is available, the central node sends a reply to the requesting process, giving that process permission to use the resource. When the process is done using the resource, it sends a release message to the central node. The central node is responsible for keeping track of which process or node is using the resource and which processes are waiting to use which resource. Though this method is fairly simple, its major drawback is that if the central node fails, the whole system of nodes is then unable to share resources. In a fully distributed system, where there is no central controlling node, the problem of enforcing mutual exclusion is much more difficult to implement, but is more reliable because it is not destroyed by the failure of one node. In this method, a process desiring exclusive access must send a request to all other nodes to ask permission to use the resource. Each node must either reply that it is not using the resource, giving the requesting process permission, or it will defer its reply if it is currently using the resource. If (and when) the requesting process receives all of the replies, it will proceed to use the resource. Another solution to enforcing mutual exclusion in a fully distributed system is by requiring a process to obtain a 'token' before it may use a resource. In this method, there is a certain message, called the token, that is passed from one node to the next around the network in a cycle. A process desiring exclusive access must wait until the message reaches its node. It will then keep the token until it has completed its critical section, when it will then continue passing the token message around the cycle. There are many more solutions to the mutual exclusion problem, many of which are described in [19]. By offering a greater v a r i e t y of resources, the potential tO decrease computation time, and a reliable environment in which to work, distributed systems can aid in increasing the efficiency of the programmer.
4. Conclusion
'
In this chapter we have explained h o w advancements in programming languages and operating systems have combined to create an environment that is easy for the p r o g r a m m e r t o w o r k ini By easy we mean that the p r o g r a m m e r can communicate with the computer in a language that is easy for ;him,to understand,' as' w e l l as far removed from the details of t h e machine. This; decreases t h e number o f errors that are due-to human mistakes. T h e system is also made more efficient by performing certain tasks ,on ~behalf;,of the: programmer or user as well as distributing the resources in: such a! m a n n e r :that they are used as often as possible. In programming, these goals are'achieved by.,
Programming languages and systems
167
providing the programmer with a more abstract language with which to communicate instructions to the computer. Also, through semantic analyses, compilers can be developed that perform optimizations to yield more efficient machine code. In operating systems, many efforts have gone into creating a machine that is efficient and convenient to use. These efforts include creating operating systems that manage resources, employ virtual memory systems, and use parallelism and multi-processors. Further improvements continue to be made especially in finding efficient ways of implementing languages with greater expressive power and in developing reasonable organizations of distributed systems.
References [1] Abadi, M., L. Cardelli, P.-L. Curien and J.-J. Levy (1990). Explicit substitution. In: Proc. XVII ACM Symp. on Principles of Programming Languages. San Francisco, January. ACM Press, 31-46. [2] Special issue on Protocol Testing and Verification (1990). A T & T Teeh. J. 69(1). [3] Aho, A. V., R. Sethi and J. Ullman (1986). Compilers, Principles, Techniques, and Tools. Addison-Wesley, Reading, MA. [4] Allen, J. (1983). Maintaining truth about temporal intervals. Comm. ACM 26, 832-843. [5] Backus, J. (1981). The history of FORTRAN I, II, and III. In: R. Wexelblat, ed., History of Programming Languages, Academic Press, New York, 25-44. [6] Borning, A. (1981). The programming language aspects of ThingLab, a constraint-oriented simulation laboratory. ACM Trans. Programming Languages Systems. 3(4), 353-387. [7] Cartwright, R. S. and M. Felleisen (1992). Observabe sequentiality and full abstraction. In: Proc. 19th ACM Symp. on Principles of Programming Languages. Albuquerqe, January. ACM Press, 328-342. [8] Deitel, H. M. (1984). An Introduction to Operating Systems. Revised 1st ed., AddisonWesley, Reading, MA. [9] Gunter, C. A. and D. S. Scott (1990). Semantic domains. In: J. van Leeuwen, ed., Handbook of Theoretical Computer Science. Vol. B, North-Holland, Amsterdam, 633-674. [10] Harrison, M. (1978). Introduction to Formal Language Theory. Addison-Wesley, Reading, MA. [11] Hayes, J. E (1988). Computer Architecture and Organization. 2nd ed., McGraw-Hill, New York. [12] Hewitt, C. (1985). The challenge of open systems. Byte 10(4), 223-241. [13] Holzmann, G. J. (1991). Design and Validation of Computer Protocols. Prentice-Hall, Englewood Cliffs, NJ. [14] Hwang, K. and F.A. Briggs (1984). Computer Architecture and Parallel Processing. McGrawHill, New York. [15] Joshi, A. K. (1993). Natural language processing. In: Handbook of Statistics, this volume. [16] Kalyanasundaram, B. (1993). Design and analysis of algorithms. In: Handbook of Statistics, this volume. [17] Kant, K. (1993). Steady state analysis of stochastic systems. In: Handbook of Statistics, this volume, Chapter 2. [18] Krishnamurthy, R. and B. Narahari (1993). Parallel computer architectures. In: Handbook of Statistics, this volume. [19] Maekawa, M., A. E. Oldehoeft and R. R. Oldehoeft (1987). Operating Systems, Advanced Concepts. Benjamin/Cummings, Menlo Park, CA.
168
S. Purushothaman and J. Seaman
[20] Meyer, A. R. and S. S. Cosmodakis (1988). Semantical paradigms: Notes for an invited lecture. In: Y. Gurevich, ed., Logic in Computer Science, IEEE Press, New York, 236-255. [21] McCarthy, J. (1978). History of LISP. ACM SIGPLAN Notices 13(8). Also in R. L. Wexelblat, ed., History of Programming Languages, Academic Press, New York, 173-197. [22] McDermott, D. (1982). A temporal logic for reasoning about processes and plans. Cognitive Sci. 6(2), 101-155. [23] Naur, P. (1960). Report on the algorithmic language Alsol60. Comm. ACM 3(5), 299-314. [24] Patterson, D. A. (1985). Reduced instruction set computers. Comm. ACM 28(1), 8-21. [25] Scott, D. S. (1970). Outline of a mathematical theory of computation. In: Proc. 4th Ann. Princeton Conf. on Information Sciences and Systems. Princeton University Press, Princeton, NJ, 169-176. [26] Silberschatz, A. and J. L. Peterson (1988). Operating System Concepts. Addison-Wesley, Reading, MA. [27] Steele, G. (1980). The definition and implementation of a computer language based on constraints. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA. [28] Tannenbaum, A. S. (1988). Computer Networks. 2nd ed., Prentice-Hall, Englewood Cliffs, NJ.