systems
c standardization ANSI streamlinesthe language by CHRIS MILLER
T
he language c was developed by Dennis Ritchie as part of the evolution of the Unix operating system in the mid-1970s. c has itself evolved since its first implementation, and there is as yet no definitive standard for the language. The American National Standards Institute, (ANSI) has drafted a standard; this article describes its principal features and the probable effects of standardization on users and implementers of C.
Current definition of c The most widely of c is contained Programming
accepted definition in the book The c Language by Kernighan
Abstract: A draft standard for the c programming language has beenproduced by the American National Standards Institute. The ANSI subcommitteeis concerned with standardizing the language as it exists, rather than changing it. The introductionof standards will make the writing of portable code easier, although not all programmers will limit themselvesto standard code. Adjustments will have to be made to compilers. Keywords: data processing, programming languages, C, standards
Chris Miller is a lecturer in computer science at Heriot-Watt University and a director of the software consultancy Soft Mill Ltd. He is coauthor of a book on Unix, Unix for Users,’ and author of a forthcoming book on C, c for
Software?.
~0128 no 7
September
1986
0011-684X/861070369-05$03.00
0
1986 Butterworth
and Ritchie3, widely referred to simply as ‘K&R’. However, this definition leaves several ‘grey’ areas where the exact semantics of the language are ambiguous or unclear, and inevitably different implementers have chosen varying interpretations of these. Furthermore, there are certain features which have been added with slight variations to most c implementations since K&R was published; these include structure assignment, the ‘void’ pseudo-type, and enumeration types. An excellent manual for the de facto c ‘standard’ of two years ago is A c Reference mamu&; this presents a far more precise specification than K&R, and describes most common variants of the language. c is available for a very large range of processors and environments, and is widely used as an implementation language for portable applications; the omni-availability of Unix owes much to the portability of its code, almost entirely written in C. However, the existence of variants creates many pitfalls; to write truly portable code requires extensive experience of what can and cannot be assumed of different compilers, and to port existing code can require substantial effort. It is clearly time that a reference standard be introduced; in recognition of this, the ANSI Association set up a sub-committee (X3Jll) in 1982 to consider c language standardization. This committee has produced several draft proposals; the draft standard is now almost stable, and it
& Co (Publishers)
Ltd.
369
is reasonable to expect that something very close to the most recent draft’ will be adopted within the next year as the ANSI standard. Whether or not the International Standards Organization (ISO) officially adopts the ANSI standard, it is inevitable that ‘ANSI standard C' will achieve wide international acceptance. It should perhaps be made clear that the group considering a c standard is quite separate from various groups currently trying to standardize Unix, although there is continuing discussion among them. A Unix standard is unlikely to emerge for some time, and there is a serious risk that conflicting standards may be issued by the different committees; a c standard is closer at hand, and will amost certainly be the c standard. Goals of standardization The X3Jll committee is concerned primarily with standardizing the c language as it exists, and not with improving or altering the present de facto language definitions. The intention is that almost all ‘reasonable’ existing programs should be legal and retain their current meaning under the new definition. Hence the committee has taken a very conservative attitude towards language innovation. The few new features are mostly compatible with existing versions, and are generally aimed at making it easier to write portable code. The main points of difficulty have arisen in the ‘grey’ areas mentioned above. In each such case, the standard must either prescribe a particular interpretation, and thus render programs conforming to an alternative interpretation illegal, or must leave the interpretation ‘implementationdependent’, and thus leave problems of portability unresolved. Where the latter decision has been taken (as, for example, in deciding whether ‘char’ variables are considered as signed or unsigned integers) a new feature has occasionally been introduced to allow
370
the conscientious programmer to be precise (e.g. the ‘signed char’ and ‘u~i~d char’ data types).
can be written. This is compatible with K&R c and is already widely implemented.
Proposed standard
Macro definitions. An area of dis-
are prone to view crepancy among existing implementaK&R with great reverence. The tions has been the treatment of the body of macros. Common coding language defined there will therefore tricks which work only on some be the starting point in this paper, with descriptions of the principal compilers are: changes and extensions proposed by #deiine concat(a,b) ai**lb the X3Jll committee; as mentioned or previously, some of these have already #define copy(a) a concat(a,b) copy(a)c gained widespread use, while others #define are recognized as legal by few existing where the intention is to concatenate compilers and implemented by fewer. the two macro arguments into a single The proposals can be considered token; some preprocessors will insert under three headings: the pre-proces- a space between the parameters or sor, the c language itself, and the leave an empty comment there, so runtime support library. that ‘concar(Aleph,Null)” might expand to ‘AlephNulE’, ‘Aleph Null’ or The Pre-processor ‘Alephl**lNull’. These tricks are invalid under the ANSI proposal. A Critical c~~ilati~. The ‘#if. . . new token concatenation operator #e&t? . . . #em&f ’ construct has been ‘##’ is introduced, so that the above extended to ‘#if. , . #eEif . . . #elif can be written: . . . #else . . . #endif ’ to allow easier specification of a chain of alternate #define concat(a,b) a ## b conditional compilations, for Another unportable trick is the reexample: placement of macro arguments that #if DEBUG>1 appear within strings in the replace/*Verbose debugging code *i ment body, for example: #~D~UG == 1 C programmers
PTerse debugging code *i #else /*No debugging code *I #endif
#define fibfiie(file) “/usr/locaYlib/fiie”
where ‘lib$le(errors)’ should expand to ’ “lusrllocallliblerrors” ‘.
This feature is available in a few existing compilers, and is a compatible extension to K&R C. The operator ‘defined’ is allowed in ‘#if’ conditions, so that: #if defined DEBUG is equivalent to #ifdef DEBUG
and compound conditions such as in Figure 1:
Under the ANSI proposals this example can be handled by using another new operator, ‘#‘, which encloses the following token in string quotes, together with a new rule that adjacent string constants are concatenated: #define libfile(file)“/usr/local/lib/” #file
These changes will affect a small but not negligible fraction of existing programs; the conversion effort to change such a program to the ANSI standard will be very small.
#if (defined (DEBUG) && (DEBUG P3)) j 1defined (VERBOSE)
Figure I. Compound conditions
data processing
Predefined
symbols. One important restriction is that the draft standard explicitly forbids any macro names to be predefined, other than ‘-LINE__ for the current source line number ‘-FILE_’ for the current file. It is currently standard practice for compilers to predefine one or more macros specifying environmental features such as target operating system, compiler version and target hardware.
Data types. Most of the extensions to
the C language are concerned with specification of data types. The (already widely implemented) pseudo-type ‘void’ is introduced. There are two intended uses: a function that returns no result should be declared as ‘void’, e.g.: void errmess char
*m;
{ fprintf(stderr,
“error: %s\n”, m);
1
and a call to a function whose rest& is to be ignored should be explicitly cast to ‘void’, e.g.: if (skipnext) (void) getchar(); /*Skip over next input character *i
In addition, the type ‘(void *)’ is introduced as a ‘generic pointer’. If a value of any pointer type is cast to ‘(void *)’ and then back to the original type, the original value will be obtained. Many existing programs rely on ‘(chur *)’ exhibiting such behaviour; that assumption is not portable, and can cause great problems in moving programs to certain machines with, for example, tagged architectures. The ‘(void *)’ type makes the use of generic pointers portable, and incidentally, will finally enable programmers to gag the infuriating ‘possible pointer alignment problem’ warning messages generated by the type-checker ‘lint’ whenever a space allocation function such as ‘malloc(J’ is used.
vol.28 no 7
september
1986
volatile int *ptr.to.vol; i* Pointer to volatile integer * int *volatile vol.ptr.to.int; /* Volatile pointer to integer */ volatile int *volatile vol.ptr.to.vol; i* Voiatile pointer to volatiie integer *i
Figure 2. Use of ‘Volatile’
Three new type qualifiers are introduced: ‘signed’, ‘volatile’ and ‘const’. Any integer type can be declared as being either ‘signed’ or ‘unsigned’; as in all existing versions of C, the default for ‘short, ‘int’ and ‘long’ is ‘signed’; implementations differ in how variables of type ‘char’ are treated. The draft standard expressly leaves the default treatment of ‘char’ up to the implementer, but allows the programmer explicitly to specify ‘signed char’ or ‘unsized char’ when necessary. The intention of declaring a variable ‘volatile’ is to warn the compiler that its contents may change in ways unknown to it. This invalidates optimizations based on the assumption that a value stored there will still be there when next the variable is used. A common use of ‘volatile’ would be to refer to control or status registers for I/O-mapped devices in an operating system or embedded application. The syntax follows C’S usual twisted logic, as shown in Figure 2. The ‘const’ declaration introduces an object whose contents may not be altered by the c program. This can replace many uses of ‘#define’, with the additional feature that it is legal to take the address of a ‘CO& object.
Note that ‘consI) objects need not be true constants, as in: const volatile int *const clock.address = CLGCK.REGISTER;
which declares ‘clock_address’ as an unchangeable pointer to- an integer that may not be changed by the program but may change spontaneously; ‘clock_address’ itself is (presumably) a true constant, but (*clock_address~ is simply a read-only location. This might be an appropriate declaration for a device status register. Enumeration types. Enumeration
lists (‘enurn’) are permitted, and are treated as collections of integer constants; hence variables of an enumeration type may legally appear as ‘case’ labels, array subscripts, and anywhere else that a constant integer value is valid (except in ‘#if ’ expressions). The form of a declaration is shown in Figure 3.
Stnrctures, arrays and unions. Struc-
tures may be assigned to other structures of the same type, passed as parameters, and returned as function results; many but not all compilers already allow this. All variables can be initialized,
typedef enum
i red, orange, yeliow, green, blue, indigo, violet } spectrum; i* red = 0, orange = 1, etc. li typedef enurn { HorTab = 5, VerTab = 11, FormFeed = 12, CarriageReturn = 13, Newhne = 21, Space = 64 } EBCDIC.whitespace;
Figure 3. Form
ofdeclaration using ‘enum’
371
including local variables of structure, union and array types (illegal in many current compilers). Initialization of a union is defined as initialization of the first component.
is equivalent to the form acceptable to ail current compilers:
‘extem’
)
declarations.
present rather odd semantics of ‘extem’ definitions are made explicit: any number of ‘extem’ declarations of an identifier may appear as long as they agree in type; at most one of these may be explicitly initialized. All such declarations will refer to the same object. Function
The
prototypes.
On machines where the representations of pointers and integers differ, a common source of error is failure to cast function arguments correctly, e.g.: i* rime0 requires one parameter of type (long *) *I long time-now, time 0; time-now = time(O); /*ERROR! */ i* Should be “time((long *) O);*!
The ANSI draft proposes an extension of external function declarations whereby the expected argument types can be declared in a function ‘prototype’. The above example could now be written: long time-now, time(long *); time-now = time(O); /* The compiler knows to cast the parameter to (long *) *!
The forms: int int int
nextnumber(void); fprintf(FILE *, char *, . . .); blackbox();
declare respectively a function with no parameters, a function with an unknown or variable number of parameters, the types of the first two of which are as specified, and a function with parameters about which nothing is specified. A similar syntax can be used in function definitions: int i
372
minimum (int a, int b) return a c b ? a : b;
int minimum (a, b) int a, b;
1
return a c b ? a : b;
The latter form is ‘deprecated’ by the draft standard. Run-time support library A compiler may conform to the language standard as a free-standing implementation without providing any run-time library at all. However, in practice most compilers will provide a hosted implementation, incorporating a standard library for tasks such as I/O, mathematical functions and string manipulation. The library specified in the draft standard is closely similar to a subset of Unix libraries, omitting Unixspecific functions. ‘High-EeveE’ I/O functions such as ‘printf()’ ~sca~f~~’ ‘fread(J’ and ‘fseeki)’ are included; ‘fopen()’ and ‘fclose()’ are omitted from the February draft, but this is apparently a clerical error and they are intended to be part of the standard. The string-handling functions defined correspond largely to those of Unix System V rather than those of, say, Berkeley Unix systems. The mathematical functions are already almost standard, as are the character ‘isazpka~~‘, classification macros ‘isdigit()‘, etc. The library includes the widelyimplemented ‘varargs’ macros (in the header ‘mdargs.h)‘) for defining functions that take variable parameter lists. The Unix-based functions ‘kill(‘)’ and ‘signal(j’ for the handling of asynchronous interrupts are also included. However, an implementation need not support any form of interprocess communication; it is only obliged to allow a program to send a signal to itself. A valuable feature for those concerned with complete portability is
that all implementations (not just hosted ones) must supply two standard headers, ‘(~i~its.k}’ and ‘(~oat.k}’ defining implementation-dependent limitations such as the size of the largest positive integer, the largest unsigned long integer or the smallest positive moating-point number. In most cases minimum limits are set on these, so that all implementations must, for example, guarantee at least 8 bits for a ‘char’ and at least 16 bits for a ‘int’. Implications for C implementations The eventual standard will be close enough to existing implementations that the work of converting compilers to support the changes should be small. Almost all implementations already support most of the library functions required, although not necessarily with the same names as in the draft standard. In practice, most implementers will probably ignore the prohibition on predefined macro names; too much existing code contains sections compiled conditionally upon the operating system or hardware that the program is to run on. It might be preferable for the standard to restrict the form of predefined macro identifiers, or to specify a standard header file to contain predefinitions, or to specify that the programmer should be given the option of excluding all such definitions . Once a standard is agreed, there will need to be a formal validation procedure for conforming implementations. It is not clear how freestanding implementations can be validated at all, since no standard for their I/O is specified. Once such a validation procedure exists there will certainly be strong market pressure on implementers to supply validated compilers and it is likely that several such compilers will be available for most common systems, in particular Unix and MS/DOS, within a few
data processing
months approved.
of
the
standard
being
Implications for the programmer The standard is being designed to codify as far as possible existing C implementations. Hence, most existing code should work on standardconforming compilers without modification. The converse is not true. Standard-conforming code may be rejected by most existing compilers; for example, function prototypes are understood ‘only by a minority of current compilers. The exceptions will for the most part be programs that make assumptions about the validity of certain forms of type-punning, e.g. that all pointer types share the same represenand programs that make tation, assumptions specific to particular environments, (e.g. character set, word length, integer representation) or use extended or system-specific libraries. All of these constitute impediments
~0128 no 7
September
1986
to portability already, so that the problem of converting to conformity with the standard should be no worse than converting to any other c environment; in practice most vendors will no doubt continue to support their existing libraries and as far as possible to provide backward compatibility. The greatest problem in practice is likely to be ensuring that new code is written to conform with the standard; it is almost inevitable that implementations will provide mutually incompatible extensions, and few programmers will take the trouble to restrict themselves to pure standard-conforming code, or indeed to find out what the standard is. Of course, there are occasions when such restriction is in any case inappropriate, such as in low-level very hardware-specific routines, but extensions will often be used for convenience and ease of development at the cost of portability. The introduction of a standard will at least make it easier to write portable
code when this is an important objective, and as such should be welcomed by most c programmers.
References Miller, C D F, and Boyle, R D UNIXfir Users Blackwell, Oxford, UK (1984). Miller, C D F, c for Software Blackwe!, Oxford, UK (forthcoming) Kemighan, B W, and Ritchie, D M The c Programming Language Prentice-Hall, Englewood Cliffs NJ, USA (1978) Harbison, S I’, and Steele Jr, G L A c Reference Matiual PrenticeHall, Englewood Cliffs NJ, USA (1984) X3Jll Committee Druft Proposed ANSI Standard for Information Systems - Programming Language C American National Standards Committee, Washington, USA (February 1986) 0 Heriot-Watt University, Department of Computer Science, 79 Grassmarket, Edinburgh EHl ZHJ, UK.
373