Technical Note
A Method of Translating Binary Punched Hollerith
1.
Cards to Computer Data Files
INTRODUCTION
In the early 195Os, under the sponsorship of the Medical Research Council, Sir Dugald Baird and colleagues in Aberdeen initiated a maternity record system oriented towards clinical and epidemiological research. Data were extracted from case notes and transcribed to Cope-Chat cards. Subsequently the data were coded and punched on to Hollerith cards. This system has continued, with only slight modifications, until the present day. There is now a unique collection of research data which details the socio-demographic characteristics and obstetric performance of two generations of Aberdeen mothers. In recent years, however, the coding frame employed and the limitations of card sorting techniques have rendered the material unsuitable for many research projects. Consequently it was decided to investigate the possibility of constructing a computerised data bank to contain these data in a form more appropriate to current and future research needs and interests. This Note details the methods and techniques devised to transcribe some 80,000 binary punched Hollerith cards to magnetic tape files.
2.
BINARY
PUNCHED
HOLLERITH
CARDS
The central problem relating to the original Hollerith card files was the use of binary punching-otherwise known as multiple punching or overcoding. This practice is quite acceptable when using cards on sorting and collating equipment. Indeed it offers the possibility of concentrating a substantial amount of data on to one Hollerith card. However, such cards cannot be used as data for a computer program in the normal way. Computer data must be punched so that each card column represents only one alpha-numeric character (i.e. according to one of the Standard code sets of which EBCDlC is the most common). In conventional mode ‘a computer card reader both reads and translates the punch combination on each card column, and raises an error condition when a non-standard combination is encountered. The Aberdeen data cards would provoke many such errors. Nevertheless, it has proved possible to use these cards by reading them in direct (binary) mode. This necessitates the use of a small ‘machine code’ program which transcribes the punch pattern, column by column, to an eighty-character record, which is then written to tape. (The writer is indebted to staff of the University of Aberdeen Computing Centre who provided such a program for his use.) This intermediate tape file contains ‘binary card images’. 61 Int. J. Bio-Medical Computing (6) (1975)~_O Applied Science Publishers Ltd, 1975 Printed in Great Britain
M. L. SAMPHIER
62 3.
PROGRAM
CHARACTERISTICS
The program developed to translate the ‘binary card images’ is designed to be of general application-i.e. it will accept any data format and incorporates a range of options for recoding and manipulating the data during the transcription process. The technique employed for transcribing the data is one of extracting each data element from pre-defined data fields-i.e. sets of punch sites (see Section 6 below)and subsequently reconstructing the data record in an appropriate format. Both economy and simplicity are achieved by employing a single binary punched ‘map card’ to define the location and extent of each data field to be scanned. This ‘map card’ is placed at the front of the deck of the binary cards to be translated, and is subsequently the first record in the ‘binary card image’,file. Parameters submitted to the program at run-time control the translation process. The facilities provided are as follows: (i) Allocation of an integer ‘code’ to each punch site on the data cards. (ii) Extraction of data from ‘split fields’. (See Section 5, below.) (iii) Reconstruction of ‘numeric’ data of multiple digit format. (See Section 5, below.) (iv) Allocation of an integer code to ‘blank’ data fields. (v) Re-ordering of the sequence of data items. (vi) Relegation of redundant data items. (vii) Choice of output record formats.
4.
TRANSLATION
PROCESS
The eighty-character ‘binary code image’ is decoded and the appropriate punching pattern of the original card is reproduced within the computer in a 960 element matrix (i.e. 80 x 12 punch sites). Each data field, as defined on the ‘map card’, is then scanned. If a ‘punch’ is encountered within a field the corresponding numeric code value is allocated. An intermediate record is thus generated containing a numeric code or ‘blank’ for each data field. This record is then re-composed according to the control parameters submitted, and written to a tape file in the appropriate format.
5.
DEFINITION
OF DATA FIELDS
The program is designed to interpret three types of data fields: (i) Binary Fields. Where a single punch site is used to denote the presence or absence of a characteristic, etc. (i.e. a punch/no-punch dichotomy).
DATA FILES
63
(ii) Scalar Fields. Where a series of punch sites are used to code the several mutually exclusive points of a scale. (iii) Nominal Fields. Where a series of punch sites are used to code a set of characteristics or attributes. By definition a data field may contain only one ‘punch’. (The program scans a ‘data field’ until a ‘punch’ is located. Any subsequent ‘punches’ within that field are ignored.) Binary and scalar fields meet this requirement without question, but some nominal fields may have to be treated as a group of binary fields where multiple punching has been practiced. Again, by definition, a data field may only extend over a continuous sequence of punch sites. For the purposes of this program the Hollerith card is taken to be a sequence of 960 sites reading from left to right, column by column. A field may extend over more than one card column, as long as the site sequence is unbroken. Where the sequence is divided into more than one section, each of these is defined as a separate data field. The several sections form a ‘split field’ which is re-formed by the program after the relevant data have been extracted. ‘Numeric data’ is defined as one or more scalar fields, corresponding to the maximum number of digits encountered (e.g. numbers in the range I-5000 are treated as four fields). After extraction these are re-grouped into a single data value. The location and extent of all data fields to be scanned by the program are encoded on to a single ‘map card’. This is achieved by ‘punching’ the first site in each data field sequence. (Binary fields constitute a sequence of one site.) Unused sections of the card are similarly defined as dummy fields.
6.
TECHNICAL DETAILS
The program was developed using the Aberdeen University ICL System 4 computer, and comprises some four hundred Fortran Statements. To date it has been used successfully to transcribe more than 150,000 binary cards, from a number of different data files. (The program has also been used by members of another department in the University Medical School.)
M. L.
SAMPHIER
M.R.C. Medical Sociology Unit, Centre for Social Studies, Westburn Road, Aberdeen, AB9 2ZE, Scotland (Great Britain).