Carcinogenesis bioassay data system

Carcinogenesis bioassay data system

COMPUTERS AND BIOMEDICAL 7,230-S% (1974) RESEARCH Carcinogenesis Bioassay Data System MARY S. LINHART AND JOHN CARPER National Cancer Institu...

1MB Sizes 6 Downloads 97 Views

COMPUTERS

AND

BIOMEDICAL

7,230-S% (1974)

RESEARCH

Carcinogenesis

Bioassay

Data

System

MARY S. LINHART AND JOHN CARPER National Cancer Institute* ROBERT L. MARTIN Division of Computer Research and Technology* NORBERT PAGE AND JAMES PETERS National Cancer Institute* Received August 28,1973 The Carcinogenesis Bioassay Data System (CBDS) provides for the collection, maintenance, and reporting of bioassay information. CBDS was developed for the Carcinogenesis Area of the Division of Cancer Cause and Prevention of the National Cancer Institute. System design and programming were provided by the Data Management Branch of the Division of Computer Research and Technology at NIH. Bioassay investigators provide input data to CBDS. Background data is collected about the chemicals and chemical preparations, the experimental environment, the animal colonies, and groups. Throughout the experiments, observation data is gathered. A complete pathology report is submitted following the death of each individual animal. This data is essential when evaluating the carcinogenicity of the substances under test. Computer programs edit and maintain the data base and prepare reports. CBDS reports are used by NC1 in contract administration and both investigators and NC1 use the output to assist in evaluation of the bioassays. I. INTRODUCTION

TO CBDS

A. Background 1. National Cancer Institute The Carcinogenesis Area of the Division of Cancer Cause and Prevention is responsible for research on the causes and prevention of cancer produced by chemical and physical agents. Five program approaches exist to achieve this goal : (1) identification and characterization of population groups which may have a higher level of risk than the general population for the development of a variety of specific cancers; (2) identification of carcinogenic activity of selected agents by bioassays; (3) development and selection of improved biological models for the characterimtion of carcinogenic processes to improve the degree of correlation between animal studies and the human situation, and to develop more effective bioassay systems; (4) identification as target points for corrective measures in man of the specific * National Institutes of Health, Bethesda, Md., 20014. Copyright 0 1974 by Academic Press, Inc. All rights of reproduction in any form reserved. Printed in Great Britain

230

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

231

metabolic processes required for the expression of carcinogenic effects of selected agents; and (5) development, application, and evaluation of corrective measures for man and his environment. Carcinogenesis research activity is conducted both intramurally, at the National Cancer Institute and extramurally, by contractors. The contract-supported collaborative program was developed to respond to the need for bioassays of potential environmental carcinogens and for fundamental research programs on the mechanisms of carcinogenesis. The collaborative program is presently organized into nine Segments. They are: Bioassay Operations, Biological Models, Biology and Immunology, Carcinogen Metabolism and Toxicology, Chemistry and Molecular Carcinogenesis, Colon Cancer, Information and Resources, Lung Cancer, and Tobacco Research. The Bioassay Operations and the Information and Resources Segments are administered primarily by the Carcinogen Bioassay and Program Resources Branch. The Bioassay Operations program is responsible for the design and conduct of standardized bioassay tests to detect carcinogenic hazards of chemical and physical agents. This entails (1) identifying and selecting chemical and physical agents for bioassay; (2) acquiring, characterizing, and purifying these agents; (3) establishing logistical capabilities for testing; (4) identifying, developing, and selecting biological models for carcinogenesis bioassay including improved animal models and shortterm bioassay procedures; (5) identifying carcinogenic activity of selected agents by the appropriate bioassay tests; (6) monitoring testing performance; (7) developing a data bank to include results of bioassay testing and information on use and characteristics of the agents being tested; (8) analyzing and evaluating the test results; and (9) deciding on further action required on the tested agents. The Information and Resources Segment furnishes support for the entire program, and is responsible for development of information resources including a computerized system for collection, maintenance, and analysis of bioassay data. In addition, this segment supplies chemical and animal resources. 2. Origins

qf CBDS

In 1968, members of the Carcinogen Bioassay and Program Resources Branch (then known as the Program and Data Analysis Unit) began examining the information requirements of the bioassay portion of the carcinogenesis program. Bioassay tests measure the effects of a substance on a living system; in this case, the primary concern is the carcinogenic effects. Data concerning the substance under test (and any other substances involved in the test), characteristics of the animals, their environment, development, disposition, and condition at the termination of the experiment must be examined and correlated in order to evaluate the agent’s carcinogenic properties. The Carcinogen Bioassay and Program Resources Branch found that existing procedures would be inadequate as the program expanded. Contractors were pro-

232

LINHART

ET AL.

viding vast quantities of data. NCI’s program scientists were faced with thick stacks of forms and computer printouts in different formats. Even the types of information varied from contract to contract. Some investigators provided individual animal data; others did not; and in the latter case, judgements had to be made on summary information that was not statistically adequate. It was obvious that contract administration would be increasingly difficult without standardized reporting formats for bioassay studies. A computerized information system appeared to be the solution. Bioassay data would be uniform, centralized, and current. A variety of comparative analyses could be made, both within a contract and among contracts. Such analyses would be particularly helpful when several investigators tested a single agent or examined tumors which had been induced in a single strain under different environmental conditions. The contractors themselves who would bear the burden of responsibility for data in such a system would share in the benefits. Standard procedures would simplify their data collection process. Periodic reports would assist them in checking the progress of the research. Reports of completed studies would be useful when evaluating the bioassay’s findings. B. System Designand Programming

Such considerations led to the development of the Carcinogenesis Bioassay Data System (CBDS). The Carcinogen Bioassay and Program Resources Branch assumed responsibility for system design and implementation and requested assistance from the Data Management Branch of the Division of Computer Research and Technology at NIH. In a series of meetings, the two groups discussed the fundamental information needs of carcinogenesis bioassays, designed (and redesigned) input and output formats, and formulated processing requirements. Data Management Branch personnel then turned to system design and eventually submitted a schema for approval. Further discussions ironed out processing details and programmers began work on the Carcinogenesis Bioassay Data System. Initial efforts were directed to developing a file maintenance system and producing a single report to portray essential information about each bioassay. This was termed “Phase I”. Phase II, at that time, was a general term assigned to cover system development beyond the perimeter of Phase I; usually this meant other reports and further correlations and analyses of the data. (Phase I is completed and several reports have been developed for Phase II.) C. Data Collection 1. General

Data collection is a very important phase of the system and that most prone to human error. As in every system, valid results are only obtained by CBDS if the input

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

233

is complete and correct. The Carcinogen Bioassay and Program Resources Branch and the contractor share the task of assuring the validity and accuracy of input. 2. Carcinogen Bioassay and Program Resources Branch a. Znput data. The Carcinogen Bioassay and Program Resources Branch maintains a directory of chemicals on test in the Bioassay Program. As new chemicals are selected for testing, they are assigned identifying numbers and descriptive data about the chemical is entered into the System. Reports may be generated on the basis of chemical name or number, detailing information about studies using the requested chemical. Background information about each contract is also included in CBDS. This includes contractor name, identifying number, and a list of chemical numbers assigned to the particular bioassay. The Chemical List is used to verify correct use of chemical numbers by researchers. 6. Assistance to contractors. The Carcinogen Bioassay and Program Resources Branch furnishes the investigator with input data forms. The forms are on selfcarboning paper; the original is returned to NCI, the copy is retained by the investigators. All researchers receive Environment, Batch Chemical, Preparation, Colony, Animal Group, Group or Cage Observation, and Individual Animal Data Record forms. Contractors also receive guidelines specifying requirements and restrictions on input data. Furthermore, although the input forms were intended to be selfexplanatory, caution and experience have shown that it is the better part of wisdom to instruct the bioassay personnel in correct input procedures. A training program has been established to train contract personnel in data collection techniques. 3. Contract holders a. Input preparation. Investigators are responsible for the mass of input data and although NC1 provides guidelines, training, and advice; the researcher must establish and monitor data collection procedures at the test site. This should be accomplished in a systematic and orderly fashion with clear cut areas of responsibility. Typically the contractor plans and organizes his research procedures; first in a general fashion and then delineating in detail preparations, animals, and all other features of the study. Once this is established, perhaps even before the study actually begins, background information may be prepared for entry into the system. The investigator assigns identifying codes to the components of the experiment. These are used as record keys by CBDS. For each type of information, a unique identifier is designated. The keys may be any combination of letters and numbers. Thus a contractor may have Preparation 123 and Preparation ABC, Colony AA and Colony 1C. b. Batch Chemical Data Records. Information about the chemicals used in the test is essential; the contractor is responsible for reporting information about the particular substances, so-called Batch Chemical Data. This includes information about the

234

LINHART

ET AL.

manufacturer, whetherithere is a reserve supply, purity, the critical temperature range, and any special treatment of the substance by the contractor. c. Preparation Data Records. Investigators submit Preparation Data Record forms describing the preparations they will administer to the animals. This information includes the chemicals in each preparation, the concentration of each chemical, information about methods of preparation and storage, stability, and when appropriate, pH and particle size. d. Environmental Data Decords. In the initial phases of the assay, the researchei establishes the experimental conditions. These are reported as Environmental Data describing the feed, bedding, caging, lighting, water, and room air conditions. The living conditions and diet of the animals are relevant to the outcome of the experiment and this data should be available to resolve questions which might arise. e. Colony Data Records. The bulk ofthe data collected relates directly to the animals. Colony Data details the animal species, breeding background, and the source of the animals, i.e., whether bred by the investigator or supplied by an outside source. The contractor also reports on this form the disease control procedures he will apply to the colony. J Animal Group Data Records. Colonies are divided into animal groups. The animal group is the fundamental unit of study and individual animals are normally considered in terms of their group membership. The size of a group is determined by the investigator, and could range from a single animal to several hundred. Group members are from the same colony, they inhabit the same environment and receive the same treatment. The investigator reports a group’s size, sex, birth rate, any special handling, and the method by which the group was selected. Animal Groups may receive one or more preparations during an experiment. All members of a group are given identical preparations by the same method. As part of Animal Group Data, the investigator reports the Treatment Regimen. This includes the amount and method of administration for each preparation, the age ofthe animals when they receive the first dose, thefrequencyofadministration, and the duration of treatment. The role of a particular group is determined by the preparations it receives. Test groups receive the substance or agent under investigation. If the agent under test is administered in conjunction with another substance (the vehicle), this secondary substance alone will be given to the vehicle control group. Thus if test groups develop tumors, it may be reasonably established whether it was caused by the substance under test or the vehicle depending upon the tumor incidence in the vehicle group. Other groups may be given known carcinogens to verify that the animals are susceptible to tumor induction. These are known as positive control groups. Untreated control groups receive no special treatment. The effects of the experimental environment on the untreated groups gives an indication of the animals’ natural propensity for developing tumors of specific types which must be discountediwhen evaluating the effects of the agent on the test group.

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

235

A carcinogenesis bioassay usually includes several preparations and many animal groups. The suspected carcinogen may be tested several times with controlled variations in experimental conditions. The researcher may wish to vary the amounts of the test agent, or test by different routes of administration, or test the agent on both sexes. Within the scope of a bioassay, the contractor normally sets up a number of tests in which specified animal groups are related or “linked” by their roles in the test. Each experimental relationship usually has one or more test, vehicle, positive, and untreated groups. The groups which are “linked” are identical in so far as possible, e.g., same age, same sex, same genetic background; and are treated identically; again in so far as possible, in the same environment with the same method of administration of treatment. To evaluate the bioassay, the effects of the test on linked Animal Groups should be examined together. For groups which function as test groups the investigator reports the major compounds in the treatment, that is, the substance(s) under investigation and identifies the control groups treated in conjunction with the test group. This information about the experimental relationship of animal groups is fundamental to the structuring of CBDS reports since results for a single group are meaningless without the data about the related groups in a particular experiment. The researcher also discloses his plans for the routine examination of organs and tissues at the end of the study. g. Observation Data Records. At the onset of a bioassay, a contractor provides CBDS with background information about batch chemicals, preparations, environments, colonies, and animal groups. As the study proceeds, the investigator reports his periodic observations of the animal groups. This data is reported by cage or in feeding studies, by group. Observation Data includes a count of survivors to date, the total weight of the animals, counts of observable tumors, and for Group Observations, a measure of food consumption. II. Individual Animal Data Records. A substance, by definition, is carcinogenic if it is capable of inducing cancer. Therefore, information about tumors in animals is essential. The investigator cannot completely determine tumor development until he performs both a gross and histopathologic examination of an animal. This is reported on an Individual Animal Data Record. On this Record, the contractor provides information about the death of the animal, when it occurred and whether it was natural or otherwise since at the termination of some bioassays, survivors are sacrificed. The investigator also reports whether the animal underwent a necropsy and, if so, the general condition of the animal at that time. Finally a detailed report of the animal’s necropsy is required, itemizing all physical abnormalities, including tumors. This record is central to the entire system since data about the histopathology examination is essential to the evaluation of the carcinogenic effects of the agent under test. i. Summary. Obviously, bioassay investigators are responsible for providing a vast amount of input. In large studies, Individual Animal Data alone could require

236

LINHART

ET AL.

submission of thousands and thousands of items of information to the Carcinogenesis Bioassay Data System. It should be evident therefore that a computerized system is the best method for storing and correlating this data. Besides reporting the basic items of information considered to be fundamental to carcinogenesis bioassays, the researcher is free to report other information he considers pertinent. This data may be reported, ad lib, in an unstructured fashion. CBDS provides for storage of such comments on microfilm. At present, no computer correlation or analysis is envisioned. D. Preprocessing of Data

The investigator forwards completed forms to the Carcinogen Bioassay and Program Resources Branch to be processed for entry into CBDS. Input forms are checked for omissions and errors before being submitted for computer processing. Although the computer programs check for errors, it has been found that a visual scan of the forms by trained personnel results in substantial time savings. If an error is caught before entry into the data base, no correction need be entered (and the correction itself, like all input is a potential error). Batch Chemical and Individual Animal Data Records receive special handling before entering the computerized phase of the system. The U.S. Tariff Commission’s code for the manufacturer of the chemical is inserted in each Batch Chemical Record. Before Individual Animal Data Records are transcribed, diagnoses of tumors and other animal abnormalities are coded using SNOP codes. The coding system is described in The College of American Pathologists, Committee on Nomenclature and Classification of Diseases, “Systematized Nomenclature of Pathology,” Chicago: College of American Pathologists, 1965. One code is entered for the topography or site of the lesion, and another for the diagnosis. After coding and prechecking, the forms are ready to be translated into machine readable format. Keypunching is used since it is the most economical method of input for the large volume of data. As a further backup, the forms are microfilmed and stored on fiche. Each fiche contains a copy of the forms relating to a particular group. All forms are included except those Observation Data Records which do not bear comments. A fiche will be referred to when there is any question about the data in the CBDS files, when an individual interpreting a completed experiment wishes to look at comments entered on the forms or at the descriptions of lesions recorded by examining pathologists. Data entry, microfilming, and the day to day operation of CBDS, including submission ofjobs and error corrections are presently carried out under contractual arrangement with a local firm. After conversion to computer format, an Input Procedure stores data in 80 character card images on a system input tape and concurrently on a history tape. Data may be continuously added to these tapes until time to run the CBDS Update Procedure.

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

237

E. CBDS Update Procedure I. General

When sufficient input data is on tape, the Update Procedure of CBDS is run. The Update normally runs on an IBM computer at the Division of Computer Research and Technology at NIH. Actual computer time for the Update Procedure on a 370/165 for 2000 transactions has been approximately 44 seconds. This may be expected to increase with the expansion of the Master Files. The CBDS Update Procedure performs two major tasks; editing and file maintenance. In editing, the computer programs check for errors and inconsistencies in data. File maintenance procedures provide for insertion, deletion, and modification of the information in the CBDS Master Files. The Update Procedure also includes a Checkpoint Feature which monitors the job. The CBDS Update Procedure receives input from three sources. Records are entered for new chemicals not previously involved in carcinogenesis studies. Corrections for records which failed initial edit checks in the previous run of the system make up a second input file. The bulk of input is on the third file. This contains transactions derived from the Data Record forms submitted by bioassay investigators which will result in new entries in the system’s files; and modifications, deletions, or insertions necessary to update existing file data. 2. Chemical Input

The CBDS Update task begins with the Chemical Input job steps. If new chemicals have been added to the off-line Chemical File, the Chemical Input Program builds transactions to be inserted in the System Master File. 3. Edit Pre-Sort

The first major step in CBDS update processing is the Edit Pre-Sort which sorts all input. The primary sort is on the identifying keys. A secondary sort is made on card code so that within a particular key, all transactions will be sequential; e.g., the six card images from an Environmental Data Record form will be in order. A further sort is made on the observation date of Group and Cage Observations so they may be entered sequentially on the Support Master File. The final sort arranges deletions, insertions, and modifications in proper sequence. In conjunction with sorting, the Edit Presort verifies Contract Numbers on bioassay Data Records and checks references to NC1 chemical numbers to be sure the numbers were assigned to the particular contract. 4. Edit-Router

The Edit-Router Program is the next major job step. This is the largest program in the Update Procedure. Each transaction is checked for errors. There are several general categories of edit checks. Keys are checked to determine the type of record. Duplicate records are prohibited. Format checks made by a 360 Assembly Language

238

LINHART

ET AL.

subroutine guarantee that numeric and alphabetic fields contain appropriate data. Required items must not be blank and range and legitimacy checks insure acceptable codes. Dates are checked for legitimate values. Items which are related are verified for compatibility. The Edit-Router program also reformats some of the input. Colony, Animal Group, and Preparation Data Records forms are modified for use by the subsequent Update programs. The first card image of a set is coded with an “I” to indicate an insertion and subsequent card images with “M”s to denote modification of the record which is created. In addition, the location of the new data in the master records is inserted in the transactions. The Edit-Router creates seven output data sets. Four or these contain the valid Update transactions for the Animal Group, Preparation-Chemical-Tumor, Link, and Support Updates. A fifth output file is a History File with replicas of all edited transactions. The sixth output file is an Error Report describing the items in error and the type of mistake, and printing the incorrect records. The rejected records themselves are placed on the Error File. Error correction is a shared responsibility of the Carcinogen Bioassay and Program Resources Branch and the bioassay contractor. The Error Report from the Edit-Router is examined to determine the nature of the problem. Some errors are relatively simple to correct; these may be generally classified as typographical errors where reference to the original Data Record indicates the correct information. The corrective process is more difficult when the Data Record form was in error. In those cases, the investigator must be contacted. The error correction problems are compounded by the fact that researchers are all over the United States and discussions of problems are frequently carried out over long distances. Resolving a single error may involve a series of telephone calls over a period of days. Corrections are submitted as input in the next run of the CBDS Update Procedure. They again go through the edit checks and, if acceptable, are passed on to the appropriate Update. 5. PCT Update Information in the Carcinogenesis Bioassay Data System is maintained on four Master Files. The Preparation-Chemical-Tumor File is a sequential tape file containing Preparation Records created from Preparation Data Record forms completed by the bioassay investigators. These are arranged in order by Contract and by Preparation Number. Chemical records contain information about chemicals involved in carcinogenesis investigations, they are generated using data from the Chemical File. A “Tumor Record” is on file for each SNOP code and contains the verbal equivalent of the code. (“Tumor Record” is actually a misnomer since such records exist for all SNOP terms, not just tumors. The name was assigned when it was intended to carry only tumor records and has been so widely used that it is difficult to change now.)

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

239

The Preparation-Chemical-Tumor Update Program prevents invalid data from entering the file. Error messages must be checked by the user who corrects them by sending in modifications in a subsequent run. 6. Animal Group Update

The Animal Group File contains three record types: Contract, Colony, and Animal Group. Records are sequenced by Contract, Colony, and Animal Group numbers. Contract Records contain the name of the contractor. They are introduced into the system by Carcinogen Bioassay and Program Resources Branch personnel. Colony and Animal Group Records contain the information from the associated Data Record forms completed by the bioassay investigator. Animal Group Records also contain a Tumor Table with a list of the tumors incurred most frequently by animals in that group. The CBDS Animal Group Update makes edit checks. It is possible to modify particular items in a Master File record using Update Transactions and these changes might generate errors. For example, a bioassay investigator may decide during the course of an investigation to add a Treatment Regimen for an Animal Group. The investigator should then notify NCI, and the individual responsible for input preparation would prepare a modification designating the area in the Animal Group Record to be changed and the New Data. The Edit-Router does not edit the modifications. The altered Animal Group Master Record is edited by the Update. Invalid information is rejected and error messages report any problems. 7. Standard Update

Records on the Animal Group and the Preparation-Chemical-Tumor (PCT) Files are maintained with the Standard Update Program developed by the Data Management Branch of the Division of Computer Research and Technology. The Standard Update is a 360 Assembly Language program which updates a master file with deletions, insertions, and modifications. The Standard Update makes basic edit checks on the transactions. Errors exist when deletions and modifications find no matching records, and when insertions are entered for record keys already on file. Invalid data causes transactions to be rejected; and transactions are also rejected if they do not have a legitimate transaction type. The Standard Update performs the basic functions of any update job and has been adapted to fit the file maintenance needs of many systems. It is easily comprehended by nontechnical users who find transactions simple to prepare. Furthermore, it allows users to incorporate their own program logic to perform special processing. The Standard Update will execute user subroutines after each transaction is processed before writing an updated master record to the output file. The user is provided with the old and new master records and may make edit checks on the new record, rejecting invalid data. The user routine should include messages to report errors and any corrective actions.

240

LINHART

ET AL.

8. Link Update The records on the Animal Group and PCT Files were easily derived from input data and have relatively simple data formats. The Link File is more complicated. The Link File is an indexed sequential (ISAM) file on disk. Link Master Records exist for each experimental relationship established by an investigator for testing a particular agent or group of agents. The Link Master Record is essentially a control record containing pointers to the keys of the vehicle, positive, and untreated animal groups which are to be considered together with a particular test group. The Link File also has other control records; Agent Controls, Contract Controls, and Link Controls. Agent Control Records contain the contract number of all bioassays testing a particular substance. When these are used with Link Master Records, reports may be generated for specified agents. Conversely, using Contract Control Records, reports may be generated for designated contracts. Using Link Control Records, selection criteria may be even more specific and request data for a particular agent within a particular contract. The Link Update is a COBOL program and includes the normal update logic for deletions, insertions, and modifications. The input to the Link Update is a Link Transaction file and the Link Master. Most Link Transactions are generated by the Edit-Router from the Animal Group Data Record when a group’s role is tested. A transaction contains contract and agent numbers, the animal groups involved in the experimental relationship, and the proposed necropsy plan. This type of record becomes a Link Master Record. It also may cause the Link Update to generate Agent, Contract, and Link Controls if none exist for the particular numbers. If the associated controls have already been created for the Contract-Agent set, they are updated. User initiated deletions and modifications can be used on Link Master Records. They may not be used on the Control Records which are generated and maintained by the computer. A unique transaction changes the agent of a Link Master. Since the agent number is part of the key, such a change actually triggers program routines to simulate deletion of records for the old agent and insertion of records for the new agent. 9. Support Update The Support File is a sequential tape file containing Batch Chemical, Environment, Cage and Group Observations, and Individual Animal Data Records. These are 80 character card images, exactly the same records created from the associated Data Record forms. Data on the Support File records are sequenced by key and card code, and Observation Records are further sorted on observation date. The Support Update checks for valid data and provides for deletions and replacement of records on the Support File. In processing Individual Animal Records, the Support Update also modifies associated Animal Group Records on the Animal Group File. To accomplish this, the Animal Group File is converted to an ISAM

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

241

file so affected Group Records may be accessed without processing the entire file. Upon receipt of an Individual Animal Record, the associated Animal Group Master Record is retrieved and Number Survived in the Animal Group Record is reduced by one. Number Necropsied and Number of Tumor Bearing Animals may be increased. An Animal Group Master Record also contains a summarized record of tumors which have affected individuals within the group. If an Individual Animal Record shows tumor pathology, the Support Update inserts the SNOP code for the tumor in the Tumor Table, The number of tumors and the number of animals having that tumor are increased. If other animals have the same tumor, the related count fields are again increased. The Support Update also makes adjustments if for some reason, (e.g., error correction) the data for an individual animal is changed. After the Support Update is finished, the Animal Group Master File is restored to a sequential file. IO. Checkpoint Feature

The Update Procedure involves so many programs and files, that the run procedure itself is extremely complex, To minimize the degree of systems understanding required, another series of programs allows nontechnical users to initiate update processing. This is the Checkpoint Feature of the Update. Checkpoint programs maintain an indicator that is changed after each step in the Update task. If a failure occurs in a particular job step, when the Update is rerun, the Checkpoint Indicator is used to prevent reprocessing of successful steps. (Naturally, the reason for the failure must be corrected prior to the rerun.) The Checkpoint Feature also improves system efficiency. A Checkpoint program determines if there is to be chemical input. If not, the job steps which prepare that input are bypassed. In addition, the Checkpoint Feature bypasses updating any Master File when there is no input. The Checkpoint Feature allows operation of the Update Procedure with a minimum of human interference, thus reducing the possibility of errors. A set of control cards using IBM Job Control Language is used to run the CBDS Update on the computer. Instructions on these cards request the CBDS Update and Checkpoint programs and describe the files involved in processing. File descriptions may include the identifying number of the tape or disk. For the Master Files, these numbers in the Job Control Language must be changed after each processing operation. Current output master will be the input master in the next update and a new tape reel will become the output master. The Checkpoint Feature automatically adjusts the Master File tape numbers in the Job Control Language after successful completion of the Update. Checkpoint also reorganizes the ISAM Link File when it is necessary. F. File Compatibility

Procedure

Information in the Carcinogenesis Bioassay Data System is further protected by another series of programs, the File Compatibility Procedure. This set of programs is run quarterly to insure compatibility of the 4 Master Files. Cross-checks are made to

242

LINHART

ET AL.

verify that the data in CBDS is compatible for each bioassay. These cross-checks may not be made at input time since contractors may submit data at any time. In fact, it is certain that gaps will exist between the entry of Animal Group Observations, and Individual Animal Data Records. The File Compatibility Procedure prints informative messages when discrepancies exist between files. The Link File is checked to insure all records are compatible. Records must exist on the Animal Group File for Animal Groups mentioned in the Link Master Records. Preparations and SNOP codes referred to in Animal Group Records must be on the PCT File. Animal Groups must have related Environment, Cage or Group Observation Records, and Individual Animal Records on the Support File. Preparation Records must have related Chemical Records on the PCT File and Batch Chemical Records on the Support File. Conversely, errors exist if there is no matching Animal Group Record for Support File records and if there is no Preparation Record using a particular Batch Chemical. In cross-checking Individual Animal and Animal Group Records, the File Compatibility programs compare the Tumor Table in the Animal Group Record to the record of tumors on Individual Animal Records. The File Compatibility System does not correct errors. NC1 personnel must examine the Error Report and submit necessary deletions, modifications, and insertions. G. NCI CarcinogenesisBioassayAnimal Group Report

Data collection and file maintenance are necessary preliminaries to reporting. Four reports exist currently. These four are of general interest-both to NCI and investigators-they provide a fairly complete picture of information in the system although they do not begin to exhaust the possibilities of data analysis and correlation. The NCX Carcinogenesis Bioassay Animal Group Report is the basic system report containing the most significant information about bioassays. A Select Program permits reports to be generated for the entire file, for specified contracts, specified agents, or for particular contract-agent combinations. The Link File is the key to bioassay reporting. Once selective criteria are established, Link File records are used to determine all animal groups in the experimental relationships which meet that criteria. The Select Program prepares input for a Print Program for each animal group in the selected experimental relationships. The Print Records contain animal group background information and the Tumor Table from the Animal Group File. Preparation and chemical data and the English equivalents of the SNOP codes are selected from the PCT File and placed in the Print Record. Selected Animal Group Print Records are then sorted. Normally the major sort key is Contract Number, but if a report is desired by agent, the Agent Numbers may be the major sort key. After sorting by Contract and Agent (or Agent-Contract), the records are sequenced

CARCINOGENESISBIOASSAYDATASYSTEM

243

by Link Number which places related animal groups together. The next sort is on role so that test, vehicle, positive and untreated groups are printed in order. The Print Program prints the sorted Print Records. The Program prints preparation information by age of the animals at first dose and the various tumors which the group has suffered are ranked in order by frequency. H. PhaseII Reports

Two statistical reports currently exist in CBDS. Cage and Group Observations serve as basic input for a Weight Curve and a Survival Curve. Experimentally related animal groups are presented together. The records are sorted on Observation Date so that both weight and survival data are displayed as a function of time. The fourth CBDS report is the Individual Animal Pathology Report. This report is a complete presentation of the pathology data for the animals within a group. It includes both tumor and nontumor diagnoses.

II. COSTANDPERSONNEL It is impossible to determine the exact cost of CBDS and even getting an approximate figure is difficult. The main problem is establishing the cost of NC1 personnel who participated in the project. No records have been kept of the number of hours devoted to the effort by specific individuals. Furthermore, no clear cut distinction can be made between developmental costs and operating costs. The Update Procedure was first run with data from a bioassay contractor in December, 1971, As a result of that job, many changes were made in programming logic. The same holds true for other runs in the first six months of actual operation. System maintenance is, of course, a permanent fixture of a system. Currently one programmer is responsible for correcting system problems and making modifications to the existing programs. Nevertheless some facts and figures are known. CBDS is now one of the responsibilities of the Carcinogen Bioassay and Program Resources Branch. The nucleus of this group was the Program and Data Analysis Unit which originally had 2 staff members. Currently there are 9 members of the group under the direction of Dr. Norbert Page. They are Dr. Thomas Cameron, Ursula Evans, Terry Kuch, Mary Linhart, Gretrude Merl, Sharon O’Connor, Dr. Sidney Siegel, Patricia Steinour, and Mary Stewart. The Data Management Branch of the Division of Computer Research and Technology furnished more than 4 man years of personnel time to the analysis, design, and programming of CBDS. The total charge as of June, 1972 was $120,619. Nine members of the Data Management Branch staff contributed some part of their professional effort to the development of CBDS. Computer and computer related costs as of June, 1972 amounted to more than

244

LINHART

ET&.

$55,000. This figure includes test and operational costs (e.g., computer time, terminal support, and storage space) at the Division of Computer Research and Technology. The volume and complexity of CBDS data makes running costs fairly high with data for about 19 contractors on file. As of April, 1973; typical costs are as follows. Procedure Input Chemical List Update Animal Group Report File Compatibility Phase II Reports

CPU Time/Cost (seconds) 2.7 $2.95 2.4 $3.11 43.5 $59.43 69.5 $102.00 18.6 $24.24 120.0 $234.00

Notes input count = 2793 records input count = 3628 records

(report on four contractors.)

The tasks of input preparation, error correction, microfilm preparation, and system operation are currently performed by a contractor responsible to the Carcinogen Bioassay and Program Resources Branch. Cost of the contract for the first year is approximately $85,000. There are 7 employees handling CBDS input and microfilming operations; Project Manager, Supervisory Clerk, Control Clerk, Systems Operation Specialist, Data Technician, Programmer, and Medical Coder. As of May, 1973, microfilming charges at the Washington National Records Center have been $650.00. Each contract has personnel directly involved with data preparation for CBDS. The number of individuals involved will vary with the size ofthe contract. The average contract requires 1 full-time data clerk. Presumably the individuals would be engaged in data collection anyway. In the long run, CBDS may reduce contractor expenses by eliminating the need for extensive data storage at the contract site, providing a relatively simple method for data collection, and relieving each investigator of the task of of designing and producing data collection forms and reports. Of course, expenses incurred by investigators are ultimately NC1 expenses. 111. POSTSCRIPT A. Overview The Carcinogenesis Bioassay Data System began operation in early 1972. By the end of the year, the system was running smoothly. CBDS is more than a simple file maintenance and reporting system. The very size of CBDS puts it out of the “simple” category. Thousands of records and millions of data items will be maintained. Besides volume and the multiplicity of data items, CBDS is complicated by the necessity of providing for the relationship and the interdependence of the elements. A Contract may have several Colonies; for each Colony, a number of Animal Groups exist. Animal Group data directly relates to Cage or Group Observations, Individual,

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

245

Animal, and Preparation data. Preparation Records refer to Chemical and Batch Chemical Records and Individual Animal Records refer directly to SNOP Records. Reporting means correlating data from different records, often from different files. CBDS is further complicated by the experimental relationships of the Animal Groups. The linking of test groups with related control groups is the trickiest and most crucial part of the System. The link problem begins in the laboratories. Researchers have not developed any uniform nomenclature for an experimental relationship. A “test” may mean all the tests using an agent or it may mean an experimental relationship between specific test, vehicle, positive, and untreated control groups. The Link File which contains the data about the relationships among these groups requires the most complicated update processing of any program. Because of the experimental relationships, reports are not simple lists of data about animal groups arranged in sequential order by key, rather the records for related groups must be presented together. The Animal Group File has to be accessed randomly and this is, of course, more expensive. B. Evaluation The wisdom of developing an information system has been confirmed by the growth of the bioassay program. In fiscal 1972, the program operated at an annual level of about five million dollars and maintained 115,000 animals on test throughout the country. Projections of program growth suggest that by fiscal year 1976, the bioassay segment could reach a level of twenty-five million dollars per annum and involve records maintenance for 700,000 animals. Due to the long-term nature of animal studies, relatively little Individual Animal Data has been submitted to date. This is extremely important since it includes the information about tumors. Any judgements about CBDS will be more equitable after this data is incorporated and analyzed. The National Cancer Institute uses CBDS to monitor and evaluate bioassay investigations. For example, defects in experimental design have been found in the process of data entry. In addition, unacceptable lags in conducting histopathology examinations of tissues have been detected. Bioassay data is so detailed and voluminous that the only feasible method of interpreting the information is via summary and statistical analyses. An equally important blessing for contract administrators is the availability of current data. Using CBDS reporting facilities, the administrator is now able to ascertain a fair picture of the current status of a research effort without contacting the investigator. CBDS contains data from all bioassays and thus provides a broader base for comparison. All types of relationships and counter-relationships may be examined. Data from studies involving the same animal may be correlated as may data from a11studies on the same agent. A fringe benefit resulting from the development of CBDS has been a reevaluation of the entire bioassay administrative process. Problems which already existed be-

246

LINHART

ET AL.

came more evident in the light of CBDS. In addition, the System has emphasized the need for a closer relationship with investigators at all levels. Bioassay investigators, who contribute so heavily to CBDS, also share in the benefits. All reports are available to them. Standard data collection and reporting methods should eliminate the duplication of effort which occurred when each investigator designed individual procedures. CBDS reduces tedious manual labor for all contractors. For those with computers, in all probability they may cut back on input preparation; for those without computers, CBDS reduces the requirement for arduous manual analysis of the data. The standardization of input formats may not always have been welcomed with open arms by the investigators, particularly those who had already developed their own procedures. However, in the long run, the contractor should benefit since the tedious task of recording data has been organized and formalized into a routine procedure. Data collection should take less time (assuming that investigators had been doing a thorough job of recording data in the past.) A word to the wise, however, seems appropriate. CBDS is not the be-all, catch-all and end-all for carcinogenesis information processing. This is obviously true at this point when much remains to be done in the development of analytical and reporting capabilities. Investigators are encouraged to submit ideas and suggestions to the Carcinogen Bioassay and Program Resources Branch for additions and improvements. To paraphrase Gertrude Stein, a system is a system is a system. CBDS, by its very nature as a system, structures information. The structured format of CBDS is dynamic and subject to change but any structure implies limitations. In order to design the system, literally hundreds of decisions were made. For the most part, the decision were well reasoned and logical, bearing in mind the goals of the carcinogenesis program. Occasionally, equally logical reasons may have existed for making alternate decisions. Other decisions, such as file structure were based primarily on system concerns. Finally some decisions, generally minor ones, were purely arbitrary. A choice had to be made and no cogent reason could be found at the time for selecting one alternative over the other. Some of these decisions may have to be changed. A case in point is the use of the SNOP coding system to summarize Individual Animal Data. Certainly the data is virtually useless unless it is summarized, but summarization implies loss of detail. To be sure, the details will be in the Microfilm Files. More importantly, the SNOP system was designed for use with humans; e.g., there is no topography code for tail. The decision to use these codes was based on the general understanding and acceptance of SNOP by pathologists and the fact that the Division of Computer Research and Technology had developed a system for NCI’s Laboratory of Pathology which provided computer encoding of diagnoses. (To solve the tail problem if it arises, NC1 may be required to designate a code for tail.) An attempt has been made to keep the limitations of CBDS minimal, within the bounds of system efficiency and reasonable expense, For example, on the input

CARCINOGENESIS

BIOASSAY

DATA

SYSTEM

247

data forms, respondents are frequently limited to a choice of replies, hopefully all the possible alternatives. Perhaps it would have been possible to allow free form input, but the time and money involved in handling it would have been prohibitive. C. Future Considerations

CBDS is a fluid and growing system. In fact, some of the problems of system development resulted from this very fluidity and it was necessary at various stages to freeze changes in order to get the basic system underway. Now that the system is operating, attempts are being made to improve input procedures, maximize program efficiency, and reduce run costs. NC1 is concerned with the costs of CBDS. It would seem particularly appropriate to give special attention to this during the second year of system operation. One possibility which might be considered is reducing the number of Update runs by entering more data at one time. This would reduce update costs which are largely determined by the time required to handle the Master files. CBDS has not yet approached its full potential. Other correlations and analyses can and should be made. It would be useful to develop even greater specificity in reporting, e.g., a user might request tumor information on test group mice who did not get lung cancer in tests with cigarette smoke. Investigators are encouraged to make suggestions about reports they would find useful. Much consideration has been given to error correction problems, One solution is to prevent entry of errors into the Update Procedure. This may be accomplished by performing edit checks as input is put into machine readable form. A program exists which makes such checks as data is entered via terminal. Entry of data is initiated by the typist who indicates the type of Data Record form. The program then requests items for that form, makes edit checks and rejects illegal entries. If this program were used by personnel at the contract site, many of the present complications of error correction would be eliminated. Undoubtedly, this procedure would have problems all its own. The program is a slower and more expensive method of preparing input than straight keypunching and so far, the volume of errors does not seem to warrant its use. The desirability of having contract personnel enter data is also questionable since this involves support of computer terminals at each contract site and training of personnel. CBDS contains a vast amount of data and the existing reports are voluminous. It would be wise to consider a method of producing abbreviated reports or reports of exceptional conditions which might warrant investigation. This would be most appropriate for experienced users who do not require extraneous background information. In addition, it might be feasible to develop profiles of various types of experiments and generate reports to signal deviations from the norms. In conjunction with this, faster turnaround on report requests is always desirable. Currently a user can expect at least a one day lapse between computer input and output. Computer turnaround time could be reduced to a few hours if the files were

248

LINHART ET AL.

smaller and it might be worth considering special spin-off files for frequent users of CBDS. Another delay which might be minimized is the time necessary to transmit data between CBDS and the contract holders. Currently the postal system is the chiel carrier. In some cases, long-lines data transmission might be worth considering both for input and output. Such special procedures are expensive and volume and use must warrant the expense. Investigators may receive cards or tapes of their data to use in their own data systems. In addition, investigators may be provided with CBDS program decks. In fact, most programs were written in machine-independent languages so they could be adapted for use by others. Besides encouraging and training contractors to use CBDS output, a modus operandi is being established to handle output requests. This would mean an output clearinghouse to receive requests, initiate computer runs, and forward output to the contractor. The clearinghouse would also accept special requests (e.g., which might necessitate programming) and make recommendations as to the feasibility of filling them. Occasionally, investigators desire to deviate from CBDS input requirements. Each request has been evaluated on its own merits and some have been permitted. In the long run, it will be better for all concerned if exceptions to CBDS procedure are minimal. The deviation must be monitored and processed and this means an additional expenditure. At present, relationships between investigators and CBDS are rather informal and this, of course, has some advantages; hopefully it will encourage a spirit congenial to dialogue and exchange of ideas. However, informality can degenerate into carelessness which may be dangerous as the system grows and informal controls become impossible. Plans are being made to develop and implement formal guidelines particularly in regard to input procedures and schedules. In addition, a formal program is being developed to introduce bioassay personnel to CBDS. This might include literature, on-site training sessions, and even regional conferences to discuss changes, problems, suggestions, etc. CBDS will be a much more productive system when it is fully understood and used by the investigators. ACKNOWLEDGMENTS CBDS was developed for the Carcinogenesis Area of the Division of Cancer Cause and Prevention of the National Cancer Institute under the direction of Dr. Umberto Saffiotti. National Cancer Institute personnel who made a significant contribution to CBDS include Ursula Evans, Terrence Kuch, and Howard Rosenberg. From the Division of Computer Research and Technology, special acknowledgement should go to Roger Dailey, George Dobenecker, Martin Epstein, Sandra Foote, Ann Gallagher, Sam Harper, John Parks, Gary Stoner, and Emmett Ward. Morris Johnson and Dalton Tidweli from Woff Research and Development also contributed technical support.