Automated generation of a World Wide Web-based data entry and check program for medical applications

Automated generation of a World Wide Web-based data entry and check program for medical applications

-?ii!zeti~ in Biom e!b tine ELSEVIER Computer Methods and Programs in Biomedicine 52 (1996) 1299138 tomated generation of a .World Wide Web-bas...

2MB Sizes 1 Downloads 28 Views

-?ii!zeti~ in Biom e!b tine ELSEVIER

Computer

Methods

and Programs

in Biomedicine

52 (1996)

1299138

tomated generation of a .World Wide Web-based d and check program for medical applications T. KiucIhF, S. Kaiharab aDepartment of Epidemiology bHospital Computer Center, Received

und Biostatistics, Faculty of’ Medicine, University of Tokyo, Tokyo, Jupan University of Tokyo Hospital, 7-3-l Hotlgo, Bunkyo-ku, Tokyo 113, Jupun

15 February

1996; revised

1 October

1996; accepted

7 October

1996

Abstract The World Wide Web-based form is a promising method for the construction of an on-line data collection system for clincal and epidemiological research. It is, however, laborious to prepare a common gateway interface (CGI) program for each project, which the World Wide Web server needs to handie the submitted data. In medicine, it is even more laborious because the CGI program must check deficits, type; ranges, and logical errors (bad combination of data) of entered data for quality assurance as well as data length and meta-characters of the entered data to enhance the security of the server. We have extended the specification of the hypertext markup language (HTML) form to accommodate information necessary for such data checking and we have developed software named AUTOFORM for this purpose. The software automatically analyzes the extended HTML form and generates the corresponding ordinary HTML form, ‘Makefile’, and C source of CGI programs. The resultant CGI program checks the entered data through the HTML form, records them in a computer, and returns them to the end,-user. AUTOFORM drastically reduces the burden of development of the World Wide Web-based data entry system and allows the CGI programs to be more securely and reliably prepared than had they been written from scratch. Copyright 0 1997 Elsevier Science Ireland Ltd KeyrvorJs’s: Hypertext

transfer

protocol;

Hypertext

markup

language;

Data management;

World

Wide Web

1. Introduction

A few years ago, World Wide Web (WWW), which provides a single user interfa,ce to a variety of services and data formats on the Internet, was developed and is very popular now [I]. Its native and powerful feature is its multi-media, distributed information service using hypertext data

* Corresponding author. Present address: Hospital Computer Center, University of Tokyo Hospital, 7-3-l Hongo, Bunkvo-.cu, Tokvo 113, Jauan. Tel.: + 81 3 38122111. ext. 3523: 0169-2607;97/$17.00

Copyright

PI1 SO 169-2607(96)01793-2

0 1997 Elsevier

Science

Ireland

Ltd.

All rights

reserved

130

T. Kiuchi,

S. Kailwa

/ Computer

Methods

formats based on the hypertext transfer protocol (HTTP) a;ld utilizing hypertext markup language (HTML), which is developed based on the standard generalized markup language (SGML) [2,3]. HTTP/HTML was originally developed for the purpose of providing information to network users. However, thanks to the recent revision of the HTTP and HTML specifications, data submission frsm clients to the server has been made possible. HTTP/HTML now provides the following functions: (1) The server can send a data entry form to a client. (2) Using this form, the client can submit data to the server. (3) The server can process the submitted data. (4) The server can return the results of processing to the client. ‘We have developed an automated patient registra.tion and random allocation system [4] and total data management system for a clinicat trial using HTTP/HTML [5]. The most popular HTTP server software, such as NCSA httpd, CERN httpd, and NetScape Commerce Server, has an interface fbr running external programs or gatewa.ys calle,d the common gateway interface (CGI) [6]. We have developed CGI programs for three projects in C language in order to process data entered using an HTML form. However, it was laborious for us to develop CGI programs for each project. One major reason is that, for medical. use, we have to write routines for checking deficits, data types, range errors, and logical errors for all items in order to assure the data quality as well as checking the length of the entered data and detecting illegal meta-characters in order to assure server security. This paper presents a software tool, AUTOFORM, which analyzes the HTML form and automatically generates a CGI program to handle incoming data reliability and security. 2. Method 2.1. Hcdwnre

ard operating system

The system was developed using a PC with a BSDj0S 1.1 (BSDI, USA) operating system.

and Programs

in Biomedicine

52 (1997)

129-139

2.2. Software

The main program of AUTOFORM was written in PERL (Per1 4.109 with Japanese patch). For the compilation of generated C programs, gee version 2.5.8 and GNU make version 3.00 were used. The HTTP server software used was NCSA httpd version 1.4 and the clients used for testing were NCSA Mosaic version 2.4 on an UNIXbased workstation and NetScape 1.1 on MS-Windows-NT 3.5 [7]. 2.3. Extension of the specl$catiorz of HTML fOrln

for

AUTOFORM

AUTOFORM was designed to recognize all tags in HTML form supported by NCSA Mosaic 2.4 [S]. In order to make the generated CGI programs check the deficits, data type, data length, range errors and logical errors and detect illegal meta-characters, it is necessary to use additional information other than those supported by Mosaic. Thus, we developed an ‘extended HTML form’ for AUTOFORM in order to describe such information in HTML tags using additional parameters.

2.4. Security consideratiorl AIJTOFORM is designed to provide security from the following known CGI securit.y problems [9].(l) Memory overwrite One weak point of many CGI programs is that if it receives longer than expected data containing executable codes; these will overwrite its memory. If the CGI program switches its instruction pointer to that code, any potentially hazardous extraneous codes may be executed within the server. Note that data can be sent directly to a server without using the form-based interface, whose output the CGI program handles. AUTOFORM avoids these problems as it is designed to check the length of each string submitted to the server, using parameters in an ordinary and extended HTML farm.(2) Insecure data passed to a shell There is another security hole in the way the shell interprets data. Some CGI programs receive

T. Kiuchi,

S. Kuihara

1 Computer

Methods

and Programs

Table 1 Necessary. optional and extended parameters for each combination Tags INPUT

Type RADIO CHECKBOX TEXT

Data typea

in Riornedicine

52 (1997)

Option parameters

INT

NAME, NAME NAME

CHECKED VALUE, CHECKED VALUE, SIZE, MAXLENGTH

DECIMAL

NAME

VALUE, SIZE, MAXLENGTH

CHAR

NAME

VALUE, SIZE, M.~XLENGT~ VALUE, SIZE, MAXLENGTH SIZE, SELECTED, MULTIPLE

Extended parameters fOI' AUTOFORM

VALUE

NAME

SELECT

NAME, OPTION

TEXTAREA

NAME, COLS, ROWS

ALIAS, REQUIRED ALIAS ALIAS, REQUIRED. MAXVALUE, MINVALUE ALIAS, REQUIRED, MAXVALUE, MINVALUE ALIAS, REQUIRED, SECURITY ALIAS, REQUIRED ALIAS, REQUIRED ALIAS,

LCHECKb -LData type is specified using a DATATYPE b LCHEC’K tag is an extended tag.

131

of a tag, type and data type

Necessary parameters

PASSWORD

129- 138

REQUIRED

IF, THEN parameter, which is itself an extended one.

data and pass it to a shell. This can produce potentially dangerous results. For example, when sending e-mail using the HTML-based form, the address is passed onto a C function such as (‘system (commands)” where “commands” is a char type array and its contents are given 3s “sprintf (commands, “mail O/OS”, form-data)” where the mail address string is included in the form-.data char type array. If the COKlteIltS Of “form-data” were “someone@lo:al.edu; mail cracker@,elsewhere.edu < / etcipasswd”, then the server’s user information becomes available to the cracker. The best way to avoid this security hole is not to use functions which invoke a shell. However, such functions are convenient and will probably be used. Thus, AUTOFORM was designed to detect potentially hazardous meta-characters if an additional parameter for security protection is specified in a extended HTML form, assuming that it may be passed to a shell.

3. Result

The parameters used in extended HTML form are sumlnarized in Table I and the examples are shown in Fig. 1. The meaning of extended parameters are described in this section. An ALIAS parameter indicates an alias of a variable name which is specified in NAME parameter and is used as a variable name of the resultant CC1 program in C language. The alias is used to specify the item name of the entered data to a user when the submitted data are successfully accepted or fail to be accepted because of some deficit or other errors. This parameter is especially important in non-English speaking countries because it is convenient for each enduses to be able to read item names in their native language while English must be used for a variable name in C language. An ALIAS

132

T. Kiuchi,

S. Kaihara

/ Computer

Methods

and Programs

in Biomedicine

52 (1997)

129-139

Sex: Male, Female

If you want to receive the confirmation of patient registration, confirmation will he sent to you via e-mail.

please check the box helon-. Othcrwisc.

c. Text tgpe Age:

Hemoglobin: .

MAXLENGTH

= “2”

MAXLENGTH=“-I”

SIZE

the

= “2”

SIZE = “-I”

d. Password type Please enter pour second password for the registration of your patient:
= “\c>”


SELECT tag Please select your institution from the list below.

Hemu~lobin:
SIZE = “l”>.

c. Password t?pei/HJ> Pleaseenter your second password for the registration of your patient:
SELECT tag+Hl> Pleaseselect your institution from the list below.


TEXTAREA tagQHI> Pleasewrite your questions or correspondence in the area below.