Machine-independent system for processing medical text

Machine-independent system for processing medical text

COMPUTER PROGRAMS IN BIOMEDICINE 3 (1973) 163-174. NORTH-HOLLANDPUBLISHING COMPANY MACHINE-INDEPENDENT SYSTEM FOR PROCESSING MEDICAL TEXT T.C. SHAR...

595KB Sizes 1 Downloads 56 Views

COMPUTER PROGRAMS IN BIOMEDICINE 3 (1973) 163-174. NORTH-HOLLANDPUBLISHING COMPANY

MACHINE-INDEPENDENT

SYSTEM FOR PROCESSING MEDICAL TEXT

T.C. SHARPE and D.E. CLARK Medical Computing Unit, University o f Manchester, England

This paper describes a set of three programs, written in FORTRAN IV, for handling medical records which consist of free text interspersed with numerical results. The first program, CONCORD, generates a separate dictionary and concordance for each record. The concordances are stored on magnetic tape and may be searched by the second program, CONQUEST, to establish which of the medical records contained a given word or phrase, or a numerical value in a certain range. There are facilities for specifying lists of synonyms, and a number of searches may be carried out on a single run. There is no restriction on the terminology used for recording information, but a third program, CONDIC, is used to generate a list of all the words used in a set of medical records so that synonyms and mis-spellingsmay be taken into account when designing searches. ~oncord

Conquest

1. Introduction Although a number of computer programs are available for processing medical report narrative, each one tends to be restricted to a certain type of machine, the reasons being: (a) The programs are often written in low-level language. (b) Some are so large that they can only be run on the most powerful computers where large amounts of disc/core storage are available. This means that few hospitals and medical schools have access to them; there is a need for a system which can be run on a medium-sized computer and is written as far as possible in a high-level language. The programs discribed in this paper are written in FORTRAN IV, and can be run on a machine with 32k of core storage. FORTRAN was chosen because it is almost universally available - it was found that all the programming could be done quite adequately in this language if five short machine-language routines were added. It was found in practice that these were available as standard library routines under most operating systems, and in some cases they could be implemented in extended FORTRAN. The programs are based on a system for searching Atomic Energy Law developed at Harwell [ 1]. There

Condic

are two major programs, CONCORD and CONQUEST. CONCORD processes the original documents which are punched onto cards in the same format as they would appear in the type-written page, i.e., as words separated by blanks. It orders the words alphabetically and stores a reference to the position of each word in the document. These alphabetical lists of words, termed the dictionary and concordance respectively, are written to magnetic tape. CONQUEST reads a set of instructions from cards which specify words or phrases which may have occurred in the text. As search "language" allows the positional relationship of the words to be defined. CONQUEST indicates which document or documents contained the words requested, and prints out the original text of the documents if required. For medical purposes, a facility for handling numbers (decimal and integer) and dates has been added. Numbers have to be identified in the text by a group of characters enclosed in brackets, e.g. (AGE) 35. Also a program called CONDIC has been added which combines the dictionaries produced separately for a number of medical records to generate a dictionary of terms. This is essential for recognising synonyms and mis-spellings.

164

T.C. SHARPE and D.E. CLARK

2. Computational methods

essed along with the text; the last card contains the end-of-document marker ++.

2.1. CONCORD (fig. 1.) 2.1.4. Description o f the program 2.1.1. Purpose To generate a dictionary and concordance from each medical record fed in.

The program contains 31 subroutines, which are described below.

2.1.4.1. START. START is called once only and per2.1.2. Techniques used 2.1.2.1. Chained dictionary. Words appear in storage in the same order as they occurred in the original document. A system of pointers is used to indicate the alphabetic sequence of the words. The dictionary is divided into sections or subchains by markers, which usually consist o f zero (the start of the dictionary) letters A to Z, and the binary word containing all ones (the end o f the dictionary). Maximum size of the dictionary is 1000 words.

forms various initialisation tasks. It sets up certain alphabetic constants and generates a look-up table NATABL which contains an entry for each character in the character set. The entry consists of a number which identifies the type of character, e.g. 1 = a letter, 2 = blank or character to be ignored, 3 = full stop etc. START also sets up parameters NCHWD (No. of characters per word on this machine), NCHCD (No. of characters per card), NWDCD (No. of machine words per card).

2.1.2.2. Concordance. Each dictionary entry has a

2.1.4.2. MARCOM. MARCOM reads markers and

chain of numbers associated with it which specifies the position of each occurrence of the word in the original document, e.g. sentence No. 3, word No. 6. This information goes to make up the concordance. The concordance may contain up to 2000 entries.

common words from cards. Common words are words such as THE, IT, IS, OF, which do not carry any information.

2.1.4.3. INITAL. INITAL is called before each docu-

2.1.2.3. Tail stack. If there are n characters per ma-

ment is processed to clear the working arrays and reset the counters and logical variables.

chine word, the first n letters of a dictionary word are packed in array NAWORD, and the remaining characters are packed in successive locations in NASTAK, the array of tails. A set o f pointers is used to indicate the position o f the tail in NASTAK and the number of locations occupied by the tail.

2.1.3. Data The data consists of narrative punched onto cards in the same way as it would appear on the typewritten page - the "line" being 72 characters long. No special conventions are used with pure text, but number require an identifier which must precede the numbers and be enclosed in brackets. The standard FORTRAN character set is used (digits 0 - 9, letters A - Z (upper case only), mathematical symbols + - * / --, and punctuation ( ) . , , plus apostrophe ' and $. Dollar may be used to erase the previous word. The first card of each record is a header card which carries identification material and is not proc-

2.1.4.4. SCANMK. SCANMK scans the marker words (user-defined) and enters them in the dictionary. The markers must already be in alphabetical order.

2.1.4.5. SCANCM. SCANCM scans the common words (also user-defined) and enters them in the dictionary.

2.1.4.6. SCANTX. SCANTX takes each word in the text in turn and deals with it accordingly, depending whether it is a number, an identifier, a marker, or a true word. If it is a true word, SCANTX updates the concordance.

2.1.4.7. SCAN. SCAN delivers one word at a time to SCANMK, SCANCM, or SCANTX.

2.1.4.8. READCD. READCD calls READ to read in a card, and scans it character by character. NATABL is

MACHINE-INDEPENDENT SYSTEM FOR PROCESSING MEDICAL TEXT

MAIN [ I I

h

F

r*'i_ F

:-~:1--: ~":"~

I

-.~-I~, ~.co.

I

__-~J

INITAL

I

I

SCAN

I

F

~:~

s~

I ~ _~

READCD

....

-~

L

+

__J - -__I

F I I I

k

SC/gqTX

F --~I I I

I I I I

___

I

r -

(

-

-

~_.~

--LIJ ~oo~ q_._~c,wo~

i

£I - -

I I

-~:~"~

I I

I

~..._

__

I

I

-V-t;~ I

I I

G~0~

FNDNXT

i._.

F I__

2

I-I I

I

Z1

r" I I

I

b ]

WRTMT

Fig. 1. CONCORD flowchart.

[

165

166

T.C. SHARPE and D.E. CLARK

consulted to find out what type of character it is; if it is a letter or digit, it is packed into the next character position in a buffer array; other characters act as separators to indicate end of word or end of sentence. Some characters are ambiguous (e.g., full stop/decimal point and hyphen/minus sign) and require special logic for correct interpretation. The separated words are delivered to SCAN.

2.1.4. 9. LOOKUP. LOOKUP tells SCANTX whether a word is currently in the dictionary, and if so, where. It first compares the word with the markers to determine the correct subchain, then scans the subchain. 2.1.4.10. ENTER. If the current word is not yet in the dictionary, ENTER sets up the appropriate pointers to enter it. 2.1.4.11. CFWORD. CFWORDextracts the tail of a dictionary word from the tail stack. It is only called when the new word and the dictionary word both have the same first n letters. A flag is set to indicate whether the full word is earlier, later, or identical to, the dictionary word. 2.1.4.12. COMWRD. Sets NLCOMM to be TRUE if the current word is a common word. 2.1.4.13. NUMBER. Convert numbers from BCD into binary using a process of multiplying each character (plus or minus a constant factor) by a power of ten and connecting to integer (if there is no decimal point), floating point (one decimal point) or date (2 decimal points). The number is stored, with its identifier, in NANUM. 2.1.4.14. LISTC. Prints the concordance for each document, if required. 2.1.4.15. PRINTD. Prints the dictionary for each document. 2.1.4.16. LISTCA. Proceeds through the dictionary chain, placing the words in true alphabetical order in arrays NAWRD (containing the first n letters) and NASTK (containing the tails). The concordance entries are transferred in a similar way to NACCK, and

various pointers are placed in arrays NAFTC and NAFT 2. These pointers locate the tails and concordance references for each entry in NAWRD.

2.1.4.17. FNDNXT. Finds the location of the next word in NAWORD, in alphabetical order. 2.1.4.18. GTWORD. Fetches the next dictionary word from NAWORD and (if more than n letters) NASTAK. 2.1.4.19. WR TMT. Writes 5 logical records to tape: (i) A copy of the first card of the document, which contains identification. (ii) The text of the original document. (iii) A list of parameters defining the effective array sizes of NAWRD etc. (iv) Arrays NAWRD, NASTK, NAFTC, NAFT 2, NACCK. (v) Array NANUM (if any number were present). 2.1.4.20. FAIL. Prints a failure message if the dictionary or concordance overflows. 2.1.4.21. BLOW. Unpacks the characters read in, n to a word, by READ, and places them, one to a word and left justified, in array NABOTH for processing in READCD. Minot subroutines." 2.1.4.22. RESETI. Resets all the elements of a onedimensional array to a given value (either zero or blank). 2.1.4.23. RESETJ. As above, operates on two-dimensional arrays. 2.1.4.24. READ. Reads a card in (mAn) format where rn is the number of words per card, n is the number of characters per word. 2.1.4.25. PACKFR*. Packs two numbers into a single location. 2.1.4.26. UNPKFR *. Unpacks two numbers from a single location.

MACHINE-INDEPENDENT SYSTEM F O R PROCESSING MEDICAL TEXT

2.1.4.27. GET* (S,LT,). Gets the lth character from a location containing n characters and places it left adjusted in another location, followed by blanks.

167

(ii) Search request punched on cards. The search language uses the following instructions: Ins t ruc t i on

+

Number

Word

2.1.4.28. PUT* (S,I,T,). Put the left-most character of a machine word, into the 1th position in another machine word, leaving the other characters undisturbed.

2.1.4.29. KOMPAR (A, B, K). Sets k = - I ifA is logically less than B, k=0 ifA=B, and k=l ifA>B. * Written in extended FORTRAN in the 360 version. + Library subroutine.

OR AND NOT STOP FINISH SUBSET PRINT LT EQ GT

nl nl

al al al

n2 n2

r/4 r/3 rt 3 n3

a2 a2 a2

2.1.4. 30. PA CKCH. Packs three numbers into a single word. This is done by integer multiplication as the subroutine is only used for processing dates and speed is not important.

2.1.4.31. UNPKCH. Inverse of PACKCH. 2.2. CONQUEST(fig. 2)

n 1 and n 2 are the upper and lower limits for the proximity of two words, n 3 Represents a number or date, and n 4 refers to the set of records from which a subset should be created, a 1 Is the word which we are trying to match, and a 2 is an identifier for numeric quantities, enclosed in brackets. The sample run illustrates the way in which the various instructions are used.

2.2.1. Purpose To interpret search instructions read from cards and scan the output from CONCORD to determine which documents satisfy the requirements of the search.

2.2.4. Program description. The program contains 20 subroutines, which are described below.

2.2.4.1. START. Called once for each run, to reset 2.2.2. Techniques used 2.2.2.1. Binary search. Since the dictionary produced by CONCORD contains words in true alphabetical order, a binary search technique can be used in CONQUEST to try and locate a given word. The search word is compared with the middle word of the dictionary; if it is greater, the bottom half of the dictionary is disregarded and the process is repeated using only the top half of the dictionary; if it is less the process is repeated using the bottom half. In this way the desired word is repeatedly 'straddled' until either the matching word is located, or the search length becomes zero. For a dictionary containing 1000 words, the process requires a maximum of 10 repetitions.

the counters and arrays. Also sets up Hollerith constants OR, AND, etc.

2.2.4.2. QCODE. Reads the search instructions from cards. The instruction words are compared against the list of instructions set up in START, using function NCODE, and replaced by a numeric code. Any numbers on the search cards are converted to binary using NUMBER. If a SUBSET instruction is encountered, control is passed to the SUBSET routine, which reads the next instructions. If FINISH is encountered, QCODE returns to MAIN.

2.2.4.3. NUMBER. As in CONCORD, sect. 2.1.4.13. 2.2.4.4. NCODE. Returns a numeric code for all legal

2.2.3. Data Consists of: (i) The output from CONCORD

instructions, zero for illegal ones.

2.2.4.5. SUBSET. Reads in cards following the

168

T.C. SHARPE and D.E. CLARK

MAIN

[....

j

,r

I

i

_'~- >-[-~ START ,1~

--

r -

l

I'l

,,,

QCODE

r _~?~Z

I !

NUMBER

I i

I ! I I

r

I

F,

NCODE

l

I I I

II r'! I J I i

I I ,

I

~

I

L

I

I

I

I

I

I I

'I

I I

SEARCH

ANDNOR

-->-

r

I a

r I

!

I I I I

1!

i ! I

i -

I

~

,

I

I I

I I i

'I

i

I I

i

L_

I t

I

'

1

f

LOOKUP

I I

-rq Nude.

.~ -4-I

L-

I I |

Fig. 2. CONQUEST flowchart.

TEST

MACHINE-INDEPENDENT SYSTEM FOR PROCESSING MEDICAL TEXT SUBSET card until STOP or FINISH is encountered. The numeric field on these cards refers to a search which has just been carried out e.g. if 4 searches have been carried out on the last run, the number could be 1,2, 3 or 4. The instructions specify which of the searches must be satisfied by the document if it is to be included in the subset. The number on the SUBSET card identifies the set from which the subset is to be drawn. Blank or zero indicates the entire set of records. The new subset would then be labelled 1. This number is used for a logical tape unit, and in the 360 version cannot exceed 4.

2.2.4.6. READMT. Reads back the information put out by WRTMT in CONCORD. Has an entry WRTMT for writing subsets to tape or writing text onto the output tape. 2.2.4. 7. SEARCH. Search scans through the search request for the current run, now in core. It calls in appropriate routines depending on the search instructions, as follows:

°R / AND ANDOR NOT STOP STORE LET } EQ NUMSCH GT FINISH Return to main program.

2.2.4.8. ANDNOR. OR. Calls LOOKUP (NAWK, NR) to place the references for the specified word (if any) in NAWK. AND. Calls LOOKUP (NATEMP, NTEMP) to place the references for the specified word in NATEMP. Deletes the references in NAWK which do not satisfy the AND condition. Calls TEST if required. NOT. Similar to AND, but the references in NAWK which do satisfy the NOT condition are deleted. 2.2.4.9. LOOKUP. Performs a binary search to determine whether the search word is in the dictionary. It if is, the concordance references are placed in array

169

NAWK (if it is an OR instruction) or NATEMP (if it is an AND or a NOT instruction).

2.2.4.10 TEST. If the search request specifies that the word in the AND or NOT instruction must lie a certain number of words before or after the word in the OR instruction, TEST is called to see if the difference between the concordance references is in the specified range. If the words are not in the same sentence, the condition will not be fulfilled. 2.2.4.11. NUMSCH. Searches NANUM for the identifier specified on the search card (in LT, EQ, GT instruction). If it is found, the number in NANUM is checked to see if it is the same type as the search number. It it is the same type, it is compared with the search number to see if the condition less than, equal, or greater than is fulfilled. 2.2.4.12. STORE. If the present search has achieved a positive result, STORE places a 1 in NACARD (NOSRCH), where NOSRCH is the number of the current search. If not, NACARD (NOSRCH) is set to zero. NACARD is later written to tape; it contains a record of the search results for one document. 2.2.4.13. ORDER. Called by ANDNOR after processing and AND or a NOT instruction. Removes the blank elements from NAWK, places the references in order, and sets NR (No. of references) accordingly. 2.2.4.14. PRINT. Prints a summary of each search including: (i) Heading (ii) Replica of search instructions. (iii) Number of documents satisfying the search. (iv) Header cards o f documents satisfying the search. The following routines are also used in CONCORD:

2.2.4.15. 2.2.4.16. 2.2.4.17. 2.2.4.18. 2.2.4.19. 2.2.4.20.

RESETI RESETJ UNPKFR KOMPAR PACKCH UNPKCH.

170

T.C. SHARPE and D.E. CLARK

2.3. CONDIC (fig. 3) I

2.3.1. Purpose To generate a dictionary o f terms from the dictionaries produced separately for a set of medical records.

2.3.2. Techniques used CONDIC uses a straightforward core-bounded merge technique. It is written so as to make most economical use of storage, and allows a dictionary of 10,000 words to be accommodated on a 32k machine. This is done by overlapping the storage areas used for the "old" and "new" versions of the dictionary before and after merging in the dictionary for the next record.

2.3.4. Program description The program contains five subroutines, which are described below.

~ _ -~:

h, I

i v

I i

,.--

WORD2 (i)

•--e-h i - 6 - - l-J

WORD3

(I)

)_$~

WORD2

(2)

I

i ,k

!

I

i

V

I

'r"

!

I

!

I

J

I ! i i

I-.

I

I

i I

W

AI

'!I

!

I

P ! I

I

,

!

_ ~

2.3.4.1. WORDI. Has 3 calls; deals with words with-

I

out tails. (i) Compare the new word (from NAWRD) with the next word in the CONDIC dictionary. If it is earlier, it becomes the next entry in the dictionary; if it is the same, both words are entered in the same location; if it is later, the old dictionary word is entered. (ii) Enter any remaining words after the end of the list in NAWRD has been reached. Shift the "new" dictionary back into the "old" locations. (iii) Print a dictionary of all the words with n letters or less.

L_~ -4

.

2

L. r" I I I I

L r

.

.

.

.

WORD2 (3)

-(

I I I

r

......

4-

- -'~a

I

where c is the number of characters in the word. Similar to WORDI, except that tails have to be extracted from the correct locations in NASTK.

2.3.4.3. WORD3. Dealswithwordswhere 2n < c<~ 3n. Words with more than 3n letters are truncated. The following routines are common to CONCORD and CONQUEST: 2.3.4.4. KOMPAR

WORD3 (2)

ri

!

I

2.3.4.2. WORD2. Deals with words with n < c ~< 2n,

WORDI G)

I l

2.3.3. Data Output from CONCORD.

.__

I

+

Q Fig. 3. CONDIC flowchart.

WORD3

(3)

MACHINE-INDEPENDENT SYSTEM FOR PROCESSING MEDICAL TEXT

171

2.3.4.5. UNPKFR'

4. Mode o f availability

3. Hardware and software requirements

The program is available as a tape from the Medical Computing Unit, Clinical Sciences Building, York Place, Manchester M 13 OJJ.

The program exists in two versions: (i) I.B.M. 7090. (ii) I.B.M. 370. The specifications for the smaller machine (7090) are as follows: Core storage - 32k, 36 bit/word. Cycle time - 2.18/~sec. Direct-access storage - none. Magnetic tape - 16 at 60 kc/sec, 7 track. Operating system - IBSYS version 13 F O R T R A N IV compiler. The specifications for the 370: Core storage - 2m byte, 32 bit word. Direct-access - 2314 and 3330 disc. Magnetic tape - 5, 9 track. 2, 7 track. Cycle time - Main core 1.2/asec. 8 bit word. Buffer store - 160 nsec. 8 bit word. Operating system - F O R T R A N G-LEVEL Compiler.

FF,EQ

DOC

i

0

3

SENT 105

Sample Run The data for the sample run consists o f ten postmortem reports. These are written in ordinary medical English, and the only additions to the original reports are the bracket-strings for identifying height, weight, data o f autopsy, etc., and the header and terminator cards (fig. 5c). The reports each contain about 1000 words on average, and occupy about 150 punched cards. Fig. 4. is a print-out showing the structure o f the concordance for the first report. Note that concordance references are not generated for the words A, AN, AS, etc. which have been defined as common words. A search run is shown in fig. 5. The search requests are fed in as a card deck in the present version, but

WORD

CODE

WORD

6

MELAENA MODERATELY

O

96

7

O

IO7

4

O

121

3

1

O

20

1

MONILIAL

l

O

46

1

MOU~I

3

0

55

2

MUCOSA

.......

1

O

25

7

OBESE

1

O

2

3

OCCUPATION

1

O

26

2

OEDEMA

1

O

20

2

OESOPHAGITIS

1

0

55

5

OESOPHAGUS

1

0

54

i

OESOPHAGUS

Fig. 4. CONCORD - print-out of a typical CONCORDANCE.

.....

72

T.C. SHARPE and D.E. CLARK COMPUTER SEARCH OF ~R~DICAL DATA

COMPUIER SEARCII OF F~DICAL DATA

SEARCH NUMBER I

SEARCH NUMBER 3

SEARCH INSTRUCTIONS

SEARCH INSTRUCTIONS OPERATION

PLUS

blINUS NU~3ER

OR

O

0

STOP

O

O

WORD

OPERATION

PLUS

MINUS

FEMALR

GT

O

O

STOP

O

O

NUMBER

WORD

i I 68 (DATE OF AUTOPSY)

RESULTS OF COMPUTER SEARCH

RESULTS OF COMPUTER SEARCH

SEARCH NUMBER I

SEARCH NUMBER 3

THE SEARCH QUERY IS SATISFIED BY 3 DOCLLMENTS -

THE SEARCH QUERY IS SATISFIED BY 8 DOCUMENTS -

MANCHESTER ROYAL INFIPMARY

AUTOPSY NO.

164/68

MANCHESTER ROYAL Ih~FIP~iARY AUTOPSY NO.

~iANC}[ESTER ROYAL INFIP~[ARY

AUTOPSY NO.

461/69

MANCHESTER ROYAL INFIRMAP, Y

AUTOPSY NO.

139/69

MANCHESTER ROYAL INFIP~[ARY

AUTOPSY NO.

460/69

MANCHESTER ROYAL INFIRMARY

AUTOPSY NO.

318/68

MANCHESTER ROYAL INFIIL~iARY AUTOPSY NO.

461/69

MANCHESRER ROYAL INFIRMARY

AUTOPSY NO.

231/69

MANCHESRER ROYAL INFIRMARY

AUTOPSY NO.

460/69

MANCHESTER ROYAL INFIRMARY

AUTOPSY NO.

335/68

MANCHESTER ROYAL INFIRMARY

AUTOPSY NO.

134/69

COMPUI'ER SEARCH OF MEDICAL DATA

SEARCII NUMBER 2

164/68

SEARCH INSTRUCTIONS OPERATION

PLUS

MINUS

LT

O

O

STOP

O

O

NL~ER IOO

COMPUTER SEARCH OF MEDICAL DATA

WORD (WEIGHT)

SEARCH NUMBER 4

SEARCH INSTRUCTIONS RESULTS OF COMPUTER SEARCH

SEARCH NUMBER 2

OPERATION

PLUS

MINUS

EQ

O

0

STOP

O

0

NUMBER

WORD

6.00000 (HEIGHT)

FINI THE SEARCH QbI~RY IS SATISFIED BY iO DOCUMENTS MANCHESTER ROYAL INFIP~MARY

AUTOPSY NO.

164/68

MANCHESTER ROYAL INFIR~LERY

AUTOPSY NO.

139/69

MANCHESTER ROYAL INFIR~LERY

AUTOPSY NO.

132/69

MANCHESTER ROYAL INFIP~[ARY

AUTOPSY NO.

318/68

MANCHESTER ROYAL I~$IRMARY

AUTOPSY NO.

461/69

RESULTS OF COMPUTER SEARCH

SEARCH NUMBER 4

MANCHESTER ROYAL I N F I P ~ R Y

AUTOPSY NO.

231/69

THE SEARCH QUERY IS SATISFIED BY 2 DOCUMENTS -

MANCHESTER ROYAL INFIP~:ARY

AUTOPSY NO.

460/69

MANCHESTER ROYAL INFIR~LERY

AUTOPSY NO.

318/68

~%/qC}IESTER ROYAL IhTIPC~RY

AUTOPSY NO.

335/68

MANCHESTER ROYAL INFIRMARY

AUTOPSY NO.

335/68

I~NCilESTER ROYAL INFIRM~.RY

AUTOPSY NO.

134/69

MANCttESaER I~OYAL INFIRMARY AUTOPSY NO.

475/69

Fig. 5a. CONQUEST-sample run. Four typical searches carried out on a set of 10 post-mortem reports.

Fig. 5b.

MACHINE-INDEPENDENT SYSTEM FOR PROCESSING MEDICAL TEXT SUBSET INSTRUCTIONS

OR

I

AND

2

AND

3

(a)

STOP

SUBSET INSTRUCTIONS

(b)

STATISTICS NO. OF RECORDS SEARCHED

IO

NO. OF RECORDS IN SUBSET

3

END OF FILE REACHED 3 DOCUMENTS PROCESSED

COMPUTER SEARCH OF MEDICAL DATA

SEARCH NUMBER 1

SEARCH INSTRUCTIONS OPERATION

PLUS

MINUS

OR

O

O

NUMBER

MONILIAL

WORD

AND

1

O

OESOPHAGITIS

PP~NT

O

O

STOP

0

O

FINI

RESULTS OF COF~UTER SEARCH

SEARCH NX~.IBER I

THE SEARCH QUERY IS SATISFIED BY I DOCUMENT M A N C H E S T E R ROYAL INFIP~IARY

AUTOPSY NO.

SUI@LERY OF AUTOPSY RECORD - • . . . . . . 3.

460/69

PULMONARY ARTERIES.

THROMBOSIS OF INTRAHEPATIC BRANCHES OF THE HEPATIC VEIN

CHIASI SYNDRO~fE). 4.

MONI LIAL OESOPUAGITIS.

5.

PREVIOUS CIIOLYCYSTLCTOMY.

SIGNED . . . . . . .

/ / /

Fig. 5c.

(BUDD

173

174

T.C. SHARPE and D.E. CLARK ALPHABETIC LISTING OF WORDS USED, GROUPED BY NO. OF LETTERS.

/*



°

°



°

°

°

BRAIN

DUCTS

STAGE

VISCERAL

BRAIN

DUODENAL HOWEVER

MODERATE PRESENT

STAINED

WALLS

BRAIN

DUODENUM I/C

MONILIAL PRESENTS

STALK

WARD

BRAIN

DURA

MONTHS

PRESSURE

STARTED

WARD

BRANCH

DURATION ILIACS

MORNING

PREVIOUS

STATE

WARRANT

HOWEVER

ILIAC

MIXED

PRESENCE



°

*



.





/

J°°o,,, ~ L S

DA~OLLOWING

HYPOCHONDR

AD~NALS

DEGEh~RATING

I/C

~.

BASS O E D E ~ T O U S

SECONDLY

ANGIOGRAM

DEGE~RATION

I/C

~.

JOHN ~SOPIIAGITIS

SEMINIFEROUS

ANGI~RAP~

DEMARCATED

I~ROVE~NT

OESOPHAGUS

SENSATIONS

INCISIONS

OESOPHAGUS.

SEPTE~ER

ANTIBIOTICS D E ~ E L I N A T I O

OCCUPATION

SCELEROSIS

.

o

o

°

°

.

°

Fig. 6. CONDIC - combined dictionary of terms for all 10 post-mortem reports.

could be entered via a teletype if one was available, with slight modifications to the program. In this example a search is carried our for: (a) Females. (b) Weight less than 100 kg. (c) Date o f Autopsy after 1.1.68. (d) Height equal to 6 ft. A subset is created of all cases satisfying the first three searches; this subset is searched for cases of

MONILIAL OESOPHAGITIS; the text of the report (only one was found) containing this is printed out. Fig. 6 represents the combined dictionary of terms for all 10 post-mortem reports.

Reference [1] G.F. Niblett and N.H. Price, STATUS, a concordance generating program (H.M.S.O., London).