Mathematical model for medical diagnosis

Mathematical model for medical diagnosis

MATHEMATICAL MODEL MEDICAL DIAGNOSIS FOR JOSEF WARTAK Department of Medicine, University of Alberta, Edmonton, Alberta, Canada Abstract-Medic...

401KB Sizes 8 Downloads 83 Views

MATHEMATICAL MODEL MEDICAL DIAGNOSIS

FOR

JOSEF WARTAK Department

of Medicine,

University

of Alberta,

Edmonton,

Alberta,

Canada

Abstract-Medical diagnosis is viewed as a problem in statistical classification wherein an N-dimensional sample space is partitioned into categories (diseases). Members (patients) of the categories (diseases) are each represented by an ordered sequence of n numbers, or equivalently as a data vector in an N-dimensional hyperspace. Assuming there exists a probability distribution associated with each category, the patient’s data vector is to partition the space in an optimal fashion, i.e. it is assigned to the category (disease) in whose region it falls with the greatest probability. Before making the final decision certain optimization rules may be used. Diagnosis

Mathematical

Model

Computer

Statistics

Probability

Optimization

INTRODUCTION

Diseases are characterized by the presence of certain manifestations (symptoms, signs, laboratory data) which the physician acquires in studying and examining the patient. The physician seeking to detect the presence of disease has to evaluate these manifestations in the light of his clinical experience and textbook knowledge. As a result, the patient is considered to suffer from a disease. which in many other cases produced the most similar manifestation cluster to that discovered in the patient. For the experienced physician the mental process required to make a diagnosis has become so intuitive, that very few clinicians seem to realize that it can be explicitly presented in mathematical terms, and of much greater importance, it can be investigated and perfected by mathematical techniques. In mathematical terms, medical diagnosis is the process of recognizing a new input (i.e. the patient being studied) as a member of a given class (i.e. disease category). Each member of a class may be considered as an N-dimensional vector, where a dimension represents an attribute (i.e. disease manifestation). To recognize membership in classes, it is necessary to learn about the common properties of classes which are known only through a set of their samples. This learning amounts to the estimation of probability densities of the set of samples in the N-dimensional vector space. The central problem is the linear transformation that minimizes the mean-square distance between members of the same class to ensure a non-trivial solution. After linear transformation is done the probability density of the class can be estimated by finding contours of equiprobability hyperelipsoids. The recognition of membership is based on the evaluation of the already learned conditional probability density of each class at the point in the vector space that represents the patient’s data vector to be classified. The motivating criterion for final diagnosis is that the decision should minimize the overall “loss” associated with error. 79

JOSEFWARTAK

80

VECTOR

REPRESENTATION

OF

DISEASES

Assume we have samples S,, S2. . ., S, and they come from diagnostic categories D,, categories are exclusive, i.e. they do not have common D,, . . .. D,,. These diagnostic members. Further, assume that 17attributes have been measured for all members of each sample. Then each member (patient) can be represented as a vector or point in n-dimensional space. Consequently, the members of a particular sample Sk (k = 1, 2, . ., m) form a cluster that can be described by equations of extremal hyperplanes forming a new. uncorrelated system of coordinates which closely fits the shape of the cluster. For illustrative purposes, an example involving two samples and two attributes is shown in Fig. l(a). A

Sl

D,

IA) ..

;:.“$,j; ..,.,.,. :: : ..;,:.., ‘:,‘,.: .:, :. :: ‘.

.,:...,:, .;:/,.;:::..,, .::.;. .“. ._‘._‘,.’., .’._:, ‘:.;,.,.;,. ::.:*~+.:: ;;:, .:...: ..;...... ‘. .. . .. . . . . . 02

Fig. 1. Geometrical representation of symptom space and disease clusters. For simplicity only two symptoms (S, and Sz) and two disease clusters (DI and D2) are shown. (a) Original symptom space. (b) Disease clusters in their own reference frame determined by eigenvectors. (c) A family of equiprobability elipsoids.

Let A$’ be a general symbol of the member (patient) of the kth sample (k = 1. 2, ., ., m) where index i(i = 1, 2, . . ., p) is alternate number of patient and index j(j = 1, 2, n) indicates a particular attribute. The problem is to obtain the best approximation in the least-square method sense ofjth data points from the matrix [A$‘] by an n - 1 dimensional hyperplane defined by the equation n

a,, + c j= 1

ai.yj = 0

where u(,, CI~, . , a, are coefficients.

81

Mathematical model for medical diagnosis

n=f:(

The coefficients a,, al,. . . , a, are defined from:

n

4) +

1

j=

i= 1

1 2

= minimum.

ajxij 1

The necessary condition for A to be a minimum is: o/l

-

daj

(for all j = 0, 1, . . . , n).

= 0

Solving the above system of equations we obtain: (i[1] - [CJ)a

= 0

(1)

where [I] is the identity matrix, a is a column vector (a,, u2, . _., a,), [q matrix with elements cIi =

&

j,=1,2

.$l(X,jmXj)

I-

,...)

is a covariance

n

with xkj being the jth attribute of the kth sample and Xj the average value of the jth attribute. The system of equation (1) has non-zero solutions only if the determinant IC - AZ1= 0. The above determinant is a polynomial in ;1 equal to zero. Values of 1>which are roots of this polynomial are eigenvalues of the matrix [CJ. If all the values of A are different then each is associated with a certain vector x, which is called an eigenvector of the matrix [Cl. Those eigenvectors are mutually perpendicular and make a new uncorrelated system of coordinates in relation to which one can compute the coordinates Y,, Y,, . . . , y,, for members of each sample (Fig. lb). This can be done according to the equation: y = x’ [V] where y is a row vector (yr, yz, ., y,,), [V] is a matrix with eigenvectors as columns, and x’ is a row vector (x;, xi, . . ., xi) in which .xi is the ,jth attribute of a member expressed in the old coordinate system from which the jth coordinate of the origin of a new uncorrelated system of coordinates is subtracted. PROBABILITY

DISTRIBUTION

In the cluster representing the sample S, (k = 1, 2, . . ., in) a family of minimal convex areas can be distinguished wherein definite numbers of points are contained. Assuming a priori that the distribution of those points is normal, it can be shown that the minimal areas having with the given sizes the highest probability of containing the points are the corresponding hyperelipsoids expressed by the equations: Z Y: r+*... +R2 1

z

n

where R2 is the parameter of a family of hyperelipsoids. For illustrative purposes an example in two-dimensional space is shown in Fig. l(c). As the coordinates Yi, Ye. . ., y, are-independent, the probability that point X = (xi, x1,

JOSEFWARTAK

82

. . ., x,) is contained inside elementary hypercube with measurements can be expressed by the equation:

p(x’DJ

1 exp (R’/2) dy,, = (27r-3 (I”,, &, . . . , 2,)

dy,, dy,, . . ., dy,

. . dyn

This is a conditional probability that a particular set of attributes .x1, x2, . ., x,, will occur in a member belonging to the kth type of pathological population (disease). The similar information is given by the medical experience and knowledge which expresses the relation between diseases and symptoms also in the form of a conditional probability. For as a rule medical textbooks contain formulations like this: “The patient with acute appendicitis usually has pain in the right lower quadrant of the abdomen, accompanied by vomiting.” What is indicated here is a conditional probability: P (abdominal pain, vomiting/acute

appendicitis).

The difference is that in medical practice those conditional probabilities are-so far-indicated mostly not numerically but approximately: nearly always, usually, very often, often, rarely, almost never. If besides the value of P(X/D,) one knows the absolute probability of occurrence of the kth type of pathological population (disease), i.e. P(D,), then it is possible to use for establishing medical diagnosis the Bayes’ formula :

Usually the value of P(D,) cannot be computed because of the difficulty in determining the population. Even if it could be computed, it would be changed from time to time and place to place. Thus, in many cases one is forced to use the equal probability of occurrence of diseases, though in reality for each disease P(D,) will, in general, be different. Equation (3) allows one to eliminate (2~)~ 3 dy,, . . . , dy, from equation (2) and thus calculate P(X/D,) according to the equation : P(X/D,)

= 1

i ‘,.,. lt23

n

ew(-

R2/2).

The left side of equation (3) represents a conditional probability of the occurrence of the kth disease when a set of attributes X = (x1, x2, . ., x,J has occurred in a patient. The most probable type of disease will be that, for which the probability P(DJX) is the greatest [l-4]. Similar-but not as precise- a probability approach is the basis of classical medical diagnosis. Also in this case a physician has to consider P(D,) and P(X/D,) and chooses the disease which maximizes in his opinion the value of P(DJX).

OPTIMIZATION Before making the final diagnosis one can apply certain optimization rules, which permit one to calculate the “loss” to be associated with each type of misdiagnosis (error), since certain errors may be much more “costly” than others. The optimum decision is that which

Mathematical

model for medical

diagnosis

83

minimizes the expected value of the loss. The expected value of the loss can be calculated on the basis of the loss matrix L and the vector of the a priori probabilities P(D,). The loss matrix L describes the relative cost associated with every misdiagnosis and takes the following form: Diseases D,

DZ

D,

T, T2

0 L 21

L 12 0

L Irn L Zrn

T,

L ml

L In*

0

Decisions

where the rows correspond to the different decisions and the columns to the different diseases. In the matrix, the element L,, is the loss or cost associated with the decision T, while in reality it is D, type of disease. When the decision T, corresponds to the type of disease D, the element of L is zero, since a correct decision implies no loss. The zero elements lie on the main diagonal of L. The variation of the off-diagonal elements, indicating the relative cost of different sorts of errors, is essentially, the error criterion adopted for the optimization. For any type of disease Dk the expected average loss depending upon the decision TI is expressed by the formula

Of course one should choose T1which minimizes the value of j. There are various kinds of loss in medicine which can be taken into consideration by a physician. One criterion may be the death, permanent disturbances, or retrievable social and economic losses of the individual himself. The other criterion relates to infectious disease or disturbance of the lives of other persons as with mental disease-that is, a loss in public health. The elements of the loss matrix will be nearly always a matter of personal interpretation and will reflect the attitudes of the doctor utilizing the decision theory. This fact, however, does not necessarily detract the importance of the solution. CONCLUSIONS The use of digital computers to aid the physician in making a clinical diagnosis has often been suggested in the last decade [l-13]. There is no doubt that with the rapid increase in the volume of data that the physician of the present, and indeed of the future, needs to evaluate the conventional methods of arriving at diagnosis are becoming inefficient. It may be therefore that a partnership will develop between the physician and the computer, in which the computer takes over those tasks that it can do particularly well (due to perfect memory, for example) while the physician uses those skills and aptitudes for which no machine can at present substitute.

84

JOSEF WARTAK

One of the difficulties the physician experiences very often is the lack of knowledge about probabilities that ought to be attached to the data. If he had access to a computer stored with information obtained from surveys of large populations, these probabilities could be supplied to him. The computer, being free from fatigue and emotions. can also help to eliminate some judgemental errors that often result due to physicians’ tiredness, momentary inattention, bias, etc. By combining the special skills of the physician and the technique of the computer a fruitful partnership will one day be achieved. REFERENCES 1. R. S. LEDLEY and L. B. LUSTED, Reasoning foundations of medical diagnosis. Scirncr 130(9), 3366 (1959). 3 R. S. LEDLEY. Use of Corqmtrrs in Biology ad Medicine, pp. 418-446. McGraw-Hill, New York (1965). 3: H. R. WARNER, A. F. TORONTU, L. G. VEASYand R. STEPHENSON.A mathematical auoroach to medical diaenosis-application to congenital heart disease, J. .4t?z. Mvd. Assoc. 177, 177 (1961). ’ ’ 4. H. R. WARNER, A. F. TORONTO and L. G. VEASY. Experience with Bayes’ theorem for computer diagnosis of congenital heart disease, Ann. N.Y. Acad. Sci. 115, 558 (1964). 5. J. E. OVERALL, Configural analysis of psychiatric diagnostic stereotypes, Behat,ioral Sci. 8, 211 (1963). 6. J. E. OVERALL and C. M. WILLIAMS, Conditional probability program for diagnosis of thyroid function, J. Am. Med. Assoc. 183, 307 (1963). 7. J. A. RINALDO, P. SCHEINOK and C. E. RUPE, Symptoms diagnosis: a mathematical analysis of epigastric pain, Ann. Intern. Med. 59, 145 (1963). 8. G. S. LODWICK, L. M. COSMO, W. E. SMTH, R. F. KELLER and E. D. ROBER~ISON.Computer diagnosis of primary bone tumors, Radio!. 80,273 (1963). 9. E. KIMURA, Y. MIBUKURA and S. MIURA. Statistical diagnosis of electrocardiogram by theorem of Bayes, Japan. Hem J. 4, 469 (1963). theory approach to medical diagnosis. Cybernetics 3, 163 (1965). IO. J. WARTAK, An information il. J.WARTAK, Computer-aided differentiation of glycemic curves: an attempt towards increasing the accuracy of diagnosis in diabetology, Dura Acquisirion and P~occssit~g in Bio/ogy und Medicinr. Proc. 1966 Rochester Conference. Vol. 5, p. 337. 12. J. WARTAK, Computer-aided recognition of electrocardiograms. .4cttr Cardiol. 22, 350 (1967). 13. J. WARTAK, Computers ill Elrctroca,diography. C. Thomas, Springfield. Ill. (1971).