A&s oralBiol.Vol. 15,pp. 869-877, 1970.Pergamon Press.Printedin GreatBritain.
A MARKOV MODEL OF DIAGNOSTIC ERRORS IN ASSESSMENT OF CARIES EXPERIENCE K. H. Lu Department of Biostatistics, University of Oregon Dental School, Portland, Oregon 97201, U.S.A. Summary-Two models of diagnostic errors in caries experience assessment areexamined in terms of their theoretical relationship and their maximum difference in caries increment. Assume that p is the proportion of surfaces over which all diagnoses are certain to be correct, then (1) the independent model assumes all diagnoses over the proportion l-p is pure guesswork regardless what the true condition of the surfaces may be and (2) the dependent model assumes that over l-r, the diagnoses are more likely to be carious than non-carious if the surface condition is truly carious and vice versa. The study shows that the dependent model approaches the independent as a limit, i.e., the independent model is the eigen solution to the dependent model. The maximum difference in the estimate of caries increment is negligible if p is large. ALTHOUGH various investigators have been aware of the problem of diagnostic errors and devised methods of handling data of this type during the past decade or so, theoretical inquiries into the nature of such errors by mathematical models are of much more recent origin. Lu (1968) suggested a model based on the postulates that: (1) a population of tooth surfaces consists of:
r1 = proportion of surfaces whose true conditions are carious both at t, and t,. r2 = proportion of surfaces whose true conditions are non-carious both at tI and t2. r3 = proportron of surfaces whose true conditions are non-carious at t, and carious at t,. Such that CT, = 1 (2) there exists a portion of surfaces p over which diagnosis are certain to be correct, and over the portion I-p, the probabilities of correct and incorrect diagnoses are equally likely, regardless of the true condition of the surface. CARLOS
and SENNING (1968) proposed a model defined with the following para-
meters : (1) v = “The proportion of ‘teeth’ (or surfaces) that are truly caries-free at examination t ” = 7r2 + 7r3 (Lu’s notation), (2) h = The average probability of attack by caries during the interval (t, t + 1) over all teeth (or surfaces) which were caries-free at t = ---3 A.O.B.
15/9-E
(Lu’s notation). 869
870
K. H. Lu
(3) E = The probability that the true state of a tooth (surface) will be misclassified at t or t + 1 El-P_ $(I - CL)= *(l - tL>(Lu’s notation). Since each model estimates three parameters, and the parameters of one model may be expressed in terms of those of the other, the two models are equivalent. While both models succinctly pointed out the inadequacy of current methods of handling data containing diagnostic errors, they both rest on the assumption that diagnostic errors occur independently of the true condition of the surfaces, as implied by the last postulates of both models. It may be argued that diagnostic outcomes should be dependent on the true condition of the surface, i.e., a truly healthy surface is more likely to be diagnosed as a healthy one than as a carious one and vice versa. We shall call this the dependent model and Lu’s 1968 model the independent model. Obviously the assessments of caries experiences based on these two models would differ somewhat from each other. It is the purpose of this paper to render a theoretical investigation on the two models with regard to (1) their mathematical relationship from the Markov chain point of view and (2) the maximum difference in their estimates of caries increment.
THEORETICAL
CONSIDERATION
Suppose we are concerned with the presence or absence of caries lesion on a surface at two points in time, t1 and tz, over a population of surfaces. Let the population be represented by a vector 17 = (?T~rT27r3r4) where r1 = the proportion of surfaces each of which has a carious lesion both at t1 and tz ; this condition is hereafter referred to as state (11). ?T~= the proportion of surfaces none of which has a carious lesion at t1 and tz ; this condition is hereafter referred to as state (00). r3 = the proportion of surfaces each of which has no carious lesion at t1 but has it at tz ; this condition is hereafter referred to as state (01). 1~~= the proportion of surfaces each of which has a carious lesion at t1 but does not have it at tz; this condition is hereafter referred to as state (10). Let us postulate that there exists a proportion p in each ?T~such that the diagnosis for the presence or absence of carious lesions is certain to be correct i.e., the transition probability matrix is : Diagnoses probability Diagnostic state matrix 11 00 01 10 11 MI = Intrinsic 00 01 state 10 0 0 0 /.L
A MARKOV MODEL OF DIAGNOSTIC
Consequently, diagnoses over the proportion specified by the transition matrix :
P(Z-M) =
;
i
;
871
ERRORS
(1 - p) for each 7~ is subject to error as
O[
Yj
~l~;pl;p~~~
01 a, b, c _< 1.
wherea+b+2c=l
We note that P is a symmetric transition matrix. By the way P is defined above, it implies the following restrictions : (1) P {S(C) --f S(ii)} = P{S(ij) -+ S(ij)} = P{S(ji) -+ S(J)) (2) P {S(ii) + SW)} = P (S(ij) -+ S(ji)} = b (3) P {S(ij) -+ S(G)} = P {S(ii) -+ S(Q)} = P {S(ii) --f S(j)) P @(ii> --f S(jj)} = c i, j = 0, 1 i # j,
= a =
where (1) “a” denotes the probability that the diagnoses at tr and t, agree with the true state. (2) “b” denotes the probability that diagnoses differ from the true state both at tl and tZ. (3) “CT”denotes the probability that diagnoses agree with the true state at tl, differ at t2, and vice versa. Since diagnostic errors can only arise from the (1 - p) part of the population, we shall focus our attention to the matrix P. While it is convenient to postulate formally the conceptual existence of the elements of matrix P, namely a, b, and c, it is a quite difficult matter to obtain estimates of them experimentally. Due to the nature of the diagnostic process, a, b, and c vary from population to population, from criteria to criteria, and from examiner to examiner. There is no set of diagnostic outcome that can be considered as the absolute standard. Any descriptive states assigned to any group of surfaces are subject to error. In short, the true states of any group are in an existential sense not knowable. The problem is then that in face of our inability to ascertain the elements of P, what in the long run are the expected values of a, b, and c? In the narrative to follow, we shall show that for all possible values of u, b, and c such that a + b + 2c = 1, in the long run all lead to the expectations that a = b = c = Q. I. The mathematical relationship between the two models: It can be shown that the power of matrix P has no zero entries, thus it implies that P is a transition matrix of a regular Markov chain. It is well known that
limp”= n+co
P* = fa
812
K. H. Lu
where a = (a1 a2 a3 a& and 6 is a column vector with four elements identically equal to unity. It is interesting to note that 1 1 1 ii 1
(a_atUP=(aa1a)
(a1
a2
a3
4
Since each column of P sums to 1, the left hand of the equation becomes (;I t t &), thus, (t a a t> = 1 * (a1 ta
t
a
a> =
(a1
a2
a2 a3
a3
4
4
conclude that matrix P approaches its limiting form with rows identically equal to (4 3 $ &). Furthermore, had an eigen analysis been performed for P, it would have found that (&t &$) to be an eigen vector of P with eigen value 1 regardless what values the unknowns a, b, and c may assume. There are other eigen vectors associated with P; however, the vector (4 $ a &)is the only one with nonnegative elements satisfying the Markov condition Cu, = 1. The limiting form of P is of special interest for our present problem because it implies that regardless in what true state our observation may start, in the long run the expected probabilities of diagnostic outcome are identically the same, i.e., &.In other words, it implies we
Pr {the diagnoses agrees with true condition at t*} = Pr
{diagnoses differs from true condition at tl} = 4. That is to say, the diagnosis over the (1 - CL> portion of the population is pure guesswork. Indeed, this is the second postulate of Lu’s model (1968). The importance of this finding is that it provides the theoretical link between the independent and the dependent models. The latter approaches the former as a limit. The practical meaning of the limiting vector of P is that the examiner’s diagnostic behavior, in the long run, may be safely approximated by the limiting probabilities, i.e., a = b = c = a, especially, when the examiner is engaged in the experiment for a fairly long time. The above remarks are also useful in another sense. That is, since there is no satisfactory way of estimating an individual examiner’s values of a, b, and c, one is forced to seek the best unbiased estimate of them. It is clear that the expected values of a, b, and c are identically a, hence it is indeed the only unbiased estimate available under the uncertainty principle. II. The maximum differences in caries increment estimates : For the entire population, the intrinsic-to-diagnosis transition matrix A is as follows: P + (1 - p)a A=IM+P(I-MM)=
(1 - tL)b (1 - t+ (1 - cL)c
(1 - /4b p+(l
(1 - t4c
(1 - cL)c
(1 - dc - da (1 - t+ (1 - dc p+(l - da (1 - t4b (1 - cL)c (1 - tL)b CL+(~ - da i
A MARKOV
MODEL
Let K = (k, k2 kJ k4) be the diagnosis kl = the proportion of population kz = the proportion of population k3 = the proportion of population k4 = the proportion of population
OF DIAGNOSTIC
873
ERRORS
vector where with diagnosis with diagnosis with diagnosis with diagnosis
state state state state
(11) (00) (01) (10)
then
When A is non-singular, the system of simultaneous equations associated with matrix A is formally solvable. However, there are other restrictions which must be satisfied. We see that there are only four known quantities, namely, the kl’s. Since Zkl = 1, there are three independent observed quantities; a unique solution is possible only if the system of equations consists of exactly three unknowns. For instance, the solution is possible for I7 if the transition matrix A is completely specified apriori. On the other hand, if one of the 7r1’sis known, then it is possible to allow one of the parameters in A to be estimated from the matrix equation. Now, if an element of II, say 7r4 = 0, then the equation IIA = K has a unique solution in the form I7 = A- lK in terms of a, b and c. The case where 7r4 = 0 implies that once a carious lesion is present on a tooth surface, the surface cannot restore itself to the healthy state. Thus I7 = (7r1?rz 7r3
0)
while 9r4 is known a priori, due to diagnostic error we may still observe k4 > 0, which consists solely of diagnostic errors. The equation l7=
A-lK
yields four equations (only three are independent) in three unknowns as follows: h = rl P + (1 - ~1 (anI + br, + cr3) k, = 72 CL+ (1 - CL)(br, + ax, + 07,) k3 = ~3 P + (1 --CL)(crl + ca2 + ar3) k4 = (1 - P) (crl + crz + W.
Solving the above system of equation for n1 and p in terms of a, b, and C, we have 771 = P
-I- (1 - p) (a - b) + k, - k2 - k, + k4 2 IP + (1 - 1-4(a -
WI
~~ _~CL+ (1 - P) (a - 4 - kI + k2 2 [P + (1 - P) (a - @I
k3 + k4
K. H. Lu
874
It is noteworthy that when u = b = c, the above equations reduce to Lu’s previous finding : 7ri
*_ --
ki - k., and p = 1 - 4k,,
in the independent model (1968), as they should. As mentioned earlier, the differential probabilities Q, b, and c are unknown and difficult to assess experimentally, the independent model where a = b = c = & has been used in practice. It is therefore of interest to examine the maximum deviation resulting from the independent model when it is used in lieu of the dependent model. We shall concentrate on the estimate of the caries increment, namely, the n3’s from the two models. Let the ratio of estimates R be defined as follows :
R=y33=
4
CL +
= p +
(1
k3 - k .(1 - 1.4(a - b) k3
-:I (a -
”k‘s
b)’
We see that R is a function of p and diagnostic probabilities u and b, such that when R < 1 if a > b, p < 1; also
R>lifu 1 2/_& - 1
b-+1
A MARKOV
MODEL
OF DIAGNOSTIC
875
ERRORS
In this case, the examiner always makes the incorrect diagnosis, the independent model underestimates the increment as compared to the dependent model by
l-__!L 2j.L- 1
= -P-l
2Z.J- 1
at its worst.
Again if p is known to be close to unity, the discrepancy becomes negligible. From the above discussion, it is clear that the most efficient means to ensure the successful adaptation of the independent model is to have ~1as close to unity as possible. A high magnitude of Z.L can be achieved in two ways: (1) Clear and accurate diagnostic criteria set forth by the examiner such that only small portions of the surface may fall into the uncertain portion (1 - p). (2) Faithful adherence to the criteria by the examiner during the course of the experiment.
CONCLUSION
I. Consider a population of surfaces at two points in time, f1 and tz, such that the population may be grouped into four subgroups according to the presence or absence of a carious lesion on each of its members. The four subgroups are: (1) 7r1, the proportion that has a carious lesion both at t1 and fz, S(ll); (2) rr,, the proportion that does not have a carious lesion both at t1 and tz, S(O0); (3) 7r3,the proportion that does not have a carious lesion at t1 but has it at tz, S(O1); and (4) 9r4, the proportion that has a carious lesion at t1 but does not have it at fp, S(10). II. It is postulated that: 1. There exists in each q a proportion
ZLsuch that the diagnosis is certain to be correct, i.e., the transition matrix is MI. 2. Over the proportion 1 - p the transition matrix is (I - iU)P where
P= a+b+2c=l. This is called the dependent model as compared to the independent model where a = b = c = ). III. The transition matrix for the entire population is A=Mz+(z-M)P Let K = (k, k2 k3 k4) be the proportions denoting proportions comes (1 I), (00), (Ol), and (10) respectively, then
ZZ= A-‘K
of diagnostic out-
876
K. H. Lu
has a unique solution for r1 ?r2 7~~and p if r4 is known to be zero, i.e., true reversal state (10) cannot occur. The solutions are: 721= P + (1 - P) 6~ - b) + kl - k2 - k3 + k4 2 [P + (1 - P) (a - @I “2 = P + 0 - ~1 (a - 8 - h + k2 - ks + k, 2 G + (1 - PI (a - @I 773= p=t{“;”
ks - ka CL+ (1 - PI (a - 6) I
J[(c,“4)‘+4[k3(~~,“,:b:“-““]}.
When a = b = c = ), the r1 and TVreduce to 7r: = ~ k’ - k4 (i = 1,2,3) and P p = 1 - 4k4 as in the independent model by Lu. The maximum difference between 7r3* and 7r3 is 1 - p when a -+ 1 and CL-1 asb-tl. 2/L - 1 IV. It can be shown that lim P” = &z n-too
where f is a unit column vector and a = (& &4 )), an eigen vector of A, for all values of p, a, b, and c. The important conclusion arrived at here is that the limiting form of A corresponds to the transition matrix where diagnostic outcomes are independent of true condition of the surfaces in the proportion (1 - r). Since in actual populations the values of p are usually high, the limiting model is an excellent approximation of the more “structured” dependent model. R&nn&-Deux types d’erreurs de diagnostics, dans l’evaluation de la frequence carieuse, sont Studies en ce qui concerne leur rapport theorique et leur difference maximale en augmentation carieuse. En considerant que p est la proportion des surfaces au niveau desquelles le diagnostic est certainement exact, dans ce cas (1) le type independant suppose que tous les diagnostics dam la proportion de 1 - p sont incertains, quelque soit Nat reel de ccs surfaces et (2) le type relatif suppose que pour 1 - p, les diagnostics sont probablement carieux, plutot qu’indemnes, si l’ttat de la surface est effectivement carieux et vice versa. Cette etude montre que le type relatif s’approche du type independant a sa limite, c’est a dire que le type independant est la solution propre du type relatif. La difference extreme dans l’appr&ciation de l’augmentation carieuse est negligeable si p est grand.
A MARKOV
MODEL OF DIAGNOSTIC ERRORS
817
Znsammwrfassung-Fur die Erhebung des Kariesbefalls werden zwei Modelle diagnostischer Irrtiimer hinsichtlich ihrer theoretischen Beziehung und der grijl3ten Unterschiede im Karieszuwachs untersucht. Angenommen, da13 p die Obertliichenproportion ist, fur die alle Diagnosen mit Sicherheit korrekt sind, dann setzt (1) das unabhiingige Model1 voraus, da13 alle Diagnosen bei der Proportion 1 - p eine Vermutung darstellen ohne Rticksicht darauf, wie die wahre Beschaffenheit der Oberlltiche sein kann. Das abhangige Model1 (2) nimmt an, dal3 die Diagnose bei der Proportion 1 - ~1wahrscheinlither Karies als Nichtkaries ist, wenn die Obertlachenbeschaffenheit tatsachlich Karies ist und umgekehrt. Diese Untersuchung zeigt, dal3 das abhangige Model1 sich dem unabh&ngigen als Grenze niihert; das bedeutet, das unabhangige Model1 ist die Losung zu dem abhangigen Modell. Der grol3te Unterschied bei der Bestimmung des Karieszuwachses ist hinfallig, wenn ~1grol3 ist.
REFERENCES CARLOS, J. P. and SENNING, R. S. 1968. Error and bias in dental clinical Trials. J. dent. Res. 47, 1, 142-148. KEMENY,J. G. and SNELL, J.L. 1960.Finite Murkov Chains. Van Nostrand, New York. LANCZOS, C. 1956. Applied Analysis. Prentice Hall, Inc. Englewood Cliffs, New Jersey. Lu, K. H. 1968. A critical evaluation of diagnostic errors, true increment, and examiner’s accuracy in caries experience assessment by a probabilistic model. Arch oral Biol. 13, 1133-1147.