E S T I M A T I O N OF M I S S I N G M O N T H L Y T E S T - D A Y R E C O R D S L. D. YAN VLECK A~I) C. R. HENDERSON Department of Animal Husbandry, Corne]l Ihlivers~ty, Ithaca, New York SUI~MARY Three methods, within herd regression, regression ignoring herd effects, and a ratio factor method, of estimating a missing monthly test-day record from preceding or succeeding known test-day records axe compared. The intra-herd regression method is considered to be better than regression ignoring herd effects or the ratio factor method. The regression procedure ignoring herd effects, however, is generally less than 10% less efficient than the intra-herd method and is recommended for use, since population averages are easier to obtain and to use than are the herd averages required for the intra-herd method. Factors are presented for estimating a missing monthly test-day record by the three methods when (1) the preceding monthly record is known, (2) the succeeding monthly record is known, and (3) both preceding and succeeding records are known. The favorable acceptance of D H I A central processing of dairy records has brought with it many problems. One of these is what to do about missing testday records, which oceur frequently, O'Bleness (3). Gross comparison factors obtained from ratios of average monthly test-day records for predicting missing monthly test-day records were presented by Kendrick (2) in 1940. Since that time little attention has a p p a r e n t l y been given to this problem. The purpose of this study was to develop factors to predict a missing monthly test-day milk record when (1) the previous monthly test-day record is known, (2) the succeeding monthly record is known, and (3) both the preceding and succeeding monthly records are known by regression ignoring herd effects (total regression), intra-herd regression, and the ratio of adjacent monthly test-day averages. DATA The ] ) H I A milk records used in this study were from 9,036 Holstein cows in 374 New York herds. Ten monthly test-day records were available for each cow's lactation, beginning with the first month of lactation and ending with the tenth month. The sum of the first ten test-day records was defined to be a complete lactation yield. Actual 305-day yield can be estimated by multiplication of this sum by 30.5, the number of days in the average D H I A month. The monthly records were adjusted to a common age at calving (72-77 too.) and season of calving (December-March) by the use of ratio factors described by VanVleck and Henderson (5). )£ETttODS OF ESTIIV£ATION
Suppose that the test-day record for the ith month of lactation was not reported and that the test-day record for the adjacent jth month is known. Then, a procedure is required for estimating the gth record from the jth. Three methods :Received fo~ publication February 27, 196~1. 1863
L. D. V A N V L E C K
1864
AND
C. R. H E N D E R S O N
will be proposed and discussed. I t will be assumed that the records have been adjusted for all external effects, such as age at calving and season of calving, except herd effects. Method 1. The record of the ,it~ month can be estimated from the intra-herd regression of the i th monthly record on the jth monthly record. The prediction equation is Yil = Iz( -~- B j ( x i -- t~/) ; j = i -- 1 or i + 1 ; where y~l is the predicted monthly record, ~' is the herd average for the i th month of lactation, xj. is the known monthly record, Bj is the intra-herd regression factor associated with xj, and ~j' is the herd average for the jth month of lactation. Method 2. The record of the i th month can be estimated from the total regression (ignoring herd effects) of the i th monthly record on the jth monthly record. The prediction equation is Y~2 = t~ + b~(x] -- t~) or, y~2=a~+b~(Xj); a~=~i-bit~j; j=i-1 or i + 1 ; where y~2 is the predicted monthly record, is the population average for the i th month, xi is the known monthly record, bi is the total regression factor associated with xj, and ~i is the population average for the jth month of lactation. Method 3. The ratio factor method ignoring herd effects of estimating the itb monthly test-day record from the jta monthly record is y~=~-xi=~ #j
where Y~a ~ t~J xj-
is is is is
+ ~---" (xs--~s) ; j = i - - 1
or i + 1 ;
/Li
the predicted monthly record, the population average for the i t~ month, the population average for the jtu month, and the known monthly record.
Bias. Under the condition that the records are corrected for all effects (age, season, year, etc.), except herd effects, Method 1 will give the best linear unbiased estimates of missing monthly test-day records. There are, however, considerations which make this method undesirable. Therefore, the biases of the other two procedures are of interest, so that these biases can be balanced against the objectionable features of Method 1, to determine which procedure to use in the normal operation of a records processing center. Assuming Method 1 is best, the bias of another procedure is the difference between Method I and the alternative method. Then, the bias of Method 2 relative to Method 1 is d12 = Y~I -- Y~e = ( ~ ( - - ~ ) + ( b j t ~ - B i l l ) + ( B s - bi)xj.
ESTII~ATION OF TEST RECORDS
1865
The bias of Method 3 relative to ]Kethod i similarly is dla = m' - B~m' + (Bj -- m ) x~.
~J
The bias of Method 3 relative to Method 2 as shown by H a r v e y (1) is de3 = (b~ -- ~ )
(xj -- ~s)-
Inspection of d12 reveals if the total and intra-herd regressions arc equal, bj = B~ = % then the bias is strictly accounted for by the differences of the herd averages from the population averages. In fact, in that case d12 = ( m ' - t ~ ) + c ~ ( ~ j - t~/). Actually, if c~ is near unity then the bracketed components of the bias tend to balance each other. In practice, it is doubtful whether the bias is corrected by this compensating force. The difference between Methods 1 and 3, d13, also exhibits this compensating influence. I f B s -------~, then the constant bias is m'
tLj /~¢t-t/ , which is a function of the population and herd averages. This
bias would approach zero as the herd averages approach the population averages and, conversely, the bias would increase with an increased disparity between the herd and population averages. The bias of Method 3 relative to Method 2 approaches zero as the total regression coefficient, b j, approaches the ratio of monthly averages, t~-A.
~J
The corresponding estimates of the missing monthly test-day records when both the preceding and succeeding records are known are Yil = t~' + B~-I(x~-I -- t q - / ) + Bi+l (x~+l
-
-
pti+l t) ;
Ym ----m -5 b~-i(x/-1 -- m-l) -~- b~+l(xi+l -- ~ a )
; and
The biases similarly are • d~2 = m' -- m + (b~_a~z - B~lm-~') + (b~+~m+~
--
B
~+~<~• ) +
(B~_~
b~-~)
x~_~ + (B~+~ -- b~+~)x~+~
d 2 3 = ( b~-I
/~i-, +P4 b
i )+ (X~-I l p --/~-1) ~ i + +l (
~i-1-~g~i) (xi*l-~i+l)'[Ai~l
These biases behave similarly to those described previously, when only one adjacent test-day record is known, but are slightly more complex.
FACTORS Ratio factors for predicting a missing monthly test-day record are shown in Table ]. The values presented indicate that an average factor for all months
L. D. VAN VLECK AND C. R. HENDERSON
1866
TABLE
1
Ratio factors for predicting a missing monthly test-day record when (1) th~ preceding test-day record is known, (2) the succeeding test-day record is known, and (3) both the preceding and succeeding test-day records are known Monthly test-day record to be estimated 1 (1) ~ (2) b (3) ~
.... .95 .. . . .
2
3
4
5
6
7
8
9
10
1.05 1.05 52
.95 1.10 .51
.91 1.11 .50
.90 1.13 .50
.89 1.18 .51
.85 1.24 .50
.80 1.50 .52
.67 1.6.0 .47
.62 .... ....
To estimate the test-day record for the i th month, multiply the factor under month i by the test-day record of month i - - 1 . b To estimate the test-day record for the i th month, multiply the factor under month i by the test-day record of month i + 1. c To estimate the test-day record for the ith month, multiply the factor under month i by the sum of the test-day records of months i - - 1 and i + 1.
iS n o t a p p r o p r i a t e , end
months
since the factors are dissimilar, especially for the extreme
of lactation.
When
the preceding
record
is k n o w n ,
the
factors
d e c r e a s e as e x p e c t e d w i t h s t a g e of l a c t a t i o n , b u t n o t a t a c o n s t a n t r a t e . versely, when
the succeeding record
length of the lactation. both adjacent
the factors
Con-
increase with
the
The factors change little with stage of lactation when
monthly
records are known.
average of the two adjacent a close a p p r o x i m a t i o n ,
is k n o w n ,
The usual procedure
of using the
r e c o r d s t o p r e d i c t t h e m i s s i n g r e c o r d is e v i d e n t l y
s i n c e t h e f a c t o r s a r e a l l n e a r 0.5.
sented here do not agree with
those reported
earlier
The ratio factors preby
Kendriek
(2),
the
average lactation curve being much flatter in his study. Total and intra-herd a missing monthly
regression factors are given in Table 2 for estimating
record when
either the preceding
or succeeding record
is
TABLE 2 Total and intra-herd regression factors for estimating a missing monthly test-day record when (1) the preceding test-day record is known and (2) the succeeding test-day record is known a Testday record 2 from 3 from 4 from 5 from 6 from 7 from 8 from 9 from 10 from
(1) a~ i 2 3 4 5 6 7 8 9
11.8 5.5 5.6 5.9 2.7 1.2 -- 0.6 -- 4.0 -- 3.9
b~-~
B~ 1
.84 .86 .81 .79 .83 .82 .82 .81 .83
.70 .80 .75 .73 .79 .79 .80 .81 .83
Testday record 1 from 2 from 3 from 4 from 5 from 6 from 7 from 8 from 9 from
2 3 4 5 6 7 8 9 10
(2) a~
b~+~
B~+~
20.6 10.9 7.8 8.7 8.1 8.8 12.0 13.2 10.5
.61 .86 .95 .92 .9~ .93 .82 .80 .71
.52 .80' .90 .85. .85 .86 .75 .74 .6~8
a a, is the intercept of the prediction equation y,----a~ + bjz~ where y¢ is the predicted test-day record for month i, bj ( j - - - - i - - 1 or i + 1) is the total regression coefficient, and xj is the test-day record of the preceding ( i - - 1 ) or succeeding (i + 1) month. B,-1 and B~÷I are the intra-herd regression coefficients associated with the intra-herd prediction equation y~---~ + B j ( x j - - ~ ) , where m and ~ are the herd averages for the ith and jth test-day records.
ESTIMATION
O~ T E S T
I~ECOI~DS
1867
TABLE 3 Total and intra-hcrd regression factors for estimating a missing monthly test-day record when both the preceding and succeeding test-day records are known ~ Test-day record 2 from i and 3 3 from 2 and 4 4 from 3 and 5 5 from 4 and 6 6 from 5 and 7 7 from 6 and 8 8 from 7 and 9 9' from 8 and 10
Total regress4on at
2.8 1.0 1.9 2.8 1.6 1.5 0.8 --1.5
Intra-herd regression
b ~-~
b ~÷~
B ~_~
B ~. I
.31 .45 .49 .40 .47 .54 .53 .54
.69 .56 .47 .56 .51 .40 .4.7 .~3
.29 .¢4 .4'8 .39 .45 .53 .5.1 .54
.67 .56 .~6 .54 .5.1 .39 .~7 .4,2
a at is the intercept of the prediction equation y~ = a~ + b~_~x~_l + b~+~x~+,, where b~-z and b,÷l are the total regression coefficients associated with x~-i and x~+l, the test-day records of the preceding and succeeding months, respectively. B~-I and B~÷~ are t~he intra-herd regression coefficients associated with the within-herd predictlo.n equation y~ = ~ , + B ~ - l ( x ~ - l - - ~ - ~ ) + B ~ + ~ ( x ~ ÷ ~ - - ~ + ~ ) , where ~t, ~_~, and ~r+~ are the herd averages for the test-day records of months of lactation i, i - 1, and i ~-1, respectively. known. T a b l e 3 p r e s e n t s t h e r e g r e s s i o n f a c t o r s f o r the case w h e n b o t h t h e p r e c e d i n g a n d s u c c e e d i n g m o n t h l y r e c o r d s a r e k n o w n . I t is a p p a r e n t t h a t e x c e p t f o r t h e l a t t e r s i t u a t i o n t h e t o t a l a n d i n t r a - h e r d regressions, in g e n e r a l , a r e n o t in close a g r e e m e n t . T h e n t h e b i a s of t h e t o t a l r e g r e s s i o n p r o c e d u r e r e l a t i v e to t h e i n t r a - h e r d p r o c e d u r e d e p e n d s on t h e k n o w n r e c o r d as well as t h e h e r d a n d p o p u l a t i o n m o n t h l y a v e r a g e s . S i n c e t h e difference b e t w e e n B~ a n d b i is nu~ m e r i e a l l y small, m o s t of the bias d e p e n d s on t h e d i f f e r e n c e s b e t w e e n the h e r d a n d p o p u l a t i o n ~verages f o r t h e m o n t h l y r e c o r d s . S u p p o s e , as a n e x a m p l e , t h a t t h e p o p u l a t i o n a v e r a g e t e s t - d a y r e c o r d f o r the fifth m o n t h of l a c t a t i o n is 46.7 a n d f o r t h e f o u r t h m o n t h , 51.7. F u r t h e r s u p p o s e t h a t t h e h e r d a v e r a g e s f o r t h e h e r d i n w h i c h a m i s s i n g r e c o r d is to be e s t i m a t e d a r e 35.0 a n d 30.0 f o r t h e f o u r t h a n d fifth m o n t h s , r e s p e c t i v e l y . N o w s u p p o s e t h a t t h e fifth m o n t h l y r e c o r d of a cow in t h i s h e r d is 31.0 a n d t h a t t h e f o u r t h r e c o r d is missing. T h e t h r e e e s t i m a t e s a r e Y41 = 35.0 + .85(31.0 - 30.0) -- 35.85 Y4e --- 8.7 + .92(31.0) = 37.22, a n d y43 = 1.11(31.0) = 34.41. A s s u m i n g Y4, is t h e b e s t u n b i a s e d e s t i m a t e , t h e biases a r e d12 = 1.37 a n d d ~ 3 - 1 . 4 3 . T h e s e b i a s e s a r e a b o u t 4 % of t h e b e s t estimate. The total and intra-herd correlations between the estimated missing monthly r e c o r d s a n d t h e a c t u a l r e c o r d s a r e g i v e n in T a b l e 4. T h e t o t a l a n d i n t r a - h e r d c o r r e l a t i o n s a r e n o t a p p r o p r i a t e , however, f o r c o m p a r i n g t h e a c c u r a c y o f the two p r e d i c t i o n p r o c e d u r e s , V a n V l e c k a n d H e n d e r s o n ( 4 ) . T h e r e l a t i v e efficiency o f t h e t w o p r o c e d u r e s c a n be m e a s u r e d b y the r a t i o o f t h e i r r e s i d u a l v a r i a n c e s . T h e r e s i d u a l v a r i a n c e of t h e p r o c e d u r e , i g n o r i n g h e r d effects, is d e n o t e d b y V1 a n d t h a t of t h e i n t r a - h e r d p r o c e d u r e b y Ve. S i n c e i t is e x p e c t e d t h a t t h e i n t r a h e r d p r o c e d u r e h a s t h e s m a l l e r v a r i a n c e , t h e r e l a t i v e efficiency is m e a s u r e d b y E = (V1/Vf)(100). The r e l a t i v e effleiencies, as w e l l as t h e r e s i d u a l s t a n d a r d
TABLE 4 Total and intra-herd correlations between estimated and actual m i s s i n g m o n t h l y test-day records, residual s t a n d a r d errors of the estimates, and the relative efficiency of total regression and intra-herd regression estimates ~ Preceding record k n o w n
Succeeding record known
R~
V1j12
R2
V2~/~
2
.72
9.66
.60
9.13
112
1
.72
8.23
.60
3
.86
7.01
.80
6.81
106
2
.86
7.02
.80
4
.88
6.15
.82
5.92
108
3
.88
6.67
.82
6.50
105
4
.92
5.0.5
5
.86
6.12
.79
5.92
107
4
.86
6.61
.79
6.38
107
5
.91
4.79
E
Monthly record estimated
Preceding and succeeding known
Monthly record estimated
R~
V~~/2
R2
E
Monthly record estimated
7.87
109
2
.89
6.45
.83
6.3'8
102
6.82
106
3
.92
5.56
.88
5.50
102
.88
4.99
102
.87
4.73
103
V~112
1~
V~~/2
R2
V2~/2
E
6
.87
5.41
.82
5.32
103
5
.87
5.73
.82
5.53
107
6
.92
4.36
.88
4.33
101
7
.87
5.14
.83
5.03
104
6
.87
5.48
.83
5.24
109
7
.91
4.30
.88
4.26
102
8
.82
5.95
.78
5.81
105
7
.82
5.96
.78
5.62
112
8
.90
4.50
.88
4.46
102
9
.80
6.27
.77
6.12
105
8
.80
6.24
.77
5.83
115
9
.88
4.90
.87
4.84
105
.77
7.22
.75
7.11
103
9
.77
6.70
.75
6.40
llO
1O
..........................
E1 is the correlation i g n o r i n g herd effects between the predicted an4 actual test-day record, g , 1/~ is the residual s t a n d a r d deviation f r o m estimation i g n o r i n g h e r d effects. R2 is the intra-herd correlation between the predicted an4 actual test-day record. V#/2 is the residual s t a n d a r d deviation f r o m intra-herd estimation. E is the ratio ( V 1 / V = ) ( 1 0 0 ) which expresses the relative efficiency of total regression to intraherd regression prediction of m i s s i n g test-day records. a
ESTIMATION
OF
TEST
REC01~DS
1869
deviations, are also presented in Table 4. The relative efficiencies indicate that the intra-herd procedure is only slightly better for nearly all months. When both adjacent records are known, the efficiencies of the two procedures are very similar. These comparisons indicate that the increased accuracy of the intraherd procedure probably does not overcome the disadvantage of requiring the herd averages, which are difficult and expensive to obtain by records processing centers. CONCLUSIONS
The intra-herd regression method is the best of the three procedures considered for predicting a missing monthly test-day record from a preceding or succeeding test-day record. The slight additional accuracy of prediction, however, is outweighed by the disadvantage that herd averages for the monthly test day records are required. Since population averages are easier to obtain and are easier to use than herd averages, prediction ignoring herd effects of missing monthly test-day records is preferable to intra-herd regression estimation. REFERENCES
(1) HAlve, W. R. Problems to Consider in Determining Approp~date Extension Factors for Incomplete l~ecords. Unpublished report. 1959. (2) K~NDZreK, J. F. The Cow T e s t e r ' s Manual. USDA Misc. Publ. 359. U.S.G.P.O., Washington. 1940. (3) O'B~N~SS, G. V. Personal communication. 1960. (4) V.~NVL~ct;, L. D., ~ND HE'ND~SON, C. R. Extending P a r t Lactation Milk Records by Regression Ignoring Herd Effects. J. Dairy Sei., 44: 1519. 1961. (5) VANVLEcK, L. D., AND HENDEI~SON, C. ]:~. Ratio Factors for A d j u s t i n g Monthly Test-Day D a t a for Age and Season of Calving and Ratio Factors for Extending P a r t Lactation Records. J. Dairy Sei., 44: 1093. 1961.