Laboratory Data in Clinical Trials: A Statistician's Perspective

Laboratory Data in Clinical Trials: A Statistician's Perspective

Laboratory Data in Clinical Trials: A Statistician’s Perspective Christy Chuang-Stein, PhD Clinical Development Biostatistics I, Pharmacia and Upjohn ...

126KB Sizes 9 Downloads 86 Views

Laboratory Data in Clinical Trials: A Statistician’s Perspective Christy Chuang-Stein, PhD Clinical Development Biostatistics I, Pharmacia and Upjohn Company, Kalamazoo, Michigan

ABSTRACT: Even though laboratory data provide the best indicators for systemic toxicities in clinical trials of investigational medications, many applied statisticians lack a basic understanding of the interpretation of such data. Understanding is essential to a statistician’s ability to help evaluate a patient’s overall safety experience in a trial, the latter being the primary objective for collecting laboratory data in the trial. In this paper, we discuss the purpose of conducting laboratory evaluations as well as some hidden issues concerning the current practice of laboratory data analysis. The issues include the use of reference ranges, the one-parameter-at-a-time approach, and the exploratory nature of safety data analyses. Controlled Clin Trials 1998;19:167–177  Elsevier Science Inc. 1998 KEY WORDS: Laboratory toxicity, multicenter clinical trial, reference range, safety profile

INTRODUCTION Laboratory specimens are routinely collected and assayed in clinical trials to ensure that patients are not experiencing any untoward toxicities. Laboratory data are by far the best indicators of systemic toxicities, and they therefore provide vital information regarding a patient’s safety. These data are important because of their objectivity and relationship to organ functionality. Laboratory abnormalities often serve as early warnings of undesirable clinical signs or symptoms that surface later. The importance of such data in the clinical trial setting leads to their frequent collection in almost all clinical trials that evaluate new therapies. Such trials typically have prespecified schedules to collect laboratory specimens for safety evaluations. Moreover, investigators participating in a trial frequently order unscheduled blood draws to follow up on a toxicity previously identified in the laboratory or to confirm a suspicious clinical observation. Results from laboratory safety evaluations help physicians decide on treatment strategies for their patients. This paper was presented at an invited paper session entitled Hidden Issues Concerning Laboratory Data Evaluation at the Joint Statistical Meetings on August 4, 1996, in Chicago, IL. Address reprint requests to: Dr. Christy Chuang-Stein, Clinical Development Biostatistics I, Pharmacia and Upjohn Company, Kalamazoo, MI 49001. Received November 14, 1996; revised April 1997; accepted August 7, 1997. Controlled Clinical Trials 19:167–177 (1998)  Elsevier Science Inc. 1998 655 Avenue of the Americas, New York, NY 10010

0197-2456/98/$19.00 PII S0197-2456(97)00123-2

168

C. Chuang-Stein

Despite the extensive use of laboratory data by physicians, many pharmaceutical statisticians do not understand how physicians use and obtain laboratory data or how to present the data in a way that facilitates the decision-making process. The laboratory Data Working Party of the PSI (Statisticians in the Pharmaceutical Industry) concluded from a survey of its members that pharmaceutical statisticians analyze laboratory data without much thought [1]. Because of the overwhelming emphasis on evaluating efficacy results in clinical trials and the lack of explicit requirements from regulatory authorities on laboratory data summarization, pharmaceutical statisticians devote too little effort to understanding laboratory data. Other factors contributing to this phenomenon include a basic lack of knowledge of the biology for laboratory data by the pharmaceutical statisticians and the scarcity of studies designed from the safety perspective. This paper addresses the above deficiencies. It begins with an example concerning the utility of laboratory data by showing some correspondence between laboratory parameters and body systems. Understanding such a correspondence can greatly help statisticians appreciate the predictive value of laboratory data. The paper then comments on the current practice of using reference ranges to interpret laboratory data and the appropriateness of such a practice in the clinical setting. In addition, the paper points out how the clinical trial setting differs from a tertiary care environment and how this difference affects the role of laboratory data evaluations. The paper concludes with a general discussion. IMPLICATIONS OF LABORATORY PARAMETERS Laboratory data are routinely used to predict potential organ dysfunctions. To appreciate the predictive value of laboratory results, we need first to understand the correspondence between laboratory parameters and organ functions. Table 1 gives an example of such correspondence. Depending on the disease and the treatment under consideration, physicians will rely on some variables in Table 1 more heavily than on others. Some laboratory parameters such as gamma glutamyltransferase (GGT) and creatine kinase (CK) are quite variable, as transient factors can greatly influence their values. On the other hand, some parameters are very specific to the organ functions with which they are associated. An example is pancreatic amylase, a very reliable indicator of pancreatitis. Because laboratory abnormality often precedes clinical symptoms, laboratory data often serve as the first clue to the upcoming clinical signs or symptoms. For example, elevated amylase or lipase often predict pancreatitis; elevated SGOT, SGPT, and bilirubin can warn of liver function failure; increased creatinine and BUN are frequently early indications of renal dysfunction. INTERPRETATION OF LABORATORY DATA Because laboratories frequently use different equipment and assay procedures, laboratory data are commonly interpreted using a reference range, that is, a range of laboratory results within which the likelihood of a biochemical abnormality in a patient is small. This concept arose from the need to identify diagnostically useful laboratory values to help guide general practitioners to initiate or discontinue particular treatments for their patients. Reference ranges are useful for diagnosing a potential abnormality from a normal state.

169

Laboratory Data in Clinical Trials

Table 1

Some Laboratory Parameters and Corresponding Body Systems

Body System

Laboratory Parameters

Hematology

Bone marrow function Hemoglobin WBC Differentials Platelets Coagulation Prothrombin time (PT) Partial PT Fibrinogen Fibrin split product Methemoglobin Hematology Prothrombin time Chemistries Bilirubin Enzymes AST (SGOT) ALT (SGPT) GGT Alkaline phosphatase Enzymes Amylase Lipase Chemistries Blood urea nitrogen (BUN) Creatinine Urinalysis Blood Protein Glucose WBC RBC Casts Bacteria, fungus Chemistries Calcium Enzymes Alkaline phosphatase Creatine kinase Chemistries Sodium Potassium Glucose Calcium Chloride CO2 Magnesium Phosphate Albumin Total protein

Hepatic

Gastrointestinal Renal

Musculoskeletal

Metabolic/Endocrine

170

C. Chuang-Stein

Evaluation of results from clinical trials relies heavily on reference ranges. The Laboratory Data Working Party of the PSI [1] reported that all responders in its survey used reference ranges provided by the analyzing laboratories which, in turn, often use the ranges supplied by the manufacturers. In the analysis of clinical trial results, reference ranges are routinely used to flag abnormal laboratory values and to construct protocol-defined grades for laboratory toxicities. Despite their prevalent usage, however, the Laboratory Data Working Party reported that very few pharmaceutical statisticians know how the ranges are determined. In this paper, we explain that while reference ranges adequately serve the diagnosis-oriented need in individual patient care, they fall short of meeting the challenges of a clinical trial that uses multiple laboratories to process the laboratory specimens. Determination of a reference range requires three major classes of decisions— analytical, clinical, and mathematical, all of which add to the variability of the constructed range. First, we consider analytical decisions. Although analytical accuracy is a highly desirable feature of an assay, it is an ill-defined target. When setting up an assay, a laboratory accounts for the clinical intentions of its users and selects the operating characteristics of its assay accordingly. Because laboratory assays are traditionally used to diagnose disease, a laboratory tends to balance the reagents so that the resulting biochemistry is the most precise at or near the points where the physician makes clinical decisions. These points are usually the upper and lower ends of the reference population range. On the other hand, a clinical trial laboratory, in its concern for safety, should optimize its assay precision to ensure that the precision outside the reference range is at least as good as that at the upper and lower ends of the range in order to measure values outside the range with enough precision to characterize the extent of the abnormality. In addition, in some situations a clinical trial laboratory measures parameters used for efficacy evaluations. The difference in required precision leads to different assay specifications in different laboratories. Next, we consider clinical decisions. Using a laboratory test diagnostically requires that a clinician have a high degree of confidence that his/her interpretation of the results based on the quoted reference range could correctly identify actions needed to complete the diagnosis or initiate a treatment. Unfortunately, the analyte level of many disease entities is very close to, or even overlaps, the range of values observed in a normal, healthy population. In other words, a level consistent with a nondisease state for one person may indicate disease in another. Therefore, two questions influence the determination of the limits of a range: (1) What degree of confidence does one want to have that a result outside the reference range is indeed abnormal (sensitivity)? (2) What degree of confidence does one want to have that a result inside the reference range is indeed normal (specificity)? Because sensitivity and specificity work against each other, they need to be defined in advance. Which rate carries more weight in determining the limits of a range depends very much on a laboratory’s clientele. Concerns about false positive and false negative rates can lead to different reference ranges and therefore can contribute to a disparity of range values even when all the other factors involved in the range construction are the same.

Laboratory Data in Clinical Trials

171

Third, we consider mathematical decisions. Another important source of variability comes from the fact that no accepted set of guidelines is available for defining a reference population or mathematically constructing the reference range from a set of laboratory values. A long-used and common approach is to analyze 40 laboratory or hospital employees who are self-diagnosed as healthy, exclude the highest and the lowest results, and call the range defined by the remaining results a 95% reference range. The standard errors of the upper and lower limits of this range are derived from the theory of order statistics. A range constructed this way carries the flavor of a tolerance interval; however, it falls short of the requirements of the latter, as our subsequent discussion of tolerance limits indicates. One can refine this method by including a medical history and/or a physical exam to obtain a more complete assessment of a volunteer’s health. One can also increase the number of subjects. When the data look reasonably normal, a normal approximation is generally used, and constructing a reference range is often equated with constructing a confidence interval using 95% as a common choice for the confidence coefficient. Royston and Matthews [2] used this procedure to construct a 95% confidence interval for an individual. Results in DiCiccio and Efron [3] suggest that bootstrap methods can improve this standard procedure. When data are skewed, as is often the case with biochemistry parameters, various other methods are employed to construct the ranges. These methods include transforming the data to make them appear more normal. The lack of consensus on how reference ranges should be mathematically constructed leads to different ranges, even with the same set of laboratory results. Because of all the above complications associated with reference ranges, Oliver and Chuang-Stein [4] suggested that one not rely totally on the reference ranges supplied by the manufacturers when multiple laboratories are used in a trial. Instead, they recommended using the results of proficiency surveys to normalize results when summarizing data from different laboratories. A proficiency survey is a test required of all accredited laboratories by the Clinical Laboratory Improvement Act of 1988; the testing is done several times a year. Under this testing program, all participating laboratories are asked to analyze a common set of specimens. By comparing the results from different laboratories, one can devise an algorithm to adjust those results and make the adjusted results compatible across laboratories. Using a central laboratory in a multicenter trial can alleviate the need to adjust laboratory results in order to obtain summary statistics on the amount of change across all centers; however, the ambiguity inherent in the range construction affects the value of the ranges supplied by the central laboratory. The next section further evaluates the value of the manufacturer-supplied ranges in the clinical trial setting. THE ROLE OF LABORATORY DATA IN CLINICAL TRIALS Thompson et al [5] suggested three major reasons for routine laboratory tests in clinical trials: (1) to screen patients for inclusion in a trial; (2) to see if the new treatment under investigation changes individuals’ preexisting biochemistry in an undesirable fashion; (3) if the new treatment does change individuals’ preexisting biochemistry, to determine whether the extent of change

172

C. Chuang-Stein

among patients receiving the new treatment is comparable to that induced by its comparator in a controlled study. In other words, laboratory tests in clinical trials ensure the continued safety of patients after they receive the new treatment. Obviously, exceptions to the above exist, such as when efficacy is measured by a parameter that is typically included in the safety laboratory panels. Cholesterol and glucose are examples. In a clinical trial, patients at the extremes of the distribution of laboratory values are often of the most interest from the safety perspective. Within a given trial, clinicians have greater concern for patients with considerable changes in laboratory values than for those with negligible or nonsignificant changes, which suggests that the strategy of interpreting laboratory data on the basis of the reference ranges does not fully meet the objectives of collecting the data. A more serious problem occurs when the target population in a trial is far from normal, for example, patients with AIDS and patients experiencing trauma. In such cases, using a set of ranges from a presumably normal population is often inappropriate. Thompson et al [5] suggested using local laboratories for diagnostic purposes to qualify patients for a trial but using a central laboratory to process the rest of the laboratory specimens. In addition, they suggested using the baseline values (which are different from the screen values) obtained from the central laboratory to construct disease-specific reference ranges for evaluating the experience of a target patient population after the treatment of interest is initiated. Oliver and Chuang-Stein [4] also suggested constructing disease-specific ranges. They extended the concept of disease-specific ranges and recommended constructing protocol-specific ranges so that the ranges, representing the safety outlook of the patients prior to a trial, can be used to study the safety experience of the same group of patients in the trial. ANALYSES OF LABORATORY DATA Routine analyses of laboratory data include constructing tables that show the percentage of patients with each specific type of laboratory abnormality, tables that contain the frequencies of subjects’ experiencing a change from normal to abnormal status or from abnormal to normal status for each selected body function, tables that contain the frequencies of subjects’ experiencing a change in their pretreatment laboratory toxicity grade, and tables that present the summary statistics concerning the amount of change. Frequently, graphs depict the amount of change in relation to the pretreatment value, or the average group change over time. For statistical inference, nonparametric tests are frequently appropriate even though p-values from the inferential procedures are typically used for descriptive purpose only. Recently, some authors have discussed more informative graphic displays [5,7]. In addition, multiple small figures [7] that display laboratory data can also be quite informative. Even though statisticians routinely process laboratory data, I believe that we do not devote enough effort to finding new and better ways to analyze such data. Consideration of the following points may lead to increased research efforts pertinent to the analysis and presentation of laboratory data.

Laboratory Data in Clinical Trials

173

1. The multivariate nature of laboratory data: One of the survey results of the Laboratory Data Working Party indicated that most clinicians prefer to have an overall safety profile based on a patient’s laboratory data in order to make a clinical decision rather than to have a one-parameter-at-a-time analysis. In addition, the oneparameter-at-a-time approach does not exploit the cumulative information provided by related assays indicative of the same body function, as is suggested in Table 1. Brown et al [8] proposed a method of comparing the safety of two antibiotics using laboratory data obtained during and after therapy. They first selected tolerable limits for laboratory results and ranked abnormalities according to the frequencies of the abnormal values detected. They then analyzed pairs of related test results concurrently. Sogliero-Gilbert et al [9] proposed combining laboratory results by constructing a score, called the Genie score, for each functional group for each patient. Gilbert et al [10] also used Genie scores and the change in the scores to compare treatment groups in a controlled clinical trial. Chuang-Stein et al [11] proposed incorporating laboratory results with other types of safety data such as clinical signs and symptoms to obtain an overall safety experience for a patient in a trial. Aggregation of data in this fashion can help to obtain an overall safety profile for a patient. It also facilitates further collapsing of the data into suitable univariate scores that can lead to efficient comparisons among treatment groups. The consolidation of multivariate data into univariate scores can increase the efficiency of testing. It will therefore be useful to develop algorithms to automate the data aggregation and consolidation process. 2. Regression to the mean: Most trials have eligibility criteria that require normal or clinically normal safety parameters prior to study entry. In terms of chemistry parameters such as SGOT and SGPT, this means that a subject cannot have too high a value for SGOT and SGPT in order to be eligible. When this selection occurs and one evaluates the average change from baseline for individual treatments, with baseline defined by the screen value, the change statistics are subject to regression to the mean effect. Assuming a bivariate normal distribution for a pair of repeated measurements on the same laboratory parameter, Chuang-Stein [12] set forth the mean change that is entirely due to regression to the mean effect under the conditions stated above and proposed some adjustment procedures for this effect. The absolute value of the mean change increases as the correlation coefficient between the repeated measurements decreases. In addition, the absolute mean change increases as the percentage of exclusion increases. Therefore, the SGOT and SGPT values tend to increase even though the intervention has absolutely no effect on these two parameters. As a result, an analysis that fails to adjust for regression to the mean might raise more alarms than necessary regarding the safety of a treatment. My experience indicates that few people adjust for regression effect in the analysis of safety laboratory data. As is argued by others [13,14,15], if the correlation coefficient between the value after study initiation and the screen value is the same as that

174

C. Chuang-Stein

between a second pretreatment value and the screen, then there will be no expected mean change (from the second pretreatment value) due to regression. Therefore, for situations in which the above statement about the correlation coefficient is true or almost true, it will be worthwhile to use as the baseline value a pretreatment value taken after the screen period. Many trials indeed use the predose sample on the first day of dosing for their baseline value. Regression to the mean does not affect the comparison of the average change from baseline among different treatment groups in controlled studies even when the screen value is used as the baseline, as long as the randomization results in balance among the treatment groups with respect to the baseline values. 3. The mixture of response distribution: The response of a laboratory parameter to the treatment of interest may well be a mixture distribution, with the treatment affecting only some subjects. To formulate this situation, assume that the response distribution G(x) for the experimental treatment group is a mixture of the response distribution F(x) for the control group and a location shift plus a scale change of F(x), that is, G(x) 5 pF(x) 1 (12p)F((x2D)/l)). The task is to estimate the mixture proportion p and test the null hypothesis that p 5 1 against p ≠ 1. Cherng et al [16], in discussing testing procedures specifically designed for this alternative, applied and compared the procedures proposed by Conover and Salsburg [17] and O’Brien [18]. This assumption of mixture of response distributions deserves more investigation, for it reflects a very realistic description of the response distribution in situations in which some patients tolerate a treatment extremely well while other experience varying degrees of treatment-emergent side effects. It would be worthwhile to expand Conover and Salsburg’s work to binary and ordinal data. 4. Tolerance limits: Even if disease-specific reference ranges are constructed as recommended by Thompson et al [5], most approaches ignore the fact that a range covers a percentage of the population and not a parameter that characterizes the population distribution. In this regard, constructing reference ranges is analogous to constructing tolerance limits where the objective is to construct an interval that contains a desirable portion of the population with a certain confidence level. Such limits provide a specified confidence that a certain proportion of the subjects will have their laboratory values between the limits. When the value for a laboratory assay follows a normal distribution, Ostle and Malone [19] discussed tolerance limits of the form x 6 Ks where x is the sample mean and s is the sample standard deviation. K is determined so that in many random samples from a normal population, a certain fraction g of the intervals (x 6 Ks) will contain at least 100P% of the population. P in the above is called the coverage probability and g the confidence coefficient. This terminology is used to reflect a 100g% confidence that the tolerance range specified by x 6 Ks will include at least 100P% of the normal population sampled. When the normal distributional assumption is in doubt (as with most of the chemistry parameters),

Laboratory Data in Clinical Trials

175

other positive distributions such as log-normal might be considered; or else nonparametric procedures can be used to construct distribution-free tolerance limits based on order statistics. Wilks [20] presented detailed exposition on constructing distribution-free tolerance limits. The above concept of tolerance limits can also be used to calculate the required sample size if one wants to have a certain level of assurance that, for example, the limits given by the second smallest and the fifth largest observation contain a specified portion of the population values (see [21] and [22]). A trial designed from the safety perspective should be large enough so that what one observes in the trial reasonably reflects the response of population at large. 5. Correlating laboratory data with pharmacokinetic parameters: There has been an increasing need to correlate pharmacokinetic parameters with laboratory data. If, for example, all the observed severe laboratory toxicities occur at high plasma drug concentrations and subjects show high subject-to-subject variability in the pharmacokinetic parameters, it might be desirable to recommend therapeutic drug monitoring. Also, it is often necessary to decide if a baseline condition such as hepatitis B affects drug absorption. While these analyses are not routinely conducted, they can often help in understanding the pharmacodynamic property of a drug. 6. Identifying patients at a greater risk for laboratory toxicity: As stated earlier, safety analysis focuses on an individual patient’s response to the treatments. Traditional phase I/II trials often exclude patients with major organ impairment. While this practice can better isolate the cause of the observed laboratory toxicities, it provides little information on how patients with major organ impairment are likely to respond to a new treatment. Unfortunately, once a new treatment gains regulatory approval for public usage, there is usually very little control over how it is applied, which raises serious safety concerns for some types of patients. Increasingly, regulatory bodies have been requiring studies of a treatment’s safety profile in certain subgroups of patients during the development programs. The subgroups include pediatric patients, geriatric patients, and patients with hepatic or renal dysfunctions. Exploratory analysis helps to identify factors associated with the presence, absence, or time to onset of laboratory toxicities of certain severity grades. Logistic regression or regression-type analysis for survival data can be used for this purpose. While such analyses are currently conducted for extreme cases of adverse reactions, I recommend their use to help physicians anticipate and prepare for less severe reactions to a treatment. Intelligent use of this information can facilitate the preparation of the package insert for a new treatment regimen. COMMENTS This paper dwells not on the routine analysis for laboratory data but on fundamental issues related to laboratory data that will increase statisticians’ understanding of such data. The fact that it is inappropriate to subject laboratory data to the type of strict inferential analysis applied to efficacy data does not

176

C. Chuang-Stein

justify the lack of attention to laboratory data summarization. Nearly all analysis of laboratory data will be exploratory. The outliers and the unusual safety laboratory data will attract more medical attention than their counterparts in the efficacy data. A few serious medical events incurred in the development of a new drug for a nonserious medical condition can lead to the termination of the development program. On the other hand, anecdotal accounts of effectiveness alone will rarely win regulatory approval for a drug. In this sense, extreme value distributions [23,24] are relevant to the safety analysis, while the efficacy analysis focuses more on the location shift of the response distribution. Even though a shift in the location parameters for a laboratory assay value is informative, a more urgent need in safety evaluation is the characterization of the extreme response and the identification of individuals at a greater risk for having such a response. As our medical colleagues are professionally trained to examine and interpret safety data, we need to work closely with them to develop clinically relevant methods. Statistical thinking can help design algorithms to combine laboratory parameters for easy detection of a major organ impairment. Algorithms can flag the time period when the organ functional status appears to change during the course of a trial. By helping our medical colleagues to make the best use of the information contained in the laboratory data, we as statisticians can also greatly increase our own understanding of such data. The author wants to thank the editor and two referees for their comments, which have helped to improve the quality of this paper.

REFERENCES 1. Smith A. Report of the PSI (Statisticians in the Pharmaceutical Industry) Laboratory Data Working Party. 1980. 2. Royston P, Matthews JNS. Estimation of reference ranges from normal samples. Stat Med 1991;10:691–695. 3. DiCiccio TJ, Efron B. Bootstrap confidence intervals. Stat Sci 1996;11:189–228. 4. Oliver LK, Chuang-Stein C. Laboratory data in multi-center trials: Monitoring, adjustment and summarization. In: Gilbert G, ed. Drug Safety Assessment in Clinical Trials. New York: Marcel Dekker; 1993:111–123. 5. Thompson WL, Brunelle RL, Enas GG, et al. Routine laboratory tests in clinical trials. In: Cato AE, ed. Clinical Trials and Tribulations. New York: Marcel Dekker; 1988:1–54. 6. Silliman NP. Hidden issues concerning laboratory data analysis [discussion]. Joint Statistical Meetings, Chicago, 1996. 7. Tufte ER. The Visual Display of Quantitative Information. Cheshire: Graphics Press; 1983:170–175. 8. Brown KR, Getson AJ, Gould AL, et al. Safety of cefoxitin: an approach to the analysis of laboratory data. Rev Infectious Dis 1979;1:228–231. 9. Sogliero-Gilbert G, Mosher K, Zubkoff L. A procedure for the simplification and assessment of lab parameters in clinical trials. Drug Info J 1986;20:279–296. 10. Gilbert GS, Ting N, Zubkoff L. A statistical comparison of drug safety in controlled clinical trials: the Genie score as an objective measure of lab abnormalities. Drug Info J 1991;25:81–96. 11. Chuang-Stein C, Mohberg NR, Musselman DM. The organization and analysis of safety data using a multivariate approach. Stat Med 1992;11:1075–1089.

Laboratory Data in Clinical Trials

177

12. Chuang-Stein C. The regression fallacy. Drug Info J 1993;27:1213–1220. 13. Davis CE. The effect of regression to the mean in epidemiologic and clinical studies. Am J Epidemiol 1976;104:493–498. 14. Ederer F. Serum cholesterol changes: effects of diet and regression toward the mean. J Chron Dis 1972;25:277–289. 15. Gardner MJ, Heady JA. Some effects of within-person variability in epidemiological studies. J Chron Dis 1973;26:781–795. 16. Cherng NC, et al. Some approaches to the evaluation and display of laboratory safety data. Joint Statistical Meetings, Chicago, 1996. 17. Conover WJ, Salsburg DS. Locally most powerful tests for detecting treatment effects when only a subset of patients can be expected to “respond” to treatment. Biometrics 1988;44:189–196. 18. O’Brien PC. Comparing two samples: Extensions of the t, rank-sum, and log-rank tests. JASA 1988;83:52–61. 19. Ostle B, Malone LC. Statistics in Research, Basic Concepts and Techniques for Research Workers 4th ed. Ames: Iowa State University Press; 1988:132–144. 20. Wilks SS. Mathematical Statistics. New York: Wiley & Sons; 1962;329–341. 21. Sachs L (translated by Reynarowych Z). Applied Statistics, A Handbook of Techniques 2nd ed. New York: Springer-Verlag; 1984:281–284. 22. Nickens DJ. Using tolerance limits to evaluate laboratory data. Joint Statistical Meetings, Chicago, 1996. 23. Lechner JA, Simiu E, Heckert NA. Assessment of peaks over threshold methods for estimating extreme-value distribution tails. Structural Safety 1993;12:305–314. 24. Smith RL. Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Stat Sci 1989;4:367–393.