Statistical considerations of parasiticide screening tests and confirmation trials

Statistical considerations of parasiticide screening tests and confirmation trials

Chapter 4 Statistical considerations of parasiticide screening tests and confirmation trials Hima Bindu Vanimisetti, PhD and Sean P. Mahabir, PhD Zoe...

487KB Sizes 0 Downloads 32 Views

Chapter 4

Statistical considerations of parasiticide screening tests and confirmation trials Hima Bindu Vanimisetti, PhD and Sean P. Mahabir, PhD Zoetis Inc., Kalamazoo, MI, United States

This chapter will introduce general statistical considerations for experimental design and for summary and analysis of data followed by more specific discussions for in vitro and in vivo parasiticide screening applications.

General experimental design considerations Experimental design can be defined as the method to plan experiments such that they meet the objectives of the research as clearly and efficiently as possible and ensure inferential validity of the results. Therefore, careful thought of experimental design is essential for the successful outcome of any study. Essential components of experimental design include randomization, replication, any restrictions applied in randomizing (such as blocking), and proper definition of experimental units (EUs).1 An experimental units is the smallest physical entity to which a treatment can be assigned; EUs should be independent, and any two units should be able to receive different treatments with equal opportunity. In in vivo experiments, the EU is most commonly an individual animal but can also be a group of animals, for example, a cage, pen, litter, pasture, household, or herd of animals.2,3 In in vitro experiments, it could be a test tube, a Petri dish, an entire microtiter plate or even wells within a microtiter plate. Replication occurs when the same experimental condition or treatment is applied to multiple independent EUs such that the underlying natural or inherent variation or “noise” among units can be estimated. Adequate replication at the level of the EU, true replication, is necessary for valid comparisons of treatments to be possible using statistical analysis methods.2,3

Parasiticide Screening, Vol 2. DOI: https://doi.org/10.1016/B978-0-12-816577-5.00009-0 © 2019 Elsevier Inc. All rights reserved.

345

346

Parasiticide Screening, Vol 2

When treatments are applied to an entire group of animals such that all animals in that group get the same treatment, for example, when drug is delivered via feed or water to a pen housing multiple animals or when all animals within a household are treated with the same drug, the pen or household is the EU and not the animal itself. In these situations, the animal is considered an observational unit and true replication can only be achieved by treating multiple independent pens or households with the same treatment. Randomization is the process of allocating EUs to treatment groups, as well as housing or locations within a dilution plate, at random. This can be achieved using different methods ranging from drawing a number from a hat to using computer-generated treatment allocation lists. Systematic assignment of units to treatment, for example, alternating allocation of two treatment groups in a 1,2,1,2 sequence, is not considered randomization and precludes valid inferential statistical analyses (e.g., P-values would no longer be valid or meaningful). Randomization, along with treatment masking or blinding, helps avoid bias and confounding in experimentation and decisionmaking. Randomization is not only essential at the time of allocating treatments but also when collecting data; for example, an entire treatment group should not be necropsied first followed by the next group. Both replication and randomization are prerequisites for any calculated P-values or confidence intervals to have statistical validity. Blinding or masking is a procedure that helps to reduce potential bias by which investigators, owners (as in the case of field or clinical trials), and/or other study personnel are kept unaware of treatment received by animals. Blinding is relevant for any experiment but essential for studies intended to support registration as it is criteria for study validity and a requirement as per good clinical practice guidelines.4,5 A common restriction applied during the randomization process is a technique called blocking. Blocking is a method of grouping similar EUs or animals together into a “block” to control for known or systematic nuisance factors that can affect measurements of interest. The goal of blocking is to produce homogenous groups of EUs. Blocking can produce more accurate estimates of treatment differences, reduce residual variance or inherent noise variation, and increase the power of studies. When blocking is efficient, that is, it is effective in reducing variability within a grouping adequately, smaller differences between treatment groups can be detected and study power is increased. Common examples of blocking variables include body weight, pretreatment fecal egg counts, pretreatment ectoparasite counts, source of animals, a pasture, age, breed/strain, and litter. Blocking should be performed using a nuisance factor that is likely to impact the outcome of interest; blocking on an unimportant variable is inefficient and reduces study power.1 In study designs involving blocking, animals or EUs are first

Statistical considerations of parasiticide screening tests Chapter | 4

347

grouped together based on the blocking variable, and then treatments are randomly assigned to animals within the block.

General data summary and analysis considerations How we summarize or analyze data depends on the objectives of the study and the type of variable of interest. Data summaries are statistical tools used to characterize the data collected through the use of simple descriptive statistics or graphical display. The term analysis will be reserved in the context of formal statistical inference, where the data collected are used to make some determination or decision regarding the population of interest from which the data or sample was drawn.

Variables Variables can be classified as quantitative or qualitative. Quantitative variables have outcomes that are numerical and for which difference between levels (values) are mathematically well defined. For example, for a variable such as parasitic burden measured as count, the difference between any two values can be calculated through simple arithmetic. Quantitative variables can be further classified as discrete or continuous outcomes. Discrete numerical variables have levels that are countable and the scale is described as interval, whereas for ones that are continuous, the levels are not countable and can be defined as interval ratio. In simple mathematical terms a continuous measure is one for which any two distinct values x and z, where x , z, there always exits a value y, between x and z, (i.e., x , y , z). Variables such as weight, temperature, height are examples of continuous variables. While variables such as number of fleas, ticks, mite counts, worm counts, and number of emetic events are examples of discrete variables. On the other hand, qualitative variables have outcomes that are categorical and can be further classified into either nominal or ordinal discrete variables. If the category levels can be arbitrarily ordered (no order value) then it is a nominal variable [e.g., animal infected (Yes/No), True/False, color]. If the category levels can be ordered by magnitude, it is an ordinal variable (e.g., Mild/Moderate/Severe; Cold/Warm/Hot/Extremely Hot; 0/110/ 1150/51100/ . 100). The difference in magnitude or distance between levels for an ordinal categorical variable is not well defined, that is, the difference between any two levels cannot be quantified on a numerical scale. For example, for scores of lesion severity of mild, moderate, and severe, the difference between severe and moderate is not measurable on a numerical scale, or necessarily the same as the difference between moderate and mild. Table 4.1 summarizes the different classifications of types of variables.

348

Parasiticide Screening, Vol 2

TABLE 4.1 Types of random variables. Quantitative variables

Qualitative variables

Discrete

Ordered categorical/Ordinal

(e.g., the number of fleas/ticks on host animal; the number of worms at necropsy, fecal egg count)

[e.g., mite burden (0, 110,11100, .100); body condition score; lesion severity score (none, mild, moderate, severe)]

Continuous

Categorical/Nominal

(e.g., body weight (kg or lb); body temperature)

[e.g., mites present (Yes/No); clinical sign normal, abnormal)]

Data summaries Data summaries may be in the form of tables or graphs. The method of summary should be consistent with the type of variable to be summarized. For example, qualitative data are often summarized by calculating frequency distributions which summarize the distribution of the variable so that the counts and relative frequencies (or percentages) are summarized for each level of the variable for each treatment group. This can be displayed in table or in a suitable graph, for example, a bar chart where treatment groups are represented as bars, x-axis, and the height of the bars, y-axis, corresponds to the relative frequency or percentage for that treatment group. Typical summaries for quantitative variables will include calculation of estimates of central tendency such as mean, median, or mode, and calculation of estimates of variation such as the standard deviation, range (minimum and maximum value), or interquartile range. Percentile summaries of the distribution of the data are also sometimes used, such as a 5-number summary which includes the minimum, 25th percentile (P25), 50th percentile (P50 5 median), 75th percentile (P75), and maximum. These three percentiles are also sometimes referred to as quartiles, Q1, Q2, and Q3, respectively, for the distribution. For visual display, the quartiles can be used to construct box and whisker plots, or box-plots, as a visual summary of the distribution for a variable. The unit of variation of the box plot is the interquartile range, defined as the difference Q3Q1, while the box itself represents the middle 50% of the distribution. Distributions for quantitative variables are also sometimes summarized with histograms (bar chart analog for quantitative variables). How relationships between variables are summarized also depends on the type of variables they are. Do they co-vary and are we trying to describe the relationship between them? In which case, correlation coefficients provide a quantifiable measure of association between two variables. For example, in

Statistical considerations of parasiticide screening tests Chapter | 4

349

the case of a linear relationship between two quantitative variables, the strength of the linear relationship can be measured with Pearson’s correlation coefficient. If the two variables are not necessarily related linearly, but may be nonlinearly related, or one or both variables are ordinal, then Spearman’s correlation coefficient offers a rank-based measure of association. Generally, measures of association are bounded by 21 and 11. The closer in absolute value to 1, the stronger the relationship between the two variables, and the closer the value is to 0, the weaker the relationship. The relationship between two qualitative variables may be summarized by a simple cross-tabulation of the two variables, or contingency table, which summarizes the bivariate distribution of the two variables. Each cell represents a level of variable 1 and a level for variable 2 that includes the frequency or count as well as a relative frequency or percentage. Rows and columns are summarized with row/column totals and relative frequencies that summarize the marginal or univariate distributions of the variables. Graphically, the relationship between two quantitative variables can be summarized with a scatter plot. Scatter plots help to describe what underlying trends characterize the relationship between two variables. For instance, if the scatter plots for two quantitative variables are linear then Pearson’s correlation coefficient can be used to measure the strength of that linear association. Box-plots may also be used to describe the distribution of one variable within the levels of the second. For relationships between more than two variables, contour plots and 3-D plots and yet others not mentioned may utilized. Table 4.2 lists different types of variables and corresponding summary methods.

Statistical analysis Statistical analysis as presented in this chapter generally refers to some form of statistical inference. In statistical inference, we are concerned with decision or determination regarding parameters, which are single-valued population characteristics of interest to be estimated. Corresponding single-valued characteristic of a sample drawn from the population, or data collected, is defined as a statistic. For example, the mean, median, and standard deviation of the population are all parameters, while mean, median, and standard deviation of a sample from the population are all examples of statistics. In addition, any function of a statistic is also a statistic. The objectives of the study drive the statistical hypotheses to be tested, where a hypothesis is defined a statement regarding the parameter(s) of interest. Typically, the data collected is used to test whether the data support rejecting the null hypothesis (H0), representing the status quo, in favor of the complementary statement, or alternate hypothesis (Ha) at the stated significance level. The significance level, or level of significance of the test,

350

Parasiticide Screening, Vol 2

TABLE 4.2 Types of variables and corresponding summaries. Variable type

Summary statistics

Graphical methods

Qualitative

Frequency, percent, cumulative frequency for each level

Bar chart; pie chart

Quantitative

Central tendency: mean, median, mode

Histogram; stem-and-leaf plot; dot plot; box plot

Location: kth-percentile (Pk), quartiles [Q1 5 P25, Q2 5 P50 (median), Q3 5 P75], min, max Variation: standard deviation, range, IQR, CV 1. Frequency tables (count and percentages) for first variable for each level of the second variable 2. Contingency table (cell and row and column frequencies and percentages)

Bar chart for the first variable at each level for second

Quantitative outcome, qualitative independent variable (e.g., treatment)

Calculate summary statistics for the dependent variable by the levels of the independent variable

Plots for quantitative single-variable plotted by the levels of the independent variable

2 Quantitative variables

Calculate summary statistics for each variable

Scatter plot

2 Qualitative variables Note: 1. Frequently first variable is an outcome variable and the 2nd an independent variable (e.g., treatment/dose/ formulation or time) 2. Variables are both dependent variables and covary

Calculate Pearson and/or Spearman correlation CV, coefficient of variation; IQR, interquartile range.

denoted by α, is the fixed probability or risk we are willing to take that the null hypothesis is falsely rejected. Below are a few examples of different pairs of hypotheses for illustration. H0: No difference between Control versus Therapeutic for mean flea count at Day 2. Ha: There is a difference between Control versus Therapeutic for mean flea count at Day 28.

Statistical considerations of parasiticide screening tests Chapter | 4

351

H0: The mean tick count for dogs treated with the investigational product is inferior to the positive control product at Day 60 of treatment. Ha: The mean tick count for dogs treated with the investigational product is noninferior to the positive control product at Day 60 of treatment. H0: There is no difference between the LD50 values for compounds 1 and 2. Ha: There is a difference between the LD50 values for compounds 1 and 2. H0: Formulations 1 and 2 of the drug product are not bioequivalent for maximum drug concentration (Cmax). Ha: Formulations 1 and 2 of the drug product are bioequivalent for maximum drug concentration (Cmax). H0: No relationship between variables X and Y. Ha: There is a relationship between variables X and Y. To test the hypothesis, a suitable test-statistic is calculated based on the data collected from a random sample and compared to the theoretical probability distribution of the test-statistic under the null hypothesis (H0). The P-value, defined as the probability of observing a value for the teststatistic as or more extreme to H0 than for the observed data, is calculated. If the P-value is # α, then H0 is rejected in favor of Ha. Table 4.3 summarizes the possible outcomes and probabilities for a cross-tabulation of underlying (unknown) truth versus decision made regarding H0. The probabilities for the two possible errors to be made testing hypotheses are α and β, where α was defined above as the probability of making a type I error (reject H0 when H0 is true). The second error, β, is the probability of making a type II error (failing to reject H0 when Ha is true). In practice, α is fixed to a value (most commonly 0.05, or 5%). In contrast, β, which is inversely related to sample size and effect size, is controlled by selecting a sample size which will keep the value at a suitably low level. The power of the test, is the probability of correctly rejecting H0, is 1 2 β. Generally, we want to design studies with small α and β. To control β, we typically ensure that the experiment is adequately replicated at the EU level for desirable level of power (typically 80%90%). As part of the planning

TABLE 4.3 Outcomes (probability) in hypothesis testing. Decision

Truth

H0

H0

Ha

Correctly fail to reject H0

Falsely reject H0

(1 2 α) Ha

Falsely fail to reject H0 (β)

(α) Correctly reject H0 (1 2 β)

352

Parasiticide Screening, Vol 2

stages of an experiment once the hypothesis of interest is established calculations are done to determine what sample size is needed (at the EU level). The method of analysis used determines what the test-statistic is, which is used to evaluate how sample size changes for a desired level of power under different assumptions for effect size and variance. The selection for sample size is based on this exercise. Some further discussion on this topic will be reserved for later in the chapter. Traditional hypotheses for comparing two populations are sometimes stated for the parameters of interest (means for quantitative variables or proportions for binary outcome variables) in terms of the difference. Most commonly it is of interest to test the null hypothesis that the parameters for two populations of interest are equal versus whether they are different, or alternatively stated, H0: Difference in parameters 5 0, against Ha: Difference in parameters 6¼ 0 Sometimes it may be of interest to test for the similarity of two products (groups). This can be two-sided equivalence or one-sided equivalence (no worse than standard product) called noninferiority. We may also want to demonstrate superiority of one product relative to another. These three types of hypothesis all involve a prespecified equivalence margin, delta, for which these hypotheses are to be tested. See hypotheses displayed below, where difference 5 group 1 2 group 2 for the parameter of interest, and for the sake of presentation, we will assume that group 1 . group 2 implies group 1 is better than group 2. Equivalence H0: Groups are not equivalentB|difference| $ delta Ha: Groups are equivalentB|difference| , delta Noninferiority H0: Test is inferior to standardBdifference # -delta Ha: Test is not inferior to standardBdifference . -delta Superiority H0: Test is worse standardBdifference # 0 Ha: Test is superior to standardBdifference . 0 In addition to P-values, confidence intervals can also be used to test hypotheses. Fig. 4.1 illustrates how confidence intervals for the difference can be used to test the four pairs of hypotheses previously. Additional discussion of these types of tests and the choice of delta is discussed in several guidance documents.6,7 Statistical inference in its simplest form involves a single variable and parameter of interest (e.g., hypothesis about population mean). In most experimental situations in this book, a single-response variable may be of interest and two or more groups/populations (treatments) of interest are being

Statistical considerations of parasiticide screening tests Chapter | 4

353

FIGURE 4.1 How confidence limits can be used to test hypotheses: In each case the dot symbolizes point estimate of the difference between two groups/treatments (means, proportions, etc.). The chart on top (A) demonstrates possible outcomes for the difference and corresponding confidence limits for simple tests. The chart on the bottom (B) illustrates how confidence limits can be used to test hypothesis for equivalence, noninferiority, and superiority.

compared. In the cases of quantal/bioassay experiments or dose-ranging or dose determination experiments, the relationship between the response (mortality or efficacy end-point) as a function of dose (level of drug) is of interest. While simple comparisons involving two populations can be done using fairly simple tests univariate tests (t-test, MannWhitney U-test, signed-rank test, etc.), when more than two populations are to be compared, or multiple independent variables or factors are involved, statistical models are used to test hypotheses. Indeed, the more complex the experimental design, the more likely some form of a statistical model is needed to appropriately analyze the data.

Statistical models In an experiment what is primarily of interest is the relationship between the outcome variable or dependent variable, and the independent variable(s) (e.g., treatment, dose, concentration, formulation, and time). The outcome variable or response variable is what is being observed and is a random variable. In contrast the levels of the independent variable, sometimes called factors in experimental design are controlled. The terms outcome variable, dependent variable, and response variable are all interchangeable. Similarly, independent variables are also referred to as explanatory variables or predictor variables. Appendix 4.1 summaries a variety of scenarios for type of response variable, type of independent variable, the objective (hypothesis),

354

Parasiticide Screening, Vol 2

and type of tests (if simple) or statistical model that can be used. These are by no means an exhaustive list and most are covered statistical books.

Transformations Statistical tests typically have assumptions that are made such as independence, an underlying probability distribution (normal distribution in many cases), homoscedasticity (constant variance across populations), functional relationship between two variables in regression, etc. If the assumptions are not satisfied then the validity of the results of the test are weakened (P-values, confidence limits, etc.). Some tests may be more robust to violations to some assumptions than others. For example, many tests assuming normality (t-tests, ANOVA, linear regression, etc.) are robust to mild departures from normality and homoscedasticity but may be affected by serious departures.8 In situations such as this transformations are often used as a remedial measure to address one or both of these issues. Square-root and log-transformations are often used for unbounded continuous data. For proportions bounded by 0 and 1 the arcsine square-root transformation is often used to aid in variance stabilization. In parasitology the end points of interest are often counts which are frequently skewed in distribution and a logtransformation often used, as the natural right-skewness in counts of these types are often assumed to be log-normally distributed. An alternative to transformations is to use generalized linear models9 which model data using an appropriate probability distribution from the exponential family distributions and an associated link function which defines the mean as a function of the linear model of (independent/predictor variables). The normal distribution is a member of this family of distributions which has the identity link (i.e., mean is the linear function of independent variables). For dichotomous outcomes, a binomial distribution is often used with logit link, for count data the Poisson and negative binomial distribution are sometimes considered with a log link. While this approach is not widely used, its use has been proposed by several authors10,11 for parasitology data. Software for these models are available but are not as familiar as that for the more common general linear models. When transformations and other remedial measures do not adequately address violations in the assumptions for linear models (t-tests, ANOVA, etc.) nonparametric methods of statistical testing offer an alternative.

Parametric versus nonparametric statistics Statistical tests that rely on the assumption of an underlying probability distribution are called parametric tests. Nonparametric tests are a separate class of tests that do not require distributional assumptions. In general when parametric tests are appropriately used, that is, no violations in the serious

Statistical considerations of parasiticide screening tests Chapter | 4

355

TABLE 4.4 Commonly used parametric tests and corresponding nonparametric alternative. Parametric test

Nonparametric test

t-Test for 1 group (population)

Sign test

t-Test for 2 independent groups (populations)

Wilcoxon sum-rank test or MannWhitney U-Test

Paired test for 2 dependent groups

Signed-rank test; McNemar’s test

One-way ANOVA

KruskalWallis

Two-way ANOVA with block as one of the 2 factors

Friedman Test

Pearson correlation test of association (specifically for linear association between quantitative variables)

Spearman correlation test of association between to variables (variables must be at least must be ordinal, functional relationship is not necessarily linear)

assumptions, then they provide more statistical power. Table 4.4 summarizes some more popularly used parametric tests and a corresponding nonparametric alternative. These tests can be found in most statistical textbooks. This is not intended to be a comprehensive list.

Multiple testing A single-hypothesis test has an associated probability of a type I error, α. Multiple testing occurs when several simultaneous hypothesis tests are performed for the same variable, or over multiple variables. An important issue that arises in multiple testing is the lack of control over the overall type I error rate across all comparisons/testing done, or the experiment wise type I error rate, which is the probability of making a type I error in at least one comparison for an experiment. This is a highly prevalent issue over all areas of science and is largely ignored in some areas. The simplest example of this in a one-way ANOVA with k groups to be compared. If all pairwise comparisons are performed then there are k(k 2 1)/2 possible comparisons. If k 5 6 and the comparison-wise error rate (error rate for a single comparison), α 5 0.05. Then with no adjustment, the overall error rate is 1 2 (1 2 α)k 5 0.2469. As k increases, the number of comparisons increases and so too does the overall error rate. Hence the general recommendation to minimize the number of post hoc comparisons. Multiple comparison methods/procedures provide a means for conducting these types of comparisons and controlling the overall type I error rate and are available in most commercially available

356

Parasiticide Screening, Vol 2

statistical software where ANOVA-type analyses are included. More details on multiple comparison procedures can be found in statistical text books on experimental design and analysis as in Montgomery.8

Fixed and random effects In ANOVA-type models, the effects associated with factors can be regarded as fixed or random. Fixed effects are effects for which inference is directly of interest for only the levels of the factor observed (e.g., treatment, gender, day of study, and parity). Simple one-way ANOVA models, and linear or nonlinear regression models with a single independent variable are all examples of fixed effects models. However, in some experimental situations the levels of a factor may not be of interest, but the levels used are representative of a sample of possible levels from a larger population from which we wish to make inference over (e.g., clinic, block, room, household, and animal). Effects corresponding to factors such as these are called random effects. Models for study design that contain both fixed and random effects are called mixed effects models. In these models the random effects are treated as normally distributed random variables with zero mean and unknown variance. Mixed effects models can be linear functions, nonlinear functions. General linear mixed models assume normally distributed errors, these are subclass of an even larger class of generalized linear mixed models which allow for modeling probability distributions for both quantitative and qualitative random variables that come from the exponential family of probability distributions.

Repeated measurements Repeated measurements are used to refer to multiple observations of the same variable performed on the same subject. This is a form of pseudo-replication in time. Because these measurements are from the same individual, these values are inherently correlated, so any comparisons within subject should adjust or account for this autocorrelation within subject. Paired data are a special case of this when two time points are considered for comparison. When more time points are considered, repeated measures can be modeled using mixed linear models which allow for modeling different covariance structures. SAS (SAS v9.4, Cary, NC) procedures MIXED for general linear mixed models and GLIMMIX for generalized linear mixed models offer a variety of options for modeling this type of data for normal and nonnormal distributions, respectively.

More on sample size and power Appendix 4.1 lists several methods for analysis based on the type of variables considered. Sample size and power calculations can be done using

Statistical considerations of parasiticide screening tests Chapter | 4

357

commercially available software such nQuery,12 which covers a wide variety of tests for simple experimental designs. There are also websites that offer online tools for simple sample size and power calculations. SAS13 procedures such as POWER and GLMPOWER also provide a means for performing this exercise for a wide variety of tests. High14 illustrates the use of these procedures for general linear models. For more complicated study designs with mixed effects, such as multicenter trials with repeated measurements or with pen as EU, more information is needed15 and simulations may be performed to estimate sample size in terms of number of sites, number of pens, and number of animals per pen. As an alternative to simulations, Stroup16,17 illustrates how the MIXED and GLIMMIX procedures in SAS can be used to do these types of calculations for mixed effects models.

Choice of estimator for the mean (arithmetic mean vs geometric mean) Abbott’s formula for percent efficacy features prominently in the evaluation of efficacy of antiparasitic products. What estimator should be used for the mean has become the focus of some debate. In practice both arithmetic means and geometric means are used in Abbott’s formula18 as provided below. % Efficacy 5

ðC 2 T Þ 3 100; C

where C is the mean parasitic burden for the Control Group and T is the mean parasitic burden for the Treated Group. In studies where a negative control is not present, C is the mean parasitic burden at pretreatment and T is the mean parasitic burden posttreatment. Which estimator is used depends on any one or more of the following: parasite of interest, guidelines, regulatory agency, and objectives of the study.11 For anthelmintics, WAAVP (see Appendix A of Ref. [19]) recommends the use of geometric means for both laboratory and field studies. This is also consistent with the VICH guidelines [Guidance for the Industry 90, VICH GL7 (2001)] for anthelmintics.20 The Center for Veterinary Medicine (CVM) continues to support the use of geometric means for laboratory and field studies for endoparasites and field studies for ectoparasites, all conditions where distributions of parasitic burdens are expected to be skewed. For laboratory cat and dog studies for fleas and ticks the European Medicines Agency (EMA) recommends the use of arithmetic means.21 Other regulatory agencies, including CVM, are already starting to move in this direction for cat and dog flea and tick laboratory studies. EMA22 argue that for studies of this type where the infestation is uniform (fixed number of parasites administered per challenge) counts will be normally distributed prior to treatment and normally distributed for control animals after treatment, but not necessarily for treated animals. EMA cite McKenna23 who reported that

358

Parasiticide Screening, Vol 2

geometric-based efficacy tends to mask treatment failures. They also cite the publication from Dobson et al.24 which demonstrated through Monte Carlo simulations that geometric means can produce biased upward estimates for % efficacy when conducting a fecal egg count reduction test, while in contrast the arithmetic mean efficacy estimates are closer to the true efficacy estimates. The results of this paper suggest that the arithmetic mean should be preferred for estimating % efficacy, even for skewed distributions and can be extended to parasitic count data. The simulations reported by Dobson et al.22 make very specific assumptions around the probability distribution of counts using the Poisson and negative binomial distributions. Simulations were also done by Smothers et al.25 assuming a log-normal distribution for counts based on empirical data which demonstrated superiority of the geometric mean as a measure of central tendency over the arithmetic mean. Data for some species of worm counts for cattle were confirmed log-normal through chi-square tests and bootstrap simulations were used to confirm theoretical expectations. The efficacy of antiparasitic products is based both on % efficacy and statistical inference on the means. The focus on % efficacy puts much emphasis on the point estimator selection (arithmetic mean or geometric mean) and perhaps not enough on statistical inference. Arithmetic means are appropriate estimators for completely randomized designs when the residuals are not skewed. If any randomization restrictions are put in place, as they often are through some form of blocking, or if there is lack of balance, then there is a potential for the arithmetic means to be biased estimators of the mean. Using model-based means such as least squares means from a general linear model reduces this bias, and the residuals from the models should determine whether model-based means on the original scale or for the log-transformed scale is appropriate, or otherwise.

In vitro method(s) Statistical methodology for in vitro tests as introduced in Chapter 1 in Volume 1, Defining in vitro parasiticide screening and test methods, is covered in this section. These studies typically involve the organism of interest exposed to control and test materials in well-controlled laboratory conditions outside of a biological system (host-target or suitable small animal model) conducted using vessels or on substrate, etc. In regards to experimental design blocking can be an especially useful tool to control variability and distribution of treatments for in vitro studies26,27; the following are all specific types or experimental designs that employ blocking that can be applied in in vitro studies randomized complete block design (RCBD) (one replicate per block per treatment), incomplete block design (not all treatments represented in each block), generalized block design (multiple replicates per treatment per block), Latin squares designs, split-plot designs. The choice of the design generally depends on the

Statistical considerations of parasiticide screening tests Chapter | 4

359

experimental conditions and what variables/factors are potential nuisance variables that should be controlled for. Factorial designs, such as 2k factorial or fractional factorial designs,8,26 are also sometimes used when trying to determine a small number of important factors impacting the response among multiple factors under consideration. If the data are to be analyzed then it is also important that the EUs are randomized to give validity to the tests. Identification of the correct EU can sometimes be challenging because for in vitro experiments the experimental design may be hierarchical in nature (multiple analysts, days, plates/runs, vials, etc.). Improper replication of the EU can lead to misleading results in analysis. Several authors stress the importance in distinguishing between true replication of the EU, and pseudo-replication.2729 If pseudo-replication is present, it needs to be accounted for in the statistical model so that the correct statistical tests can be conducted. Alternatively, the mean of the pseudo-replicates of the EU response can be calculated as an estimator of the EU level response, and those values can be used in the statistical model. In Chapter 1 of Volume 1, Defining in vitro parasiticide screening and test methods, some outcome variables of interest included morbidity, mortality, and repellency. When mortality is the outcome of interest, statistics of interest may include the dose/concentration of parasiticide required to kill 50% or 90% of the parasites (population) under test, LD50 and LD90, respectively. Also of interest may be the corrected mortality at each concentration, mcorr,30 listed below. LD-values and mcorr values both come from an appropriate regression method for estimating mortality as a function of concentration. mo  mc mcorr 5 3 100 100  mc where mcorr is the corrected mortality at each concentration tested (in percent); mo is the mean observed mortality in the treated groups (in percent); mc is the mean observed mortality in the control groups (in percent). Also, percent efficacy can also be expressed in terms of mortality-based means for number of live organisms dead using Abbott’s formula18 below: % Efficacy 5

ðC 2 T Þ 3 100 C

where C is the mean number of organisms in Control Group; T is the mean number of organisms in Treated Group. The mean estimates used in practice may be simple arithmetic means or geometric means. Model-based mean estimates are also sometimes used. A few options for modeling include probit regression, logit regression, and nonlinear regression with mortality as the response. Software for these types of models are available on multiple platforms/software (SAS, R, SPSS, MINITAB, etc.). For the examples in this chapter, SAS will be the software of choice.

360

Parasiticide Screening, Vol 2

TABLE 4.5 Number of hatched larvae. Test compound

Rep 1

Rep 2

Rep 3

Rep 4

Arithmetic mean

Geometric meana

Control

22

21

20

18

20.3

20.2

0.1 ppm

20

19

13

23

18.8

18.4

1 ppm

21

18

15

21

18.8

18.6

10 ppm

12

10

8

17

11.8

11.3

100 ppm

9

5

6

10

7.5

7.2

1000 ppm

3

2

2

6

3.3

3.0

10,000 ppm

1

0

9

2

3.0

1.8

a

Since the mathematical definition is not defined for 0-values, the Williams geometric mean31 defined as the back-transformed mean for log-transformed counts, ln(count 1 1) is used.

Flea egg eclosion (ovicidal activity)—dose determination example: Twenty-five newly deposited flea eggs were placed on treated filter paper, four replicates/compound. Exposure was for 1 h, then the eggs are placed in flea rearing media and incubated at 28 C. The number of hatched larvae was determined 72 h postincubation. See Table 4.5. The objective is to estimate LD50 and LD90. In this experiment the filter papers are assumed to be pseudoreplicates within treatment. For the statistical model, to remove the pseudoreplicate level, the means across reps are calculated for each treatment group. If the objective was to evaluate percent efficacy at each level of the compound then the mean treatment responses could be modeled using a general linear model. Since the objective is to estimate LD50 and LD90, a more appropriate model would be a regression of the percent response as a function of dose and also facilitate the inverse estimation of concentration at fixed levels of the response (hatch inhibition). A graph of the mean egg hatch inhibition as a function of log-dose confirms that the data display a trend that is consistent with sigmoid response. A probit or logit regression can be used to model sigmoid relationship and allow for the estimation of LD50 and LD90. For this data, SAS procedure PROBIT was used to fit the model for both the probit (cumulative normal distribution) and the logit distribution. From Table 4.6 the estimates for LD50 are fairly close for probit and logit models (8.05 vs 7.96, respectively) with 95% confidence limits that are very similar. For LD90 the point estimates appear to be slightly different for the probit and logit models (6826 vs 7584, respectively) with the 95% confidence limits for the logit model being wider than that for the probit model.

Statistical considerations of parasiticide screening tests Chapter | 4

361

TABLE 4.6 Results from probit/logit regression analysis. Distribution

Control included Y(es)/N(o)

LD-value

Point estimate (95% confidence limits)

Pearson chi-square P-value

Probit

No

LD50

8.05 (2.03, 26.1)

0.7061

LD90

6826 (1168, 154,408)

LD50

7.96 (2.00, 26.28)

LD90

7584 (1141, 269,807)

LD50

35.89 (3.95, 156.87)

LD90

7584 (1428, 177,472)

LD50

33.66 (3.93, 146.38)

LD90

7487 (1280, 297,620)

Logit

Probit

Logit

No

Yes

Yes

0.7472

0.6732

0.7221

In all cases the Pearson chi-square P-value testing for lack-of-fit is not significant (P $ 0.67) indicating that there is not lack-of-fit to be concerned about for any of these models. Default options in probit procedure in SAS exclude nonzero doses; however, there is an option (OPTC) that allows for the inclusion of control groups. The last four rows in the table summarize these results. Comparing these to the first four rows, the point estimates for LD50 have all increased to B35, with a corresponding increase in width to the corresponding 95% confidence limits. The LD90 estimates for the most part have not been unduly impacted. The last four models are essentially corrected for background or natural response, while the models from the first for rows do not. For illustration purposes, Fig. 4.2 displays the estimated logit curve adjusted for natural response (Control) plotted with the observed treatment means.

In vivo method(s) This section will elaborate on statistical design and analysis aspects of in vivo tests described in Chapter 2 of Volume 1, Defining in vivo parasiticide screening and test methods, and will cover both laboratory studies and field effectiveness studies. Emphasis is given to studies evaluating efficacy using clinical outcomes. Bioequivalence studies using blood-level pharmacokinetic and pharmacodynamic outcomes are not specifically discussed. Bioequivalence studies can be conducted and analyzed as per the available Bioequivalence Guidance documents.32,33

362

Parasiticide Screening, Vol 2 Predicted probabilities for egg inhibition with 95% confidence limits 1.0

Probability

0.8

0.6

0.4

0.2

OPTC = 0.18

0.0 0.1

1

10

100

1000

10,000

Dose (ppm)

FIGURE 4.2 Estimated logit curve adjusted for natural response (control) plotted with the observed treatment means and 95% confidence bands.

Laboratory studies Laboratory studies are conducted during the early to late stage of the drug development process to evaluate preliminary efficacy and safety of the investigational veterinary product (IVP) or compound against a negative control in either the target animal species or an appropriate nontarget animal model. The main study types include dose determination in the earlier phase of drug development and dose confirmation in the later phase of drug development. Use of controlled tests is preferred and efficacy is typically evaluated by comparing parasite (adult or larvae), tick or flea counts between pre- and posttreatment time points,32 but more commonly, between treatment groups. Egg counts are typically only used in field efficacy studies. Abbott’s formula,16 as defined previously, is commonly used for determining efficacy. Alternatively, the HendersonTilton33 formula can be used in ectoparasitic studies when counts vary over time among control animals and is defined as: 0 1 Mean of Treated Group after treatment B C B 3 Mean of Control Group before treatment C C %Efficacy 5 100 3 B 1 2 B C Mean of Treated Group before treatment @ A 3 Mean of Control Group after treatment The mean used in these formulae can be either arithmetic or geometric mean. Historically, due to the skewed distribution of egg and worm counts across animals, a log transformation has been applied to these data as the geometric means give a better measure of central tendency. In recent years,

Statistical considerations of parasiticide screening tests Chapter | 4

363

however, arithmetic means have been preferred as they give a more conservative measure of efficacy. Current regulatory guidance tends to vary based on regulatory authority, type of study, and based on whether the parasite is an endoparasite or ectoparasite, therefore, approaches should be aligned with regulatory requirements and confirmed with the agency concerned, if necessary, during protocol development. For endoparasites, current VICH guidelines recommend the use of geometric means for both laboratory and field studies. Arithmetic means are used for laboratory studies evaluating ectoparasites for both the EU and United States; however, for field studies, geometrics means are acceptable for the United States, but for EU, justification should be provided for the not using arithmetic means. It is recommended to report both arithmetic and geometric means if data will be submitted to multiple regulatory authorities. Data may be analyzed using either parametric or nonparametric methods; analysis methods should be clearly specified in the protocol and concurred with the regulatory agencies, if applicable. To demonstrate efficacy, the % efficacy as calculated above should be $ 90% for most parasites, with significance demonstrated between treated and control animals at a significance level of 0.05. Adequate infestation has to be demonstrated as per the applicable VICH guidelines prior to demonstrating efficacy. Sample size calculation may be performed, although a minimum of six animals per treatment group are needed for laboratory studies. Since the magnitude of difference between treatments (negative control and investigational product) that is considered clinically relevant is high ($90% reduction in means), six to eight animals should be sufficient although more may be enrolled to ensure at least six adequately infected animals are available for calculation of efficacy. RCBD are the most commonly used experimental design for laboratory studies. Animal is typically the EU since animals are individually housed. For an RCBD, common considerations for blocking are pretreatment body weight, pre-treatment egg/ectoparasite counts, age, or housing location. If both sexes are used in the study, balance between genders is desirable, and if appropriate, randomization should be carried out within each sex. With these general considerations, examples of laboratory studies are presented later. Unless otherwise stated, the data used in these examples are simulated data provided for illustrative purposes. Where possible, references to publications for each study type are included as examples for further reading. Dose determination example using a tick efficacy study in dogs with induced infestation: This example follows WAAVP guideline for evaluating efficacy of parasiticides for treatment, prevention, and control of flea and tick infestations in dogs and cats (see Appendix A of Ref. [36]). Thirty-two dogs are each individually infested with B50 ticks 4 days prior (TD -4) treatment. Two days later (TD -2), tick counts are recorded for each dog. Dogs are ranked by descending tick count into eight blocks with four dogs

364

Parasiticide Screening, Vol 2

per block. Within each block, dogs are randomly assigned to one of four treatments (untreated control, 0.5 3 , 1 3 , and 2 3 ) using a computergenerated randomization program implemented using SAS software (example code is provided in Appendix 4.3). Dogs are housed in individual dog runs. The study design is a RCBD with animal as the EU with blocking based on TD -2 tick counts. On TD 0, dogs are treated as per their assigned treatment group and then infested with B50 ticks on TD 0, 5, 12, 19, 26, 33, and 40. and tick counts are conducted 48 h PI on TD 2, 7, 14, 21, 28, 35, and 42. The count data are presented in Appendix 4.2. These data are analyzed, both with and without log transformation {ln (count 1 1)}, using a general linear mixed model for repeated measures with the fixed effects of treatment, day of count, and their interaction and random effects of block, block by treatment interaction (the animal term), and residual error. The repeated measures model was used to model the withinsubject covariance since multiple observations were collected on each animal over time. Modeling was implemented using the MIXED procedure in SAS (SAS v9.4, Cary, NC). When using repeated measures models, several within-subject covariance structures that may be plausible or biologically reasonable in the context of the study should be considered, for example, unstructured, compound symmetry, compound symmetry heterogeneous, autoregressive, autoregressive heterogeneous, and spatial power.37 The final model may be chosen using various selection criteria and a good choice is to select the model that minimizes the corrected Akaike’s Information Criteria.37,38 Treated group means are compared to the negative control at each time point using a two-sided significance level of α 5 0.05. Treatment least squares means (arithmetic means) or back-transformed least squares means (geometric means), along with percent efficacy and P-values in comparison to negative control are presented in Table 4.7. In this example the 0.5 3 dose was not efficacious, whereas the 1 3 and 2 3 doses were found to be efficacious for 21 days PT using arithmetic means and the 1 3 and 2 3 doses were found to be efficacious for 28 and 48 days PT, respectively, using geometric means. Data may also be analyzed separately by day of count, rather than using repeated measures, depending on protocol requirements as concurred with the relevant regulatory agencies. In the above example a general linear mixed model can be used with a fixed effect of treatment and random effects of block and residual. Finally, the data can also be analyzed using nonparametric methods such as the KruskalWallis test if analyzing data separately for each day of count and can be implemented using the NPAR1WAY procedure in SAS software. A recent publication with examples of dose determination studies for fleas and tick are provided in McTier et al.39

TABLE 4.7 Least squares means (arithmetic), back-transformed least squares means (geometric), and corresponding percent efficacy for tick counts from a dose determination study in dogs. Type of mean

TD tick count

Mean NC

Arithmetic

Geometric



TD 2

36.1

Percent efficacy

0.5 3

13

23

0.5 3

13

23

16.8

0.5

0

53.5

98.6





100

TD 7

40.5

18.4

1.4

0.9

54.6

96.5

97.8

TD 14

38.4

19.5

3.1

2.3

49.2

91.9

94.0

TD 21

38.1

19.6

3.4

2.1

48.6

91.1

94.5

TD 28

39.3

23.8

7.1

4.9

39.4

81.9

87.5

TD 35

37.9

28.5

9

7.1

24.8

76.3

81.3

TD 42

36

30

14

11.5

16.7

61.1

68.1

TD 2

35.1

16.5

0.4

0

53

98.9

100



TD 7

40.4

18

0.8

0.5

55.4

98

98.8

TD 14

37.8

18.8

1.6

1.1

50.3

95.8

97.1

TD 21

29.9

19.4

1.6

0.8

35.1

94.6

97.3

TD 28

38.1

23

3.5

1.7

39.6

90.8

95.5

TD 35

37.8

27.4

5

2

27.5

86.8

94.7

TD 42

35.8

29.3

8.4

2.6

18.2

76.5

92.7

P # .05, compared to negative control group. NC, Negative control.

a

a



366

Parasiticide Screening, Vol 2

Dose confirmation example using a nematode efficacy study in cattle with natural infection: This example follows WAAVP guideline for evaluating the efficacy of anthelmintics in ruminants (see APPENDIX A of Ref. [19]). Sixteen infected animals (confirmed using pretreatment fecal egg counts) are ranked by descending pretreatment body weight into eight blocks with two animals per block. Within each block, animals are randomly assigned to one of two treatments (untreated control, IVP) using a computer-generated randomization program implemented using SAS software (example code is provided in Appendix 4.3; tick counts can be replaced with body weights). All animals are housed on a single pasture. The study design is a RCBD with animal as the EU with blocking based on pre-treatment body weights. On TD 0, the animals are treated as per their assigned treatment group and necropsied 14 days PT. Nematodes recovered from animals at necropsy were counted and identified. The data from two example species, Ostertagia spp. and Trichostrongylus colubriformis, are presented in Appendix 4.2. These data are analyzed, separately for each species, both with and without log transformation {ln(count 1 1)}, using a general linear mixed model with the fixed effects of treatment and random effects of block and residual error. Modeling was implemented using the MIXED procedure in SAS (SAS v9.4, Cary, NC). Treatment group means are compared using a two-sided significance level of α 5 0.05. Treatment least squares (arithmetic means) or back-transformed least squares means (geometric means), along with percent efficacy and P-values are presented in Table 4.8. Efficacy was demonstrated for Ostertagia spp. but efficacy could not be evaluated for T. colubriformis since adequate infection in control animals was not observed with less than six control animals. As mentioned previously, these data can also be analyzed using nonparametric methods such as the Wilcoxon rank-sum implemented using the NPAR1WAY procedure in SAS software; associated P-values are presented in Table 4.8. Recent publications with examples of dose confirmation studies include Snyder and Wiseman (fleas and hookworm in dogs)40 and Liebenberg et al. (Insecticidal activity against mosquitoes in dogs).41 From a statistical design and analysis perspective, studies that evaluate persistent efficacy or speed of kill are similar to the above two worked out examples but with differences in time of measurement of counts PT; comparisons are made to untreated controls at each time of measurement. Specific examples for these types of studies are not discussed. Recent publications with examples of speed of kill studies include Six et al. (ticks on dogs),42 Becskei et al. (ticks on dogs),43 and Fourie et al. (ticks on cats).44 Examples of persistent efficacy studies are Cramer et al. (nematodes in cattle)45 and Geurden et al. (ticks in cats).46

Statistical considerations of parasiticide screening tests Chapter | 4

367

TABLE 4.8 Least squares means (arithmetic), back-transformed least squares means (geometric), and corresponding percent efficacy for two endoparasites from a dose confirmation study in cattle. Parasite Type of mean

Statistic

Ostertagia

Trichostrongylus colubriformis

NA

Number of infected controls

8

5

Arithmetic

Control mean

175.5

4.4

IVP mean

0.5

0.3

Percent efficacy

99.7

93.2a

P-value (general linear model)

,.0001

.0254a

Control mean

171.5

2.6

IVP mean

0.4

0.2

Percent efficacy

99.8

92.3a

P-value (general linear model)

,.0001

.0191a

P-value (Wilcoxon rank-sum test)

.0002

.0536

Geometric

NA

IVP, Investigational veterinary product. a Efficacy cannot be demonstrated due to lack of adequate infection in the control group with less than six infected animals.

Field studies Field effectiveness studies are conducted using the final formulation of the drug product to confirm effectiveness and safety. The studies can be conducted as single-site studies or as multisite studies to ensure adequate enrollment of study animals. Field effectiveness studies use natural infestations, and efficacy is based on reduction in fecal egg counts or ectoparasite counts compared to control group. Either a negative or a positive control group can be used in these studies depending on the protocol objectives and the animal and parasite species involved. When comparing to a positive control group, either superiority or noninferiority tests can be used. Example of a field effectiveness study for fleas in dogs: Cherni et al.47 describe a field study conducted to evaluate the efficacy and safety of sarolaner (Simparica) against fleas on dogs presented as veterinary patients in the United States. A total of 479 client-owned dogs from 19 clinics in multiple

368

Parasiticide Screening, Vol 2

geographic locations throughout the USA were included in the study. The study was designed as a multiclinic single-masked, positively controlled clinical trial with a randomized completed block design within each clinic. Households were enrolled if they had at least one dog with 10 or more live fleas. If the household had multiple dogs that met the criteria, a primary dog was randomly selected and other dogs were designated as supplemental dogs. Primary dogs were included in the efficacy evaluations, and all dogs were included in safety evaluations. Households were randomly assigned to treatment with either sarolaner or spinosad in a ratio of 2:1 based on order of enrollment within a clinic. Owners were provided tablets as per

TABLE 4.9 Results from field effectiveness study example for fleas in cats.46 TD of count Treatment Sarolaner

Spinosad

Statistica

0

14

30

60

90

N

186

177

176

161

156

AM

110.1

1

0.6

0.1

,0.1

% AE



99.1

99.5

99.9

.99.9

GM

48.4

0.4

0.3

,0.1

,0.1

% GE



99.3

99.5

.99.9

.99.9

% Success



97.7

98.3

100

99.4

N

94

90

87

83

77

AM

96.7

5.2

9.2

0.5

1.2

% AE



94.6

90.5

99.5

98.8

GM

47.8

1.2

2.3

0.3

0.2

% GE



97.5

95.1

99.4

99.6

% Success



90

69

100

97.5

95% CI of difference in % success (Sarolaner vs Spinosad)



(1.5, 16)

(20.1, 40.0)

(22.3, 4.7)

(21.3, 8.7)

Noninferiorityb (Sarolaner vs Spinosad)



Yes

Yes

Yes

Yes

Superiorityc (Sarolaner vs Spinosad)



Yes

Yes

a

%AE, %Efficacy using AM; %GE, %efficacy using GM; AM, arithmetic mean; GM, geometric mean; N, number of animals. Yes 5 lower CI . 5 15%. c Yes 5 lower CI . 0%. b

Statistical considerations of parasiticide screening tests Chapter | 4

369

the randomization plan at clinic visit and proceeded to dose the dogs at home; dosing occurred on TD 0 (first dose), 30, and 60 days. Primary dogs returned to the clinic for flea counts on TD 14, 30, 60, and 90 days post first treatment. Live flea counts were summarized using both arithmetic and geometric means by treatment group and time point. Percent effectiveness of each treatment group was calculated using the formula [(C 2 T)/C] 3 100, where C 5 pretreatment mean flea count and T 5 posttreatment mean flea count. To compare the two products, a percentage reduction was calculated, as defined previously, for each dog at each time point. If the percentage reduction was $ 90% then the treatment was considered a success for that dog. Exact 95% confidence intervals for the difference in treatment success rates were constructed using the BINOMIAL procedure in SAS from the StatXact package. The lower confidence limit of the difference was used to

TABLE 4.10 Model terms for common statistical designs. Designa

EUa

Fixed effects

Random effects

CRD

Animal

Treatment

Residual

CRD, repeated measures

Animal

Treatment, time, treatment by time

Animal, residual

RCBD

Animal

Treatment

Block, residual

RCBD, repeated measures

Animal

Treatment, time, treatment by time

Block, block by treatment, residual

RCBD

Pen

Treatment

Block, block by treatment (pen term), residual

RCBD, repeated measures

Pen

Treatment, time, treatment by time

Block, block by treatment, animal within block by treatment, block by treatment by time, residual

RCBD, multisite/ clinic

Animal

Treatment

Site (or clinic), block within site (or clinic), site (or clinic) by treatment, residual

RCBD, multisite/ clinic, repeated measures

Animal

Treatment, time, treatment by time

Site, block within site, site by treatment, block within site by treatment (animal term), site by treatment by time, residual

a

CRD, Completely randomized design; EU, experimental unit; RCBD, randomized complete block design.

370

Parasiticide Screening, Vol 2

test the hypothesis of noninferiority of sarolaner to spinosad and of superiority of sarolaner to spinosad the one-sided 0.025 level of significance using a delta of 15%. Results are summarized in Table 4.9. Efficacy was demonstrated for both treatments using both arithmetic and geometric means with % efficacy measures above 90% at all time points. While P-values for the comparisons between pre- and posttreatment counts at each time point were not presented in this publication, based on the study design, a repeated measures analysis could be conducted using a general linear mixed model with fixed effects of treatment, time point, and treatment by time point and random effects of clinic, clinic by treatment interaction, animal, clinic by treatment by time, and residual. Noninferiority of sarolaner compared to spinosad was demonstrated at all PT time points since the lower 95% confidence limit of difference in percent success rates was $ 2 15%. Superiority of sarolaner compared to spinosad was demonstrated at days 14 and 30 PT with the lower 95% confidence limit of difference in percent success rates being .0%. Other recent publications with examples of field studies include Hayes et al. (multicenter field trials for treatment and prevention of fleas and nematodes in dogs in Europe)48 and Edmonds et al. (nematodes in grazing cattle with pen as the EU).49 Common study designs and associated analysis models: To end this section, model terms for common study designs encountered in parasitology studies are presented in Table 4.10.

References 1. Milliken GA, Johnson DE. Analysis of messy data volume 1: designed experiments. 2nd ed; 2009. p. 7198. 2. St. Pierre NR. Design and analysis of pen studies in the Animal Sciences. J Dairy Sci 2007;90(E. Suppl.):E8799. 3. Bello NM, Kramer M, Templeman RJ, Stroup WW, St-Pierre NR, Craig BA, et al. Short communication: on recognizing the proper experimental unit in animal studies in the dairy sciences. J Dairy Sci 2016;99(11):88719. 4. European Medicines Agency. Committee of Medicinal Products for Veterinary Use. Guideline on statistical principles for clinical trials for veterinary medicinal products (pharmaceuticals). 2012. EMA/CVMP/EWP/81976/2010. 5. VICH GL 9 (GCP). Good Clinical Practice. Good Clinical Practices. ,https://www.fda. gov/downloads/AnimalVeterinary/GuidanceComplianceEnforcement/GuidanceforIndustry/ ucm052417.pdf.; 2000. 6. European Agency for the Evaluation of Medicinal Products. Committee for Proprietary Medicinal Products. Points to consider on switching between superiority and noninferiority; 2000. CPMP/EWP/EWP/482/99. 7. European Medicines Agency. Committee of Medicinal Products for Human Use. Guideline on the choice of the non-inferiority margin; 2005. EMEA/CVMP/EWP/2158/99. 8. Montgomery DC. Design and analysis of experiments. 4th ed. New York: John Wiley & Sons; 1997.

Statistical considerations of parasiticide screening tests Chapter | 4

371

9. McCullagh P, Nelder JA. Generalized linear models. 1st ed. London: Chapman and Hall; 1983. 10. Wilson K, Grenfell BT. Generalized linear modelling for parasitologists. Parasitol Today 1997;13:338. 11. Anderson N. Analysis of parasite and other skewed counts. Trop Med Int Health 2012;17 (6):68493. 12. nQuery Advisor 7. 0. Statistical solutions. Saugus, MA: Stonehill Corporate Center; 2007. 13. SAS Institute Inc. SAS/STATs 14.1 User’s Guide. Cary, NC; 2015. 14. High, R. An introduction to the Statistical Power Calculations for Linear Models with SAS 9.1. ,https://www.lexjansen.com/pnwsug/2007/Robin%20High%20-%20Statistical%20Power% 20Calculations%20for%20Linear%20Models.pdf. [accessed 05.08.18]. 15. Ciu Z, Zimmerman AG, Mowrey, DH. Sample size determination in animal health studies. In: Conference on applied statistics in agriculture. ,https://doi.org/10.4148/2475-7772.1114.. 16. Stroup W. Mixed model procedures to assess power, precision, and sample size in the design of experiments. In: Proceedings of the biopharmaceutical section. Alexandria, VA. American Statistical Association. 1999, . 15-24. 17. Stroup W. PROC GLIMMIX as a Teaching and Planning Tool for Experiment Design. ,http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf.; 2016 [accessed 05.08.18]. 18. Abbott WS. A method of computing the effectiveness of an insecticide. J Econ Entomol 1925;18:2657. 19. Wood IB, Amaral NK, Bairden K, Duncan JL, Kassai T, Malone Jr JB, et al. World Association for the Advancement of Veterinary Parasitology (W.A.A.V.P.) second edition of guidelines for evaluating the efficacy of anthelmintics in ruminants (bovine, ovine, caprine). Vet Parasitol 1995;58:181213. 20. Guidance for the Industry 90, Effectiveness of Anthelmintics: General Recommendations VICH GL7. ,https://www.fda.gov/downloads/animalveterinary/guidancecomplianceenforcement/guidanceforindustry/ucm052425.pdf. [accessed 04.08.18]. 21. Guideline for the testing and evaluation of the efficacy of antiparasitic substances for the treatment and prevention of tick and flea infestation in dogs and cats. EMEA/CVMP/ EWP/005/2000-Rev.3. ,http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_ guideline/2016/07/WC500210927.pdf.; 2016 [accessed 05.08.18]. 22. Questions and answers on the CVMP guideline on the “Testing and evaluation of the efficacy of antiparasitic substances for the treatment and prevention of tick and flea infestations in dogs and cats”. EMA/CVMP/005/00-Rev.2. ,http://www.ema.europa.eu/docs/ en_GB/document_library/Medicine_QA/2009/10/WC500004604.pdf. [accessed 04.08.18]. 23. McKenna PB. What do anthelmintics efficacy figures really signify? NZ Vet J 1998;46:823. 24. Dobson RJ, Sangster NC, Besier RB, Woodate RG. Geometric means provide a biased efficacy result when conducting faecal egg count reduction test. (FECRT). Vet Parasitol 2009;161:1627. 25. Smothers CD, Sun F, Dayton AD. Comparison of arithmetic and geometric means as measures of a central tendency in cattle nematode populations. Vet Parasitol 1999;81:21124. 26. Festing MFW. Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA. ATLA 2001;29:42746. 27. Compton ME, Mize CW. Statistical considerations for in vitro research: I  Birth of an idea to collecting data. In Vitro Cell Dev Biol Plant 1999;35:11521. 28. Lazic SE. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci 2010;11:5 Available from: https://doi.org/10.1186/1471-2202-11-5.

372

Parasiticide Screening, Vol 2

29. Lazic SE, Clarke-Williams CJ, Munafo MR. What exactly is ‘N’ in cell culture experiments? PLoS Biol 2018. Available from: http://journals.plos.org/plosbiology/article? id 5 10.1371/journal.pbio.2005282 Published 04 April 2018. Accessed 05 August 2018. 30. Busvine JR. Toxicological statistics. A critical review of the techniques for testing insecticides. 2nd ed. Commonwealth Agricultural Bureaux; 1971. p. 263288. 31. Williams CB. The use of logarithms in the interpretation of certain entomological problems. Ann Appl Biol 1937;24:40414. 32. FDA CVM Guidance for Industry # 35. Bioequivalence Guidance. ,https://www.fda.gov/ downloads/AnimalVeterinary/GuidanceComplianceEnforcement/GuidanceforIndustry/ ucm052363.pdf.; 2006. 33. VICH GL 52 (GCP). Bioequivalence: blood level bioequivalence study. ,https://www.fda. gov/downloads/AnimalVeterinary/GuidanceComplianceEnforcement/GuidanceforIndustry/ UCM415697.pdf.; 2016. 34. Petrie A, Watson P. Statistics for veterinary and animal science. 3rd ed. Wiley Blackwell; 2013. p. 923. 35. Henderson CF, Tilton EW. Tests with acaricides against the brown wheat mite. J Econ Entomol 1955;48:15761. 36. Marchiondo AA, Holdsworth PA, Fourie LJ, Rugg D, Hellmann K, Snyder DE, et al. World Association for the Advancement of Veterinary Parasitology (W.A.A.V.P.) second edition: guidelines for evaluating the efficacy of parasiticides for the treatment, prevention and control of flea and tick infestations on dogs and cats. Vet Parasitol 2013;194:8497. 37. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS for mixed models. 2nd ed. SAS Institute Inc.; 2007. p. 1836. 38. Kincaid C. Guidelines for selecting the covariance structure in mixed model analysis. In: Proc 30th annual SAS users group international conference. Paper 19830. ,http:// www2.sas.com/proceedings/sugi30/198-30.pdf.; 2005 [accessed 05.08.18]. 39. McTier TL, Six RH, Fourie JJ, Pullins A, Hedges L, Mahabir SP, et al. Determination of the effective dose of a novel oral formulation of sarolaner (Simparicat) for the treatment and month-long control of fleas and ticks on dogs. Vet Parasitol 2016;222:1217. 40. Snyder DE, Wiseman S. Dose confirmation and non-interference evaluations of the oral efficacy of a combination of milbemycin oxime and spinosad against the dose limiting parasites, adult cat flea (Ctenocephalides felis) and hookworm (Ancylostoma caninum), in dogs. Vet Parasitol 2012;184:28490. 41. Liebenberg J, Fourie J, Lebon W, Larsen D, Halos L, Beugnet F. Assessment of the insecticidal activity of afozolaner against Aedes aegypti in dogs treated with NexGard. Parasite 2017;24:39 Available from: https://doi.org/10.1051/parasite/2017042. 42. Six RH, Liebenberg J, Honsberger NA, Mahabir SP. Comparative speed of kill of sarolaner (Simparicat) and fluralaner (Bravectos) against induced infestations of Ctenocephalides felis on dogs. Parasit Vectors 2016;9:93 Available from: https://doi.org/10.1186/s13071016-1374-z. 43. Becskei C, Geurden T, Erasmus H, Cuppens O, Mahabir SP, Six RH. Comparative speed of kill after treatment with Simparicat (sarolaner) and Advantixs (imidacloprid 1 permethrin) against induced infestations of Dermacentor reticulatus on dogs. Parasit Vectors 2016;9:104 Available from: https://doi.org/10.1186/s13071-016-1377-9. 44. Fourie JJ, Horak IG, de Vos C, Deuster K, Schunack B. Comparative speed of kill, repellent (anti-feeding) and acaricidal efficacy of an imidacloprid/flumethrin collar (Serestos) and a Fipronil/(S)-Methoprene/Eprinomectin/Praziquantel Spot-on (Broadlines) against Ixodes ricinus (Linne´, 1758) on cats. Parasitol Res 2015;114:S109116.

Statistical considerations of parasiticide screening tests Chapter | 4

373

45. Cramer LG, Pitt SR, Rehbein S, Gogolewski RP, Kunkle BN, Langholff WK, et al. Persistent efficacy of topical eprinomectin against nematode parasites in cattle. Parasitol Res 2000;86:9446. 46. Geurden T, Borowski S, Wozniakiewicz M, King V, Fourie J, Liebenberg L. Comparative efficacy of a new spot-on combination product containing selamectin and sarolaner (StrongholdsPlus) versus fluralaner (Bravectos) against induced infestations with Ixodes ricinus ticks on cats. Parasit Vectors 2017;10:319. 47. Cherni JA, Mahabir SP, Six RH. Efficacy and safety of sarolaner (Simparicat) against fleas on dogs presented as veterinary patients in the United States. Vet Parasitol 2016;222:438. 48. Hayes B, Schnitzler B, Wiseman S, Snyder DE. Field evaluation of the efficacy and safety of a combination of spinosad and milbemycin oxime in the treatment and prevention of naturally acquired flea infestations and treatment of intestinal nematode infections in dogs in Europe. Vet Parasitol 2015;207:99106. 49. Edmonds MD, Vatta AF, Marchiondo AA, Vanimisetti HB, Edmonds JD. Concurrent treatment with a macrocyclic lactone and benzimidazole provides season long performance advantages in grazing cattle harboring macrocyclic lactone resistant nematodes. Vet Parasitol 2018;252:15762.

Appendices Appendix 4.1 APPENDIX 4.1 Variety of scenarios for dependent variable and independent variable(s), objective, and statistical test/model. Response variable

Independent variable(s)

Objective

Test/Type of model

Qualitative (2 levels)

Qualitative (2 levels)

Comparing group proportions Comparing group proportions

Fisher’s exact test Chi-square test Generalized linear model (logit link) Logistic regression Probit regression Generalized linear model (logit link) Nonlinear regression Generalized linear model (logit link)

Qualitative (more than 2 levels)

Quantitative

Sigmoid functional relationship

Multiple qualitative variables

Comparing group proportions

(Continued )

374

Parasiticide Screening, Vol 2

APPENDIX 4.1 (Continued) Response variable

Quantitative

Independent variable(s)

Objective

Test/Type of model

Mixed effects qualitative variables

Comparing group proportions

Repeated measures in time

Comparing groups within time

Qualitative (2 groups)

Comparing independent group Comparing dependent (paired) group Comparing independent group

Generalized linear mixed model (logit link) Generalized linear mixed model (logit link) t-Test

Qualitative (2 groups) Qualitative (2 or more groups)

Qualitative with uncontrolled independent/ nuisance variables (any variable type) Qualitative variables (fixed and random)

Quantitative

Comparing independent groups, adjusting for nuisance variables Comparing group means (larger inferential scope than ANOVA) Functional relationship

Multiple variables (mixed types)

Functional relationship

Repeated measures

Functional relationship

Paired t-test ANOVA; general fixed effects linear model ANCOVA; general fixed effects linear model General linear model; general

Linear regression; polynomial regression Nonlinear regression Mixed effects regression (linear or nonlinear) General linear mixed model

Statistical considerations of parasiticide screening tests Chapter | 4

375

Appendix 4.2 APPENDIX 4.2 Data for dog tick dose determination study example presented in Table 4.6. Tick counts Treatment

Block

Animal

TD 2

TD 2

TD 7

TD 14

TD 21

TD 28

TD 35

TD 42

Negative control Negative control Negative control Negative control Negative control Negative control Negative control Negative control Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 0.5 3 Treatment dose 1 3 Treatment dose 1 3 Treatment dose 1 3 Treatment dose 1 3

1

D22

25

19

37

39

46

45

39

40

2

D17

27

42

41

46

43

45

40

35

3

D29

28

33

39

32

34

34

38

30

4

D02

34

35

36

28

28

20

35

38

5

D06

37

36

45

48

3

36

36

42

6

D18

38

35

41

42

82

41

41

34

7

D26

41

46

43

39

27

47

36

34

8

D13

43

43

42

33

42

46

38

35

1

D10

26

20

14

22

16

28

33

41

2

D08

28

18

23

10

15

16

21

34

3

D30

29

14

13

22

22

30

42

20

4

D25

33

21

18

22

20

15

20

30

5

D21

36

13

20

14

23

24

22

22

6

D01

38

14

25

26

21

24

35

33

7

D05

41

15

16

17

18

33

20

31

8

D20

45

19

18

23

22

20

35

29

1

D27

25

1

0

0

0

0

1

5

2

D03

26

1

5

8

9

12

19

34

3

D31

32

0

0

0

0

0

1

3

4

D07

35

0

3

5

6

19

12

15

(Continued )

376

Parasiticide Screening, Vol 2

APPENDIX 4.2 (Continued) Tick counts Treatment

Block

Animal

TD 2

TD 2

TD 7

TD 14

TD 21

TD 28

TD 35

TD 42

Treatment dose 1 3 Treatment dose 1 3 Treatment dose 1 3 Treatment dose 1 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3 Treatment dose 2 3

5

D23

35

0

0

4

2

7

7

9

6

D04

40

1

3

8

10

15

25

34

7

D19

42

0

0

0

0

0

0

0

8

D24

42

1

0

0

0

4

7

12

1

D11

26

0

3

5

8

12

17

25

2

D09

27

0

0

0

0

0

0

0

3

D28

30

0

0

6

0

12

15

35

4

D16

33

0

0

0

0

0

0

0

5

D14

36

0

0

0

0

0

0

0

6

D32

40

0

4

7

9

15

25

32

7

D12

42

0

0

0

0

0

0

0

8

D15

45

0

0

0

0

0

0

0

APPENDIX 4.2 Data for cattle nematode dose confirmation study example presented in Table 4.7. Treatment

Block

Animal

Ostertagia

Trichostrongylus colubriformis

Negative control Negative control Negative control Negative control Negative control Negative control Negative control Negative control IVP IVP IVP IVP

1 2 3 4 5 6 7 8 1 2 3 4

C23 C43 C27 C87 C54 C19 C76 C83 C20 C35 C28 C39

202 162 122 202 168 227 198 123 0 2 0 0

0 7 0 11 5 8 4 0 0 1 0 0 (Continued )

Statistical considerations of parasiticide screening tests Chapter | 4

377

APPENDIX 4.2 (Continued) Treatment

Block

Animal

Ostertagia

Trichostrongylus colubriformis

IVP IVP IVP IVP

5 6 7 8

C41 C37 C24 C52

1 0 0 1

0 0 1 0

IVP, Investigational veterinary product.

Appendix 4.3 SAS Code For Dog Tick Dose Determination Study / Dog Tick Study for Dose Determination - Randomization  /  read in data for randomization - 32 dogs with Day data random1;  -2 Tick counts ; input animal $ count @@; label count 5 'Day -2 Tick Counts'; infile cards; cards; D01 38 D09 27 D17 27 D25 33 D02 34 D10 26 D18 38 D26 41 D03 26 D11 26 D19 42 D27 25 D04 40 D12 42 D20 45 D28 30 D05 41 D13 43 D2136 D29 28 D06 37 D14 36 D2225 D30 29 D07 35 D15 45D23 35 D31 32 D08 28 D16 33 D24 42 D32 40 ; run;  %let seed1 5 3846724; seed for tiebreaks when creating blocks  ;  %let seed2 5 2475181; seed for treatment assignment within block  ;  assigning a seed allows to produce reproducible results if program is rerun. seeds can be picked from any random number table available online and should be chosen arbitrarily. new seeds should be used for each new study  ; data random1; set random1; tieb 5 ranuni(&seed1);  tie break random number from uniform distribution ; tseed 5 ranuni(&seed2);  treatment assignment random number ; run;  proc sort data 5 random1; sort in descending order by day -2 tick count ; by count descending tieb;  use random number to break ties when counts are equal ; run;  data random1; group animals into 8 blocks of four animals each  ; set random1; block 5 ceil(_n_/4); run;  randomly assign animals within each block to treatment ; proc rank data 5 random1 out 5 random2; (Continued )

378

Parasiticide Screening, Vol 2

(Continued)  ranks trt; assign treatment number  ;  by block; within each block  ;  using the treatment assignment random number  ; var tseed; run; data random2; length treat $50; set random2; if trt 5 1 then treat 5 'Negative Control'; else if trt 5 2 then treat 5 'Treatment Dose 0.5x'; else if trt 5 3 then treat 5 'Treatment Dose 1x'; else if trt 5 4 then treat 5 'Treatment Dose 2x'; run; proc report data 5 random2; columns animal count tieb block tseed trt treat; label animal 5 'Animal' tieb 5 'Tie break random number when creating block' block 5 'Block' tseed 5 'Random number for treatment assignment' trt 5 'Treatment number' treat 5 'Treatment Description'; run;