Approaches to Integrating Biomarkers Into Clinical Trials and Care Pathways as Targets for the Treatment of Inflammatory Bowel Diseases

Approaches to Integrating Biomarkers Into Clinical Trials and Care Pathways as Targets for the Treatment of Inflammatory Bowel Diseases

Gastroenterology 2019;-:1–12 Q1 Q2 Q3 Q27 Approaches to Integrating Biomarkers Into Clinical Trials and Care Pathways as Targets for the Treatment ...

952KB Sizes 0 Downloads 2 Views

Gastroenterology 2019;-:1–12

Q1 Q2 Q3

Q27

Approaches to Integrating Biomarkers Into Clinical Trials and Care Pathways as Targets for the Treatment of Inflammatory Bowel Diseases Parambir S. Dulai,1 Laurent Peyrin-Biroulet,2 Silvio Danese,3 Bruce E. Sands,4 Axel Dignass,5 Dan Turner,6 Gerassimos Mantzaris,7 Juergen Schölmerich,8 Jean-Yves Mary,9 Walter Reinisch,10 and William J. Sandborn1 1 Division of Gastroenterology, University of California San Diego, La Jolla, California; 2Department of Gastroenterology, Nancy University Hospital, Lorraine University, Nancy, France; 3Department of Biomedical Sciences, Humanitas University, Humanitas Clinical and Research Centre, Milan, Italy; 4Dr Henry D. Janowitz Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, New York; 5Department of Medicine I, Agaplesion Markus Hospital and Crohn Colitis Clinical Research Center Rhein-Main, Frankfurt/Main, Germany; 6Institute of Paediatric Gastroenterology, Shaare Zedek Medical Center, The Hebrew University of Jerusalem, Jerusalem, Israel; 7Department of Gastroenterology, Evaggelismos-Ophthalmiatreion Athinon-Polycliniki, Athens, Greece; 8Goethe University, Frankfurt, Germany; 9 INSERM UMR, Paris Diderot University, Saint Louis Hospital, Paris, France; and 10Universitätsklinik für Innere Medizin III, Vienna, Austria

BACKGROUND & AIMS: There is no consensus on the best way to integrate biomarkers into inflammatory bowel disease (IBD) research and clinical practice. The International Organization for the Study of Inflammatory Bowel Disease aimed to outline biomarker definitions, categories, and operating properties required for their use in registration trials and clinical practice. Using fecal calprotectin as an example, we provide a framework for biomarker development and validation in patients with IBD. METHODS: We reviewed international society guidelines, regulatory agency guidance documents, and standardized reporting guidelines for biomarkers, in combination with publications on fecal calprotectin levels in patients with IBD. We assessed the validity of fecal calprotectin to serve as a surrogate biomarker of IBD activity and outlined a framework for further validation and development of biomarkers. RESULTS: No endpoints have been fully validated as surrogates of risk of disease complications; mucosal healing is the most valid endpoint used to determine risk of disease complications. Fecal level of calprotectin has not been validated as a biomarker for IBD activity because of lack of technical and clinical reliability, assessment of performance when used as a replacement for endoscopy, and assessment of responsiveness to changes in disease states. The level of fecal calprotectin can be used only as a prognostic factor for disease recurrence in patients in remission after medical or surgical treatment. CONCLUSIONS: We reviewed guidelines, regulatory documents, and publications to identify properties required for the development of biomarkers of IBD activity and areas in need of clarification from regulatory agencies and societies. We propose a path forward for research of biomarkers for IBD.

Keywords: Crohn’s Disease; Ulcerative Colitis; Outcome; Response to Treatment.

S

ignificant progress has been made in drug development for Crohn’s disease (CD) and ulcerative colitis (UC). There has also been an increasing emphasis placed on appropriate end point selection and disease activity monitoring, due largely to an evolving appreciation of how misinterpretation of inflammatory vs clinical disease activity can affect clinical trial success and clinical decision making. Endoscopy represents the criterion standard of objective disease activity assessment, requiring central blinded reading to ensure reproducible, unbiased assessment. In routine practice, endoscopy can be associated with considerable cost, risk, and burden to patients and health care systems, and from a patient perspective, endoscopybased monitoring is ranked least acceptable.1 Biomarkers are objectively measured and evaluated assessments of normal or pathogenic biologic processes and/ or pharmacologic responses to therapeutic interventions.2 The use of biomarkers in inflammatory bowel disease (IBD) has largely been limited to noninvasive adjunctive assessment of disease activity and is not approved by regulatory agencies to be predictive nor to be a surrogate of a further clinical endpoint.3 There is no consensus on and only limited knowledge about how to optimally integrate biomarkers into IBD research and clinical practice. A framework for biomarker validation and integration into

Abbreviations used in this paper: ACG, American College of Gastroenterology; AGA, American Gastroenterological Association; CD, Crohn’s disease; CRP, C-reactive protein; ECCO, European Crohn’s and Colitis Organization; EMA, European Medicines Agency; FC, fecal calprotectin; FDA, US Food and Drug Administration; IBD, inflammatory bowel diseases; MH, mucosal healing; UC, ulcerative colitis. © 2019 by the AGA Institute 0016-5085/$36.00 https://doi.org/10.1053/j.gastro.2019.06.018

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

CLINICAL AT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

2

Gastroenterology Vol.

-,

No.

-

BACKGROUND AND CONTEXT

full-text studies relevant to FC in IBD. The Standards for Reporting Diagnostic Accuracy were also reviewed to capture fundamental pillars in biomarker reporting for FC in IBD.5

There is no consensus on the best way to integrate biomarkers into inflammatory bowel disease (IBD) research and clinical practice.

Biomarker Categories

WHAT YOU NEED TO KNOW

NEW FINDINGS

CLINICAL AT

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180

Dulai et al

No endpoints have been validated for studies of IBD. Fecal level of calprotectin has not been validated as a biomarker for IBD activity. Level of fecal calprotectin can be used only as a prognostic factor for relapse following treatment. LIMITATIONS This was a review of guidelines, regulatory agency guidance documents, and publications, in combination with expert opinion. IMPACT The authors propose a path forward for research of biomarkers for IBD.

practice would potentially have utility in both early- and late-stage drug development and in clinical practice. In the current review we outline definitions and characteristics of biomarkers and operating properties of importance for both registration trials and for clinical practice, and we propose a path forward for surrogate biomarker development in IBD, using fecal calprotectin (FC) as an example.

Methods The Endpoints Cluster Group of the International Organization for the Study of Inflammatory Bowel Diseases (IOIBD) met on March 17, 2018, in Rio de Janeiro, Brazil, and decided to write an article on biomarkers and biomarker qualification for regulatory approval, using FC as the example. The cluster group members were polled regarding interest in participating, and a subgroup of the members volunteered and became the authors of the article. Parambir S. Dulai was invited to participate by the group lead, William J. Sandborn, with the consent of the other authors. The final article was reviewed by the entire membership of the Endpoints Cluster Group and by the Chairman of the IOIBD, Geert D’Haens. A review of the US Food & Drug Administration (FDA) and the European Medicines Agency (EMA) Web sites was conducted for guidance documents on clinical trials in IBD and/or biomarkers. This included the BEST (Biomarkers, EndpointS, and other Tools) Resource prepared by the FDA and National Institutes of Health.4 A review of the American Gastroenterology Association (AGA) (https://www.gastro.org/guidelines/ ibd-and-bowel-disorders), American College of Gastroenterology (ACG) (http://gi.org/clinical-guidelines/clinical-guide lines-sortable-list/) and European Crohn’s and Colitis Organization (ECCO) (https://www.ecco-ibd.eu/publications/eccoguidelines-science/published-ecco-guidelines.html) Web sites was conducted for IBD-specific guidelines and technical reviews to identify statements relevant to biomarkers and/or the integration of biomarkers into clinical trials or care pathways. A search of PubMed was conducted for studies related to calprotectin, and 4060 titles and/or abstracts were screened for

A summary of key biomarker-specific terms and their importance to IBD is provided in Table 1. Important definitions and distinctions are summarized here:  Diagnostic biomarker: Used to detect or confirm the presence of a disease or condition and assessed through cross-sectional studies of biomarker measurements and disease activity measurements at the same time point.  Monitoring biomarker: Measured serially over time to evaluate changes in biomarker and disease state and assessed through longitudinal studies in which variation of the biomarker and change in clinical disease activity or in disease states are measured in the same time frame.  Prognostic or predictive biomarker: Assessed through longitudinal studies of the biomarker at a given time point and of clinical disease activity or disease complication occurrence during subsequent follow-up or at a later time point. A prognostic biomarker reflects an individual patient’s intrinsic probability of experiencing a specific event (ie, disease recurrence, progression, complication, or surgery), often derived from observational data. An example of this is a tool that stratifies an individual patient’s probability of experiencing a disease-related complication or recurrence or of requiring surgery (irrespective of treatment exposure).6,7 A predictive biomarker reflects the ability to identify individuals with a positive biomarker who are more likely than similar individuals with a negative biomarker to experience a favorable effect from exposure to a specific intervention, best derived from clinical trial data. An example of this is a tool that stratifies an individual patient’s probability of responding to therapy (specific to treatment exposure).8 The integration of prognostic and predictive biomarkers is possible by showing that a specific therapy may be effective in both patients with a positive predictive biomarker and those with a negative predictive biomarker; however, the effect size is larger among a subgroup of patients based on the status of their prognoses related to a diseaserelated complication.  Composite biomarker: A combination of 2 or more biomarkers, which are combined by using a stated algorithm or approach to obtain a single interpretive readout. An example of this can be taken from nonalcoholic fatty liver disease, for which magnetic resonance imaging–based noninvasive assessments (ie, biomarkers) of fat quantity (magnetic resonance imag- Q7 ing–proton-density fat fraction) and hepatic fibrosis (magnetic resonance elastography) are combined into a single interpretative readout to inform the full spectrum of disease progression and regression and the prognostication of disease outcomes.2

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 -

Table 1.Key Biomarker Terminology and Description

Diagnostic

FDA–NIH definition

FDA–NIH explanation

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

Detect or confirm presence of a Determination of whether a patient has a disease or condition of particular medical condition for which interest OR identify treatment may be indicated OR whether individuals with a subtype of an individual should be enrolled in a the disease clinical trial studying a particular disease or subtype of the disease. Monitoring Measured serially for assessing Serial nature of the measurements focuses attention on change in the biomarker’s status of a disease or value as an indicator of an individual’s medical condition or for current or future condition, beneficial or evidence of exposure to (or adverse effect of a drug or other effect of) a medical product intervention, or effect of an exposure or an environmental agent over time. Can include other biomarker types when measured serially. Pharmacodynamic Show that a biological response Level changes in response to an exposure or response has occurred in an individual and provides early evidence that a (application of who has been exposed to a treatment might have an effect on a monitoring) medical product or an clinical endpoint of interest or can be environmental agent used to assess a pharmacologic endpoint related to safety concerns.

Predictive

Prognostic

Importance in clinical trials (enrichment strategies)

2019

Term

Operating properties to be assessed and examples

Decrease heterogeneity: Accurate classification Accuracy, Precision, Reliability of disease, activity state, severity, and/or Identify IBD and subtypes and quantify distribution of involvement extent and severity of involvement Increase study power and the ability to EF in patients with HF to identify subsets demonstrate a treatment effect of disease state Responsiveness Decrease heterogeneity during screening: Detect worsening or improvement in disease Confirm persistent stable activity before treatment and tolerability/adequate before enrollment exposure/response during treatment Decrease heterogeneity during active treatment: LFTs to monitor toxicity Confirm drug concentration/exposure and rate of response, progression, or toxicity

Responsiveness, Feasibility Ease of monitoring regression or progression of inflammation after an intervention without the need for invasive testing HCV-RNA or HIV-RNA to measure response and guide further treatment

Accuracy, Reliability Identify patients likely to respond to specific interventions TPMT genotype or activity for treatment with thiopurines to predict toxicity

Accuracy, Reliability Balanced randomization based on clinical prediction models for disease progression and/or complication Gleason score for progression of prostate cancer under therapy

Adapted from FDA-NIH BEST Resource.4 EF, ejection fraction; HCV, hepatitis C virus; HF, heart failure; LFT, XXXX; NIH, National Institutes of Health; TPMT, thiopurine methyltransferase.

Q21

IOIBD: Biomarkers in IBD

Enrichment through randomized withdrawal: Empiric strategy whereby patients with an apparent response to treatment (during open-label or treatment arm) are randomly reassigned to receive drug or placebo Establish long-term effectiveness when longterm use of placebo not acceptable or high dropout and avoid long-term exposure to ineffective drug Predictive strategy: Selecting patients more Used either to select patients for Identify at baseline individuals participation or to stratify patients into likely to respond to an intervention who are more likely than (positively or negatively) compared with biomarker-positive and biomarkersimilar individuals without negative groups, with the other patients with the condition the biomarker to experience primary endpoint being the effect in the Increase absolute and relative effect size, a favorable or unfavorable biomarker-positive group. reduction in size of study population needed, effect from exposure to a and enhanced benefit to risk relationship medical product or an (ideal for early-phase proof-of-concept trials) environmental agent Prognostic strategy: Selecting patients more Identify baseline likelihood of a Indicates an increased (or decreased) likely to have clinical events of interest or to clinical event, disease likelihood of a future clinical event, progress rapidly with substantial worsening recurrence, or progression in disease recurrence, or progression in an identified population. Measured at of the condition being treated patients who have the Increase absolute effect size between groups defined baseline and therefore may disease or medical condition include background or prior treatments (ideal for prevention trials) with some impact of interest on the size of study population needed

3

CLINICAL AT

301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360

4

Gastroenterology Vol.

Endpoint Definitions for Biomarker Studies It is important to consider the endpoint against which the biomarker is being assessed and to validate whether that endpoint may serve as a surrogate for disease effects, net benefits of interventions, and/or future clinical events.

CLINICAL AT

361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420

Dulai et al

 Reasonably likely surrogate endpoint: An endpoint supported by strong mechanistic and/or epidemiologic rationale such that an effect on the surrogate endpoint is expected to be correlated with an endpoint intended to assess clinical benefit in clinical trials but without sufficient clinical data to show that it is a validated surrogate endpoint. Symptom-based indices have served as a prime example of a noninvasive assessment previously believed to be a reasonably likely surrogate endpoint in IBD trials, despite the lack of evidence supporting an association between resolution of symptoms with reduction in diseaserelated complications.  Validated surrogate endpoint: An endpoint supported by a clear mechanistic rationale and clinical data providing strong evidence that an effect on the surrogate endpoint predicts a specific clinical benefit. Achieving mucosal healing (MH) is an early example of a possibly validated surrogate endpoint, given its clear association with reductions in several disease-related complications,9 however, it is not enough for an early surrogate endpoint (ie, MH) to correlate with a later clinical endpoint (diseaserelated complications) for it to be considered a validated surrogate endpoint. The effect of the intervention on the surrogate endpoint must reliably predict or fully capture the net effect of the intervention on the clinical outcome.4,10,11 Indeed, this is not straightforward to prove because many confounding factors may be at the origin of a correlation between the observation of a possibly surrogate endpoint and an observed favorable further clinical outcome. The randomized clinical trial is an attractive method in this situation, as are promising new methods that deal with causal effects.12 A randomized trial by Landi et al13 assessed the validity of endoscopic remission serving as a surrogate endpoint of future clinical relapse among patients with colonic CD being treated with steroids for acute flares. The study enrolled 147 patients with CD being treated with oral prednisolone for an acute flare, of whom 136 achieved clinical remission. Endoscopic activity was observed in 96 of these 136, and these 96 patients were subsequently randomly assigned to receive immediate steroid taper (n ¼ 46) or prolonged prednisolone for 5 weeks with subsequent taper (n ¼ 50); the remaining 40 patients who had achieved both clinical remission and endoscopic remission had immediate steroid tapering. Although the prolonged steroid treatment before tapering resulted in an improved endoscopic remission rate, there were no significant differences across groups in terms of clinical relapse after steroid withdrawal. That study was performed before the development and validation of endoscopic disease activity indices, and endoscopic remission was empirically defined as no lesion at all, or only

-,

No.

-

scarred lesions, or minor lesions with at least a 2-grade decrease on a 6-grade scale and no deep ulceration. However, it does provide insight and framework into how a trial may be used to validate MH as a surrogate endpoint and highlights the facts that fully capturing the relationship is not easy to prove and that most endpoints presented as surrogate endpoints of a more distant clinical endpoint do not fulfill this requirement. Therefore, although MH is not a fully validated surrogate endpoint, it is considered our most valid surrogate endpoint of many further disease complications, and this is now being considered a requirement as an endpoint in IBD trials. Ideal biomarkers in IBD are those that measure (diagnostic), monitor (pharmacodynamics/responsive), predict, or prognosticate previously recognized and validated surrogate endpoints for disease-related complications. For this reason, any future consideration for biomarker integration into clinical trials and clinical practice for IBD will need to focus on a comparison of biomarkers to disease remission (mucosal and transmural healing) and the ability to measure, monitor, predict, or prognosticate the achievement of remission across diseases (CD and UC) and subtypes (disease location and/or disease-related complication–specific groups).9

Assessment of Biomarker Operating Properties A summary of technical and clinical operating properties (including their components) for biomarkers and their implications for clinical trials can be found in Figure 1.  Reliability: The consistency in repeated measures when the Q8 measured concept has remained unchanged or its ability to give the same answer every time. It is usually expressed as the intraclass correlation coefficient estimate.  Accuracy: The extent to which a biomarker measures the outcome it was intended to measure. For measuring disease remission, when the biomarker is used as a continuous marker, accuracy may be expressed as a receiver operating characteristic area under the curve or a concordance index (C-index), whereas when assessing operating properties related to a given threshold of a biomarker, accuracy may be expressed as sensitivity (ability to rule out active inflammation), specificity (ability to rule in active inflammation), predictive values (negative or positive), and/or likelihood ratios (negative or positive). Negative and positive predictive values provide the distinct advantage of being able to take into consideration both diagnostic accuracy (sensitivity/ specificity) and diagnosis prevalence (proportion of patients with MH) in the studied population to provide estimates of the test performance. However, these estimates depend on the observed prevalence in the sample, which therefore limits generalizability. Negative and positive likelihood ratios outline the likelihood that the patient has the outcome of interest and do not depend on outcome prevalence. A positive likelihood ratio expresses the probability of being classified as positive by

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480

IOIBD: Biomarkers in IBD

5

print & web 4C=FPO

481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540

2019

CLINICAL AT

-

Figure 1. Technical and clinical operating properties of biomarkers and implications for clinical trials. Reliability can relate to the technical side of performing the assay and the clinical side of variation between measured patients. Technical aspects of reliability focus on intra-assessments and interassessments of an operator (the same operator twice for the same sample or 2 individual operators for the same sample), laboratory or lot (sources of samples), day (timing of collection during day), and extraction or recovery (separate extractions of the same sample or different samples of varying concentrations). Clinical aspects of reliability focus on intra-assessments and interassessments of a patient (the same patient assessed twice without change in reference activity measure or 2 patients with similar reference activity measures), treatment (different or similar treatments), and disease subtype (presence or absence of complications or alterations, eg, strictures or fistulae).

the biomarker among patients having the outcome to the same probability among patients not having the outcome. In addition, the ratio of the negative and positive likelihood ratios provides the odds ratio estimate of the association between the biomarker result and the outcome.  Responsiveness: The ability of a biomarker to measure meaningful changes in disease states, which can be presented in 1 of 3 ways: biomarker correlations to changes in reference measures of disease status, consistency in biomarker accuracy measures across disease states, and distributional methods (ie, Cohen or Guyatt statistics) that evaluate changes in the biomarker and its associated variability (ie, standard deviation).14 Optimizing reliability, accuracy, and responsiveness across treatment interventions and patient subgroups, can minimize the sample size required for a clinical trial.10,11,15,16 Optimization of these operating properties can be performed at each stage of assessment through colocalization or composite biomarker strategies, such as the combination of FC with C-reactive protein (CRP) level to optimize the detection of mucosal inflammation17 or the use

of 2 or more diagnostic assessment modalities to obtain 100% accuracy for determining fistula anatomy in patients Q9 with CD with perianal disease.18 The accuracy and responsiveness are strongly influenced by the reference measurement (ie, criterion standard), and when an operating characteristic is estimated in a sample through a rule derived from the same sample, the quality of this characteristic is overestimated, and validation is necessary on a new independent sample. Thus, for disease remission in IBD (mucosal and/or transmural healing), factors such as blinding or adjudication of endoscopic or radiographic scoring, disease activity indices, segment-specific scoring, combination of endoscopy with histologic or radiographic assessment of transmural healing, and external validation in additional cohorts are important.

Importance of Validated Biomarkers for Drug Development in Inflammatory Bowel Disease The 21st Century Cures Act highlighted the importance of biomarker qualification as drug development tools to accelerate development of new safe and effective drugs and

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600

6

CLINICAL AT

601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660

Dulai et al

Gastroenterology Vol.

-,

No.

-

biologics. The FDA has accordingly now placed a greater focus on the generation of data and evidence needed to support biomarker development for these purposes. The implications of validated biomarkers on clinical trial design are highlighted in Figure 1. With regard to drug development programs specifically, validated biomarkers have 3 key potential roles: assessment of desired biochemical effect (exposure-response), providing early feedback for preclinical decision making (go/no go), and targeting recruitment to study participants most likely to respond or progress within a reasonable and manageable clinical study time frame (prognostic).19 A biomarker validated for a surrogate endpoint such as MH could be of value as a response criterion within early dose-finding studies to overcome the lack of granularity in responsiveness of currently available indices. Subsequent use of the validated biomarker for a remission criterion would allow for efficient and costeffective assessments of treatment effectiveness vs placebo or standard of care to allow for early go/no-go decision making. Finally, if the biomarker is validated against an endpoint that is a true surrogate for a future event (clinical relapse or surgery), it could be used to stratify patient recruitment into specific trials focusing on prevention strategies within an umbrella or basket trial design platform, or it could be used as a trial endpoint to obtain a more rapid assessment and comparison between treatment arms without requiring the longer follow-up periods traditionally needed for the occurrence of rare events (ie, surgery).20

(FC or CRP) vs confirmation with colonoscopy in symptomatic patients with CD, but this may depend on whether there is historically good correlation between the biomarker selected and colonoscopy in the specific patient. In symptomatic patients with CD without inflammatory markers, the AGA guidance specifically comments on performing cross- Q10 sectional imaging; however, in symptomatic patients with CD with inflammatory markers, the guidance does not make any recommendations on cross-sectional imaging. ECCO guidelines for CD state that CRP and FC can be used to guide therapy and short-term follow-up and to predict clinical relapse. However, no recommendation is made on how to position these biomarkers within treatment care pathways or what their role should be in replacing or augmenting endoscopic evaluations.25 It should be noted that these recommendations were made before the completion of the Q11 CALM trial and will likely need to be updated.17 The AGA UC care pathway recommends testing for CRP but makes no specific recommendations for FC and still recommends the use of lower endoscopy in all patients to determine inflammatory status. The ECCO guidelines for UC note the value of CRP and FC for prognostication of disease risk (colectomy in acute severe UC) and relapse, and they acknowledge the diagnostic value of FC for assessing mucosal inflammatory status and severity. However, no specific recommendation is made on its use to guide therapy and short-term follow-up in UC.26,27

Current State of Biomarkers in Inflammatory Bowel Disease Clinical Trials and Clinical Practice

Fecal Calprotectin in Inflammatory Bowel Disease

The FDA and the EMA both encourage the collection of biomarkers to facilitate the monitoring of FDA-regulated products and their appropriate use in clinical practice; however, both agencies state that these biomarkers may not be used to support labeling claims. The EMA outlines that this rationale is due to the inability of currently available biomarkers to provide standalone evidence of inflammation, and the FDA highlights the fact that no established biomarker can be relied on to establish an exposure-efficacy relationship. FDA guidance has been provided for the development of FC immunologic tests21; however, there are no specific guidelines or consensus statements pertaining to the use of biomarkers in clinical trials. AGA, ACG, and ECCO guidance documents all note a growing recognition of the diagnostic performance of FC for assessing the absence of endoscopic recurrence postoperatively in CD and the need for future studies comparing endoscopy to biomarker-based decision making.22–24 The AGA CD care pathway and ACG CD clinical guideline acknowledge the adjunctive value of FC for monitoring disease activity and response to therapy; however, they specifically note that biomarkers should not be used exclusively to serve as endpoints for treatment and that endoscopy is still required to confirm the absence of mucosal inflammation. The AGA states that consideration could be given to making treatment decisions based on inflammatory markers

The FDA defines FC as an immunologic test system intended for in vitro diagnostic use as an aid in the diagnosis of IBD.21 There is currently no mention of FC as a diagnostic biomarker to quantify the extent or severity of inflammation or for its use as a biomarker of responsiveness, prognostic and/or predictive performance.

Reliability FC has a significant variability across platforms and collection techniques (point of care vs send out, commercial extraction vs manual weighting), leading to low reliability28–33 (Supplementary Table 1). This variability translates to meaningful differences in diagnostic accuracy in some studies but not all.29,30,34,35 Although the literature and the multiple company device inserts suggest that no significant degradation in FC occurs until after 3 days at room temperature,36 a 12% decline in FC levels has been observed within 24 hours after stool collection irrespective of storage temperature.29 FC has been observed to have reasonable intra-stool correlations (intraclass correlation, 0.79; 95% confidence interval, 0.48–0.90)36; however, the inter-stool correlations are quite variable and significantly influenced by factors such as timing of collection within the day, physical attributes and characteristics of stool provided, and time between bowel movements (ie, stool frequency).36–40 The individual coefficient of variation for samples collected

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720

721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780

2019

IOIBD: Biomarkers in IBD

throughout a single day has been reported to be as high as 52%,36,37 with nearly one quarter of this variability being clinically significant and affecting the interpretation of the test.39,41 It has been suggested that the first stool sample of the day in combination with an FC cutoff of 250 mg/g will result in the most consistent measurement of disease activity and most reliable differentiation between MH and mucosal inflammation.39,42

Accuracy

Q12

Q13

Several studies have derived overall pooled sensitivity and specificity estimates for FC from the meta-analysis of several studies using different cutoffs for assessing mucosal disease activity in both UC and CD43,44; however, this has several inherent limitations worth noting and of importance when understanding how to integrate biomarkers as treatment targets. The dichotomization of a continuous variable will result in a loss of overall diagnostic performance, but this dichotomization is of clinical utility when understanding how results should be used to guide treatment decisions. The choice for what value should be set for dichotomization depends on whether the objective is to optimize sensitivity (rule out disease) or specificity (rule in disease). Attempts can be made to identify a threshold that optimizes selection for both (ie, Youden index); however, this will invariably sacrifice performance in both domains for the sake of balance. Therefore, when interpreting accuracy of FC, we must consider cutoffs separately and not as overall pooled estimates and then consider the context within which they are being used. This is best shown in a recent meta-analysis in which the overall pooled sensitivity and specificity for FC (cutoffs, 48–400 mg/g) in IBD were reported to be 0.85 and 0.75, respectively. However, when looking at separate cutoffs individually, the overall sensitivity and specificity were reported to be 0.91 and 0.61, respectively, when using a cutoff of 50 mg/g and 0.88 and 0.67 when using a cutoff of 100 mg/g.45 A separate meta-analysis similarly showed that the overall sensitivity and specificity for FC in IBD were 0.92 and 0.60, respectively using a cutoff of 50 mg/g, 0.84 and 0.66 when using a cutoff of 100 mg/g, and 0.80 and 0.82 when using a cutoff of 250 mg/g.46 These studies have all observed that the diagnostic performance of FC is better in UC than CD; however, they have often combined different CD phenotypes together for analyses. For postoperative endoscopic recurrence detection in CD, the overall sensitivity and specificity of FC were 0.90 and 0.36, respectively, when using a cutoff of 50 mg/g, 0.81 and 0.57 when using a cutoff of 100 mg/g, 0.70 and 0.69 when using a cutoff of 150 mg/g, and 0.55 and 0.71 when using a cutoff of 200 mg/g.47 For small bowel CD confirmed by capsule endoscopy, the overall sensitivity and specificity of FC was 0.83 and 0.53, respectively, when using a cutoff of 50 mg/g and 0.42 and 0.94 using a cutoff of 200 mg/g.48

Composite Biomarker Strategy to Optimize Accuracy of Fecal Calprotectin Few studies have examined the use of composite biomarker strategies to optimize the diagnostic

7

performance of FC in patients with established IBD. These studies have largely focused on improvements in overall accuracy (area under the curve) when combining stool or serum biomarkers with FC.49–53 However, this is of limited value for understanding the clinical utility of adding additional biomarkers to FC for disease activity assessment because it does not allow for an assessment of incremental diagnostic yield with prespecified cutoffs for each successive biomarker added. Post hoc analyses of CALM have recently attempted to address this gap when considering the addition of CRP to FC. When dealing with patients for whom both CRP and FC were evaluated at week 48, 66% (n ¼ 58/ 88) of patients with a CRP level < 5 mg/L and 75% (n ¼ 71/95) of patients with an FC level < 250 mg/g were observed to have MH (Crohn’s Disease Endoscopic Index of Severity < 4 and no deep ulcer). When combining the 2 biomarkers, 79% (n ¼ 55/70) of patients who had a CRP level < 5 mg/L and FC level < 250 mg/g were observed to have MH. Among patients who had a CRP level < 5 mg/L but an FC level  250 mg/L, only 17% (n ¼ 3/18) were observed to have MH, whereas 64% (n ¼ 16/25) of patients with a CRP level  5 mg/L but an FC level < 250 mg/L were observed to have MH. This shows that a composite biomarker strategy requiring both CRP and FC values to be below or above prespecified cutoffs improves the overall sensitivity and specificity for MH,54 and it is FC that adds the greatest incremental yield in diagnostic performance, leading to the majority of treatment escalation within this composite strategy.55

Responsiveness There is an overall paucity of literature surrounding the responsiveness of FC in IBD, particularly in CD. Post hoc analyses of UC clinical trials have observed that changes in FC correlate with meaningful changes in health status at a group level (ie, group average). However, at an individual level, the diagnostic accuracy was only fair to good for classifying clinical and endoscopic outcomes in clinical trials.56–59 Blinding of investigators to the reference standard significantly affected the heterogeneity of accuracy for biomarkers and, thus, should also be considered when evaluating the diagnostic accuracy and/or responsiveness of FC.43

Prognostic and/or Predictive Capacity FC measurements have been proposed for prognosticating and monitoring the risk of relapse,60,61 predicting disease course and progression,62,63 predicting risk of treatment failure or need for therapy escalation,64,65 and predicting future achievement of disease remission.66,67 A recent systematic review pooled studies following patients with IBD in clinical remission who had at least 2 serial FC measurements and quantified the ability of serial FC to prognosticate the risk for future disease relapse.68 Although the cutoffs varied across studies, the review observed that asymptomatic patients with IBD who had repeatedly elevated FC measurements were 53%–83% more likely to have clinical disease relapse in the 2–3 months after testing.

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

CLINICAL AT

-

781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840

841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 CLINICAL AT

8

Table 2.Components of Biomarker Qualification for Use in Clinical Trials and Clinical Practice

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

Areas in need of clarification

Evidence concerns for FC

Technical reliability

Minimally acceptable variability between operators, laboratories, lots, samples, extraction techniques, and storage conditions, with specific focus on the clinical impact and/or impact on trial result outcomes.

Clinical reliability

Minimally acceptable variability between samples for an individual, different individuals, collection techniques, and disease-specific subpopulations with characteristics known to (or considered to be biologically plausible reasons to) affect reliability of measurements (ie, short gut in CD and bowel transit time on stool frequency). Thresholds of acceptability in sensitivity/specificity for patients, providers, payors, and other key stakeholders to allow for integration of biomarkers. Clinical situation–specific assessments of accuracy to guide study design for the assessment of operating properties (ie, cutoff for identifying MH to serve as treatment target and cutoff for identifying inflammation to guide therapy adjustments). Standardization of reference criterion standard for disease activity measurement. Degree of correlation needed to reference changes in health status, standardization of reference criterion standard for changes in health status.

Assessed, but significant variability across platforms and collection techniques observed. Minimally acceptable variability with focus on clinical impact not known. Requires further investigation regarding the impact of variability on clinical decision making to identify acceptable variability. Found to have significant variability throughout the day, with substantial impact on clinical interpretation. Requires optimization through an integrated approach combining clinical characteristics affecting reliability with variation in sampling techniques.

Diagnostic accuracy

Responsiveness

Prognostic

Ranking of outcomes by importance for prognostication, and strength of association needed. Interchangeability of prognostic performance for a possible surrogate endpoint (ie, MH).

Predictive

Ranking outcomes by importance for prediction, and strength of association needed. Interchangeability of prognostic performance for reasonably likely surrogate endpoint (ie, predicting clinical response interchangeable with predicting ultimate achievement of MH).

Dulai et al

Operating property

Cut offs identified to maximize sensitivity and specificity; early validation when using these cutoffs as interim monitoring tools. No formal assessment of FC as a replacement for endoscopic assessments, and assessments required using more stringent references (blinded evaluations of endoscopy/histology). Consistency in accuracy across drug classes required.

Early assessment in post hoc analyses of UC trials, limited assessment of responsiveness in CD. Unclear if effect size is comparable to reference criterion standard and how variability in measurements affects responsiveness. Prognostic performance for relapse established, but no assessments of prognostic performance for achieving MH, transmural healing, and/or risk of hospitalization or surgery (agnostic of treatment/intervention). Early assessment for treatment-/intervention-specific predictive performance, and integration with other available biomarkers; further validation required with specific cutoffs.

Gastroenterology Vol. -,

No. -

901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960

961 962 Q14 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020

2019

IOIBD: Biomarkers in IBD

Conversely, patients with repeated normal FC values were 67%–94% more likely to remain in clinical remission in the 2–3 months after testing. Similarly, the POCER trial observed that in postoperative patients with CD who were in endoscopic remission at 6 months, an FC value of >50 mg/g had 0.68 specificity for the development of future endoscopic recurrence at 18 months.59 These data help support the potential prognostic value of FC when used serially as a monitoring biomarker, and the integration of FC with colonoscopy for disease monitoring appears to be cost effective in the management of IBD.69 Across all studies taken together, an FC cutoff of 50 mg/g or 100 mg/g optimized sensitivity (ability to rule out active inflammation), and a cutoff of 250 mg/g optimized specificity (ability to rule in active inflammation). Consideration may therefore be given to 1 cutoff serving as a treatment endpoint (ie, confirmation that disease remission has been achieved) and another cutoff serving as treatment monitoring (ie, confirmation that inflammation is still present and treatment adjustments should be made).17,70,71 However, FC cutoffs will likely evolve in parallel with evolving therapeutic goals (ie, histologic healing or modified definitions for endoscopic and radiographic healing), and further assessments will be required for the performance of FC in assessing transmural healing in CD72 and across different drug classes with more focused and immunespecific mechanisms of action. Furthermore, none of these studies assessed the replacement of endoscopy with FCbased monitoring algorithms as a treatment target, and all of these approaches have yet to be fully studied in UC.

Needs Assessment for Biomarker Qualification in Inflammatory Bowel Disease and Future Strategies One of the greatest limitations for biomarker qualification is the lack of clear guidance on biomarker development and validation by regulatory agencies. Although definitions and frameworks/examples have been provided, no clear guidance is available on ideal study designs, a clinically acceptable coefficient of variation, or other metrics used to gain approval. In Table 2, we have summarized operating properties that require assessment for the development of a surrogate biomarker for use in clinical trials and clinical practice and areas in need of clarification from both regulatory agencies and disease-specific societies. FC remains to be fully validated as a surrogate biomarker in IBD. A formal assessment of its ability to replace endoscopy in a treat-totarget monitoring and endpoint algorithm is required to understand if biomarkers can serve as endpoints as opposed to adjunctive monitoring tools.

Supplementary Material Note: To access the supplementary material accompanying this article, visit the online version of Gastroenterology at www.gastrojournal.org, and at https://doi.org/10.1053/ j.gastro.2019.06.018.

9

References 1. Buisson A, Gonzalez F, Poullenot F, et al. Comparative acceptability and perceived clinical utility of monitoring tools: a nationwide survey of patients with inflammatory bowel disease. Inflamm Bowel Dis 2017;23:1425– 1433. 2. Dulai PS, Sirlin CB, Loomba R. MRI and MRE for noninvasive quantitative assessment of hepatic steatosis and fibrosis in NAFLD and NASH: Clinical trials to clinical practice. J Hepatol 2016;65:1006–1016. 3. Peyrin-Biroulet L, Sandborn W, Sands BE, et al. Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE): determining therapeutic goals for treat-totarget. Am J Gastroenterol 2015;110:1324–1338. 4. FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) Resource. US Food and Drug Administration Web site, https://www.ncbi.nlm.nih. gov/books/NBK326791/. 5. Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. 6. Siegel CA, Horton H, Siegel LS, et al. A validated Web-based tool to display individualised Crohn’s disease predicted outcomes based on clinical, serologic and genetic variables. Aliment Pharmacol Ther 2016; 43:262–271. 7. Guizzetti L, Zou G, Khanna R, et al. Development of clinical prediction models for surgery and complications in Crohn’s disease. J Crohns Colitis 2018;12:167–177. 8. Dulai PS, Boland BS, Singh S, et al. Development and validation of a scoring system to predict outcomes of vedolizumab treatment in patients with Crohn’s disease. Gastroenterology 2018;155:687–695. 9. Dulai PS, Levesque BG, Feagan BG, et al. Assessment of mucosal healing in inflammatory bowel disease: review. Gastrointest Endosc 2015;82:246–255. 10. Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med 1996; 125:605–613. 11. Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med 1989; 8:431–440. 12. Buyse M, Molenberghs G, Paoletti X, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2016;58:104–132. 13. Landi B, Anh TN, Cortot A, et al. Endoscopic monitoring of Crohn’s disease treatment: a prospective, randomized clinical trial. The Groupe d’Etudes Therapeutiques des Affections Inflammatoires Digestives. Gastroenterology 1992;102:1647–1653. 14. Middel B, van Sonderen E. Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. Int J Integr Care 2002;2:e15. 15. Wagner JA. Overview of biomarkers and surrogate endpoints in drug development. Dis Markers 2002; 18:41–46.

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

CLINICAL AT

-

Q15

1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080

10

CLINICAL AT

1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103Q16 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140

Dulai et al

Gastroenterology Vol.

16. Lee JW, Devanarayan V, Barrett YC, et al. Fit-for-purpose method development and validation for successful biomarker measurement. Pharm Res 2006;23:312–328. 17. Colombel JF, Panaccione R, Bossuyt P, et al. Effect of tight control management on Crohn’s disease (CALM): a multicentre, randomised, controlled phase 3 trial. Lancet 2018;390(10114):2779–2789. 18. Schwartz DA, Wiersema MJ, Dudiak KM, et al. A comparison of endoscopic ultrasound, magnetic resonance imaging, and exam under anesthesia for evaluation of Crohn’s perianal fistulas. Gastroenterology 2001;121:1064–1072. 19. Kraus VB, Burnett B, Coindreau J, et al. Application of biomarkers in the development of drugs intended for the treatment of osteoarthritis. Osteoarthritis Cartilage 2011; 19:515–542. 20. Woodcock J, LaVange LM. Master protocols to study multiple therapies, multiple diseases, or both. N Engl J Med 2017;377:62–70. 21. Class II special controls guidance document: fecal calprotectin immunological test systems—guidance for industry and FDA staff. US Food and Drug Administration Web site, https://www.fda.gov/RegulatoryInformation/ Guidances/ucm079129.htm. 22. Regueiro M, Velayos F, Greer JB, et al. American Gastroenterological Association Institute technical review on the management of Crohn’s disease after surgical resection. Gastroenterology 2017;152:277–295. 23. Gionchetti P, Dignass A, Danese S, et al. Third European evidence-based consensus on the diagnosis and management of Crohn’s disease 2016: part 2: surgical management and special situations. J Crohns Colitis 2017;11:135–149. 24. Lichtenstein GR, Loftus EV, Isaacs KL, et al. ACG Clinical Guideline: management of Crohn’s disease in adults. Am J Gastroenterol 2018;113:481–517. 25. Gomollon F, Dignass A, Annese V, et al. Third European evidence-based consensus on the diagnosis and management of Crohn’s disease 2016: part 1: diagnosis and medical management. J Crohns Colitis 2017;11:3–25. 26. Magro F, Gionchetti P, Eliakim R, et al. Third European evidence-based consensus on diagnosis and management of ulcerative colitis. part 1: definitions, diagnosis, extra-intestinal manifestations, pregnancy, cancer surveillance, surgery, and ileo-anal pouch disorders. J Crohns Colitis 2017;11:649–670. 27. Harbord M, Eliakim R, Bettenworth D, et al. Third European evidence-based consensus on diagnosis and management of ulcerative colitis. part 2: current management. J Crohns Colitis 2017;11:769–784. 28. Whitehead SJ, French J, Brookes MJ, et al. Betweenassay variability of faecal calprotectin enzyme-linked immunosorbent assay kits. Ann Clin Biochem 2013; 50(Pt 1):53–61. 29. Padoan A, D’Inca R, Scapellato ML, et al. Improving IBD diagnosis and monitoring by understanding preanalytical, analytical and biological fecal calprotectin variability. Clin Chem Lab Med 2018;56:11926–11935. 30. Oyaert M, Boel A, Jacobs J, et al. Analytical performance and diagnostic accuracy of six different faecal

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

-,

No.

-

calprotectin assays in inflammatory bowel disease. Clin Chem Lab Med 2017;55:1564–1573. Rodriguez A, Yokomizo L, Christofferson M, et al. Correlation of rapid point-of-care vs send-out fecal calprotectin monitoring in pediatric inflammatory bowel disease. World J Gastrointest Pharmacol Ther 2017; 8:127–130. Hejl J, Theede K, Mollgren B, et al. Point of care testing of fecal calprotectin as a substitute for routine laboratory analysis. Pract Lab Med 2018;10:10–14. Heida A, Knol M, Kobold AM, et al. Agreement between home-based measurement of stool calprotectin and ELISA results for monitoring inflammatory bowel disease activity. Clin Gastroenterol Hepatol 2017;15:1742–1749. Whitehead SJ, Ford C, Gama RM, et al. Effect of faecal calprotectin assay variability on the management of inflammatory bowel disease and potential role of faecal S100A12. J Clin Pathol 2017;70:1049–1056. Jang HW, Kim HS, Park SJ, et al. Accuracy of three different fecal calprotectin tests in the diagnosis of inflammatory bowel disease. Intest Res 2016;14:305–313. Lasson A, Stotzer PO, Ohman L, et al. The intraindividual variability of faecal calprotectin: a prospective study in patients with active ulcerative colitis. J Crohns Colitis 2015;9:26–32. Calafat M, Cabre E, Manosa M, et al. High within-day variability of fecal calprotectin levels in patients with active ulcerative colitis: what is the best timing for stool sampling? Inflamm Bowel Dis 2015;21:1072–1076. Naismith GD, Smith LA, Barry SJ, et al. A prospective single-centre evaluation of the intra-individual variability of faecal calprotectin in quiescent Crohn’s disease. Aliment Pharmacol Ther 2013;37:613–621. Du L, Foshaug R, Huang VW, et al. Within-stool and withinday sample variability of fecal calprotectin in patients with inflammatory bowel disease: a prospective observational study. J Clin Gastroenterol 2018;52:235–240. Kittanakom S, Shajib MS, Garvie K, et al. Comparison of fecal calprotectin methods for predicting relapse of pediatric inflammatory bowel disease. Can J Gastroenterol Hepatol 2017;2017:1450970. Toyonaga T, Kobayashi T, Nakano M, et al. Usefulness of fecal calprotectin for the early prediction of short-term outcomes of remission-induction treatments in ulcerative colitis in comparison with two-item patient-reported outcome. PLoS One 2017;12(9):e0185131. Kristensen V, Malmstrom GH, Skar V, et al. Clinical importance of faecal calprotectin variability in inflammatory bowel disease: intra-individual variability and standardisation of sampling procedure. Scand J Gastroenterol 2016;51:548–555. Mosli MH, Zou G, Garg SK, et al. C-reactive protein, fecal calprotectin, and stool lactoferrin for detection of endoscopic activity in symptomatic inflammatory bowel disease patients: a systematic review and meta-analysis. Am J Gastroenterol 2015;110:802–819. Qiu Y, Mao R, Chen BL, et al. Fecal calprotectin for evaluating postoperative recurrence of Crohn’s disease: a meta-analysis of prospective studies. Inflamm Bowel Dis 2015;21:315–322.

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200

1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233Q17 1234 1235 1236 1237 Q18 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260

2019

IOIBD: Biomarkers in IBD

45. Rokkas T, Portincasa P, Koutroubakis IE. Fecal calprotectin in assessing inflammatory bowel disease endoscopic activity: a diagnostic accuracy meta-analysis. J Gastrointestin Liver Disease 2018;27(3):299–306. 46. Lin JF, Chen JM, Zuo JH, et al. Meta-analysis: fecal calprotectin for assessment of inflammatory bowel disease activity. Inflamm Bowel Dis 2014;20:1407–1415. 47. Tham YS, Yung DE, Fay S, et al. Fecal calprotectin for detection of postoperative endoscopic recurrence in Crohn’s disease: systematic review and meta-analysis. Therap Adv Gastroenterol 2018;11: 1756284818785571. 48. Kopylov U, Yung DE, Engel T, Avni T, et al. Fecal calprotectin for the prediction of small-bowel Crohn’s disease by capsule endoscopy: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol 2016; 28:1137–1144. 49. Kim DJ, Jeoun YM, Lee DW, et al. Usefulness of fecal immunochemical test and fecal calprotectin for detection of active ulcerative colitis. Intest Res 2018;16:563–570. 50. Langhorst J, Elsenbruch S, Koelzer J, et al. Noninvasive markers in the assessment of intestinal inflammation in inflammatory bowel diseases: performance of fecal lactoferrin, calprotectin, and PMN-elastase, CRP, and clinical indices. Am J Gastroenterol 2008;103:162–169. 51. af Bjorkesten CG, Nieminen U, Turunen U, et al. Surrogate markers and clinical indices, alone or combined, as indicators for endoscopic remission in anti-TNF-treated luminal Crohn’s disease. Scand J Gastroenterol 2012; 47:528–537. 52. Cozijnsen MA, Ben Shoham A, Kang B, et al. Development and validation of the mucosal inflammation non-invasive index for pediatric Crohn’s disease. Clin Gastroenterol Hepatol. In press. https://doi.org/10.1016/ j.cgh.2019.04.012. 53. Cerrillo E, Moret I, Iborra M, et al. A nomogram combining fecal calprotectin levels and plasma cytokine profiles for individual prediction of postoperative Crohn’s disease recurrence. Inflamm Bowel Dis. In press. https:// doi.org/10.1093/ibd/izz053. 54. Reinisch W, Panaccione R, Bossuyt P, et al. OP015 biomarker correlation with endoscopic outcomes in patients with Crohn’s disease: data from CALM. J Crohns Colitis 2018;12(Suppl 1):S011. 55. Reinisch W, Panaccione R, Bossuyt P, et al. P274 factors driving treatment escalation in Crohn’s disease in the CALM trial. J Crohns Colitis 2018;12(Suppl 1):S239. 56. Sandborn WJ, Panes J, Zhang H, et al. Correlation between concentrations of fecal calprotectin and outcomes of patients with ulcerative colitis in a phase 2 trial. Gastroenterology 2016;150:96–102. 57. Rosario M, French JL, Dirks NL, et al. Exposure-efficacy relationships for vedolizumab induction therapy in patients with ulcerative colitis or Crohn’s disease. J Crohns Colitis 2017;11:921–929. 58. Parikh A, Leach T, Wyant T, et al. Vedolizumab for the treatment of active ulcerative colitis: a randomized controlled phase 2 dose-ranging study. Inflamm Bowel Dis 2012;18:1470–1479. 59. Wright EK, Kamm MA, De Cruz P, et al. Measurement of fecal calprotectin improves monitoring and detection of

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

11

recurrence of Crohn’s disease after surgery. Gastroenterology 2015;148:938–947. Zhulina Y, Cao Y, Amcoff K, et al. The prognostic significance of faecal calprotectin in patients with inactive inflammatory bowel disease. Aliment Pharmacol Ther 2016;44:495–504. De Vos M, Louis EJ, Jahnsen J, et al. Consecutive fecal calprotectin measurements to predict relapse in patients with ulcerative colitis receiving infliximab maintenance therapy. Inflamm Bowel Dis 2013;19:2111–2117. Lasson A, Simren M, Stotzer PO, et al. Fecal calprotectin levels predict the clinical course in patients with new onset of ulcerative colitis. Inflamm Bowel Dis 2013; 19:576–581. Kennedy NA, Jones GR, Plevris N, et al. Association between level of fecal calprotectin and progression of Crohn’s disease. Clinical Gastroenterol Hepatol. In press. https://doi.org/10.1016/j.cgh.2019.02.017. Xie T, Zhao C, Ding C, et al. Fecal calprotectin as an alternative to ulcerative colitis endoscopic index of severity to predict the response to corticosteroids of acute severe ulcerative colitis: a prospective observational study. Dig Liver Dis 2017;49:984–990. Kwapisz L, Gregor J, Chande N, et al. The utility of fecal calprotectin in predicting the need for escalation of therapy in inflammatory bowel disease. Scand J Gastroenterol 2017;52:846–850. Boschetti G, Garnero P, Moussata D, et al. Accuracies of serum and fecal S100 proteins (calprotectin and calgranulin C) to predict the response to TNF antagonists in patients with Crohn’s disease. Inflamm Bowel Dis 2015; 21:331–336. De Vos M, Dewit O, D’Haens G, et al. Fast and sharp decrease in calprotectin predicts remission by infliximab in anti-TNF naive patients with ulcerative colitis. J Crohns Colitis 2012;6:557–562. Heida A, Park KT, van Rheenen PF. Clinical utility of fecal calprotectin monitoring in asymptomatic patients with inflammatory bowel disease: a systematic review and practical guide. Inflamm Bowel Dis 2017; 23:894–902. Motaganahalli S, Beswick L, Con D, van Langenberg DR. Faecal calprotectin delivers on convenience, cost reduction and clinical decision making in inflammatory bowel disease: a real world cohort study. Intern Med J 2019;49:94–100. Turvill J, Rook L, Rawle M, et al. Validation of a care pathway for the use of faecal calprotectin in monitoring patients with Crohn’s disease. Frontline Gastroenterol 2017;8:183–188. Wright EK, Kamm MA, De Cruz P, et al. Measurement of fecal calprotectin improves monitoring and detection of recurrence of Crohn’s disease after surgery. Gastroenterology 2015;148:938–947. Weinstein-Nakar I, Focht G, Church P, et al. Associations among mucosal and transmural healing and fecal level of calprotectin in children with Crohn’s Disease. Clin Gastroenterol Hepatol 2018;16:1089–1097. Fraser CG. Reference change values. Clin Chem Lab Med 2011;50:807–812.

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

Q25

CLINICAL AT

-

Q19

Q20

1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320

12

CLINICAL AT

1321 1322 1323 1324 Q4 1325 1326 1327Q26 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350Q5 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380

Dulai et al

Gastroenterology Vol.

Received February 28, 2019. Accepted June 13, 2019. Reprint requests Address requests for reprints to: Parambir S. Dulai, MD, University of California at San Diego, Division of Gastroenterology, 9500 Gilman Drive, La Jolla, Califorinia 92093. e-mail: [email protected]. Acknowledgement Author contributions: Parambir S. Dulai: guarantor of article, review of literature, drafting of the manuscript. All authors: critical review and editing of the manuscript, final approval of the manuscript. Conflicts of interest These authors disclose the following: Parambir S. Dulai has received research support from AbbVie, Pfizer, Janssen, Takeda, Bulhmman, Alpco, and Polymedco; consulting or honoraria from Janssen and Takeda; and travel support from Janssen and Takeda. Laurent Peyrin-Biroulet has received honoraria from Merck, AbbVie, Janssen, Genentech, Ferring, Tillots, Vifor, Pharmacosmos, Celltrion, Takeda, Biogaran, Boerhinger Ingelheim, Lilly, Pfizer, Index Pharmaceuticals, Amgen, Sandoz, Celgene, Biogen, Samsung Bioepis, Alma, Sterna, Nestlé, Enterome, Mylan, and HAC-Pharma. Silvio Danese reports consultancy fees from AbbVie, Allergan, Amgen, AstraZeneca, Biogen, Boehringer Ingelheim, Celgene, Celltrion, Ferring Pharmaceuticals, Gilead, Hospira, Janssen, Johnson & Johnson, MSD, Mundipharma, Pfizer, Roche, Sandoz, Takeda, TiGenix, UCB, and Vifor. Bruce E. Sands has received consulting fees and research grants from AbbVie, Pfizer, Amgen, Bristol-Myers Squibb, Celgene, Janssen, and Takeda and has received consulting fees from 4D Pharma, Boehringer Ingelheim, Arena Pharmaceuticals, Forward Pharma, Gilead, Immune Pharmaceuticals, Lilly, Otsuka, Synergy Pharmaceuticals, Theravance, Receptos, TiGenix, TopiVert Pharma, MedImmune, Allergan, UCB Pharma, EnGene, Target PharmaSolutions, Lycera, Lyndra, Vivelix Pharmaceuticals, and Oppilan Pharma. Axel Dignass served as a consultant for AbbVie, Celgene, MSD, Roche, Sandoz/Hexal, Pfizer, Takeda, Tillots, Janssen, and Vifor; received payment for development of educational presentations from Ferring and Tillots; received speaker fees from Falk Foundation, AbbVie, Ferring, Pfizer, Janssen, MSD, Vifor, Celgene, Tillots and Takeda; received manuscript fees from Thieme and Wiley; and received research grants from Institut für Gemeinwohl, Deutsche Morbus Crohn und Colitis ulcerosa Vereinigung and Stiftung Leben mit Krebs. Dan Turner received consultation fees, research grants, royalties, or honoraria from Janssen, Pfizer, Hospital for Sick Children, Ferring, AstraZeneca, AbbVie, Takeda, Boehringer Ingelheim, Biogen, Atlantic Health, Shire, Celgene, and Lilly. Gerassimos Mantzaris is an advisory board member for AbbVie, Astellas, Celgene, Danone, Ferirng, Genesis, Hospira, Janssen, Millennium Pharmaceuticals, MSD, Otsuka, Pharmacosmos, Pfizer, Sandoz, Takeda, and UCB; is a speaker for AbbVie, Angelini, Astellas, Danone, Falk Pharma, Ferring, Galenica, Hospira, Janssen, MSD, Omega Pharma, and Takeda; is a consultant for MSD and Takeda; and provides research support for AbbVie, Galenica, Genesis, Menarini Group, MSD, and Pharmathen. Walter Reinisch has served as a speaker for

-,

No.

-

Abbott Laboratories, AbbVie, Aesca, Aptalis, Astellas, Centocor, Celltrion, Danone Austria, Elan, Falk Pharma, Ferring, Immundiagnostik, Mitsubishi Tanabe Pharma Corporation, MSD, Otsuka, PDL, Pharmacosmos, PLS Education, Schering-Plough, Shire, Takeda, Therakos, Vifor, and Yakult; as a consultant for Abbott Laboratories, AbbVie, Aesca, Amgen, AM Pharma, AOP Orphan, Arena Pharmaceuticals, Astellas, AstraZeneca, Avaxia, Roland Berger, Bioclinica, Biogen IDEC, Boehringer Ingelheim, Bristol-Myers Squibb, Cellerix, Chemocentryx, Celgene, Centocor, Celltrion, Covance, Danone Austria, Elan, Eli Lilly, Ernest & Young, Falk Pharma, Ferring, Galapagos, Genentech, Gilead, Grünenthal, ICON, Index Pharma, Inova, Janssen, Johnson & Johnson, Kyowa Hakko Kirin Pharma, Lipid Therapeutics, LivaNova, Mallinckrodt, Medahead, MedImmune, Millenium, Mitsubishi Tanabe Pharma Corporation, MSD, Nash Pharmaceuticals, Nestle, Nippon Kayaku, Novartis, Ocera, Otsuka, Parexel, PDL, Periconsulting, Pharmacosmos, Philip Morris Institute, Pfizer, Procter & Gamble, Prometheus, Protagonist, Provention, Robarts Clinical Trial, Sandoz, Schering-Plough, Second Genome, Seres Therapeutics, Setpointmedical, Sigmoid, Takeda, Therakos, Tigenix, UCB, Vifor, Zealand, Zyngenia, and 4SC; as an advisory board member for Abbott Laboratories, AbbVie, Aesca, Amgen, AM Pharma, Astellas, AstraZeneca, Avaxia, Biogen IDEC, Boehringer Ingelheim, Bristol-Myers Squibb, Cellerix, Chemocentryx, Celgene, Centocor, Celltrion, Danone Austria, Elan, Ferring, Galapagos, Genentech, Grünenthal, Inova, Janssen, Johnson & Johnson, Kyowa Hakko Kirin Pharma, Lipid Therapeutics, MedImmune, Millenium, Mitsubishi Tanabe Pharma Corporation, MSD, Nestle, Novartis, Ocera, Otsuka, PDL, Pharmacosmos, Pfizer, Procter & Gamble, Prometheus, Sandoz, Schering-Plough, Second Genome, SetPoint Medical, Takeda, Therakos, Tigenix, UCB, Zealand, Zyngenia, and 4SC; and has received research funding from Abbott Laboratories, AbbVie, Aesca, Centocor, Falk Pharma GmbH, Immundiagnsotik, and MSD. William J. Sandborn reports research grants from Atlantic Healthcare Limited, Amgen, Genentech, Gilead Sciences, AbbVie, Janssen, Takeda, Lilly, and Celgene/Receptos; consulting fees from AbbVie, Allergan, Amgen, Arena Pharmaceuticals, Avexegen Therapeutics, BeiGene, Boehringer Ingelheim, Celgene, Celltrion, Conatus, Cosmo, Escalier Biosciences, Ferring, Forbion, Genentech, Gilead Sciences, Gossamer Bio, Incyte, Janssen, Kyowa Kirin Pharmaceutical Research, Landos Biopharma, Lilly, Oppilan Pharma, Otsuka, Pfizer, Precision IBD, Progenity, Prometheus Q6 Laboratories, Reistone, Ritter Pharmaceuticals, Robarts Clinical Trials (owned by Health Academic Research Trust, HART), Series Therapeutics, Shire, Sienna Biopharmaceuticals, Sigmoid Biotechnologies, Sterna Biologicals, Sublimity Therapeutics, Takeda, Theravance Biopharma, Tigenix, Tillotts Pharma, UCB Pharma, Ventyx Biosciences, Vimalan Biosciences, and Vivelix Pharmaceuticals; and stock or stock options from BeiGene, Escalier Biosciences, Gossamer Bio, Oppilan Pharma, Precision IBD, Progenity, Ritter Pharmaceuticals, Ventyx Biosciences, Vimalan Biosciences; his spouse has relationships with Opthotech (consultant, stock options), Progenity (consultant, stock), Oppilan Pharma (employee, stock options), Escalier Biosciences (employee, stock options), Precision IBD (employee, stock options), Ventyx Biosciences (employee, stock options), and Vimalan Biosciences (employee, stock options). The remaining authors disclose no conflicts.

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440

1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 -

Supplementary Table 1.Technical Considerations and Reliability of Fecal Calprotectin Key findings and observations

Whitehead et al28

Bülhmann (Amherst, NH) assay reported 3.8 times higher FC concentrations than Immundiagnstik (Bensheim, Germany) and Eurospital (Trieste, Italy). Commercial extraction devices led to a 7.8%–28.1% underrecovery of FC compared with manual weighting. A 12% decline in FC levels was seen within 24 hours of stool collection, irrespective of storage temperature. Samples were progressively more unstable with longer storage times at room temperature. Within- and between-subject FC variability resulted in a reference change value of nearly 100%.a Low-quality agreement between assays necessitated different numerical values between assays for clinical cutoffs. Correlation between POC FC testing kit and ELISA was better at clinically meaningful lower ranges (250 mg/g). There was strong correlation between POC FC testing kit and laboratory assay (correlation coefficient, 0.89); however, coefficient of variation was only 2.3%–5.5% for laboratory assay and 4.8%–26.6% for POC FC. POC FC was in agreement with alternative laboratory-based testing, but only at lower ranges (< 500 mg/g); however, the predetermined limits of agreement were wide, at ±100 mg/g. Interkit/interassay variability in absolutely values was observed; however, the diagnostic sensitivity of FC was not affected by the interkit/interassay variability in absolute values. There was variability across assays in diagnostic accuracy and correlation with disease activity measures. A 52% median individual coefficient of variation was found for samples collected during the same day, and FC values were significantly influenced by stool frequency. A median coefficient of variation of 40% (5%–114%) was found, with a median range of variation for FC of 3887 (69–9946 mg/kg). The first morning sample appeared to be most consistent in obtaining the highest and lowest within-day FC values. Low variability in FC was found across samples collected during different days (ICC, 0.84; 95% CI, 0.79–0.89) in clinical remission. An 8%–23% intra-stool variability was found in FC, which was influenced by the timing of collection or the cutoff used to define a positive FC result. A 13%–26% inter-stool variability was found throughout the day depending on the FC cutoff used to define a positive result. There was substantial variability in agreement across platforms/assays. FC reductions were not observed until week 2 of therapy, whereas symptoms decreased by day 3, and a decrease in symptoms more accurately predicted ultimate MH. Within-day variabilities in FC were very wide within the first 2 weeks of therapy and may have caused this inability of FC to respond to changes in health status. Two morning samples with FC cutoff > 250 mg/g provided modest reliability.

Padoan et al29

Heida et al33 Whitehead et al34 Jang et al35 Lasson et al36 Calafat et al37 Naismith et al38 Du et al39 Kittanakom et al40 Toyonaga et al41

Kristensen et al42

Q23 Q22

Q24

CD, Crohn’s disease; CI, confidence interval; ELISA, enzyme-linked immunosorbent assay; FC, fecal calprotectin; ICC, intraclass correlation; MH, mucosal healing; POC, point of care. a Reference change values provide objective tools for assessment of the significance of differences in serial results from an individual.73

IOIBD: Biomarkers in IBD

FLA 5.6.0 DTD  YGAST62721_proof  23 August 2019  5:43 pm  ce

Oyaert M. et al30 Rodriguez et al31 Hejl et al32

2019

Reference

12.e1

1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560