Graphic report of the results from propensity score method analyses

Graphic report of the results from propensity score method analyses

Accepted Manuscript Graphic report of the results from propensity score method analyses Ian Shrier, MD, PhD, Menglan Pang, MSc, Robert W. Platt, PhD P...

857KB Sizes 57 Downloads 85 Views

Accepted Manuscript Graphic report of the results from propensity score method analyses Ian Shrier, MD, PhD, Menglan Pang, MSc, Robert W. Platt, PhD PII:

S0895-4356(16)30832-0

DOI:

10.1016/j.jclinepi.2017.06.003

Reference:

JCE 9417

To appear in:

Journal of Clinical Epidemiology

Received Date: 17 December 2016 Revised Date:

22 May 2017

Accepted Date: 3 June 2017

Please cite this article as: Shrier I, Pang M, Platt RW, Graphic report of the results from propensity score method analyses, Journal of Clinical Epidemiology (2017), doi: 10.1016/j.jclinepi.2017.06.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

analyses Ian Shrier MD, PhD; Menglan Pang MSc, Robert W. Platt PhD

SC

Affiliations

RI PT

Graphic report of the results from propensity score method

IS: Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, Canada.

Address Correspondence To:

M AN U

MP, RWP: Department of Epidemiology, Bioastatistics and Occupational Health, McGill University, Montreal Canada.

TE D

Ian Shrier MD, PhD Centre for Clinical Epidemiology Lady Davis Institute for Medical Research, Jewish General Hospital 3755 Cote Ste-Catherine Road Montreal, QC H3T 1E2 Tel: 514-340-8222 ext 4244 Fax: 514-340-7564 [email protected]

EP

Word Count: 2629 words

AC C

Abstract Word Count: 180 (max 200 words)

1

ACCEPTED MANUSCRIPT

Abstract Objective: To increase transparency in papers reporting propensity scores by using graphical methods that clearly illustrate 1) the number of participant exclusions that occur as a consequence of the

RI PT

analytic strategy, and 2) whether treatment effects are constant or heterogeneous across propensity scores. Study design and setting: We applied graphical methods to a real-world pharmacoepidemiologic study that evaluated the effect of initiating statin medication on the 1-year all-cause mortality post-

SC

myocardial infarction. We propose graphical methods to show the consequences of trimming and

matching on the exclusion of participants from the analysis. We also propose the use of meta-analytical

M AN U

forest plots to show the magnitude of effect heterogeneity.

Results: A density plot with vertical lines demonstrated the proportion of subjects excluded due to trimming. A frequency plot with horizontal lines demonstrated the proportion of subjects excluded due to matching. An augmented forest plot illustrates the amount of effect heterogeneity present in the data. Conclusion: Our proposed techniques present additional and useful information that helps

heterogeneity is present.

TE D

readers understand the sample that is analyzed with propensity score methods, and whether effect

AC C

EP

Keywords: Propensity score, trimming, effect heterogeneity, meta-analysis

2

ACCEPTED MANUSCRIPT

Introduction Observational studies usually encounter confounding when estimating the effect of treatment.[1] One method used to control for confounding is the propensity score.[2] The propensity score is defined as

RI PT

the probability of receiving the treatment conditional on covariates.[3] It is commonly estimated using logistic regression, and is considered as a summary score for the included covariates. Subjects with

identical propensity scores have, on average, the same prognosis and can be treated as exchangeable, if

SC

the key assumptions hold for positivity, consistency, no unmeasured confounding and correct model specification.[4]

M AN U

There are several ways in which the propensity score can be used to estimate causal effects including stratification, matching, regression adjustment, and inverse probability weighting (IPW).[5-7] Using propensity score methods is only appropriate if the probability of receiving any level of treatment (conditional on the covariates) is greater than zero for each participant in the analysis [4, 8]. Practically, one way this can be verified is if there is no subject with a propensity score extremely close to 0 or 1, and

TE D

if the propensity score distributions of the two treatment groups overlap throughout their full range. Sometimes the full data set includes some participants with very low (or high) propensity scores, or for which there is no participant in the other treatment group with the same propensity score. In these

EP

contexts, investigators will use one of two common approaches to restrict the population analyzed so that the assumption is true on the analyzed population. First, investigators may “trim” (exclude) those

AC C

participants who have extreme propensity scores from the study population, as recommended by Stürmer et al.[9, 10] Trimming can be performed based on the regions of non-overlap of the estimated propensity score, percentiles of the estimated propensity score, or pre-specified extreme values. Although trimming may ensure overlap of propensity scores, the distribution of propensity scores between the two treatment groups will generally be very different and adjustment as described above is still required. Second, investigators may match participants in the treatment group to one or more participants in the untreated group with the same propensity score (perhaps many-to-many). Matching on the exact propensity score is

3

ACCEPTED MANUSCRIPT

usually not feasible, and nearest neighbor matching with a certain caliper is recommended.[11, 12] With a relatively narrow caliper, participants that have no comparator with respect to the propensity score will be eliminated from the population.

RI PT

Both trimming and matching result in exclusions from the sample. One recommendation within the STROBE statement is to enhance transparency of observational studies through a participant flow diagram [13]. This diagram should illustrate how many participants were originally approached, how

SC

many had complete follow-up, and reasons for excluding participants. [13] Although it is possible to include a line in the participant flow diagram indicating the additional exclusions due to trimming and/or

M AN U

matching, the actual exclusions due to matching occur at specific propensity scores that cannot be easily conveyed with text. Further, these exclusions may change the population being analyzed considerably, such that both trimming and matching may change the parameters of interest compared to the original sample population. Current standard practice includes the presentation of the mean and standard deviation of each baseline covariate before and after matching. This is important because it provides direct evidence

TE D

for imbalance of potential confounders. However, when there are many variables in the propensity score, there are very likely to be meaningful differences between groups for different variables. This leads to difficulties in interpreting the standard mean (SD) table of comparisons. Figure 1 in our manuscript

EP

provides a general overview of how the study population is altered in terms of propensity score distribution before and after matching or trimming on propensity score. If significantly altered, this may

AC C

lead to challenges in interpretation. Without appropriate transparency, readers and decision makers may make incorrect inferences based on the results provided. In addition to exclusions, the effect of treatment is often believed to vary across subgroups of the

population (e.g. participants who are more sick may improve less).[9] When true, presenting a single estimate may be misleading. We believe that investigators should transparently and explicitly explore treatment effect heterogeneity. Although simple linear regression can be used to assess trends in treatment effect estimates across the propensity score groups, this method would not account for the different variances in the outcome associated with each treatment level of each propensity score strata. Rather,

4

ACCEPTED MANUSCRIPT

these effects should be explored fully, using methods analogous to meta-regression in meta-analysis [1416]. Therefore, the objective of this paper is to propose that some particular (but rarely used) graphical

RI PT

methods that increase transparency in the reporting of results based on propensity scores become part of standard practice when publishing a manuscript. More specifically, these methods more clearly illustrate 1) the proportion of subjects excluded when using propensity score methods, and 2) if treatment effects

SC

are constant or heterogeneous across propensity scores.

M AN U

Methods

This study (protocol number: 14_018) was approved by the Independent Scientific Advisory Committee of the CPRD and the Research Ethics Board of the Jewish General Hospital (Montreal, Quebec). We use a real-world pharmacoepidemiologic study [17] as an example to demonstrate our graphical methods. This study aims at evaluating the effect of initiating statin medication on the 1-year

TE D

all-cause mortality post-myocardial infarction (MI). The study population consisted of patients aged 18 years and older who were first diagnosed with MI between April 1st, 1998 and March 31st, 2012. Details of the study design and baseline characteristics table are described elsewhere. [17] We carried out a

EP

complete-case analysis that excluded subjects who were lost to follow-up within 1 year, as the assumption of non-informative censoring was likely to be satisfied [17]. Forty-seven pre-specified baseline

AC C

characteristics as potential confounders were included in a logistic regression model to estimate the propensity score, including demographics (e.g. age, sex), smoking, alcohol use, obesity, year of cohort entry, important comorbidities, and relevant previous medication prescribed. Stratification and matching approaches were then used in our examples. In order to assess whether this model has been adequately specified, we examined the covariate balance between the two treatment groups with propensity score quintiles and the matched sample. Covariate balance was measured using standardized mean differences (SMD). We found the covariates were reasonably well balanced with overall small SMDs in each

5

ACCEPTED MANUSCRIPT

propensity score quintile stratum. In the matched sample, the SMDs for all the covariates were less 0.1. Moreover, there were only three terms with SMDs that were slightly larger than 0.1, when considering all the possible two-way interactions and second orders among all the 47 covariates. These findings together

RI PT

suggested that our propensity score model was likely to be appropriately specified.[18] We first present an augmented propensity density plot for trimming that has been proposed by others [2, 5]. It displays the number of covariates included in the propensity score, the range of the

SC

estimated propensity score, the criterion used for trimming, and number of participants in different

treatment groups before and after trimming. In our stratified analysis, we excluded patients with extreme

M AN U

propensity score by trimming the estimated propensity score at 0.025 and 0.975 (trimming based on nonoverlap region or percentiles of the estimated propensity score is also possible). For matching, we used nearest neighbour matching with a caliper to find a closest untreated comparator for each treated participant with respect to the estimated propensity score.[19] The calliper was set to be 0.2 times the standard deviation of the estimated propensity score, and the order of the

TE D

matching was random. For this approach, we used augmented frequency histograms instead of density plots because we believed they better convey the information of the changes in the study population. With regards to our second objective to transparently describe effect heterogeneity, we plotted the

EP

risk ratios (with 95% confidence intervals) for each propensity score stratum similar to a horizontal metaanalysis forest plot.[20] The propensity score strata represent 10 equally sized groups (deciles) from the

AC C

range of propensity score in the trimmed population. We show ??2 and I2 estimated using meta-analytic techniques to describe the magnitude of between-stratum heterogeneity. Specifically, ߬ ଶ represents the between-stratum variance, whereas I2 describes the percentage of the variability in effect estimates that is due to study heterogeneity rather than sampling error.[21] We present additional data such as the proportions of the events in each treatment group.

6

ACCEPTED MANUSCRIPT

Results The study cohort with complete-cases included 29,274 participants of whom 2,978 died during

days of their diagnosis of MI and 12,255 patients did not. Participant Flow Diagrams

RI PT

the one-year of follow-up. A total of 17,019 patients received a new prescription for a statin within 30

Figure 1A (top) illustrates the impact of trimming on participants included in the analysis and is

SC

similar to displays recommended by others [2, 5]. More generally, it represents an extension to the

participant flow diagram recommended by the STROBE statement.[13] The panel displays a back-to-back

M AN U

density plot of the estimate propensity score across different treatment groups before and after trimming. The two vertical dashed lines visibly illustrate the propensity score distribution before and after trimming. We also superimposed a table showing that 47 covariates were included in the propensity score model, the method used to trim, the estimated final propensity score range (0.0018 to 0.9942), the number of patients trimmed in each group, and the final number of patients in each group after trimming.

TE D

Figure 1B (bottom) is a similar plot for the matched analysis using frequency histograms. With trimming, participants are excluded at the extremes of propensity scores, and this is easily conveyed by applying vertical lines on a density plot as illustrated in Figure 1A. However, with matching, participants

EP

(typically in the untreated group, which is usually much larger) may be excluded in each of the strata rather than just at the extremes. Visually, this could be represented as a curvilinear horizontal line in the

AC C

displays suggested by others [2, 5]. However, the entire shape of the density plot is affected with matching, and the impact is different for each treatment group. By using a frequency histogram like that in Figure 1B, the number of excluded participants in each group at each propensity score is easily visible. The superimposed table provides the estimated propensity score range before and after matching, and the number of participants who were matched for the final analysis. It clearly identifies the exclusion of 11,449 treated and 6,685 untreated participants.

7

ACCEPTED MANUSCRIPT

Heterogeneity of Treatment Effect Across Propensity Score Strata Figure 2 shows the heterogeneity in the risk ratios across strata after trimming. Since the strata were created based on propensity score deciles, each decile contains an approximately equal number of

RI PT

total participants but very different numbers of treated and untreated participants. Further, there are different numbers of participants in each treatment group, and therefore we expect the uncertainty

associated with the information within each stratum will be different. This needs to be accounted for

SC

when assessing heterogeneity of treatment effects. Meta-analytical methods represent one efficient

method that is familiar to many epidemiologists and clinicians. Whether it is appropriate to combine

M AN U

effect estimates (i.e., whether the variability across propensity score subgroups is real or comes from sampling variability) is a matter of judgment. In similar settings, such as meta-analyses, there are no fixed rules for such decisions.[21] The bottom of the figure indicates that 2686 to 2688 participants were included in each stratum in our example. In our data, the stratum-specific estimated risk ratios vary but there is no obvious trend across strata; the large decrease in rate ratio for the highest propensity decile is

TE D

associated with very large confidence intervals. The horizontal lines in the figure represent the overall combined effect and 95% confidence intervals from the inverse propensity weighted estimate (RR =0.58 95%CI: 0.51 to 0.65). The overall effect is superimposed to place the heterogeneity within the context of

EP

the overall analytical results. Although we used the inverse probability weighted estimate that evaluates the average treatment effect (as this was what was used in our earlier publication on the same material),

AC C

users could superimpose whichever overall summary result they feel is most appropriate for their context. The values for ߬ ଶ and I2 are presented, showing that the between-study variance equals 0.035, and that 63.2% of the variability in effect estimates is due to between study heterogeneity rather than sampling error. Even though the I2 suggests moderate relative heterogeneity, all of the effect estimates across the propensity score range would lead to the same decision and are clinically homogeneous. Therefore, reporting one overall effect estimate might be considered appropriate in our example. In other data, reporting only a single estimate might lead to the loss of important information for medical decisionmaking. Finally, the top of Figure 2 indicates the absolute proportion of treated and untreated participants

8

ACCEPTED MANUSCRIPT

who had an event, which is also required for informed decision making.[22, 23] For the matching analysis, a similar graphical display could be used to present the heterogeneity in the risk ratio across deciles in the matched sample. In our data, we present only the graph based on the trimmed analysis

RI PT

because the two graphs were very similar. In other contexts the different methods of data exclusion could lead to different conclusions.

SC

Discussion

We presented diagnostic methods that increase transparency when investigators use propensity

M AN U

score methods. First, we extend the density plots proposed by others for trimming [2, 5], and provide a new figure for matching so that the participant flow diagram recommended by STROBE more accurately conveys which participants with which propensity scores were excluded from the final analysis. Second, we propose a graphical tool instead of a table [2] for detecting and illustrating effect heterogeneity among propensity score deciles. These methods allow the reader of the study to better understand the potential

TE D

biases and limitations of the analytical strategy used.

Our density plot increases transparency of reporting because it illustrates the proportion of patients with high probability (high propensity score) or low probability (low propensity score) of

EP

receiving treatment that are in the full data set. These proportions are not usually identifiable using conventional multiple logistic regression or through one or two sentences within the text. In addition, the

AC C

proposed graphs illustrate the distributions of the propensity score in different treatment groups, as well as the change in population (through trimming or matching) in preparation for the final step of the analysis. Furthermore, other important information could be added to the figure. For example, the frequency of the matched pairs by the number of untreated can also be reported if 1:m (m>=2) matching was performed. Without this information, the transparency advised by STROBE to minimize bias and misunderstanding may fall short of its desired goal.

9

ACCEPTED MANUSCRIPT

Understanding when a treatment effect might be stronger or weaker in particular subgroups of patients is the foundation of personalized medicine.[24] However, heterogeneity of effect may not be present in any one indication, but still be present when indications for treatment across many covariates

RI PT

are combined. One of the strengths of propensity score methods is that it reduces many covariates into one single variable indicating overall probability of receiving treatment. This overall probability can then be divided into quantiles of appropriate sizes. With only one overall score for indication of treatment,

SC

researchers can more easily assess effect heterogeneity across different strengths of indications. Other authors have also proposed similar analyses either in tabular or graphical form especially within the economics literature. [5, 25-27] Our figure provides a visual examination of this heterogeneity along with

M AN U

the values of two quantitative measures (߬ ଶ and I2). An appropriate overall summary effect measure can be provided if the effects are homogeneous across the propensity score strata, or if an average effect is of interest. However, if strong effect heterogeneity is found, it raises important questions similar to those in meta-analyses including: what criteria should be used for ascertaining effect heterogeneity? When is it

TE D

appropriate to summarize the stratum-specific results as one overall effect? What covariates are responsible for the differing effect across propensity scores and can clinicians identify meaningful subgroups of patients within their clinical practice? Our method is designed to make the results more

EP

transparent so that investigators and readers can apply their own values and judgments when making inferences, and develop new processes to increase efficiency and maximize health when treating

AC C

individual patients.

Although our density plot and histogram illustrate whether trimming or matching satisfied the

assumption that all participants had indications where they may have received either level of treatment, the graphs do not assess the other necessary assumptions for propensity score based methods such as consistency, no unmeasured confounding and correct model specification. Authors still need to carefully verify the propensity score model using appropriate methods similar to those described in our methods section. In addition, Figure 1A and 1B clearly illustrate that trimming and matching result in different numbers of participants being analyzed, and therefore different parameters of interest. Our graphical

10

ACCEPTED MANUSCRIPT

methods are applicable regardless of which parameter is of interest for which particular question. Finally, the thresholds used for propensity score diagnostics, and for trimming and matching, are based on value judgments to some extent. Our methods are designed to illustrate the impacts on the population analyzed;

RI PT

they are applicable regardless of the specific judgments made in any particular context. In summary, our graphical methods visualize important information in the trimming or matching process that might be neglected or hidden in the manuscript text. These methods also present an easy and

SC

transparent mode for effect heterogeneity exploration, and can be used to assist in carrying out final

AC C

EP

TE D

M AN U

statistical analyses.

11

ACCEPTED MANUSCRIPT

Funding Dr. Shrier is funded by the Lady Davis Institute, Jewish General Hospital in Montreal Canada. Ms. Pang is funded by the Fonds de Recherche du Québec Santé (FRQS). Dr. Platt is funded by the

AC C

EP

TE D

M AN U

SC

Pharmacoepidemiology at McGill University.

RI PT

chercheur-national (National Scholar) of the FRQS and holds the Albert Boehringer I Chair in

12

ACCEPTED MANUSCRIPT

References

6. 7. 8. 9. 10.

11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.

RI PT

SC

5.

M AN U

4.

TE D

3.

EP

2.

Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. Philadelphia: Wolters Kluwer Health, 2008. Stürmer T, Joshi M, Glynn RJ, et al. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol 2006;59(5):437.e431437.e424. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70(1):41–55. Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol 2008;168(6):656-664. Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol 2006;163(3):262-270. Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med 2006;25(12):2084-2106. d’Agostino RB. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998;17(19):2265–2281. Robins JM, Hernàn MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11(5):550-560. Glynn RJ, Schneeweiss S, Stürmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006;98(3):253-259. Stürmer T, Rothman KJ, Avorn J, et al. Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution—A simulation study. Am J Epidemiol 2010;172(7):843-854. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician 1985;39(1):33–38. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011;10(2):150-161. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration. PLoS Med 2007;4(10):e297. Higgins JP. Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol 2008;37(5):1158-1160. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21(11):1539-1558. Thompson SG. Systematic reviews: why sources of heterogeneity in meta-analysis should be investigated. BMJ 1994;309:1351-1355. Pang M, Schuster T, Filion KB, et al. Targeted maximum likelihood estimation for pharmacoepidemiologic research. Epidemiology 2016;27(4):570-577. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46(3):399-424. Ho D, Imai K, Imai MK. Series Editor. Package ‘MatchIt’. 2013 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.409.3968&rep=rep1&type=pdf). Sutton AJ, Abrams KR, Jones DR, et al. Methods for meta-analysis in medical research. J. Wiley Chichester; New York, 2000. Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration. Available from www.cochranehandbook.org, 2011.

AC C

1.

13

ACCEPTED MANUSCRIPT

RI PT

27.

SC

26.

M AN U

25.

TE D

24.

EP

23.

Stovitz SD, Shrier I. Medical decision making and the importance of baseline risk. Br J Gen Pract 2013;63:795-797. Natter HM, Berry DC. Effects of presenting the baseline risk when communicating absolute and relative risk reductions. Psychol Health Med 2005;10(4):326-334. Stratified, personalised or P4 medicine: a new direction for placing the patient at the centre of healthcare and health education. Academy of Medical Sciences, 2015. Hu AN, Mustillo SA. Recent development of propensity score methods in observational studies: Multi-categorical treatment, causal mediation, and heterogeneity. Current Sociology 2016;64(1):60-82. Heckman JJ, Urzua S, Vytlacil E. Understanding instrumental variables in models with essential heterogeneity. Rev Econ Stat 2006;88(3):389-432. Xie Y, Brand JE, Jann B. Estimating Heterogeneous Treatment Effects with Observational Data. Sociol Methodol 2012;42(1):314-347.

AC C

22.

14

ACCEPTED MANUSCRIPT

Figure Legends Figure 1: In A, a density plot illustrates the participants included in the analysis before and after trimming, with associated information about propensity score methods. In B, a frequency matched plot is

RI PT

used to best illustrate the participants included in the analysis before and after matching. Both analyses use data from a study of statin and 1-year all-cause mortality post-myocardial infarction.[17]

Figure 2: Effect heterogeneity and associated risk ratios (with 95%CI) across propensity score

SC

deciles in the trimmed sample in the study of statin and 1-year all-cause mortality post-myocardial

infarction. [17] The overall combined effect (solid line) and 95% confidence intervals (dotted lines) are

M AN U

superimposed in order to place the heterogeneity within the context of the overall analytical results. In our example, we used the inverse probability weighted estimate that evaluates the average treatment effect (risk ratio =0.576 95%CI: 0.509 to 0.652), [17] but users could superimpose whatever overall summary

AC C

EP

TE D

result they feel is most appropriate for their context.

15

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

13 16

13 18

11 17

10 18

6 14

5 10

3 6





1 4

SC ●





● ●

TE D

0.5

EP



AC C

0.2

Heterogeneity: I2=63.2% τ2 =0.035

M AN U

2 1





0.05 0.1

Risk Ratio

8 15

RI PT

13 % events Treated: % events Untreated: 17

5

10

Trimmed Sample

1

2 3 4 5 6 7 8 Propensity Score Decile (total n range: 2686 to 2688 per decile)

9

10

ACCEPTED MANUSCRIPT

What is New?



AC C

EP

TE D

M AN U

SC



Graphical methods may be helpful in understanding how propensity score methods alter the population under study An augmented density plot for propensity score trimming, or an augmented frequency plot for propensity score matching, increases the transparency of the analysis when propensity score methods are used. Methods analogous to meta-analytical techniques are helpful to assess if treatment effects are heterogeneous across strengths of indication for treatment, and therefore if a single propensity score estimate is appropriate.

RI PT



ACCEPTED MANUSCRIPT

Conflicts of Interest

AC C

EP

TE D

M AN U

SC

RI PT

None of the authors have any conflicts of interest to declare.