Accepted Manuscript Using a matrix-analytical approach to synthesising evidence solved incompatibility problem in the Hierarchy of Evidence Harald Walach, Martin Loef PII:
S0895-4356(15)00321-2
DOI:
10.1016/j.jclinepi.2015.03.027
Reference:
JCE 8925
To appear in:
Journal of Clinical Epidemiology
Received Date: 4 August 2014 Revised Date:
7 January 2015
Accepted Date: 23 March 2015
Please cite this article as: Walach H, Loef M, Using a matrix-analytical approach to synthesising evidence solved incompatibility problem in the Hierarchy of Evidence, Journal of Clinical Epidemiology (2015), doi: 10.1016/j.jclinepi.2015.03.027. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
RI PT
Using a matrix-analytical approach to synthesising evidence solved incompatibility problem in the Hierarchy of Evidence
Harald Walach, Martin Loef
M AN U
SC
Ms. Ref. No.: JCE-14-541
Correspondence
AC C
EP
TE D
Prof. Harald Walach Institute for Transcultural Health Studies European University Viadrina Frankfurt(Oder) Grosse Scharrnstrasse 59 15230 Frankfurt (Oder) +49-335-5534-2380 ...-2748 (Fax)
[email protected]
1
ACCEPTED MANUSCRIPT
1 2 3
Abstract
4
The hierarchy of evidence presupposes linearity and additivity of effects, as well as commutativity of
5
knowledge structures. Thereby it implicitly assumes a classical theoretical model.
6
Study Design and Setting:
7
This is an argumentative article that uses theoretical analysis based on pertinent literature and known
8
facts to examine the standard view of methodology.
9
Results:
SC
RI PT
Objectives:
We show that the assumptions of the hierarchical model are wrong. The knowledge structures gained
11
by various types of studies are not sequentially indifferent, i.e. do not commute. External and internal
12
validity are at least partially incompatible concepts. Therefore, one needs a different theoretical
13
structure, typical of quantum-type theories, to model this situation. The consequence of this situation
14
is that the implicit assumptions of the hierarchical model are wrong, if generalised to the concept of
15
evidence in total.
16
Conclusion:
17
The problem can be solved by using a matrix-analytical approach to synthesising evidence. Here,
18
research methods that produce different types of evidence that complement each other are synthesised
19
to yield the full knowledge. We show by an example how this might work. We conclude that the
20
hierarchical model should be complemented by a broader reasoning in methodology.
TE D
EP
AC C
21 22 23 24 25 26 27 28 29
M AN U
10
Key words: Evidence-Based Medicine, hierarchy of methods, methodology, placebo, quantum theory, matrix-analysis.
Running Title: The Incompatibility Problem in Methodology
Word Count: 5, 528 2
ACCEPTED MANUSCRIPT What is new? Key Findings: · We advocate a reconsideration of the current thinking about methods and types of evidence. · Thinking in terms of a hierarchy of increasing “levels of evidence” is insufficient. The conceptual foundations of this type of reasoning are most likely wrong. What this adds to what is known: Different research methods complement each other, because the types of validity they produce are incompatible and complementary. Therefore a different theoretical structure than the one used to support the implicit assumptions of additivity and commutativity that underlie the hierarchical view has to be used.
RI PT
·
EP
TE D
M AN U
SC
· This structure emphasises the importance of non-commutativity and complementarity similar to quantum-type formalisms. What is the implication, what should change now: · Instead of using only evidence from RCTs or evidence of the “highest level available” in systematic reviews, a matrix analytical approach to reviewing extant evidence should be used, giving an overview of all types of evidence and producing a full picture of the current situation and informing further developments.
AC C
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
3
ACCEPTED MANUSCRIPT
1. Background
TE D
M AN U
SC
RI PT
The current consensus on how to create empirical evidence within medical research postulates a hierarchy of evidence. This means that research findings have to be considered in hierarchical order according to the methodological strength of the design used based on the internal validity of research methods 1. Since internal validity is highest in randomised controlled trials (RCTs) they are placed at the top of the hierarchy. And since meta-analyses normally combine effect size estimates from single RCTs, effect size estimates from such meta-analyses or summaries from systematic reviews are considered even more valid, forming the peak of the pyramid. Other types of evidence, for instance, evidence from cohort studies in naturally occurring cohorts and comparisons of such cohorts, evidence from case-control studies, case-series or single-armed observational studies, are normally considered weaker evidence, since the internal validity of such studies is difficult to gauge. This “received view” has some very important arguments in its favour. Randomised studies, if large enough 2, distribute potential confounders equally between groups and thus destroy their potential correlation with the outcome. Since many confounders of potential treatment effects are still unknown, any naturalistic, non-randomised study may potentially be influenced by covert confounders. Indeed, bias is introduced by a lack of randomisation 3-6 or if randomisation is compromised 7. Thus it seems intuitively clear to use “lower grade” evidence only as long as better evidence does not exist, and to dismiss it as soon as better evidence, namely results from RCTs and, finally, meta-analyses, becomes available 8,9. Although this rigid stance has recently been replaced by a more modest claim that demands a reasonable effect size ruling out confounders through any kind of comparative studies 10 and by the option to downgrade randomised evidence, if poor, and upgrade non-randomised, naturalistic evidence, if valid,11 the hierarchical stance itself remains untouched. Such a view suggests that the hierarchy of internal validity is not only necessary, but also sufficient to make clinical decisions. However, since the hierarchy is one of internal validity only, external validity, or generalisability and applicability, is neglected 12. In fact, as we will argue, external and internal validity are incompatible concepts: the more internal validity is emphasised in a study, the lower external validity tends to become (see Figure 1). Although this problem has been recognised by recent developments within the evidence-based medicine movement, it cannot be solved within the current thinking, because it neglects the implicit incompatibility relationship between external and internal validity, which forms the basis of our proposal.
EP
In what follows we will argue that internal and external validity are, conceptually speaking, incompatible and therefore complementary concepts. They either cannot be maximised at the same time, or the sequence of knowledge produced—first internally valid knowledge from RCTs then externally valid knowledge from observational studies or the other way round—is not indifferent, or technically speaking, does not commute. Theoretically such concepts cannot be mapped by any theoretical frame that assumes additivity and substitution of concepts. We need theoretical structures to map such relationships that, formally speaking, are non-Abelian. A non-Abelian algebra is a formal structure that is required whenever incompatible concepts have to be theoretically modelled. Unlike our normal, Abelian algebra, where the sequence of operations, for instance when multiplying, is irrelevant, a non-Abelian algebra takes care of situations where the sequencing of operations makes a difference, or, in other words, where concepts are incompatible. Such theoretical structures were introduced by quantum theory to model the strange behaviour of nature. Such a model seems also to be necessary to understand the relationship between types of evidence. Before we continue, we hasten
AC C
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
4
ACCEPTED MANUSCRIPT
to add that we are completely in line with the impulse of enlightenment that evidence-based medicine has brought to medicine through its emphasis of empirical evidence over expert judgement, and of research over authority. What we offer is not an argument against evidence-based medicine, but an argument to improve its theoretical foundations, its scope and its clinical usefulness. To make this clearer, we will show by an example what this might mean in practical terms.
RI PT
2. The Mistaken Assumption of Linearity, Hierarchy and Separability of Types and Elements of Evidence
2.1 An Argument From Validity
M AN U
SC
Assuming linearity, hierarchy, and separability of evidence means we can neglect types of evidence that are “weaker”, because whatever can be learned from those “weaker” types of studies will either be part of the evidence that is “stronger”, or, if it contradicts the latter, can be ignored because it is less reliable. This reasoning is employed by meta-analysts and systematic reviewers and by guideline panels, using only such systematic reviews and RCTs, and thus this thinking is pervasive in the field of clinical decision making. It infiltrates how research is conducted, how funding is allocated and what types of studies are published in high impact journals and thus, ultimately, it determines what is considered “good scientific practice” by the public and by courts. It is important to realise that this whole edifice itself rests neither on proof, nor on empirical evidence, nor on a sound theory 13,14. It is simply a—reasonable—assumption. We show that this assumption is wrong when generalised using three arguments: 1. An argument from Validity. 2. An argument from Equipoise and Agency. 3. An argument from Synergy.
EP
TE D
The received view rests on the assumption that internal validity is key, and that external validity is somewhat of lesser importance. However, maximising internal validity normally happens at the expense of external validity or generalisability 12,15,16 (see also Fig. 1). If a study is to be maximally internally valid it has to employ most or all of the following strategies: It has to randomly allocate patients to groups and must not reveal the randomisation code before all data are collected and analysed. In addition, some measures of masking allocators, patients, outcome assessors, and statisticians are employed. Therefore it is an ethical requirement that patients have given informed consent. We know that many patients do not consent because they refuse randomisation 17-21, and those that consent are different in many important respects from patients that do not consent 12,19,20,22,23. Thus, in principle, it is very difficult to generalise from an RCT to the whole population. But in addition to that principal, theoretical limitation, it is also important to realise a practical limitation of external validity that is true for nearly all RCTs: From an experimental point of view, homogenising the patient population tested within an RCT is a requirement because it reduces error variance and increases the signal-to-noise ratio. For this reason regulatory and other trials employ long lists of inclusion criteria. Thereby they can be successful with fewer patients – which is an ethical and economical requirement 24,25. The practical downside is that the result is applicable only to a very small percentage of the full patient population. A typical example are trials of anti-depressants that, very often, stipulate that only patients with major depression are treated, and patients with anxiety disorders, personality disorders, substance abuse and so forth excluded. The results gleaned from such studies should only be applied to the same types of patients as were tested in the trial. They rarely are because those patients are rare. STAR*D, a large outcomes study in realistic patient groups, found that one year remission rates are maximally 43% 26, more likely 38%, approaching what can be
AC C
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
5
ACCEPTED MANUSCRIPT
M AN U
SC
RI PT
expected from a placebo effect 27, well below the usefulness expected from theory and trials 28. Increasing internal validity will normally come at the cost of reduced external validity. Some exclusion criteria in some studies are posited because researchers want to homogenise the population and decrease error variance. But other criteria are stipulated for ethical or insurance reasons. In a theoretically ideal trial with no exclusion criterion and only one inclusion criterion, a clear definition of the disease treated, effect size estimates would drop because a lot of error variance would be introduced. Thus, maximising internal validity and external validity at the same time would likely jeopardise the usefulness of the research in general. The problem is that we have no theoretical model about the relationship of these two types of validity. Thus, internal validity and external validity are incompatible from at least two perspectives: Principally because, by definition, results from RCTs do not apply to all those patients unwilling to be randomised, and practically because most trials have to introduce exclusion criteria to be feasible, restricting the generalisability of their results. Maximising and emphasising internal validity, therefore, normally, comes at the expense of external validity. Indeed, a review of external validity in trials mentions only two examples out of 149 references where internal validity was high and external validity was also good12. Evidence that is internally valid is not simply something that can replace evidence that is externally valid, as is done when RCT evidence replaces “lower grade evidence”. Internal and external validity are, at least partially, incompatible concepts, and yet they have to be applied conjointly.
2.2 An Argument From Equipoise and Agency
EP
TE D
The fact that patients have to consent to be randomised and thus take the chance of receiving an— often unwanted—control treatment means that they are restricted in their agency. If the treatment were freely available and if they had the money to buy it, many patients would likely just do that, and in fact do. RCTs, conceptually speaking, treat patients as passive recipients of interventions. This might be practically negligible for new interventions where patients and doctors are in equipoise, and for pharmacological or surgical interventions, where patients are indeed passive receivers of interventions. However, this fact cannot be neglected for all those interventions where patients either have to invest an effort to change, or where doctors or patients have high a-priori preferences 29,30. If patients have to invest efforts, such as in psychotherapy, meditation, lifestyle or other complex and behavioural interventions, patients who are seeking such therapies by choice will very likely differ from those that are offered such treatments as a potential option within a trial. Patients with strong preferences are likely to be unwilling to be randomised, and many aborted studies, for instance in complementary medicine, are due to this fact 31. Thus, by default, randomised studies in such areas can only produce evidence of limited external validity. But we might also have a problem with model validity here. If an intervention, by definition, needs the active collaboration of patients, then studying such an intervention in an RCT will not produce a valid estimate of potential effects. It might produce estimates of minimally achievable effect sizes with patients that have no clear preference, or introduce other kinds of bias 32. But such a result is not representative of those patients the treatment has been designed for initially, and thus the results also lose their model validity. Model validity refers to the question, whether a study represents a fair trial and models the treatment in a way as it is used in practice. This is a point for discussion around the recent question whether meditation-based treatments are effective for depression and anxiety 33. As we have pointed out, RCTs produce biased estimates of potential effects 34. The problem becomes more pertinent when parts of the patient population have such strong
AC C
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
6
ACCEPTED MANUSCRIPT
TE D
M AN U
SC
RI PT
preferences that they will not participate in trials. In such a situation it is not possible to generalise from such results. This is very often the case with treatments that have already been in use for some time, because they are part of traditional healing systems 30. Observational knowledge about real world effects already exists, and thus patients and doctors are often unwilling to perform trials because they “know that it works” anyway. This is a variety of the Philip's paradox 10: Often we do not have scientific evidence from well controlled randomised studies for those interventions that are most effective and have been around for quite some time, and where effects are very obvious it is impossible to conduct a masked trial 35. While this is obvious for some standard procedures in medicine, such as defibrillation or anesthesia, where randomised trials would not be required because of the sheer size of the effect 36, it is also true for many treatments of traditional medicine 37. In such a case, equipoise is missing and valid RCTs are near impossible to perform. Thus, there is a double problem: Not only do we observe an incompatibility between internal and external validity, we also see that the sequence of the types of evidence gleaned is important. In the case that an intervention is new, such as with introducing new drugs, there is equipoise, and RCTs are not a problem. What is a problem is generating evidence that is externally valid. Therefore, in such situations, RCTs normally come first and large observational cohort studies in clinical practice that determine the acceptability, applicability, real world effects and side effects of the new intervention follow. In traditional interventions that have been used for quite some time, such as in traditional medicine, or in surgery, followers and providers have certain opinions based on their experience. Observational evidence already exists, and therefore evidence from RCTs is often either more difficult or sometimes even impossible to garner, and if produced, very often discarded as meaningless by patients and practitioners. Internal and external validity are neither in a linear, nor in a sequentially indifferent relationship, nor do they hierarchically implicate each other in the sense that one has to precede the other. Rather the temporal sequence determines the knowledge structure that is created which in turn reacts back on potential evidence. This means that the sequence in which these types of evidence are generated does make a difference, and what is known cannot be undone. Technically speaking: the types of evidence do not commute. Let us bear this in mind. For this situation calls for a different type of theoretical model than that which has been hitherto applied.
EP
2.3 An Argument from Synergy, Or: Harnessing the Placebo-Effect The linear model of additivity of treatment effects used for the interpretation of placebo-controlled RCTs assumes that a treatment effect is a combination of artifacts, natural course of the disease, regression to the mean, non-specific, psychologically mediated treatment effects, and specific effects. It employs a subtractive rationale: by comparing a true treatment group to a sham group, it factually subtracts the effect of the true treatment from those of the control treatment and only looks at the difference, neglecting everything else 10,38. This procedure makes a vital assumption that has rarely been tested and if tested has been found to be wrong 39: namely that the treatment components are additive and that the non-specific effects are roughly stable and uncorrelated with treatment effects across trials. The latter assumption has been shown to be wrong: treatment and placebo-effects are highly correlated at r = 0.78 in a variety of studies with treatment duration of more than 3 months 40. And the former assumption of stability of non-specific effects across trials, at least within domains, has never been tested. However, given all we know about placebo effects it is a highly unreasonable assumption 41. Placebo-effects are dramatic and strong in studies where researchers aim to maximise them 42, while they are comparatively small in trials where researchers aim to minimise noise 43. They are dependent on the individual meaning patients attribute to an intervention and to the quality of
AC C
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
7
ACCEPTED MANUSCRIPT
interaction 44, and hence cannot be easily predicted45,46. Thus, focusing only on specific efficacy is misleading and leads to what has been termed the efficacy paradox 47,48. This paradox describes a situation whereby a treatment can be very effective in general terms but inefficacious vis-a-vis its own placebo because the non-specific effects are very strong while the specific effects are very small. And yet a different intervention, which might produce small nonspecific effects but very strong specific effects, may be proved efficacious although its general effectiveness might be comparatively small. Such a situation is depicted in Figure 2.
RI PT
EP
TE D
M AN U
SC
An example of the first situation can be found in antidepressant medications. While critics point out that the specific effect over and above placebo of antidepressants is quite small if the publication bias is removed 49, in fact about d = 0.32, substantially less than the d = 0.5 that is stipulated by NICE as the benchmark for anti-depressant treatment, the generic effect of current anti-depressant treatment, including the placebo-effect produced through the powerful myth that has been created around those drugs, can be quite strong 50, reaching more than two standard deviations against baseline 51. The clear experimental demonstration of the efficacy paradox would require at least a four-armed clinical experiment, which to our knowledge has not been conducted. However, the three German Acupuncture trials (GERAC) come closest to illustrating the reality of the efficacy paradox. In those studies with about 1200 patients in each trial acupuncture was tested against a sham-acupuncture procedure—shallow needling in non-therapeutic points—and against best conventional practice. The latter contained all types of guideline-derived therapies, including medications, all of which had already been tested against placebos. None of the studies succeeded in showing any superiority of acupuncture against the sham-acupuncture procedure. But in two of the three studies—in osteoarthritis and low back pain—both the real and the sham-acupuncture were nearly twice as effective than the best-practice conventional treatment 52,53, while in the third all three arms were roughly equal 54. Thus, in two studies a placebo-procedure was statistically significantly and clinically more effective than a proven active treatment, illustrating the reality of the efficacy paradox. An example of the second situation of clear, significant specific effects that are, however, clinically and overall speaking relatively small, can be seen in modern anticancer or antidementia drugs, which all have to prove their efficacy, but whose non-specific placebo-component is small because dementia impairs the capacity to form expectations and hope, thus diminishing potentially beneficial effects of non-specific elements of treatments, and because cancer in its end stage is difficult to influence by psychological means 55-57. Thus, the efficacy paradox describes, and is due to, a lack of linear additivity of therapeutic components. Very small specific effects can be therapeutically extremely valuable, because they not only add, but synergistically boost non-specific treatment effects. And comparatively strong specific effects can be clinically useless because they fail to activate some kind of internal response. Note that in all those cases the sequence of events is very important: Patients need to first believe, through previous studies and their public perception or through a popular myth, that an intervention is effective, before a strong placebo-response can be produced. If one were to take away the specific effects and trust the placebo-component only, the latter would collapse as well. This shows that these effects are synergistic. Synergistic systems are not additive, but have to be described by multiplicative operations. The hallmark of synergy is that all components working together in synergy can achieve things that each by itself and added up will not achieve. An example for a synergistic system is a skilled child-rider on a powerful horse. The child can make the horse do things that the horse would not do by itself, unless in danger, and the child will become quicker and can jump higher with the horse.
AC C
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
8
ACCEPTED MANUSCRIPT
Thus, any therapeutic system optimises non-specific effects and capitalises on the knowledge or rumour of its specific efficacy, producing a greater clinical overall effectiveness than each and every component would produce on its own. Additivity of components is a formal abstraction that is wrong in practice and dangerous conceptually. It is dangerous because it suggests we can treat evidence about treatment effects as systems of linear components, although they are synergistic and multiplicative, rather than additive.
SC
RI PT
This analysis has led to the conclusion that internal validity and external validity are partially incompatible, that agency and the lack of equipoise also suggest incompatibility of internal and external validity, and moreover an incompatibility of sequencing or a lack of commutativity. This latter point is also emphasised by the argument from synergy stating that components of therapeutic interventions are not additive. Taken together, these arguments call into question the received view of a linear hierarchy of evidence that can look at internally valid evidence only and neglect the rest. What are the conceptual consequences?
3. Consequences: Rethink Hierarchies and Think Quantum Theoretically
M AN U
This implies that methods are not arranged in a hierarchical order but are mutually complementing each other in their strengths and weaknesses. Thus, a summary of empirical evidence has to be, by necessity, a multi-faceted, multi-method review as is advocated by realist synthesis 58-62, and other recent models 60,63. Conceptually, we need to think about the theoretical foundation of methodology in terms of a theoretical model that is following quantum theoretical reasoning. This does not mean to apply quantum physics, which would be silly, but to apply theoretical structures that have been shown to be useful for modelling relationships within the analysis of matter.
TE D
3.1 The Quantum Theoretical Structure of the Methodological Problems of Evidence
y = a + b + c + e. (1)
EP
Normally and implicitly we model our concepts along classical lines, following classical rules of thinking. Whenever we have additive components we do that, as in
This is a formal description of a linear system where four components—e.g. specific effect, non-specific effect, regression, random error—add up to yield the full effect. We have seen that this is likely wrong for the understanding of components of efficacy.
AC C
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338
In methodology we also implicitly assume that we can use Abelian thinking to model the concepts in question. An Abelian algebra is one where the concepts we are dealing with are all compatible and therefore it is irrelevant which operation is performed first. A methodological example of indifference of the sequencing of operations would be, if it is irrelevant whether we have first evidence from RCTs and then from observational studies or the other way round. A formal expression of this situation is a*b = b*a (2) or 9
ACCEPTED MANUSCRIPT a*b – b*a = 0 (2a) As soon as we substitute “2” and “3” for “a” and “b” in (2a) we can immediately see that this is how we normally operate. The terms “a” and “b” are therefore called “compatible” or “commutative”, because we can change their sequence and all stays the same.
M AN U
SC
RI PT
We have seen in in 2.1 that external and internal validity are partially incompatible concepts, and in 2.2 and 2.3 that sequencing comes into play when dealing with issues of external and internal validity and various traditions of knowledge. It does make a difference for some interventions whether we know first about specific efficacy and only later learn about their effectiveness, or the other way round. The scandal about the lethal side-effects of Cox2-inhibitors is a typical example of the sequencing effects of the types of evidence. The legal requirements necessitate that efficacy studies and RCTs come first, and long-term observational trials that teach us about acceptability and side effects only come later. Clearly this sequence of procedures does not commute: had we learned about problematic side effects, we would not test for efficacy. Once we have tested for efficacy, data about side-effects and acceptability have difficulty in being taken seriously. The whole issue of traditional or well known classical medical interventions or surgery shows the same situation from the reverse viewpoint: Where effectiveness is commonly assumed, studies about efficacy are either impossible, useless or too late. Here also, we see a sequence that does not follow the postulate of commutativity expressed in (2) or (2a). To model such cases where concepts are incompatible and/or sequencing does play a decisive role, we need an algebra of non-commuting operations, or a non-Abelian algebra as the underlying theoretical structure. This is formally written as
or a*b – b*a > 0 (3a).
TE D
a*b <> b*a (3)
EP
If we again insert “2” and “3” for “a” and “b” as in 2a above, we can immediately see how strange this situation is and how different from our “normal” theoretical reasoning. Such a formal structure is at the core of the quantum formalism and is a generalised form of the famous uncertainty relation. In other words, if one neglects all things that are typical for physical quantum theory proper, and asks the question: what is the indispensable difference between a classical and a quantum theory? Then exactly this way of dealing with non-commuting or incompatible operations turns out to be the decisive difference 64-66. It is important to realise that this has nothing to do with quantum physics proper. It is simply a theoretical structure that is necessary whenever we deal with incompatible operations. Quantum physics turned out to be the first discipline to have noted this. It is therefore at least reasonable to assume that such a structure might also be pertinent in other areas. And as we have seen, it is indeed the case for problems of validity, additivity of effects, or sequencing of types of evidence. It has been pointed out that such structures might be important in more areas of human inquiry 67,68. For instance, cognitive science researchers have discovered that using quantum formalisms to model the results they find in cognitive research is more powerful and matches reality more closely than using classical procedures 69. By the same token, it is not surprising that methodology and the structure of empirical evidence seems to also follow theoretical concepts that
AC C
339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386
10
ACCEPTED MANUSCRIPT
can more easily be modelled using quantum theoretical descriptions.
3.2. Abolish Hierarchies
TE D
M AN U
SC
RI PT
An immediate consequence is that hierarchical and linear thinking in methodology should be abolished 47. There is no such thing as an ever valid hierarchy of evidence. There is only evidence that has a higher level of internal validity, and evidence that may have a higher level of external validity 70. And one cannot be dispensed of at the expense of the other. Incompatible or complementary concepts, as internal and external validity, have to be employed conjointly, although they are mutually incompatible. This is in fact the conceptual definition of incompatibility or complementarity that was introduced by Niels Bohr 71,72. It is part of the definition of incompatible operations that they cannot be applied at the same time, and that the sequence of their application does make a difference. This is why they are called non-commuting operations. Hierarchies suggest that we can neglect evidence that is of lesser “dignity”. The model advocated here suggests that there is no such thing as evidence of a lesser dignity – apart from evidence that is useless because the methodology employed was not applied properly or the implementation or organisation of a study was flawed. Provided evidence is produced properly, evidence from cohort studies or case series is also valuable. It is valuable because it is different in type and scope from the evidence produced by RCTs 73,74. Therefore, systematic reviews should widen their remit, and include not only RCT-level evidence, but complement and qualify the findings from RCTs by evidence from other types of studies 60,63,75,76. And it should be considered bad practice for guideline-panels and reviewers to dismiss evidence from non-randomised studies. Various authors have pointed this out already 47,77,78. One major criticism of such proposals has been that no one knows how to put this into practice. We think that a combination of qualitative and quantitative approaches might be useful here. We will demonstrate this by an example.
3.3. Example: Combining Realist Synthesis and Matrix Analysis in Systematic Reviews
EP
Realist synthesis is a method of synthesising evidence in an informed qualitative-narrative way 59-62. It produces suggestions for decision makers of the type “For patient A with disease Y under condition Q and in circumstances C do X”. All available evidence is brought together in order to inform practical decisions. This is very close to what is envisaged here, as the method also turns away from a hierarchical understanding of methodology, albeit in a more pragmatic stance and less formally. Matrix analysis is an instrument from qualitative research 79. Here different outcomes and results from various studies, addressing different questions, are presented in a matrix that allows the researcher to have a clear overview. Our proposal is to combine these methods. This would respect the inherent incompatibility of methodological concepts. Research methods are ways to answer different questions. Placebo controlled RCTs answer the question about specific efficacy. Pragmatic RCTs answer the question of comparative effectiveness. Non-randomised comparative cohort studies answer the question of differential effectiveness of interventions under real world conditions for different types of patients. Long term observational studies answer the question of general effectiveness in clinical practice and in the general population, and document side effects, and so forth. None of these questions is “better” or “more valuable” or “stronger” than the other, nor do the answers to these questions produce “better” or “stronger” evidence. They simply produce different answers and diverse knowledge. All this
AC C
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434
11
ACCEPTED MANUSCRIPT
M AN U
SC
RI PT
information together yields a matrix that, in some cases, may be broadened out by fundamental research producing mechanistic knowledge 80, in vitro results, or still others. The research types answering certain questions form the rows of the matrix. Depending on the fine tuning required one can produce columns for answers such as positive, negative or unclear results, effect sizes, or patient populations studied. If required, the matrix can be three-dimensional including measures of quality of the studies. Thus, the layout of the matrix will change according to the research question. The matrix is then populated with, for instance, the vote counts of the studies or effect sizes, depending on the question. That way, all research methods are treated equally, no information is dropped, systematic reviewers produce an overall pattern, and readers can then form their own opinion. Similar approaches have been used in applied health systems analysis to model trajectories of practice and patient care 81. We have applied this method in various systematic reviews that have produced meaningful results which, had we looked at certain types of evidence only, would have remained completely devoid of any meaning 82-85. In one study we collated evidence in a systematic review about the potential causative role of metallic mercury for Alzheimer's disease 86. The evidence found can be rearranged in the following matrix (Table 1).
EP
TE D
One can see immediately that one row of the matrix is empty: There are no RCTs on the influence of metallic mercury on cognitive decline or cognitive measures, simply because it would have been unethical to conduct such studies. Any systematic review looking only at randomised trial evidence would have concluded that there are no studies; hence the question cannot be answered. However, looking at various types of evidence, one can see a different picture. It is very clear for basic research, comparatively clear for cohort studies and case studies, less clear for autopsy studies. This demands an explanation. The explanation can be found in a modifying theoretical model, published with the original review, stating that mercury is a selenium scavenger, thus aggravating a low-grade selenium deficit in certain populations which will only become apparent long-term by compromising the brain's redox systems. Thereby, the matrix approach has helped in two ways: it has broadened the evidence base, and it has made obvious where inconsistent evidence has to be explained by a theoretical model. Matrix analysis is a practical realisation of the circular understanding of mutual dependence of methods on knowledge from other types of studies: it leads to an immediate understanding of what types of evidence are still necessary and thus creates the argumentative and conceptual basis for such research.
AC C
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482
The matrix approach could be refined by a Bayesian type of reasoning: if one makes explicit—for instance by providing theoretical rationales—the prior probability for the effectiveness of an intervention, one could estimate the number of studies that are needed to shift probability towards virtual certainty. One could also use mechanistic knowledge from animal and in-vitro models to construct a prior probability that is then contrasted or enriched with findings from clinical studies of various types. This proposal has an obvious downside: It would be the end of loads of “quick-and-dirty” systematic reviews that make their authors' lives easy by neglecting everything except RCTs, thus reducing 12
complexity and cognitive load 87.
ACCEPTED MANUSCRIPT
4. Conclusion
SC
RI PT
We advocate a reconsideration of the current thinking about methods and types of evidence and reconsidering the notion of a hierarchy of increasing “levels of evidence”. We have shown that the conceptual foundations of this type of reasoning are weak at best. While a hierarchy is useful within the domain of internal validity, it is counterproductive for assembling the totality of evidence across domains. Instead, research methods should be seen as complementing each other, because the types of validity they produce are incompatible and complementary. This situation calls for an alternative theoretical structure. This view is inspired by the quantum theoretical treatment of incompatible operations. One way of putting this into practice is a matrix analytical approach to reviewing extant evidence. Here, an overview of all types of evidence produces a full picture of the current situation and will also inform further developments.
M AN U
Acknowledgement
Conflict of Interest
TE D
This paper would not have been possible without the understanding that Harald Walach gleaned from many discussions and other collaborations with Hartmann Römer, who is the theoretical father of Generalised Quantum Theory, and the funding provided by the Samueli Institute for work around that concept. Martin Loef was supported by a grant from the Oberberg-Foundation, Berlin, while working on this paper. The studies that formed the basis of our matrix-analysis were supported by the Samueli Institute's Brain, Mind, and Healing program.
The authors declare that they do not have conflicts of interest.
EP
References 1. Sackett DL. Evidence Based Medicine: How to Practice and Teach EBM. New York: Churchill Livingstone, 1997. 2. Aickin M. Randomization, balance, and the validity and efficiency of design-adaptive allocation methods. Journal of Statistical Planning and Inference 2001;94:97-119. 3. Sacks H, Chalmers TC, Smith H. Randomized versus historical controls for clinical trials. American Journal of Medicine 1982;72:233-240. 4. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. Journal of the American Medical Association 1995;273:408-412. 5. Berger VW, Exner DV. Detecting selection bias in ramdomized clinical trials. Controlled Clinical Trials 1999;20:319-327. 6. Investigators WGfWsHI. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women's Health Initiative randomized controlled trial. Journal of the American Medical Association 2002;288:321-333. 7. Chalmers TC, Celano P, Sacks HS, Smith H. Bias in treatment assignment in controlled clinical
AC C
483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530
13
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
trials. New England Journal of Medicine 1983;309:1358-1361. 8. Collins R, Gray R, Godwin J, Peto R. Avoidance of large biases and large random errors in the assessment of moderate treatment effects: the need for systematic overviews. Statistics in Medicine 1987;6:245-250. 9. Chalmers TC, Levin H, Sacks HS, Reitman D, Berrier J, Nagalingam R. Meta-analysis of clincal trials as a scientific discipline. I: Control of bias and comparison with large co-operative trials. Statistics in Medicine 1987;6:315-325. 10. Howick J. The Philosophy of Evidence-Based Medicine. Chichester: Wiley-Blackwell, 2011. 11. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. British Medical Journal 2008;336:924. 12. Rothwell PM. External validity of ranodmised controlled trials: "To whom do the reuslts of this trial apply?" Lancet 2005;365:82-93. 13. Abel U, Koch A. The mythology of randomization. In: Abel U, Koch A, eds. Nonrandomized Comparative Clinical Studies. Düsseldorf: Symposion Publishing, 1998: 27-40. 14. Black N. Why we need observational studies to evaluate the effectiveness of health care. British Medical Journal 1996;312:1215-1218. 15. Mant D. Can randomised trials inform clinical decisions about individual patients? Lancet 1999;353:743-746. 16. Cook TD, Campbell DT. Quasi-Experimentation Design and Analysis Issues for Field Settings. Chicago: Rand McNally, 1979. 17. Halpern SD. Evaluating preference effects in partially unblinded randomized clinical trials. Journal of Clinical Epidemiology 2003;56:109-115. 18. Howard KI, Cox WM, Saunders SM. Attrition in substance abuse comparative treatment research: The illusion of randomization. In: Onken LS, Blaine JD, eds. Psychotherapy and Counseling in the Treatment of Drug Abuse. Rockville: National Institute of Drug Abuse, 1990: 66-79. 19. Llewellyn-Thomas HA, McGreal MJ, Thiel Ec, Fine S, Erlichman C. Patients' willingness to enter clinical trials: measuring the association with perceived benefit and preference for decision participation. Social Science and Medicine 1991;32:35-42. 20. Brewin CR, Bradley. Patient preferences and randomized clinical trials. British Medical Journal 1989;299:313-315. 21. Feinstein AR. Problems of randomized trials. In: Abel U, Koch A, eds. Nonrandomized Comparative Clinical Studies. Düsseldorf: Symposion Publishing, 1998: 1-13. 22. Lloyd-Williams F, Mair F, Shiels C, et al. Why are patients in clinical trials of heart failure not like those we see in everyday practice? Journal of Clinical Epidemiology 2003;56:1157-1162. 23. Moore MJ, O'Sullivan B, Tannock IF. How expert physicians would wish to be treated if they had genitourinary cancer. Journal of Clinical Oncology 1988;6:1736-1745. 24. van der Lem R, de Wever WWH, van der Wee NJA, van Veen T, Cuijpers P, Zitman FG. The generalizability of psychotherapy efficacy trials in major depressive disorder: an analysis of the influence of patient selection in efficacy trials on symptom outcome in daily practice. BMC Psychiatry 2012;12:192. 25. Van Spall HG, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals. JAMA: The Journal of the American Medical Association 2007;297(11):1233-1240. 26. Rush JA, Trivedi MH, Wisniewski SR, et al. Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: A STAR*D report American Journal of Psychiatry 2006;163:1905-1917 27. Pigott HE, Leventhal AM, Alter GS, Boren JJ. Efficacy and effectiveness of antidepressants:
AC C
531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577
14
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
current status of research. Psychotherapy and Psychosomatics 2010;79:267-279. 28. Fava GA, Tomba E, Grandi S. The road to recovery from depression - don´t drive today with yesterday´s map. Psychotherapy and Psychosomatics 2007;76:260-265. 29. Lilford RJ, Jackson J. Equipoise and the ethics of randomization. Journal of the Royal Society of Medicine 1995;88:552-559. 30. Miller FG, Joffe S. Equipoise and the dilemma of randomized clinical trials. New England Journal of Medicine 2011;364:476-480. 31. von Rohr E, Pampallona S, van Wegberg B, et al. Experiences in the realisation of a research project on anthroposophical medicine in patients with advanced cancer. Schweizer medizinische Wochenschrift 2000;130:1173-1184. 32. McCambridge J, Kypri K, Elbourne D. In randomization we trust? There are overlooked problems in experimenting with people in behavioral intervention trials. Journal of Clinical Epidemiology 2014;67:247-253. 33. Goyal M, Singh S, Sibinga EM, et al. Meditation programs for psychological stress and well-being: A systematic review and meta-analysis. Journal of the American Medical Association - Internal Medicine 2014;doi: 10.1001/jamainternmed.2013.13018. 34. Walach H, Esch T, Schmidt S. Meditation is no medication (Letter). Journal of the American Medical Association 2014;in print. 35. Ney PG, Collings C, Spensor C. Double-blind: double talk or are there ways to do better research. Medical Hypotheses 1986;21:119-126. 36. Glasziou P. When are randomised trials unnecessary? Picking signal from noise. British Medical Journal 2007;334:349-351. 37. Fonnebo V, Grimsgaard S, Walach H, et al. Researching complementary and alternative treatments - the gatekeepers are not at home. BMC Medical Research Methodology 2007;7(7). 38. Howick J, Friedemann C, Tsakok M, et al. Are treatments more effective than placebos? A systematic review and meta-analysis. PLoS One 2013;8(5):e62599. 39. Kleijnen J, de Craen AJM, Van Everdingen J, Krol L. Placebo effect in double-blind clinical trials: a review of interactions with medications. Lancet 1994;344:1347-1349. 40. Walach H, Sadaghiani C, Dehm C, Bierman DJ. The therapeutic effect of clinical trials: understanding placebo response rates in clinical trials - A secondary analysis. BMC Medical Research Methodology 2005;5:26. 41. Meissner K, Kohls N, Colloca L, eds. Placebo effects in medicine: mechanisms and clinical implications. London: Royal Society, 2011. 42. Vase L, Riley JL, Price DD. A comparison of placebo effects in clinical analgesic trials versus studies of placebo analgesia. Pain 2002;99:443-452. 43. Hróbjartsson A, Gøtzsche PC. Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. New England Journal of Medicine 2001;344:1594-1602. 44. Kaptchuk TJ, Kelley JM, Conboy LA, et al. Components of placebo effect: randomised controlled trial in patients with irritable bowel syndrome. British Medical Journal 2008;336:999-1003. 45. Moerman DE, Jonas WB. Deconstructing the placebo effect and finding the meaning response. Annals of Internal Medicine 2002;136:471-476. 46. Walach H. Placebo controls: historical, methodological and general aspects. Philosophical Transactions of the Royal Society Biological Sciences 2011;366:1870-1878. 47. Walach H, Falkenberg T, Fonnebo V, Lewith G, Jonas W. Circular instead of hierarchical Methodological principles for the evaluation of complex interventions. BMC Medical Research Methodology 2006;6(29). 48. Walach H. The efficacy paradox in randomized controlled trials of CAM and elsewhere: Beware
AC C
578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624
15
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
of the placebo trap. Journal of Alternative & Complementary Medicine 2001;7:213-218. 49. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine 2008;358:252-260. 50. Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, Johnson BT. Initial severity and antidepressant benefits: A meta-analysis of data submitted to the food and drug administration. PLoS Medicine 2008;5(2):e45. 51. Rief W, Nestoriuc Y, Weiss S, Welzel E, Barsky AJ, Hofmann SG. Meta-analysis of the placebo response in antidepressant trials. Journal of Affective Disorders 2009;118:1-8. 52. Scharf H-P, Mansmann U, Streitberger K, et al. Acupuncture and knee osteoarthritis. Annals of Internal Medicine 2006;145:12-20. 53. Haake M, Muller HH, Schade-Brittinger C, et al. German Acupuncture Trials (GERAC) for chronic low back pain: randomized, multicenter, blinded, parallel-group trial with 3 groups. Archives of Internal Medicine 2007;167(17):1892-1898. 54. Diener HC, Kronfeld K, Boewing G, et al. Efficacy of acupuncture for the prophylaxis of migraine: A multicentre randomised controlled clinical trial. Lancet Neurology 2006;5:310-316. 55. Howard R, McShane R, Lidesay J, et al. Donezepil and memantine for moderate-to-severe Alzheimer's disease. New England Journal of Medicine 2012;366:893-903. 56. Petticrew M, Bell R, Hunter D. Influence of psychological coping on survival and recurrence in people with cancer: systematic review. British Medical Journal 2002;325(7372):1066.1. 57. Smedslund G, Ringdal GI. Meta-analysis of the effects of psychosocial interventions on survival time in cancer patients. Journal of Psychosomatic Research 2004;57(2):123-131. 58. Mays N, Pope C, Popay J. Systematically reviewing qualitative and quantitative evidence to inform management and policy-making in the health field. Journal of Health Services Research and Policy 2005;10(Suppl 1):6-20. 59. Pawson R. Evidence-based policy: in search of a method. Evaluation 2002;8:157-181. 60. Petticrew M, Rehfuss E, Noyes J, et al. Synthesizing evidence on complex interventions: how metaanalytical, qualitative, and mixed-method approaches can contribute. Journal of Clinical Epidemiology 2013;66 (11):1230-1243. 61. Sager F, Andereggen C. Dealing with complex causality in realist synthesis: The promise of qualitative comparative analysis. American Journal of Evaluation 2012;33:60-78. 62. Greenhalgh T, Wong G, G. W, Pawson R. Protocol - realist and meta-narrative evidence synthesis: Evolving standards (RAMESES). BMC Medical Research Methodology 2011;11(115). 63. Noyes J, Gough D, Lewin S, et al. A research and development agenda for systematic reviews that ask complex questions about complex interventions. Journal of Clinical Epidemiology 2013;66 (11): 1262-1270. 64. Atmanspacher H, Römer H, Walach H. Weak quantum theory: Complementarity and entanglement in physics and beyond. Foundations of Physics 2002;32:379-406. 65. Atmanspacher H, Filk T, Römer H. Weak Quantum Theory: Formal Framework and Selected Applications. In: Khrennikov A, ed. Quantum Theory: Reconsiderations of Foundations American Institute of Physics, Conference Proceedings. New York, USA: Melville, 2006. 66. Filk T, Römer H. Generalized Quantum Theory: Overview and latest developments. Axiomathes 2011;21:211-220. 67. Walach H, von Stillfried N. Generalizing Quantum Theory - Approaches and Applications. Axiomathes 2011;21 (2)(Special Issue):185-371. 68. Walach H, Stillfried Nv. Generalised Quantum Theory―Basic idea and general intuition: A background story and overview. Axiomathes 2011;DOI 10.1007/s10516-010-9145-5.
AC C
625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671
16
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
69. Pothos EM, Busemeyer JR. Can quantum probability provide a new direction for cognitive modeling. Behavioral and Brain Sciences 2013;36:255-327. 70. Evans D. Hierarchy of evidence: a framework for ranking evidence evaluating healthcare interventions. Journal of Clinical Nursing 2003;12(1):77-84. 71. Hinterberger T, von Stillfried N. The concept of complementarity and its role in quantum entanglement and generalized entanglement. Axiomathes 2013;23:443-459. 72. Bohr N. Causality and complementarity. Philosophy of Science 1937;4:289-298. 73. Concato J, Horwitz RI. Beyond randomised versus observational studies. Lancet 2004;363:16601661. 74. Concato J, Lawler EV, Lew RA, Gaziano JM, Aslan M, Huang GD. Observational methods in comparative effectiveness research. American Journal of Medicine 2010;123:e16-e23. 75. Linde K, Scholz M, Melchart D, Willich SN. Should systematic reviews include non-randomized and uncontrolle studies? The case of acupuncture for chroic headache. Journal of Clinical Epidemiology 2002;55:77-85. 76. Petticrew M, Anderson L, Elder R, et al. Complex interventions and their implications for systematic reviews: a pragmatic approach. Journal of Clinical Epidemiology 2013;66(11): 12091214. 77. Reilly D, Taylor MA. The evidence profile. The mulitdimensional nature of proof. Complementary Therapies in Medicine 1993;1(Suppl. 1):11-12. 78. Jonas WB. Building an Evidence House: Challenges and Solutions to Research in Complementary and Alternative Medicine. Forschende Komplementärmedizin und Klassische Naturheilkunde 2005;12:159-167. 79. Averill JB. Matrix analysis as a complementary analytic strategy in qualitative inquiry. Qualitative Health Research 2002;12:855-866. 80. Clarke B, Gillies D, Illari P, Russo F, Williamson J. The evidence that evidence-based medicine omits. Preventive Medicine 2013;57:745-747. 81. Bingham JW, Quinn DC, Richardson MG, Miles PV, Gabbe SG. Using a healthcare matrix to assess patient care in terms of aims for improvement and core competencies. Journal on Quality and Patient Safety 2005;31:98-105. 82. Loef M, Mendoza LF, Walach H. Lead (Pb) and the risk of Alzheimer's disease or cognitive decline: A systematic review. Toxin Reviews 2011;30:103-114. 83. Loef M, Walach H. Copper and iron in Alzheimer's disease: a systematic review and its dietary implications. British Journal of Nutrition 2012;107:7-19. 84. Loef M, Walach H. Fruit, vegetables, and prevention of cognitive decline or dementia: A systematic review of cohort studies. Journal of Nutrition, Health & Aging 2012;16:626-630. 85. Loef M, Walach H. The omega-6/omega-3 ratio and dementia or cognitive decline: A systematic review on human studies and biological evidence. Journal of Nutrition in Gerontology and Geriatrics 2013;32:1-23. 86. Mutter J, Curth A, Naumann J, Deth R, Walach H. Does inorganic mercury play a role in Alzheimer's disease? A systematic review and an integrated molecular mechanism. Journal of Alzheimer's Disease 2010;22:357-374. 87. Vickers AJ. Reducing systematic reviews to a cut and paste. Forschende Komplementärmedizin 2010;17:303-305.
AC C
672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714
17
ACCEPTED MANUSCRIPT
Table 1 – Example: Matrix Analysis of the Evidence Pertaining to the Question Whether Metallic Mercury is a Causative Factor for Alzheimer's Disease – from Mutter et al (2010) [88] Negative Undecided
Basic Research
25
0
0
Human Studies: Natural Cohorts
11
1
0
Cross Sectional
3
1
3
Randomised Studies
0
0
Single Cases
18
1
Autopsy Studies
4
4
RI PT
Positive
0 3
AC C
EP
TE D
M AN U
SC
1
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Figure 1 – Idealised Hierarchy of Evidence: Studies are arranged hierarchically according to the inherent internal validity of their design, with randomised controlled trials (RCTs), ideally placebocontrolled ones, ranged highest and studies with weaker design arranged below. In such a hierarchy internal validity increases as we follow the arrow up to the top of the hierarchy (right arrow). But at the same time, normally, external validity, applicability and generalisability is reduced, because these are highest in naturalistic studies (right arrow).
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Figure 2 – Schematic Illustration of the Efficacy Paradox – Adapted from (Walach, 2001). Two different treatments, x and y, are tested against their respective placebo-controls. While treatment x is found to be non-significantly different from placebo and hence termed „not efficacious“, treatment y, due to a larger specific effect is found efficacious. However the effect of the placebo in treatment x can be larger than the total treatment effect of the specific treatment y. (Reproduced with permission from the Journal of Alternative and Complementary Medicine)