CLINICAL
TRIALS
PRESENTING,
ANALYSING
DISCUSSING
THE
MICHAEI-
The purpose of clinical evaluation is to ident@
V
&
RESULTS
KIRK-SMITH
changes
measures
are flawed, then no amount
due to a treatment. The Results section is where these
sophisticated
statistical
of
analysis can sort out these
changes are presented as data summarised in tables,
problems.
grafihs and plots, so that any changes and their extent
measures
can be easily seen. Statistical tests are used to conjirm
analysis of results is likely to be straightforward.
that these changes are due to treatment and not due to
data have been collected
chance. The choice of test depends on factors such as the
what to do with them,
measurement scales and data distribution and these are
being planned and designed after it has been done - not a good idea. To avoid this, it is
outlined. Finally, the meaning of the results and their ramifications are given in th,eDiscussion section.
strongly
On the other and design
recommended
hand,
of a clinical
evaluation
then the If
and one does not know
then the study is effectively
that the results section,
with mock graphs and tables, the planning stage.
The main purpose
if the Aims,
are well planned,
etc., be drawn up at
is to
find out whether a therapy or treatment causes a change.in the condition of the patients when compared other
to no treatment
treatments. The design
at all or compared
of the study ensures
can be sure that any change treatment Results
alone
section,
(or lack of change) Being
The
has actually happened
not is the key issue to be addressed There are two aspects first, visual inspection statistical
evidence
of the visual inspection.
or
in the Results
to identifying of the results
secondly,
no interpretation
factors.
is where any change
and its #extent is reported.
sure that a change
section. change;
that one
was due to the
and not to other in contrast;
to
analysis to confirm There
and
the should
be
of the data in the Results
section; it should just contain results, nothing more. What is interesting in the results or what
The first step in identifying
the results might mean has no place here. go into the Discussion section.
been a change due to the treatment is to summarise and display the results in a form that
It is important
These
to note that if the design
or
whether
will allow an easy visual inspection
there
has
of any changes.
AS well as using tables, are by using a graph (Fig.2). scores,
The vertical see later)
good ways of doing
(Fig.1)
or a aero-bar
lines represent
this plot
the spread
in which the effects
of
of the
If an obvious visible difference or change is seen between the control and treatment groups, or the baseline
and intervention
plotted,
then there is likely to be a real difference
periods when the data are
change.
However, the change may not always be
or
obvious, or people might disagree if there is a real change. For example, treatment
in a multi-subject
design, a
group of 24 patients might have a average
improvement
of 42 (say, on a pain scale), with upper
and lower scores in the group of 38 and 46. The control
group of another
improvement Although
24 patients might have an
of 38 with scores between 34 and 42.
the average scores differ by four points, it
might be argued that another control group, being different people, might just as well have had a scores of 42 like the treatment
group. So maybe the
difference between the treatment and control groups is just due to random variations in the particular people selected for each group. Similarly, in a single case example, a patient’s average baseline temperature might be 94”C, varying between 92 and 98°C. Then during treatment treatment measures.
are contrasted Spreadsheets,
statistical
packages
professional These
can be used to produce
quality tables,
should
descriptive “Table
graphs
be accompanied
texts drawing
and
plots.
mood scores
purpose
may be to see the
The question
of a clinical
influence
of the
age and weight),
of other
factors
These range
factors
treatment range,
on
often summarised
e.g. how
if it is thought
(like
then the treatment
and down together,
effects
of the
control
due
to
and age go up
or are said to be “correlated”.
in a table or presented plot (see Fig.2).
(i.e., probability)
by
of whether
between the treatment
and
groups is more likely to be due to the
treatment might
If the influencing factors can be classified then the average scores for patients in each group can be displayed as numbers graphically in a aero-bar
the chance
the average difference
the treatment for each patient’s age would be plotted. If a line can be drawn through the (Fig.S),
Statistical tests help sort out this question calculating
or
by a continuous
e.g. patients’ ages, then the change
as the “standard
might the difference between the two groups also be due to this same natural variation?
classes e.g.
that the effect
and calculated
if the spread of scores within each group is wide,
the first step is to display the results.
might be affected
of patients or
deviation” and given as the vertical bars in Fig.2. So,
of numbers
or might be discrete
the
time. This spread of scores about each average is
may be
male or female, smoking and non-smoking, with and without expectation. For”example,
in both these cases is whether
average score during treatment is really different from the average score with no treatment, because the spread of scores making up both averages overlap
scores
evaluation
in the conditions,
was given.
as a continuous
Again,
e.g.,
e.g. the age, weight or sex of
or a variation
measured
anyway, since the temperature
is varying over a wide range.
and are taken over a limited number
(average = 74) )). A subsidiary
the treatment
sessions the average
to 93, and varying between 93 and 97°C.
One could argue that the small average change
with short
with Group A having the highest
the treatment,
decreases
might have happened
out the main points,
1 shows the average
groups,
patient
with the non-treatment graphics programmes and
or due to the random
differences
one
expect between any two samples of patients.
For example, a statistical test carried out on the difference between the 42 and 38 averages in pain scores might show that with this spread of scores within each group there is less than a 1 in 20 chance (usually written as p< 0.05) of there being this difference in averages if two randomly selected control groups were compared. It is conventionally accepted that a difference with a 1 in 20 chance
(called a “significant
difference”)
is probably due to the treatment
and not
“parametric”
or “non-parametric”
due to chance differences
between the two groups.
used. Parametric
Similarly, if the difference
between the treatment
require
ant
statistical test is
tests (tests in Section
that data give a bell-shaped
‘normal distribution’)
C, Table 1)
curve (or a
when they are displayed. Also,
control group averages was 1 to 100, unlikely to be due to chance, then this is called “highly significant”,
the steps or intervals on the measurement
and one can be even more certain that the difference
must be equally apart. These are called interval and
is due to the treatment.
ratio scales and cover measures such as blood
However, it is important
note that even if a result is significantly different
it may not be clinically useful, since a very
reliable but small improvement treatment
to
(or reliably)
due to a complicated
be more trouble than it is worth.
pressure,
rash area and temperature,
measures which can be counted, cigarettes
scales
as well as
e.g., the number
of
smoked by each person on a treatment
programme. If the data are not “bell-shaped” when displayed and small samples are involved, a common
situation
in small scale clinical trials, then a non-parametric test (tests in Sections A and B, Table 1) might be considered.
Non-parametric
tests are also used when
interval of ratio scales are not being used, i.e. for nominal
or ordinal measurements.
Nominal measures are those that are counted or classified into different groups or categories, e.g., the numbers
of patients which are ‘yes’ or ‘no’, or
‘red’, ‘blue’ or ‘black”. Ordinal measures cover data that can be ordered magnitude, Statistical tests are used in a similar way to assess whether two factors are actually correlated going up and down together) correlation association
(i.e.,
or whether the
is due to chance. The degree of (how close the points are to the line in
Figure 3) is calculated and the probability alone is determined
as a ‘Correlation
of getting this value by chance by the number
pairs) used in the calculation,
of points (or
so that ‘significant’
and ‘highly significant’ correlations as in differences
Coefficient’;
can be confirmed
between groups.
or ranked in order of
e.g. “High”, “Middle” and “Low”, and
where the steps between scale points are not equal. Psychological
measures are often ordinal, e.g. scales
such as ‘strongly agree - agree - not strongly disagree’. Psychological might be best regarded
sure - disagree -
scales with numbers
as ordinal as well since the
distances between points on a scale may not be equal e.g., on a l-10 “relaxation” scale we may not be sure that the distance between 2 and 3 is the same in psychological
terms as between 9 and 10.
‘Related’ refers to whether used are matched
the samples of data
in some way, i.e., by using a patient
as their own control
(e.g., before
and after
treatment) or by pairing two subjects who are as alike as possible and then allocating them to different treatment The best way to select an appropriate to get advice from a statistician presenting
statistical test is
or research
them with the aims, measures,
advisor by
groups. ‘Unrelated’ refers to when patients
are allocated
randomly
to treatment
and control
groups.
design and
results sections already drawn up before any data are collected
(e.g. in the format described
in the first
article of this series). This will allow them to assess quickly which statistics are most appropriate. General
Statistical tests are usually calculated
issues to do with choice of tests will be considered
although
here. Standard
textbooks
on statistical analysis should
be consulted for detailed explanations about individual tests and how they should be used. Table 1 gives examples of some common tests used to determine changes and differences in data. The headings on this table will now be explained. The ‘shape’ of the data and the type of measurement
scale used determines
whether a
on computer,
they can also be done by hand or
calculator. A spreadsheet
can be used to calculate
statistics (Soper and Lee 1990),
and most PC’s have
spreadsheet software, but calculations need to be double checked for accuracy (e.g. in case of the spreadsheet
rounding
wrong formulae).
up figures
or having the
The simplest approach
is to use
statistical packages. Most educational establishments now have large and easy to use statistical packages,
e.g.
This concludes the series of articles on planning clinical
Minitab, SPSS (e.g. Bryman and Kramer 1990))
Systat, Unistat and StatXact. parametric facilities tests).
and parametric
(StatXact There
evaluations. The purpose of the series is to give an insight
These all have non-
into good practice in conducting research and into the
tests and good ‘help’
concentrates
issues underlying it rather than giving comprehensive
on non-parametric
are also smaller, more limited
guidance. It is hoped that the articles will encourage
(but
readers to consider evaluating their treatments and to
cheaper or free) statistical programmes, many of which are available as ‘public domain’ programmes.
explore research methods and issues more deeply. NOTE: Dr KiykSmith will be pleasedto collaboratewith readerswho are interestedin eoaluating theirtreatmentsand also to discussany aspectsof this serieswith readers.He can be
The Results section is followed by the Discussion
contactedat the UniversiQof UlsteratJordanstownon the Tel &
section
enaailnumbersprinted on the titlepage.
(which can also include “Conclusions”
and
‘Recommendations’). This gives your interpretation of the results and their ramifications. Detailed results, e.g., numbers,
should not be repeated,
but should be
mentioned in summary form, e.g., “The unexpectedly high mood ratings in Group A might be due to....“.
Typical points
in a Discussion are: s* ??Interpretation work mentioned
Kratochwill, TR. (1992) Single case researchdesign and of the results in terms of past
in the Introduction,
e.g. where
agrees or differs from previous findings and ideas and why this should be. ??Drawing attention to interesting
or surprising
aspects of the results, and their implications. ?? ??
Bryman, A. and Kramer, D. (1990) Quantitativedata analysisfi social sciences.Routledge, London.
that might be covered
Stating the limitations of the study. Giving clinical recommendations that arise
from the results. ?? Suggesting future research that should be done.
analysis. Lawrence Erlbaum, Hillside, NJ. it
Pilcher, D.M. (1990) Data analysti,fmthe helping Sage Pubs., London. Siegel, S. and Castellan, NJ. (1988) Nonparametric
professions.
statisticsfm the behaviouralsciences.McGraw-Hill International Editions, New York. Soper, J.B. and Lee, M.P. (1990) Statisticswith Lotwl 123 (2nd.Ed). Chartwell-Bratt Ltd., Sweden. Yamold, P.R. (1992) Statistical analysis for single case designs. In Bryant, F. et al. (Eds.) Methodologicalissues in applied social psychology. Plenum Press, N.Y