Guidelines for the presentation of numerical tables

Guidelines for the presentation of numerical tables

Research in Veterinary Science 1995, 58, 1-4 This paper is the second of a series designed to help veterinary scientists to make the most effective u...

386KB Sizes 0 Downloads 24 Views

Research in Veterinary Science 1995, 58, 1-4

This paper is the second of a series designed to help veterinary scientists to make the most effective use of statistical techniques. The first paper was published in volume 51, page 229.

Guidelines for the presentation of numerical tables K. RYDER*, Statistics Department, AFRC Institute for Arable Crops Research, Rothamsted Experimental Station, Harpenden, HerOeordshireAL5 2JQ

THE main criterion for a good table is that the reader should be able to see the main features in the table without having to search for the salient points. There may be several ways of preparing a table, some of which will be more readable and therefore more understandable than others and this raises the question of which aspects influence whether a table is easily understandable. There are several components to a table which have to be considered: the order in which the factors are displayed, ie, rows or columns; the number of digits in each number; the type of information being given, ie, subsidiary facts such as standard errors; the use of superscripts and subscripts with attached footnotes; and the layout of the values including the use of lines and spaces. The main objective when preparing a table is that it should be concise, giving sufficient information for the reader to see clearly the results being presented and yet be self-explanatory so that the reader does not have to refer to the text for the units the numbers are expressed in or for a description of the row and column headings. Tables can contain too much information and it becomes difficult to follow the arguments given in the text; conversely, there have been tables with so little information but with plenty of blank space and asterisks that the tables are almost redundant. There may be conflicting points to be made in designing a table and the ideas discussed later should not be taken as the law. Many joumals have their own style for publishing tables which must be followed but there should be sufficient scope within that style to help produce a better table.

Order of factors and variables The order in which the different factors and variables are presented in the table will depend on how the data are to be discussed in the text. If the values of two variables are to be compared it is advisable to have them near to each other in the table, either side by side or one beneath the other, the latter being preferable. Table 1 is an illustration of how tables are sometimes presented and has been included to illustrate several aspects of presentation but mainly as an example of how the ordering of the factors can change the way a table can be used. One of the major components of the data is the change in variables over time and it would seem appropriate to have time going across the page. The other two components are twO variables and three treatment groups and the order of presentation does give rise to difficulties in comparing the * Present address: SG4 8SA

11 The Wick, High Street, Kimpton, Hitchin, Hertfordshire

TABLE 1: An illustration of several aspects of presentation for tables Treatment

group 1

2

4

Variable 1

38-75 +

23,42 +

5.97

7.40

7.95

6.86

4.87

Variable 2

72-08 _+ 9.43

54-92 +_ 5-30

35.08 _+ 11.34

37.05 _+ 8.44

25,47 + 4,47

Variable 1

6.17

10,60

45,83

7.33

20.67

+

+

+

+

3

Time (hours) 8 18

2

+ ,

16.92 _+

21.83 _+

32 13.75 +

6.17

3.61

8.21

5,08

7.70

Variable 2

24.75 +

24.50 +

18-17 +

22.00 +

26.83 +

2-83

6.92

1.58

3.27

4.79

Variable 1

24-33 + 9.88

13.00 -+ 7.66

15.33 + 8.72

13.10 5: 7-68

31.83 + 10-38

Variable 2

36.00 +

32.83 +

29.00 +

47.60 ±

60-75 ±

10.97

8.31

8.86

8.34

9.58

values. For example, when comparing the means of vanable 1 for treatment groups 1 and 2, at time 2, the values are 38.8 and 6.2. The values are separated by seven lines, and the reader has to search for the numbers which takes time. If the two values are too far apart the memory forgets the first value while the second value is being found and so the eyes scan the table several times between the two values before the memory accepts and remembers them. With the values in close proximity, the eyes can read both values together which means the reader can quickly progress to the next part. Replacing the standard errors for each treatment group by one for each variable as in Table 2a does help reduce the eye movement and comparing two means of the same variable is made easier. In this example, variable 1 can be taken as a component of variable 2 and the layout in Table 2a allows the ratio of variable 1 to variable 2 to be estimated, but if the two variables are to be discussed separately there is a case for giving the results ,as in Table 2b which is effectively two separate tables put end to end. Sometimes there are several variables to be published and the results are combined into one table as in Fig 1, where the variables are the columns. This allows the reader to compare the relationship between two variables but also makes it difficult to compare values within the same column. In one such table the readers were asked to compare pairs of values which were separated by 12 other fines. If there is no need to directly assess two variables it would be better to give each variable as small sub-tables laid out as in Fig 2. Some of the factors will have a natural order for presen-

2

K. Ryder Variable 1

TABLE 2: Two possible re-arrangements of Table 1

Variable2

Variable3

Variable4

(a) Treatment group t

2

Time (hours) 8 16

4

F

32

a

Variable 1 38.75 Variable 2 72.08

23-42 54.92

16.92 35.08

21.83 37.05

13.75 25.67

2

Variable 1 6.17 Variable 2 24.75

10.60 24.50

45.83 18.17

7.33 22.00

20.67 26.83

3

Variable 1 24.33 Variable 2 36.00

13.00 32.83

15.33 29.00

13.10 47.60

31.83 60.75

SEM Variable 1 11.41 Variable 2 12.99

8.20 10.82

8.28 8.49

6.33 7.27

7-97 6.70

(b) Treatment . group 2 Variable 1

Variable 2

Time (hours) 4 8

1 2 3

38.8 6.2 24.3

23.4 10.6 13.0

SEM

11.41

1 2 3

72.1 24.8 36.0

54.9 24.5 32.8

SEM

12.99

10.82

16.9 45.8 15.3

8-20

8.28 35-1 18-2 29-0 8.49

c

1

F

F a 16

32

21.8 7.3 13.1

13.8 25.7 31.8

6.63 37.0 22.0 47.6 7.27

7.97 25.7 26-8 60-8 6.70

t o r

a c t

2 o r

c t o

F a c t

r A

3 o r

tation, ie, increasing levels of a drug or all combinations of two or more factors. This will determine the way the table is given. Looking at recent issues of this journal, there seems to be little use made of factorial designs and for a discussion on how to present tables for three or more factors see Ryder (1986). Where there is no structure to the factor levels, ranking the levels according to the more important variables, for example, total weight or perhaps daily weight gain would help identify the best and worst and present a more informative picture (Ryder 1986).

Number of digits: 3 significant figures Perhaps the first thing a reader notices about a table is the size of the numbers being given. This to some extent is determined by the variables being reported and the size of the samples taken. Unfortunately, a number of statistical packages used to analyse data present their data to 10 decimal places and some authors feel the need to include as many as 'possible in their tables. For scientific papers, results should be reported in the appropriate sI units with approximately three significant figures. Monteith (1984) gave the following advice: ' w h e n a q u a n t i t y is q u o t e d to t w o or m o r e significant figures, the

choice of unit should preferably allow its numerical component to fall between 1 and 100; but when only 1 significant figure is available, it should'norm~ly lie between 1 and 10.' Three significant figures should in many cases be adequate as the original data would seldom be recorded to more than three digits, and any gain in accuracy from recording extra digits is usually so small compared to the experimental error that little difference is made to the analyses (Riley et al 1983). The mean values given in Table 1 are a mixture of three and four digits because the integer part ranges from 0 to 80. All the figures are given to two decimal places and are known to be the mean of less than 10 observations. The original data were at best measured to one decimal place, but were more likely to be recorded to the nearest half unit

F

4

a c t o r

FIG 1: Variables presented as the columns of one large table making it difficult to compare values of level 1 factor A with values at level 4

so reporting to one decimal place would suffice. For the standard errors it is normal practice to print one more decimal place than those given for the mean to allow the reader to do any rounding after any calculations with the standard errors. For Table 1 the standard errors would be suitable for the means to be given to one decimal place. For internal publications which will be read by administrators you need to follow a different line. Too many digits may hinder the reader who has not the time to look at the precise details. As long as they can judge that the points of your arguments are justified then they will be satisfied. Ehrenberg (1977, 1982) put forward the ideas of 'effective digits'. For example, compare variable 2 in Table 1 for the first time at two hours, that is 72.1, 24.8, and 36.0. To make the comparisons the mind will simplify the numbers by changing them to two digits. In doing that in a report the reader will see 72, 25 and 36 and should quickly judge that treatment 3 is half the value of treatment 1 while treatment 2 is almost a third. If the numbers had been multiplied by 10 then the results would have been 720, 250, and 360. Now look at the figures 4830, 4320, 4270 and 4190. These numbers are, according to Ehrenberg, given to two effective digits. The eye quickly sees that the leading digit is the same and therefore concen~ates on the second and third digits. However, we do have to be careful when working with diverse numbers such as 3580, 967 and 184; rounding

Guidelinesfor the presentation of numerical tables

Factor A

Factor A

F a c t o r

Variable I

Variable 2

Variable 3

Variable 4

B

F a c t o r

B FIG 2: Variables presented in individual tables making comparisons of different factor levels easier

them to 3600, 1000 and 200, to the nearest 100, does mean that you have rounded 184 by a larger percentage than for 3580. For such data rounding each number to its own two digits, 3600, 970 and 180, would be better.

Standard error of the means Most tables usually report the mean values for a set of observations either from a survey or from an experiment, sometimes the coefficients from a regression analysis are included and on rare events the differences between specified treatments. In all cases some indication of the reliability of the means, coefficients and differences should be given. There is a selection of statistics to choose from: standard deviation (SD), standard error of the mean (SEM), standard error of the differences (SED), coefficient of variation (CV) and the least significant difference (LSD); the SD should be given in those papers reporting the parameters of a population. For instance Roberts et al (1990) reported the dimensions (~tm) of 40 Toxocara vitulorum larvae collected from the milk of buffalO cows. They have correctly given the SD for the populations found so that the reader can form the distdbutions of the variables. This enables other researchers to see what variation the population is giving and so they should be able to work out the replication needed for their experiments. The authors made no attempt to compare the mean figures with those of other breeds. When such comparisons are to be made there is a case for reporting the SD for the populations but it may be better if either the standard errors of the means or the standard errors of the differences are given to allow the tests to be checked mentally. For complex experimental work where either split-plot or incomplete block designs have been used there is a good argument to give the standard errors for the different comparisons that can be made. For other designs it may be better to give the standard error of the means which will allow the readers to make their own comparisons and use the appropriate multiplication factors. For the sake of clarity it is, in my opinion, preferable when indicating variations to omit the + signs; all errors are positive in values and Table 1 would be clearer with the + signs omitted. The values instead can be given either in a

3

separate row or column or in brackets after the values. Many people like to give the c v as an indication of the variation. However, this hides the actual size of the variation and readers who want to make their own comparisons have that little extra work to do but there are other reasons for not using it. First, it can be manipulated to look better than it is. For instance if the mean percentage of healthy cows in a number of herds is 80 and the standard deviation is 16 then the c v is 20 per cent. But the mean percentage of the unhealthy cows is 20 and the standard deviation remains at 16 so the c v is now 80 per cent. The results do not look as good. Secondly, if non-linear transformations are made such as taking logarithms or changing from Celsius to Fahrenheit then the cvs could be drastically altered especially if the means on the new scale are close to zero. The LSD should be avoided. The reader should be allowed to make the decision on which significance levels to take. Although many authors will use the 5 per cent, 1 per cent and 0.1 per cent levels other levels such as 10 per cent, 5 per cent and 2 per cent have been used which would lead a reader to make a wrong interpretation. When presenting the coefficients from a regression model the standard error for the coefficients should also be given. Unfortunately, many statistical packages do not give them but the situation is improving. Each line in Table 1 has its own standard errors of the means but are they all needed? Any comparison between the means should be made with the best estimate of the variation available. For this set of data the best estimate would be the pooled estimate from all three treatments at each time. However, what should be done when there are means which are zero with a zero standard error? Here the best estimate would be based on the pooled estimate from the other treatments. The observations for the zero mean are precisely known and should not be used in the pooled estimate. The original 12 lines of table could be reduced to eight, three for each of the two variables, plus one line giving the standard errors for each variable (Table 2a and 2b).

Minimise footnotes, superscripts and subscripts Perhaps some of the main distractions in a table are the galaxy of stars and legions of letters which decorate the means and standard errors. The numerical values in a table have been followed in some instances by up to 10 letters from the results of multiple range tests making the table difficult to read. Footnotes are acceptable especially to highlight differences in how the values were calculated, as for example when a mean is based on fewer data than normal. If an indication of significance is to be given it may be better to give the probabilities in either one extra row or column than have one, two or three stars attached to individual values. The reliance on significance can be so dominant that tables have been produced where the data were excluded. The best arrangement will be the one where the reader can judge the significance of any differences from the information given. Multiple range tests are over used. There are many instances where multiple range tests are inappropriate and where comparisons can be made by using single degree of freedom contracts.

Use white spaces The final topic to discuss is the use of lines and spaces to make a table more readable. The policy of this journal is

4

K. R y d e r

that horizontal lines are used under the table heading, under the column headings and after the main body of the table but before any footnotes. This is acceptable. Too many lines interfere with the appearance of the table and can distract the reader. If there is a need to separate parts of the table such as the standard errors from the means, it is better to use white lines, ie, insert an extra blank line or two between important subdivisions, than to draw a line. Spaces are restful upon the eyes and do ease the look of a table. Too much detail in one table can be very off-putting as can too much white space between columns. Tables are easier to read if the columns can be seen without moving the eyes; so tables with the columns centred will be more acceptable than columns spread out to cover the space available. The objective for designing tables is to encourage the reader to look at your results not to frighten him off by a

mass of type. A little time spent on thinking how the reader will react to your work may be worth more to your reputation than you think.

REFERENCES EHRENBERG, A. S. C. (1977) Rudimentsof numeracy.Journal of the Royal Statistical Society A 140,277-297 EHRENBERG,A. S. C. (1982)A Primerin DataReduction.London,Wiley MONTEITH,J. L. (1984)Consistencyandconveniencein the choiceof unitsin agriculturalscience.Experimental Agriculture 20, 105-117 RILEY,J., BEKELE,T. & SHREWSBURY,B. (1983)How an analysisof variance is affectedby the degreeof precisionof the data.Bulletin of Applied Statistics 10, 18-43 ROBERTS,J. A., FERNANDO,S. T. & SIVANATHAN,S. (1990)Toxocara vitulorum in the milk of buffalo (Bubalus bubalis) cows. Research in Veterinary Science 49, 289-291 RYDER,K. (1986)Numericaltables:guidelinesfor presentation.Journal of Applied Statistics 13, 77-87