Early Human Development xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Early Human Development journal homepage: www.elsevier.com/locate/earlhumdev
WASP (write a scientific paper) using Excel – 4: Histograms Victor Grech Department of Paediatrics, Mater Dei Hospital, Malta
A R T I C L E I N F O
A B S T R A C T
Keywords: Software Computers Statistics Biostatistics
Plotting data into graphs is a crucial step in data analysis as part of an initial descriptive statistics exercise since it gives the researcher an overview of the shape and nature of the data. Outlier values may also be identified, and these may be incorrect data, or true and important outliers. This paper explains how to access Microsoft Excel's Analysis Toolpak and provides some pointers for the utilisation of the histogram tool within the Toolpak.
1. Introduction
3. Excel analysis ToolPak
Karl Pearson FRS (1857–1936) was an English mathematician and biostatistician. He established mathematical statistics and founded the world's first university statistics department in London in 1911 [1]. His work is still used today in that it is of primary importance to understand the properties of data that has been collected and be able to objectively describe it in a universally comprehensible manner that adheres to accepted conventions. This allows the presentation of data and its submission as an abstract for an oral presentation or a poster at a meeting. If research is not published, it is almost as if it has not been done, so this work should eventually lead to a publication in a peerreviewed journal.
Excel statistical functions are available using the Analysis ToolPak. This may not be available in Excel by default in which case it may be easily activated, a one-off procedure. Click Data Analysis in the Analysis group on the Data tab. If the Data Analysis command is not available (at the far right), click File tab, Options, Add-Ins category. In the Manage box, select Excel Add-ins, click Go, check the Analysis ToolPak check box and click OK. The data analysis ToolPak is now available in the Data tab.
2. Data conversion Data can be shown pictorially as a graph, such as a histogram or a boxplot and whisker. If data is made available as a set of values separated by delimiters, such as commas, Excel can convert this data to individual values in cells (a form of tabulation). This is done by copying and pasting these values into a cell, and from the data tab, convert text to columns (choose Delimited – Commas options), so that: Urinary concentration of lead in 16 rural children (μmol/24 h) [2]. 0.2, 0.3, 0.6, 0.7, 0.8, 1.5, 1.7, 1.8, 1.9, 1.9, 2.0, 2.0, 2.1, 2.8, 3.1, 3.4 Becomes Table 1: This row can then be easily converted into a column by selecting the values, copying and using Paste-Paste Special-Paste values and transpose options (Figs. 1 and 2), to obtain Table 2:
4. Histogram function If a series of cut-off values are required, Excel has a histogram function that allows the apportioning of data using preset values as well as the creation of an actual histogram. The cut-off values are input into a column and this is referred to by Excel as a “bin”. In this case, data and bin are shown in Table 3 [2]: The creation of a histogram is via Data Analysis, Histogram and choose options (Fig. 3). The Labels check box is ticked so as to inform Excel that the first row in each column is a header. Tick chart output. Output range specifies where the analysis and graph will appear. The output is a frequency distribution that is plotted as a histogram (Fig. 4). With some manipulation via the Chart Elements Button (which appears as a plus sign to the top right of a graph whenever a graph is selected), a more presentable histogram can be obtained (Fig. 5). If more data is collected (urinary concentration of lead in μmol/24 h [2]) as per Table 4 and plotted, a graph such as Fig. 6 will be obtained.
E-mail address:
[email protected]. https://doi.org/10.1016/j.earlhumdev.2018.01.005
0378-3782/ © 2018 Published by Elsevier B.V.
Please cite this article as: Grech, V., Early Human Development (2018), https://doi.org/10.1016/j.earlhumdev.2018.01.005
Early Human Development xxx (xxxx) xxx–xxx
V. Grech
Table 1 Urinary lead concentration in rows (μmol/24 h). 0.2
0.3
0.6
0.7
0.8
1.5
1.7
1.8
1.9
Table 2 Urinary lead concentration in column (μmol/24 h). 1.9
2
2
2.1
2.8
3.1
3.4
0.2 0.3 0.6 0.7 0.8 1.5 1.7 1.8 1.9 1.9 2 2 2.1 2.8 3.1 3.4
Table 3 Urinary lead concentration (μmol/24 h) and bin.
Fig. 1. Paste special options.
Data
Bin
0.2 0.3 0.6 0.7 0.8 1.5 1.7 1.8 1.9 1.9 2 2 2.1 2.8 3.1 3.4
0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 2.4 2.8 3.2 3.6 4 4.4
5. Values in categories – weighted means Data may occasionally be available in categories already. In this case, a weighted mean must be calculated by taking the midpoint of the range for each range, multiplying by n for each range, adding the total and dividing that by the total number of observations, as shown in Table 5 (formulas used in calculations are shown at the bottom).
6. Other basic data descriptors Once data collection is complete, the descriptive statistics exercise includes the acquisition of basic descriptors which may include standard deviation and quartiles. The next paper provides pointers as to how to do this in Microsoft Excel and describes the relationship between the two.
Fig. 2. Paste special options.
2
Early Human Development xxx (xxxx) xxx–xxx
V. Grech
Fig. 3. Histogram options.
Fig. 4. Initial Excel histogram output.
Fig. 5. As Fig. 4 with further customisation.
3
Early Human Development xxx (xxxx) xxx–xxx
V. Grech
Table 4 Urinary lead concentration (μmol/24 h) – larger sample.
Table 5 Calculation of weighted means.
Urinary concentration of lead in μmol/24 h
n
0–0.4 0.4–0.8 0.8–1.2 1.2–1.6 1.6–2 2–2.4 2.4–2.4 2.4–2.8 2.8–2.8 2.8–3.2 3.2–3.2 3.2–3.6 3.6–3.6 3.6–4 4–4.4 > 4.4
2 7 10 16 23 28 19 16 11 7 19 16 11 7 1 0
Urinary conc. lead (μmol/ 24 h)
Midpoint
n
Multiplied
0–0.4 0.4–0.8 0.8–1.2 1.2–1.6 1.6–2 2–2.4 2.4–2.4 2.4–2.8 2.8–2.8 2.8–3.2 3.2–3.2
0.2 0.6 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 Totals Weighted mean Totals
2 7 10 16 23 28 19 16 11 7 1 140
0.4 4.2 10 22.4 41.4 61.6 49.4 48 37.4 26.6 4.2 305.6 2.18 =SUM(D2:D12)
Weighted mean
= SUM (C2:C12)
=D13/C13
The last two rows demonstrate the equations used in the rows above.
Fig. 6. A larger sample as per previous two graphs.
publication and there has been no significant financial support for this work that could have influenced its outcome.
Acknowledgments The inspiration for this series of papers arises from Thomas Douglas Victor Swinscow's original series of papers in the 1970s entitled “Statistics at Square One” [2] as well as the Excel-based statistics talks prepared for the international Write a Scientific Paper course (WASP – http://www.ithams.com/wasp). [3,4]. I would also like to thank Dr. Neville Calleja (Director at Department of Health Information & Research - Department of Health Information & Research, Ministry of Health, the Elderly and Community Care) for reviewing these manuscripts.
References [1] G.U. Yule, L.N.G. Filon, Karl Pearson. 1857–1936. Obituary notices of fellows of the Royal Society, 2 (1936) 72–110. [2] T. Swinscow, Statistics at square one, Br. Med. J. 1 (6020) (1976) 1240. [3] V. Grech, WASP – write a scientific paper course: why and how, J. Vis. Commun. Med. 40 (3) (Jul 2017) 130–134. [4] V. Grech, S. Cuschieri, Write a Scientific Paper (WASP) - a career-critical skill, (2018), http://dx.doi.org/10.1016/j.earlhumdev.2018.01.001.
Conflict of interest statement There are no known conflicts of interest associated with this
4