WASP (Write a Scientific Paper) using Excel – 2: Pivot tables

WASP (Write a Scientific Paper) using Excel – 2: Pivot tables

Early Human Development xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Early Human Development journal homepage: www.elsevier.com/loca...

1MB Sizes 0 Downloads 105 Views

Early Human Development xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Early Human Development journal homepage: www.elsevier.com/locate/earlhumdev

WASP (write a scientific paper) using Excel – 2: Pivot tables Grech Victor Department of Paediatrics, Mater Dei Hospital, Malta

A R T I C L E I N F O

A B S T R A C T

Keywords: Software Computers, statistics Biostatistics

Data analysis at the descriptive stage and the eventual presentation of results requires the tabulation and summarisation of data. This exercise should always precede inferential statistics. Pivot tables and pivot charts are one of Excel's most powerful and underutilised features, with tabulation functions that immensely facilitate descriptive statistics. Pivot tables permit users to dynamically summarise and cross-tabulate data, create tables in several dimensions, offer a range of summary statistics and can be modified interactively with instant outputs. Large and detailed datasets are thereby easily manipulated making pivot tables arguably the best way to explore, summarise and present data from many different angles. This second paper in the WASP series in Early Human Development provides pointers for pivot table manipulation in Excel™.

1. Pivot tables Data analysis and eventual presentation requires the tabulation of data. Pivot tables are one of Excel's most powerful and underutilised features, tabulation functions that immensely facilitate descriptive statistics. Pivot tables permit users to summarise and cross-tabulate data, create tables in several dimensions, offer a range of summary statistics, and can be modified interactively with instant outputs. Large and detailed datasets are easily manipulated making pivot tables arguably the best way to explore, summarise and present data from diverse angles. For example, the dataset described in the first paper in this series (United States live births for 2007–2013 by state, gender, and month and year of birth) [1] can be conveniently manipulated as a pivot table. This is done by clicking anywhere in the dataset and then clicking the Insert tab – Pivot Table. Dragging the Pivot Table Fields on the right hand side from the top to the desired area below will generate a table as shown in the figure (Fig. 1). Fields can also be nested, e.g. adding Gender under Year in the ROWS section (bottom right of Fig. 2). A useful additional option is the possibility to change what is displayed from the default (Sum of Births in this case) to others such as mean, standard deviation or percentage and so on by clicking on Sum of Births – Value Field Settings. These tables can also be used to count the number of unique values in a field. 2. Calculations Right clicking in the spreadsheet displays a Show Values option that opens a window permitting additional calculations instead of the sum

of births (Fig. 3). This function is useful, for example, in order to check whether the data is all present and accounted for. 3. Filters It is also possible to filter data so as to view only selected parts of the dataset. This may be done in the raw data or in pivot tables. For the former, the option is by clicking the Data tab – Filter. Drop-downs appear next to each header (Fig. 4) and these allow filtering options. The same drop-down options are available at the top left hand side of pivot tables and these drop-downs also permit other useful functions such as sorting. 4. Grouping It may be useful to group data and this may be done for the raw data by selecting it, click the Data tab, Group. It is also possible to do this in pivot tables. In the births sheet, click on any year, click group and the following dialog box allows the user, for example, to group in two year intervals (Figs. 5 and 6). It is also possible to select just a range within the values being studied by using the control key while selecting, prior to grouping. 5. Field renaming A pivot table will generate new results that will be “Sum of” or “Count of”. These can be overwritten with new names for ease of use and reference.

E-mail address: [email protected]. https://doi.org/10.1016/j.earlhumdev.2018.01.003

0378-3782/ © 2018 Elsevier B.V. All rights reserved.

Please cite this article as: Grech, V., Early Human Development (2018), https://doi.org/10.1016/j.earlhumdev.2018.01.003

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

Fig. 1. Creating a Pivot Table.

Fig. 2. Nesting variables in a Pivot Table.

6. Same field twice

replacement character, such as the symbol “-”.

It is possible to add the same field to a pivot table twice, for example, to count unique occurrences of a variable in a field.

10. Autofit

Pivot tables can be cloned so as to create different tables without altering extant ones. A pivot table can be simply copied and pasted and then modified as needed.

By default, each time that a pivot table changes, the columns that contain data are adjusted automatically to best fit the data displayed. This will change the size of cells and graphs along the entire column length. In order to disable this feature, right-click inside the pivot table, choose PivotTable Options and uncheck “AutoFit Column Widths on Update”.

8. Eliminating totals and subtotals

11. Self-contained pivot tables

By default, pivot tables show totals for both rows and columns. These can be eliminated via the Design tab using the totals and subtotal buttons options.

Pivot tables have a pivot cache, a repository of the data used to create the pivot table, so a pivot table will continue to function even if the original data is deleted in the spreadsheet.

9. Empty cells

12. Pivot charts

By default, empty cells will display nothing. To set your own character instead, right-click inside the pivot table and select Pivot Table options. Find the “Empty cells as:”, check it and enter the desired

These may be created using the steps as invoking a pivot chart but in addition to creating tables, Excel also instantaneously plots the resulting pivot table as a chart. This greatly facilitates visual data

7. Pivot table cloning

2

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

Fig. 3. Additional calculations possible in a Pivot Table.

Fig. 5. Grouping data in a Pivot Table.

inspection, as will be constantly reiterated in this series of papers. 13. Pivot table to database Under some circumstances, it may be desirable to reverse-engineer a two-dimensional table that is not a pivot table into columnar format. If one has access to the original data, then one simply resorts to it but if this is not the case and the data is required in original database format, there is no easy way to do this in Excel. Having the data in database format may be useful, for instance, to invoke filtering or reengineer the table in a different format. There is a simple way recreate the original database using macros. A

Fig. 4. Filtering data in a Pivot Table.

3

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

Fig. 6. Grouped data in a Pivot Table.

purposes of acknowledgment). The simplest way to run a macro is from a button on the Quick Access toolbar. Assigning a new button to a macro is a one-time step, as was the Form button in the first paper of this series: [2]

macro is an automated task using software that can be activated when needed. The macro is hereunder and must be copied into a placeholder, a so-called module. The instructions at the end of the macro describe how to copy the macro into a module.

• Click down arrow at the right end of the Quick Access Toolbar. • Choose More Commands. • This opens the Customize the Quick Access Toolbar dialog box. • Instead of Popular Commands select Macros. • Scroll down the alphabetical list to ReversePivotTable. • Click on Add button then click OK.

Sub ReversePivotTable() ' Before running this, make sure you have a summary table with column headers. ' The output table will have three columns. Dim SummaryTable As Range, OutputRange As Range Dim OutRow As Long Dim r As Long, c As Long On Error Resume Next Range("A1").Select Set SummaryTable = ActiveCell.CurrentRegion If SummaryTable.Count = 1 Or SummaryTable.Rows.Count < 3 Then MsgBox "Select a cell within the summary table.", vbCritical Exit Sub End If SummaryTable.Select Set OutputRange = Application.InputBox(prompt:="Select a cell for the 3-column output", Type:=8) ' Convert the range OutRow = 2 Application.ScreenUpdating = False OutputRange.Range("A1:C3") = Array("Column1", "Column2", "Column3") For r = 2 To SummaryTable.Rows.Count For c = 2 To SummaryTable.Columns.Count OutputRange.Cells(OutRow, 1) = SummaryTable.Cells(r, 1) OutputRange.Cells(OutRow, 2) = SummaryTable.Cells(1, c) OutputRange.Cells(OutRow, 3) = SummaryTable.Cells(r, c) OutputRange.Cells(OutRow, 3).NumberFormat = SummaryTable.Cells(r, c).NumberFormat OutRow = OutRow + 1 Next c Next r End Sub

The Form button will now be in the ReversePivotTable macro button. Analogous steps were graphically described in the first paper of this series [2]. To run the macro, click on any cell in the two-dimensional table then click on the macro button. A popup window will ask where the three columns of converted data will be placed. The plotting of data into graphs should be the next and almost mandatory step in all data analysis as part of a descriptive statistics exercise, since it familiarises the researcher with the shape and nature of the data under consideration. Moreover, outlier values may be identified. These may be incorrect data, or true outliers, from which important findings may arise. The next paper in the WASP series will deal with data plotting and provides some pointers for doing so in Microsoft Excel [3]. Acknowledgments The inspiration for this series of papers arises from Thomas Douglas Victor Swinscow's original series of papers in the 1970s entitled “Statistics at Square One” [4] as well as the Excel-based statistics talks prepared for the international Write a Scientific Paper course (WASP – http://www.ithams.com/wasp) [5,6]. I would also like to thank Dr. Neville Calleja (Director at Department of Health Information & Research - Department of Health Information & Research, Ministry of Health, the Elderly and Community Care) for reviewing these manuscripts. Conflict of interest statement

Open the relevant sheet that contains the two-dimensional table. Press Alt-F11 to open what is known as the Visual Basic code editor. The left pane has a so-called Project box, with a tree structure representing the Excel objects in the workbook and any extant Visual Basic code (modules). Right click anywhere in the Project ox and select Insert - Module to create a blank module file. Copy and paste the above code into the newly created blank page. The macro is called ReversePivotTable. Save it and close the Visual Basic editor window (the macro was found online and the original author could not be identified for the

There are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. References [1] Centers for Disease Control and Prevention. National Center for Health Statistics. CDC Wonder. http://wonder.cdc.gov/natality.html (Accessed December 2017). [2] V. Grech, WASP (write a scientific paper) using Excel – 1: data entry and validation, (2018) Previous paper in series.

4

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

[3] V. Grech, WASP (write a scientific paper) using Excel – 3: plotting data, (2018) Next paper in series. [4] T. Swinscow, Statistics at square one, Br. Med. J. 1 (6020) (1976) 1240. [5] V. Grech, WASP – write a scientific paper course: why and how, J. Vis. Commun. Med. 40 (3) (Jul 2017) 130–134.

[6] V. Grech, S. Cuschieri, Write a Scientific Paper (WASP) - a career-critical skill, (2018), http://dx.doi.org/10.1016/j.earlhumdev.2018.01.001.

5