CHAPTER 18.1
An Introduction to Data Visualizations INTRODUCTION TO DATA VISUALIZATIONS—OVERVIEW Data is necessary for business. When data is put into a visualization, they can be used to tell a story. Telling a story is a sequence of events that can show past, present, and future. Using visualizations to tell a story about data can show patterns, trends, and relationships to focus on what is important. A visualization can also be used to enable discovery of new information by making it easier to understand the data and the different dimensions hidden in the data. Visualizations are changing the way we tell a story about the data to provide better information, knowledge, and insights. n Example A hotel collects a large amount of data through guest surveys. By combining guest survey data with review data from the Internet, a visualization is created to better understand where hotel improvements are needed. By observing trends or patterns in all the data collected from the hotel guest surveys over time, additional insights about the hotel operations may be gained. Using visualizations allow the hotel manager to explore the data and show collectively insights across all data that tell a story where the hotel may have some strategic opportunities before business is lost to competitors. n
Fig. 18.1.1 shows where a visualization can help along the path to gaining deeper insights in the data for decisions and actions that provide business value. Data can take opinions and turn them into facts. Fact-based decision-making requires not only good data but also the ability to turn the data into useful information and knowledge through analysis. Many decision-makers are not in the role of analyzing or working with data; they require data to be presented in a format for them to make decisions. Visualizations allow data to be explored easily or to be presented in a way where they can be better understood for Data Architecture. https://doi.org/10.1016/B978-0-12-816916-2.00052-8 © 2019 Elsevier Inc. All rights reserved.
381
382
CHAPTER 18.1:
An Introduction to Data Visualizations
Data
Information
Knowledge
Insights
Necessary for the business
Organized for a purpose
Awareness
Accurate understanding
Content Facts Digital format Structured & unstructured
Context Meaningful patterns & trends Data and business relationships Assumptions
Evaluation Abstract reasoning Problem solving Decision making
Value Future growth Strategic opportunities Positive disruption
FIG. 18.1.1 Using visualizations for data to insights.
decisions to be made. When visualizations are created correctly, good decisions can be made that provide business value.
PURPOSE AND CONTEXT A good visualization is one that is understood by the audience and meets the purpose for creating the visualization. The purpose may be to show analytic results to executive management, to show social media trends to educate the public, or to show findings in the data to improve business performance. If the purpose is to improve business performance, a dashboard can be created through the use of visualizations. The outcome of a good visualization will be improved collaboration, decisions, and actionable insights. Visualizations can be used to explore the data or to tell a story. They can be simple or more complex depending on the story to be told and the data available. Selecting the appropriate forms for the story to be told is important to make it easy to understand by the audience. It is also important to add additional details to the visualization such as labels to ensure it is interpreted correctly by the audience and the proper context is understood. Fig. 18.1.2 is an example of a bad visualization due to missing titles. Although this visualization includes a scale, it is not clear what is being compared and can be misinterpreted or misleading.
VISUALIZATION—A SCIENCE AND AN ART Telling a story about the data is both a science and an art. Selecting the right colors in the visualization can have an impact on how the story is interpreted. Certain colors such as reds and greens should be avoided as indicators as readers who are color-blind may not tell the difference in the colors. Both shapes and colors should be used. Colors may relate directly to a business, or certain colors can trigger different emotions in a visualization. Blues can have
Step 1: Define
TransU 26.1% Experian 40.4%
Equifax 33.5% Experian Equifax
TransU
FIG. 18.1.2 Bad visualization showing missing titles.
a calming effect and greens can have a feeling of safety, where reds can generate danger or energy and purple can generate feelings of power or luxury. Shapes, colors, different color hues, and appropriate font sizes should be considered to present the story in a way that the audience can easily understand it in the right context. Too much information should also be avoided, so the story presented in the visualization is clear and not cluttered as shown in Fig. 18.1.3.
VISUALIZATION FRAMEWORK A framework or methodology should be used to create a visualization that is interpreted in a way that brings value to the audience. Too often, a developer has a story in mind, but without using a clear methodology, the solution is not interpreted in a meaningful way or with the right context. Poor decisions can be made if the context is not clear or if the purpose for the visualization was not well defined in the beginning. Fig. 18.1.4 shows a framework that is easy and effective to use when creating a visualization.
STEP 1: DEFINE The first step to create a good visualization is to define what problem needs to be better understood through analyzing and presenting the data in a visualization solution. This step involves understanding the purpose for the visualization, and who will have access to view or interact with the visualization
383
384
CHAPTER 18.1:
An Introduction to Data Visualizations
State British Columbia Minnesota Connecticut South Carolina Alabama Utah Wisconsin Nevada Missouri Indiana Ontario Oregon Massachusetts
California 14.7%
Florida 8.5%
6.3%
Maryland Tennessee Colorado
Texas
6.1% New York
Ohio Washington Virginia Arizona Pennsylvania Michigan
Georgia Illinois New Jersey North Carolina
FIG. 18.1.3 Bad visualization example showing clutter.
Step 1: Define
Step 2: Data
Step 3: Design
Step 4: Distribute
FIG. 18.1.4 Visualization framework.
when it is complete. Different roles in the organization may understand or use the results in different ways. Table 18.1.1 shows some examples of different roles to consider before creating a visualization to understand the business need or purpose. The define step also considers the purpose for the visualization to meet the needs of the audience. Will the visualization be used to inform or educate the audience, or will it be used to influence a decision? Is there an immediate problem to be solved, or is the purpose to explore the data to provide more insights for strategic decisions? To answer these questions, it will be important
Step 2: Data
Table 18.1.1 Example Visualization Audience Roles and Actions Role
Purpose or Action
Executive Chief data officer (CDO)
Corporate strategy decisions Influence corporate strategy and define data management strategy Understand performance Immediate response Inform or educate
Business manager Data analyst Customers or prospects
for the visualization designer to meet with the audience to understand the business need or purpose at the beginning. n Example Hotel staff tells management new mattresses are needed because of guest feedback. The hotel manager needs to decide where to invest in hotel improvements. The visualization designer meets with the hotel manager to understand the business problem. Data are collected from guest surveys and online comments, and a visualization is designed to look at guest sentiment over time. The visualization is used to explore the data and to determine if bed comfort or mattress complaints are a problem compared with other issues. The visualization designer presents the story by showing different visualizations with a recommendation to the hotel manager who can see clearly where funding is needed to improve guest feedback and reduce loss of business. n
STEP 2: DATA The second step to create a good visualization is to understand the data to be used for the visualization. Creating a visualization should be relative to the purpose as defined in the first step. Understanding what type of data is available, how much data are available, and if the data available can tell the right story through a visualization is also important.
Types of Data When it comes to visualizations, data can be categorized into different types. The most common groupings are known as structured or unstructured. When data are put into a workable format, such as a table with rows and columns or a database, it is considered structured. Unstructured data include data that do not fit into a standard workable format and may include data such as text or comments. When working with unstructured data to create a visualization, additional work may be needed first to put the data into a workable format.
385
386
CHAPTER 18.1:
An Introduction to Data Visualizations
Data Sources Too often, companies are data-rich but information-poor. This is usually the case when there are a lot of data, but they reside in many different places and don’t integrate well to be useful. For example, data may be in a spreadsheet, text file, or database. To create a visualization, data can be gathered from many different sources, but it’s important to understand how the different data sets may be related. Not all data gathered may be used or important that can be determined when creating the visualization. Data sources may be internal to the company or external, such as publicly available data. Depending on the visualization software used, there might be additional data provided to enrich the visualization such as maps. An example of using public review data from the Internet and combining it with Maps using Qlik Sense1 is shown in Fig. 18.1.5. This example shows higher volumes of data by location on a map using bubble size. Other examples of data sources include the following: • • • • • •
Operational applications Cloud systems Files (such as Excel and comma-separated values (CSV)) Time-tracking systems Scanning systems E-mails
Credit Bureau reviews by consumer location
FIG. 18.1.5 Data visualized using Qlik Sense map feature. 1 Qlik Sense is an analytic tool that can be used for creating visualizations. https://www.qlik.com/us/ products/qlik-sense
Step 2: Data
• Customer call centers • Surveys • Internet
Data Organization The data must be organized to create a visualization. This means data must be put into a workable format. Most tools for creating a visualization provide detailed information how to manage data in the application or how to connect different data sources. Best practice will require the data to be organized into a rows and column or table format. Each value in the table should be the same unit of measure. For example, Table 18.1.2 shows airline flight data in a row and column format and having the same unit of measure. When dealing with time data, the time format must also be consistent. For example, dates should be in a consistent format such as MMDDYYYY to be visualized correctly. Table 18.1.2 Airline Flight Data Organized in a Row and Column Format Year
Airline
Domestic Flights
International Flights
Total Flights
2017 2017 2017 2017 2017
Southwest American Airlines Delta United JetBlue
1,313,573 886,803 917,231 580,293 291,995
34,308 193,145 144,295 167,578 62,369
1,347,881 1,079,948 1,061,526 747,871 354,364
Depending on the story to tell, skills and knowledge in statistics may be needed. More complicated visualizations can use calculations to show the results of the analysis. Although visualizations can tell a story using good data, they can also be used to distort reality by presenting the data in different ways. When using line or bar charts, use caution not to distort the data by truncating the bottom of the line or bar chart where differences between the data points appear larger. Also, use caution with scales, such as different size bubbles to ensure they are at the correct scale for comparisons.
Data Quality Data quality is important for a good visualization. Good data include data that are complete, clean, not questionable or conflicting, and valid. Quality data can lead to better decisions and better visualizations. There are different dimensions of data quality to be considered including the following: • • • • • •
Accuracy—correct values Completeness—no missing values Consistency—same unit of measures or time format Integrity—data that are reliable Timeliness—data relevant to the time period Uniqueness—removal of duplicates
387
388
CHAPTER 18.1:
An Introduction to Data Visualizations
• Validity—data that are valid and not made up • Accessibility—data that are accessible with permissible use Data can be collected from many different places. Before designing a visualization, it’s important to understand the data that will be used. Data can be structured, such as a customer name and location, or unstructured, such as a customer comment or phone call transcribed to text. When collecting the data, it’s important to understand how different data sets are related. For example, if structured customer data and unstructured customer comments are going to be used, then how are they related? What will be communicated through a visualization and what kind of story will be told? By understanding these questions, then the right type of visualization can be used.
STEP 3: DESIGN The concept of using a visualization to represent data has been around for hundreds of years. Today, with the advancements in technology and business intelligence (BI) technology capabilities, there are many tools available to help create a visualization. Technology has made it possible to process high amounts of data quickly. Technology may continue to advance capabilities to create a visualization—perhaps through audio describing what a user wants to see or through machine learning. No matter where we are going with the creation of a visualization, there are fundamentals that are important to understand. When it comes to design, the most important fundamental is to ensure the context of the visualization is understood by the user. Before the design step, it’s important to have followed the methodology and have the define and the data steps understood. Choosing the appropriate chart requires an understanding of the data properties and purpose for the visualization.
Forms of Visualizations When the business need or problem is understood and the data have been gathered, the visualization can be designed. There are many different forms of visualizations that can be used depending on the data, but choosing the right visualization to improve the user experience in telling the story is important. All visualizations should include not only the visual that represents the data but also additional information such as labels and text so the audience can understand the content and the context. Table 18.1.3 shows some basic forms of visualizations that can be used. Some of these charts can be enhanced; for example, a time element can be used for a bubble chart to show changes over time. Examples for some common basic charts will be discussed. However, there are many different forms of visualizations that should be reviewed before designing a visualization.
Step 3: Design
Table 18.1.3 Forms of Visualizations Audience Ease of Interpretation
Visualization Form
Number of Categories
Number of Numerical Variables
Number chart Pie chart
1
1 1
Bar chart (basic) 1
1
Display Easy Proportion Easy comparison Showing exact values Easy
Multiple
1 or 2
Compare categories
Easy
Purpose
Bar chart (grouped side by side) Bar chart (Stacked) Line (single)
Multiple
1
Compare categories
Easy
1
Trends over time
Easy
Line (multiple)
Multiple Multiple
Scatter chart
0 or 1
2
Bubble chart
0 or 1
3
Compare multiple categories over time Comparing variables and geospatial analytics Relationships and correlations between numerical values Relationships and correlations between numerical values
Difficult
Maps
1 + Date variable 1 + Date variable Multiple
Difficult
Average rating or score % of negative sentiment by company Top consumer complaints about Equifax in a given period of time Compare hotels grouped by hotel ratings Compare hotels by on line customer review sentiment Sales over time Consumer sentiment over time for each credit bureau Location and volume of customer complaints
Difficult
Relationship between cancer rates and country
Difficult
Comparing airlines by assets, revenue, and profit
Number Charts The most common visualization is a simple number chart. A number chart as shown in Fig. 18.1.6 is a good visual for a dashboard to easily communicate any total such as a count, a percentage, an average, or a dollar amount. Trend indicators can also be used in a number chart but should represent the same period of time (such as annual, quarterly, daily, or monthly).
FIG. 18.1.6 Number chart.
Example
389
390
CHAPTER 18.1:
An Introduction to Data Visualizations
Pie Charts Pie charts have been around for hundreds of years to show parts of a total relationship over a static period of time (such as a slice of the pie vs. the whole pie). Pie charts are a simple way to visualize simple comparisons for a single category; however, they do not work well to compare the size or segment across multiple pie charts. A pie chart splits a population of data for a single category into segments, and the total of all the segments equals 100%. If there are too many segments, then pie charts do not work well as they can be difficult to label or to show the difference in proportions. Also, a pie chart can take a lot of space on a dashboard or report. Fig. 18.1.7 shows an example of a pie chart where the category is ratings for a hotel. Ratings are segmented 1 through 5, and the pie chart shows the percentage of each segment.
Embassy Suites Galleria Hotel Ratings 5 = high, 1 = low 2 5 14.6% 25.0%
1
16.7%
22.9% 20.8% 3 4
FIG. 18.1.7 Pie chart.
Bar Chart A bar chart is used for comparison ranking across one or multiple categories. There are different types of bar charts, and choosing the best one will depend on the data available. A simple bar chart is easy to interpret and can be used to show totals or trends for a single category. Fig. 18.1.8 shows an example of a simple bar chart.
Step 3: Design
Airline flights in 2017 — All airports 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 Southwest
American Airlines
Delta
United
JetBlue
Data source: Bureau of Transportation Statistics T-100 Segment data
FIG. 18.1.8 Bar chart.
Stacked Bar Chart A stacked bar chart can be used to show totals for a single category or to compare categories when there is more than one. For example, Fig. 18.1.9 shows the Number of scheduled flights in 2017 by airline 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 American Airlines
Delta
Sum of domestic
JetBlue
Southwest
United
Sum of international
Data source: Bureau of Transportation Statistics T-100 Segment data
FIG. 18.1.9 Stacked bar chart example.
391
392
CHAPTER 18.1:
An Introduction to Data Visualizations
number of scheduled flights in 2017 for US airlines by domestic and international in one stacked bar chart using public. Stacked bar charts are great to show survey responses or any type of data that has multiple categories.
Horizontal Bar Chart A horizontal bar chart works well if the category labels are long. Although the data presented are similar to the simple or stacked bar chart, using the horizontal bar chart may be selected to better display the labels or for sizing depending on where it will be displayed. A horizontal bar chart may be chosen over the other types of bar charts to better tell the story with the data available (Fig. 18.1.10).
Number of scheduled domestic and international flights by airline
United Southwest JetBlue Delta American Airlines K
500K 1000K 1500K 2000K 2500K 3000K 3500K 4000K 4500K 2015
2016
2017
Data source: Bureau of Transportation Statistics T-100 Segment data
FIG. 18.1.10 Horizontal bar chart.
Line Chart Another basic form of visualizing data is using a line chart. Line charts require time data in consistent intervals. Fig. 18.1.11 shows an example of a multiple line chart where there are multiple categories plotted over time. The variable being plotted is customer sentiment for three different companies. This type of chart is not good for a static visualization, such as a PowerPoint presentation as it can be too cluttered. However, using a visualization tool such as Qlik Sense, the audience can interact and select a custom time range that will allow the user to drill down to see more details. This chart combined with others in an interactive visualization can be very powerful for exploring the data to tell a story.
Step 4: Distribute
Customer sentiment trend*
Sentiment
400
Company TransU Experian
200
Equifax
0 2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
Review date, Company
FIG. 18.1.11 Line chart showing multiple categories.
Bubble chart Another type of chart to compare different variables is a bubble chart or scatterplot. A bubble chart is a good visualization to show in a 3-D format, but it is more complicated and requires more skill to create. Different colors or bubble sizes can be used to show a lot of information in a single chart. A bubble chart looks at data in a snapshot of time. However, by plotting different snapshots of data over different periods of time, this chart can become animated to show changes through data in an interesting form.
STEP 4: DISTRIBUTE A data visualization is a way to tell a story through a graphic representation of the data and a way to share the story among both technical and nontechnical people. The last step when the visualization is complete is to distribute the visualization. There are many ways a visualization can be shared or distributed. It’s important to consider this step before you design your visualization as the purpose will define how it should be distributed. Is the purpose for the visualization to inform or to allow data discovery? Will your audience view only or interact with the visualization to discover insights?
Purpose: To Inform or Educate To inform or educate the audience, the story should unfold by showing the data and visualizations in an order that tells the story. For example, if data are collected to understand customer sentiment about their hotel stay, a visualization can be created to show the customer sentiment over time and put into a story format. Consider using data from the past, present, and predicted future to tell stories for the best outcome or decisions. Visualizations can be shared or distributed to inform or educate the audience in different ways that may include the following. PowerPoint: Visualization charts may be copied and pasted to a PowerPoint slide with additional details added to tell the highlight what is being told.
2016
2017
393
394
CHAPTER 18.1:
An Introduction to Data Visualizations
Dashboard: A dashboard is a collection of visualizations aligned to the business goals to be used as a management report. A dashboard can provide an at-a-glance view of key performance indicators (KPIs) and measures to drive action for improvements. Infographic: An infographic is similar to a visualization as it is a visual representation of the data. However, an infographic may contain more images, pictures, and words that are conceptual in addition to data in a visual form. Infographics are usually very focused and used to captivate an audience. They can be a single page or multiple pages in length. Infographics can be a great tool to use for marketing campaigns or to summarize a study to include visualizations.
Purpose: To Interact or Explore If the purpose for the visualization is to explore the data, then an interactive visualization can be valuable. To distribute an interactive visualization will depend on the software used. Most visualization tools have the capability to publish the visualization to the Internet (cloud) so the user can interact and explore the data. With defined user permissions, the user can change different variables, while all charts in the story update. Interactive visualizations are great for the user to do “what if” questions and to see the outcome visually.
DATA VISUALIZATION TOOLS AND SOFTWARE The practice of creating visualizations is rapidly growing just as machine learning, digital facial recognition, unstructured data analytics, and data science are growing. There are many smart and user-friendly tools available for creating visualizations. Selecting the appropriate tool will depend on many factors, including the knowledge, skills, and abilities of the visualization producer. Some features to consider when selecting a tool include the following: • • • • • • • • • • • •
Ease of use Drag and drop capability Ability to connect to multiple data sources Ability to manage the data Open and standard APIs User-friendly development environment Ability to share and collaborate Interactive capability Features that are up to date Scalable Manageable security Nice-looking visual results to fit the purpose
Summary
Here are some of the leading tools on the market today for creating visualizations without requiring detailed programming skills: • • • •
Qlik Tableau Microsoft Power BI Sisense
SUMMARY There is great value in the process of creating and telling a story through visualizations. The visualization framework is the best methodology to use to ensure visualizations are created with the right content and can be understood in the right context. The process of defining the purpose and talking with the audience, collecting the data, designing the visualization in a story format, and distributing the visualization allows data to be more easily understood for the audience to focus on what is important. Using visualizations to tell a story through data is a great way to provide better information, knowledge, and insights. Telling a story through visualizations will continue to be necessary moving forward to enable data to be better understood for more accurate outcomes and decisions.
395