An Introduction to Data Visualizations

An Introduction to Data Visualizations

CHAPTER 18.1 An Introduction to Data Visualizations INTRODUCTION TO DATA VISUALIZATIONS—OVERVIEW Data is necessary for business. When data is put int...

836KB Sizes 0 Downloads 224 Views

CHAPTER 18.1

An Introduction to Data Visualizations INTRODUCTION TO DATA VISUALIZATIONS—OVERVIEW Data is necessary for business. When data is put into a visualization, they can be used to tell a story. Telling a story is a sequence of events that can show past, present, and future. Using visualizations to tell a story about data can show patterns, trends, and relationships to focus on what is important. A visualization can also be used to enable discovery of new information by making it easier to understand the data and the different dimensions hidden in the data. Visualizations are changing the way we tell a story about the data to provide better information, knowledge, and insights. n Example A hotel collects a large amount of data through guest surveys. By combining guest survey data with review data from the Internet, a visualization is created to better understand where hotel improvements are needed. By observing trends or patterns in all the data collected from the hotel guest surveys over time, additional insights about the hotel operations may be gained. Using visualizations allow the hotel manager to explore the data and show collectively insights across all data that tell a story where the hotel may have some strategic opportunities before business is lost to competitors. n

Fig. 18.1.1 shows where a visualization can help along the path to gaining deeper insights in the data for decisions and actions that provide business value. Data can take opinions and turn them into facts. Fact-based decision-making requires not only good data but also the ability to turn the data into useful information and knowledge through analysis. Many decision-makers are not in the role of analyzing or working with data; they require data to be presented in a format for them to make decisions. Visualizations allow data to be explored easily or to be presented in a way where they can be better understood for Data Architecture. https://doi.org/10.1016/B978-0-12-816916-2.00052-8 © 2019 Elsevier Inc. All rights reserved.

381

382

CHAPTER 18.1:

An Introduction to Data Visualizations

Data

Information

Knowledge

Insights

Necessary for the business

Organized for a purpose

Awareness

Accurate understanding

Content Facts Digital format Structured & unstructured

Context Meaningful patterns & trends Data and business relationships Assumptions

Evaluation Abstract reasoning Problem solving Decision making

Value Future growth Strategic opportunities Positive disruption

FIG. 18.1.1 Using visualizations for data to insights.

decisions to be made. When visualizations are created correctly, good decisions can be made that provide business value.

PURPOSE AND CONTEXT A good visualization is one that is understood by the audience and meets the purpose for creating the visualization. The purpose may be to show analytic results to executive management, to show social media trends to educate the public, or to show findings in the data to improve business performance. If the purpose is to improve business performance, a dashboard can be created through the use of visualizations. The outcome of a good visualization will be improved collaboration, decisions, and actionable insights. Visualizations can be used to explore the data or to tell a story. They can be simple or more complex depending on the story to be told and the data available. Selecting the appropriate forms for the story to be told is important to make it easy to understand by the audience. It is also important to add additional details to the visualization such as labels to ensure it is interpreted correctly by the audience and the proper context is understood. Fig. 18.1.2 is an example of a bad visualization due to missing titles. Although this visualization includes a scale, it is not clear what is being compared and can be misinterpreted or misleading.

VISUALIZATION—A SCIENCE AND AN ART Telling a story about the data is both a science and an art. Selecting the right colors in the visualization can have an impact on how the story is interpreted. Certain colors such as reds and greens should be avoided as indicators as readers who are color-blind may not tell the difference in the colors. Both shapes and colors should be used. Colors may relate directly to a business, or certain colors can trigger different emotions in a visualization. Blues can have

Step 1: Define

TransU 26.1% Experian 40.4%

Equifax 33.5% Experian Equifax

TransU

FIG. 18.1.2 Bad visualization showing missing titles.

a calming effect and greens can have a feeling of safety, where reds can generate danger or energy and purple can generate feelings of power or luxury. Shapes, colors, different color hues, and appropriate font sizes should be considered to present the story in a way that the audience can easily understand it in the right context. Too much information should also be avoided, so the story presented in the visualization is clear and not cluttered as shown in Fig. 18.1.3.

VISUALIZATION FRAMEWORK A framework or methodology should be used to create a visualization that is interpreted in a way that brings value to the audience. Too often, a developer has a story in mind, but without using a clear methodology, the solution is not interpreted in a meaningful way or with the right context. Poor decisions can be made if the context is not clear or if the purpose for the visualization was not well defined in the beginning. Fig. 18.1.4 shows a framework that is easy and effective to use when creating a visualization.

STEP 1: DEFINE The first step to create a good visualization is to define what problem needs to be better understood through analyzing and presenting the data in a visualization solution. This step involves understanding the purpose for the visualization, and who will have access to view or interact with the visualization

383

384

CHAPTER 18.1:

An Introduction to Data Visualizations

State British Columbia Minnesota Connecticut South Carolina Alabama Utah Wisconsin Nevada Missouri Indiana Ontario Oregon Massachusetts

California 14.7%

Florida 8.5%

6.3%

Maryland Tennessee Colorado

Texas

6.1% New York

Ohio Washington Virginia Arizona Pennsylvania Michigan

Georgia Illinois New Jersey North Carolina

FIG. 18.1.3 Bad visualization example showing clutter.

Step 1: Define

Step 2: Data

Step 3: Design

Step 4: Distribute

FIG. 18.1.4 Visualization framework.

when it is complete. Different roles in the organization may understand or use the results in different ways. Table 18.1.1 shows some examples of different roles to consider before creating a visualization to understand the business need or purpose. The define step also considers the purpose for the visualization to meet the needs of the audience. Will the visualization be used to inform or educate the audience, or will it be used to influence a decision? Is there an immediate problem to be solved, or is the purpose to explore the data to provide more insights for strategic decisions? To answer these questions, it will be important

Step 2: Data

Table 18.1.1 Example Visualization Audience Roles and Actions Role

Purpose or Action

Executive Chief data officer (CDO)

Corporate strategy decisions Influence corporate strategy and define data management strategy Understand performance Immediate response Inform or educate

Business manager Data analyst Customers or prospects

for the visualization designer to meet with the audience to understand the business need or purpose at the beginning. n Example Hotel staff tells management new mattresses are needed because of guest feedback. The hotel manager needs to decide where to invest in hotel improvements. The visualization designer meets with the hotel manager to understand the business problem. Data are collected from guest surveys and online comments, and a visualization is designed to look at guest sentiment over time. The visualization is used to explore the data and to determine if bed comfort or mattress complaints are a problem compared with other issues. The visualization designer presents the story by showing different visualizations with a recommendation to the hotel manager who can see clearly where funding is needed to improve guest feedback and reduce loss of business. n

STEP 2: DATA The second step to create a good visualization is to understand the data to be used for the visualization. Creating a visualization should be relative to the purpose as defined in the first step. Understanding what type of data is available, how much data are available, and if the data available can tell the right story through a visualization is also important.

Types of Data When it comes to visualizations, data can be categorized into different types. The most common groupings are known as structured or unstructured. When data are put into a workable format, such as a table with rows and columns or a database, it is considered structured. Unstructured data include data that do not fit into a standard workable format and may include data such as text or comments. When working with unstructured data to create a visualization, additional work may be needed first to put the data into a workable format.

385

386

CHAPTER 18.1:

An Introduction to Data Visualizations

Data Sources Too often, companies are data-rich but information-poor. This is usually the case when there are a lot of data, but they reside in many different places and don’t integrate well to be useful. For example, data may be in a spreadsheet, text file, or database. To create a visualization, data can be gathered from many different sources, but it’s important to understand how the different data sets may be related. Not all data gathered may be used or important that can be determined when creating the visualization. Data sources may be internal to the company or external, such as publicly available data. Depending on the visualization software used, there might be additional data provided to enrich the visualization such as maps. An example of using public review data from the Internet and combining it with Maps using Qlik Sense1 is shown in Fig. 18.1.5. This example shows higher volumes of data by location on a map using bubble size. Other examples of data sources include the following: • • • • • •

Operational applications Cloud systems Files (such as Excel and comma-separated values (CSV)) Time-tracking systems Scanning systems E-mails

Credit Bureau reviews by consumer location

FIG. 18.1.5 Data visualized using Qlik Sense map feature. 1 Qlik Sense is an analytic tool that can be used for creating visualizations. https://www.qlik.com/us/ products/qlik-sense

Step 2: Data

• Customer call centers • Surveys • Internet

Data Organization The data must be organized to create a visualization. This means data must be put into a workable format. Most tools for creating a visualization provide detailed information how to manage data in the application or how to connect different data sources. Best practice will require the data to be organized into a rows and column or table format. Each value in the table should be the same unit of measure. For example, Table 18.1.2 shows airline flight data in a row and column format and having the same unit of measure. When dealing with time data, the time format must also be consistent. For example, dates should be in a consistent format such as MMDDYYYY to be visualized correctly. Table 18.1.2 Airline Flight Data Organized in a Row and Column Format Year

Airline

Domestic Flights

International Flights

Total Flights

2017 2017 2017 2017 2017

Southwest American Airlines Delta United JetBlue

1,313,573 886,803 917,231 580,293 291,995

34,308 193,145 144,295 167,578 62,369

1,347,881 1,079,948 1,061,526 747,871 354,364

Depending on the story to tell, skills and knowledge in statistics may be needed. More complicated visualizations can use calculations to show the results of the analysis. Although visualizations can tell a story using good data, they can also be used to distort reality by presenting the data in different ways. When using line or bar charts, use caution not to distort the data by truncating the bottom of the line or bar chart where differences between the data points appear larger. Also, use caution with scales, such as different size bubbles to ensure they are at the correct scale for comparisons.

Data Quality Data quality is important for a good visualization. Good data include data that are complete, clean, not questionable or conflicting, and valid. Quality data can lead to better decisions and better visualizations. There are different dimensions of data quality to be considered including the following: • • • • • •

Accuracy—correct values Completeness—no missing values Consistency—same unit of measures or time format Integrity—data that are reliable Timeliness—data relevant to the time period Uniqueness—removal of duplicates

387

388

CHAPTER 18.1:

An Introduction to Data Visualizations

• Validity—data that are valid and not made up • Accessibility—data that are accessible with permissible use Data can be collected from many different places. Before designing a visualization, it’s important to understand the data that will be used. Data can be structured, such as a customer name and location, or unstructured, such as a customer comment or phone call transcribed to text. When collecting the data, it’s important to understand how different data sets are related. For example, if structured customer data and unstructured customer comments are going to be used, then how are they related? What will be communicated through a visualization and what kind of story will be told? By understanding these questions, then the right type of visualization can be used.

STEP 3: DESIGN The concept of using a visualization to represent data has been around for hundreds of years. Today, with the advancements in technology and business intelligence (BI) technology capabilities, there are many tools available to help create a visualization. Technology has made it possible to process high amounts of data quickly. Technology may continue to advance capabilities to create a visualization—perhaps through audio describing what a user wants to see or through machine learning. No matter where we are going with the creation of a visualization, there are fundamentals that are important to understand. When it comes to design, the most important fundamental is to ensure the context of the visualization is understood by the user. Before the design step, it’s important to have followed the methodology and have the define and the data steps understood. Choosing the appropriate chart requires an understanding of the data properties and purpose for the visualization.

Forms of Visualizations When the business need or problem is understood and the data have been gathered, the visualization can be designed. There are many different forms of visualizations that can be used depending on the data, but choosing the right visualization to improve the user experience in telling the story is important. All visualizations should include not only the visual that represents the data but also additional information such as labels and text so the audience can understand the content and the context. Table 18.1.3 shows some basic forms of visualizations that can be used. Some of these charts can be enhanced; for example, a time element can be used for a bubble chart to show changes over time. Examples for some common basic charts will be discussed. However, there are many different forms of visualizations that should be reviewed before designing a visualization.

Step 3: Design

Table 18.1.3 Forms of Visualizations Audience Ease of Interpretation

Visualization Form

Number of Categories

Number of Numerical Variables

Number chart Pie chart

1

1 1

Bar chart (basic) 1

1

Display Easy Proportion Easy comparison Showing exact values Easy

Multiple

1 or 2

Compare categories

Easy

Purpose

Bar chart (grouped side by side) Bar chart (Stacked) Line (single)

Multiple

1

Compare categories

Easy

1

Trends over time

Easy

Line (multiple)

Multiple Multiple

Scatter chart

0 or 1

2

Bubble chart

0 or 1

3

Compare multiple categories over time Comparing variables and geospatial analytics Relationships and correlations between numerical values Relationships and correlations between numerical values

Difficult

Maps

1 + Date variable 1 + Date variable Multiple

Difficult

Average rating or score % of negative sentiment by company Top consumer complaints about Equifax in a given period of time Compare hotels grouped by hotel ratings Compare hotels by on line customer review sentiment Sales over time Consumer sentiment over time for each credit bureau Location and volume of customer complaints

Difficult

Relationship between cancer rates and country

Difficult

Comparing airlines by assets, revenue, and profit

Number Charts The most common visualization is a simple number chart. A number chart as shown in Fig. 18.1.6 is a good visual for a dashboard to easily communicate any total such as a count, a percentage, an average, or a dollar amount. Trend indicators can also be used in a number chart but should represent the same period of time (such as annual, quarterly, daily, or monthly).

FIG. 18.1.6 Number chart.

Example

389

390

CHAPTER 18.1:

An Introduction to Data Visualizations

Pie Charts Pie charts have been around for hundreds of years to show parts of a total relationship over a static period of time (such as a slice of the pie vs. the whole pie). Pie charts are a simple way to visualize simple comparisons for a single category; however, they do not work well to compare the size or segment across multiple pie charts. A pie chart splits a population of data for a single category into segments, and the total of all the segments equals 100%. If there are too many segments, then pie charts do not work well as they can be difficult to label or to show the difference in proportions. Also, a pie chart can take a lot of space on a dashboard or report. Fig. 18.1.7 shows an example of a pie chart where the category is ratings for a hotel. Ratings are segmented 1 through 5, and the pie chart shows the percentage of each segment.

Embassy Suites Galleria Hotel Ratings 5 = high, 1 = low 2 5 14.6% 25.0%

1

16.7%

22.9% 20.8% 3 4

FIG. 18.1.7 Pie chart.

Bar Chart A bar chart is used for comparison ranking across one or multiple categories. There are different types of bar charts, and choosing the best one will depend on the data available. A simple bar chart is easy to interpret and can be used to show totals or trends for a single category. Fig. 18.1.8 shows an example of a simple bar chart.

Step 3: Design

Airline flights in 2017 — All airports 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 Southwest

American Airlines

Delta

United

JetBlue

Data source: Bureau of Transportation Statistics T-100 Segment data

FIG. 18.1.8 Bar chart.

Stacked Bar Chart A stacked bar chart can be used to show totals for a single category or to compare categories when there is more than one. For example, Fig. 18.1.9 shows the Number of scheduled flights in 2017 by airline 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 American Airlines

Delta

Sum of domestic

JetBlue

Southwest

United

Sum of international

Data source: Bureau of Transportation Statistics T-100 Segment data

FIG. 18.1.9 Stacked bar chart example.

391

392

CHAPTER 18.1:

An Introduction to Data Visualizations

number of scheduled flights in 2017 for US airlines by domestic and international in one stacked bar chart using public. Stacked bar charts are great to show survey responses or any type of data that has multiple categories.

Horizontal Bar Chart A horizontal bar chart works well if the category labels are long. Although the data presented are similar to the simple or stacked bar chart, using the horizontal bar chart may be selected to better display the labels or for sizing depending on where it will be displayed. A horizontal bar chart may be chosen over the other types of bar charts to better tell the story with the data available (Fig. 18.1.10).

Number of scheduled domestic and international flights by airline

United Southwest JetBlue Delta American Airlines K

500K 1000K 1500K 2000K 2500K 3000K 3500K 4000K 4500K 2015

2016

2017

Data source: Bureau of Transportation Statistics T-100 Segment data

FIG. 18.1.10 Horizontal bar chart.

Line Chart Another basic form of visualizing data is using a line chart. Line charts require time data in consistent intervals. Fig. 18.1.11 shows an example of a multiple line chart where there are multiple categories plotted over time. The variable being plotted is customer sentiment for three different companies. This type of chart is not good for a static visualization, such as a PowerPoint presentation as it can be too cluttered. However, using a visualization tool such as Qlik Sense, the audience can interact and select a custom time range that will allow the user to drill down to see more details. This chart combined with others in an interactive visualization can be very powerful for exploring the data to tell a story.

Step 4: Distribute

Customer sentiment trend*

Sentiment

400

Company TransU Experian

200

Equifax

0 2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

Review date, Company

FIG. 18.1.11 Line chart showing multiple categories.

Bubble chart Another type of chart to compare different variables is a bubble chart or scatterplot. A bubble chart is a good visualization to show in a 3-D format, but it is more complicated and requires more skill to create. Different colors or bubble sizes can be used to show a lot of information in a single chart. A bubble chart looks at data in a snapshot of time. However, by plotting different snapshots of data over different periods of time, this chart can become animated to show changes through data in an interesting form.

STEP 4: DISTRIBUTE A data visualization is a way to tell a story through a graphic representation of the data and a way to share the story among both technical and nontechnical people. The last step when the visualization is complete is to distribute the visualization. There are many ways a visualization can be shared or distributed. It’s important to consider this step before you design your visualization as the purpose will define how it should be distributed. Is the purpose for the visualization to inform or to allow data discovery? Will your audience view only or interact with the visualization to discover insights?

Purpose: To Inform or Educate To inform or educate the audience, the story should unfold by showing the data and visualizations in an order that tells the story. For example, if data are collected to understand customer sentiment about their hotel stay, a visualization can be created to show the customer sentiment over time and put into a story format. Consider using data from the past, present, and predicted future to tell stories for the best outcome or decisions. Visualizations can be shared or distributed to inform or educate the audience in different ways that may include the following. PowerPoint: Visualization charts may be copied and pasted to a PowerPoint slide with additional details added to tell the highlight what is being told.

2016

2017

393

394

CHAPTER 18.1:

An Introduction to Data Visualizations

Dashboard: A dashboard is a collection of visualizations aligned to the business goals to be used as a management report. A dashboard can provide an at-a-glance view of key performance indicators (KPIs) and measures to drive action for improvements. Infographic: An infographic is similar to a visualization as it is a visual representation of the data. However, an infographic may contain more images, pictures, and words that are conceptual in addition to data in a visual form. Infographics are usually very focused and used to captivate an audience. They can be a single page or multiple pages in length. Infographics can be a great tool to use for marketing campaigns or to summarize a study to include visualizations.

Purpose: To Interact or Explore If the purpose for the visualization is to explore the data, then an interactive visualization can be valuable. To distribute an interactive visualization will depend on the software used. Most visualization tools have the capability to publish the visualization to the Internet (cloud) so the user can interact and explore the data. With defined user permissions, the user can change different variables, while all charts in the story update. Interactive visualizations are great for the user to do “what if” questions and to see the outcome visually.

DATA VISUALIZATION TOOLS AND SOFTWARE The practice of creating visualizations is rapidly growing just as machine learning, digital facial recognition, unstructured data analytics, and data science are growing. There are many smart and user-friendly tools available for creating visualizations. Selecting the appropriate tool will depend on many factors, including the knowledge, skills, and abilities of the visualization producer. Some features to consider when selecting a tool include the following: • • • • • • • • • • • •

Ease of use Drag and drop capability Ability to connect to multiple data sources Ability to manage the data Open and standard APIs User-friendly development environment Ability to share and collaborate Interactive capability Features that are up to date Scalable Manageable security Nice-looking visual results to fit the purpose

Summary

Here are some of the leading tools on the market today for creating visualizations without requiring detailed programming skills: • • • •

Qlik Tableau Microsoft Power BI Sisense

SUMMARY There is great value in the process of creating and telling a story through visualizations. The visualization framework is the best methodology to use to ensure visualizations are created with the right content and can be understood in the right context. The process of defining the purpose and talking with the audience, collecting the data, designing the visualization in a story format, and distributing the visualization allows data to be more easily understood for the audience to focus on what is important. Using visualizations to tell a story through data is a great way to provide better information, knowledge, and insights. Telling a story through visualizations will continue to be necessary moving forward to enable data to be better understood for more accurate outcomes and decisions.

395