Data Visualization

Resources

Important

  • 1D and 2D Graphing Standards – For your assignments and projects, you need to know which graphs are appropriate for different types of data. For a single numeric (continuous) variable, a histogram or a box plot is standard. For a single categorical variable, a bar plot or a Cleveland dot plot is used. For two numeric variables, a scatter plot is best. For one categorical and one numeric variable, separate box plots for each category are ideal. For two categorical variables, a mosaic plot can be used.
  • Calculator Requirement – You will need a calculator for the test, as there will be computation questions covering topics like central tendency and variability.

Core concepts

  • **Why Visualize Data?### Important
  • 1D and 2D Graphing Standards – For your assignments and projects, you need to know which graphs are appropriate for different types of data. For a single numeric (continuous) variable, a histogram or a box plot is standard. For a single categorical variable, a bar plot or a Cleveland dot plot is used. For two numeric variables, a scatter plot is best. For one categorical and one numeric variable, separate box plots for each category are ideal. For two categorical variables, a mosaic plot can be used.
  • Calculator Requirement – You will need a calculator for the test, as there will be computation questions covering topics like central tendency and variability.

Core concepts

  • Why Visualize Data? – Graphics reveal data. The goal of data visualisation is to take complex, multi-dimensional information and make it clear and easily understandable through visual representation. This process is crucial because it can reveal patterns and insights that are not apparent from raw data tables.
  • Graphical Perception – Humans are better at making accurate judgments about certain visual elements than others. Judgements are most accurate for position along a common scale (like in a bar chart) and least accurate for elements like volume, density, and colour hue. This hierarchy of perceptual accuracy should guide the choice of graph type to ensure clarity and avoid misinterpretation. For instance, bar plots are preferable to pie charts because they use position on a common scale, which is easier to judge than the angles and areas in a pie chart.
  • Plotting Data – Organising raw data into a graphical form, such as a frequency distribution, histogram, or stem-and-leaf display, makes it more intelligible. A frequency distribution organises data by counting how often each value or interval occurs. A histogram visually represents this distribution with bars, while a stem-and-leaf display shows the shape of the distribution while retaining the individual data values.
  • Describing Distributions – The shape of a data distribution can be described by its modality (the number of peaks) and its skewness (the degree of asymmetry). A distribution with one peak is unimodal, while one with two is bimodal. A symmetric distribution has the same shape on both sides of its centre. A distribution with a long tail extending to the right is positively skewed, while one with a long tail to the left is negatively skewed.
  • Principles of Graphical Standards – The key principles of good data visualisation are clarity and simplicity. This involves avoiding the complication of simple information and instead aiming for the clear portrayal of complexity. It is important to avoid hiding data, use appropriate scales, and eliminate “chart junk”—unnecessary visual elements that don’t add information. Maximising the data-ink ratio ensures that most of the ink on a graphic is used to display data, enhancing clarity.

Theories and Frameworks

  • Miasma theory – An obsolete theory of disease propagation that held that diseases like cholera were spread through “bad air”.
  • Exploratory Data Analysis (EDA) – An approach to data analysis developed by John Tukey that uses a variety of methods, like stem-and-leaf displays and boxplots, for displaying data in visually meaningful ways.
  • Graphical Cognition Model (Spence, 2006) – Describes the process of interpreting a graph, where a visual query (the graph) is perceived, encoded into working memory, and processed to draw inferences and interpretations.

Notable Individuals

  • William S. Cleveland – A statistician whose experiments on graphical perception established a hierarchy of how accurately people judge different visual elements.
  • Charles Minard – Created a famous visualisation of Napoleon’s 1812 invasion of Russia, considered one of the best statistical graphics ever made.
  • Florence Nightingale – A nurse and statistician who used data visualisation, such as the “coxcomb” diagram, to demonstrate that most soldier deaths in the Crimean War were from preventable diseases, leading to policy changes.
  • John Snow – A statistician who used a density map to plot cholera deaths in London, linking them to a contaminated water pump and challenging the prevailing miasma theory.
  • Edward R. Tufte – A statistician and design expert who wrote a key textbook on data visualisation and emphasised principles like revealing complexity clearly and avoiding “chart junk”.
  • John Tukey – An influential statistician who developed exploratory data analysis (EDA), which includes methods like the stem-and-leaf display and the boxplot.