740 likes | 846 Views
Data Visualization Seminar NCDC, April 27 2011. Todd Pierce Module 5 Types of Graphs. Best Practices Time Series (sources: Colin Ware and Stephen Kosslyn). Time Series Graphs. Most graphics show values changing over time – time gives us a context for understanding data
E N D
Data Visualization SeminarNCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs
Best PracticesTime Series(sources: Colin Ware and Stephen Kosslyn)
Time Series Graphs • Most graphics show values changing over time – time gives us a context for understanding data • random sample of 4000 newspaper graphics 1874-1989 found 75% of them had time series • Time Series can be shown best by line graphs but sometimes other graphs work best
Time Series Graphs • Patterns • Trend: overall tendency of values to increase, decrease, or stay stable during a time period; trend lines can show this (but see later caveats) • Variability: average degree of change from one point in time to the next in a time period; but be careful, if the y scale is narrow or does not start at zero, variability may be overstated • Rate of change: percent difference between one value and the next; rates of change may be increasing faster than the raw data values would indicate
Time Series Graphs • Patterns • Co-variation: changes in one time series are reflected as changes in another, either immediately or later; changes can be in same or different directions; if changes are not immediate, we have leading or lagging indicators • Cycles: patterns that repeat at regular intervals instead of in one fixed interval • Exceptions: values that fall far outside the norm
Time Series Graphs • Line Graphs: show how quantitative values have changed over a continuous time period; show pattern or shape of change over time; show exceptions • Lines make visible the sequential flow of values over time • Lines trace connection from one value to the next • Lines shows extent and direction of change through slope • If we want to compare magnitudes of values at a point in time, we should add dots to the lines
Time Series Graphs • Bar Graphs: emphasize individual values and allow for comparisons of specific values at points in time • Visual weight of bars and their separation makes us focus on individual values rather than the overall patterns • Dot Plots: useful when sampling at irregular intervals • A line connecting sporadic values implies smooth transitions between values • More regular sampling might show different picture • Use dots instead of lines to avoid false conclusions
Time Series Graphs • Box Plots: show distribution of values over time by showing the average, min and max • see Distribution Analysis for more information • Animated Scatterplots : show correlation analysis over time – such as Gapminder • see Correlation Analysis for more information • Great for telling a story, not so good for analysis – hard to track individual dots • Must be combined with trails to show patterns of change over time, and small multiples (trellis display) to compare patterns of changes for multiple items
Time Series Graphs • Best Practices • Aggregating to different time intervals: combine data into different time spans (month, week, year, day) to see different patterns emerge • Viewing time periods in context: extend the time period – trends that look significant in a small time span may not be over longer periods • Grouping related time intervals: add vertical lines or shading on the time axis to show for example each quarter or when the weekends are
Time Series Graphs • Best Practices • Using running averages to enhance perception of high level patterns: trend lines can mislead if they don’t take into account values just outside the time period; better to look at running averages of current value and a few previous values – this smoothing can reduce variability that throws off trend lines • Omitting missing values from a display: rather than have the line dip to zero, either skip the value (show a broken line) or show the line lighter or dashed; do not confuse a valid zero value with a missing value
Time Series Graphs • Best Practices • Optimizing a graph’s aspect ratio: change the aspect ratio to get a lumpy profile instead of a flat or spiky profile, to allow for optimal comparison of slopes • Using log scales and percentages to compare rates of change: variations in numerical magnitudes may hide true rates of change – use log scales, or percent change from previous value or from a baseline value, to see true rates of change • Overlapping time scales to compare cyclical patterns: instead of showing for example all three years in one line, show each year as a different line over the 12 months, to allow comparisons from year to year for a given month
Time Series Graphs • Best Practices • Using cycle plots to examine trends and cycles together: compare cycles and see trends across multiple cycles • Shifting time to compare leading and lagging indicators: shift the time axis on one graph so it aligns with the other and see patterns • Stacking line graphs to compare multiple values: if multiple time series have very different units or scale ranges, put them in stacked line graphs with the same time axis
Time Series Graphs • Best Practices • Expressing time as 0-100% to compare asynchronous processes: if activities have different start dates, reduce each to 0% and show later dates as percentage of total activity time, to compare values at similar times in total activity length • Maintaining consistency through time: must adjust for inflation in currency over time; and account for how information gathering changed or values were defined over time
Time Series Graphs • Do’s and Don’t’s • Change salience of lines if needed to show relative importance. • Ensure crossing or nearby lines are discriminable. • If using points on lines, make points at least twice as thick as the lines. • Vary the lengths of dashes in dashed lines by at least a ratio of 2 to 1. • Use different, discriminable symbols for points on different lines.
Time Series Graphs • Do’s and Don’t’s • Do not fill in the areas between two lines – it’s not an area graph. • In a mixed line and bar display, make one more salient and important. • Put labels of all lines in same part of graph (else it draws attention to certain lines – also less busy). • Put labels at end of lines (so labels and lines group with each other. • Label any critical data points explicitly rather than labeling all points.
Best PracticesPart-to-Whole and Ranking Analysis(sources: Colin Ware and Stephen Kosslyn)
Part-to-Whole and Ranking • Comparing parts to a whole and ranking them by value – for example the expenses of each department of a company as a % of total expenses, ranked in order
Part-to-Whole and Ranking • Patterns • Uniform – all values roughly the same • Uniformly different – differences from one value to the next increase by roughly the same amount • Non-uniformly different – differences from one value to the next vary significantly
Part-to-Whole and Ranking • Patterns • Increasingly different – differences from one value to the next increase • Decreasingly different – differences from one value to the next decrease • Alternating differences – differences from one value to the next begin small then shift to large and finally back to small • Exceptional – one or more values are very different from the rest
Part-to-Whole and Ranking • Part to whole is usually shown with pie charts – bad idea! • Makes us compare areas or angles, both of which humans do poorly • If pie uses a legend, eye must bounce between chart and legend • You can label pie wedges directly with name and % value – but this is no better than a table – why use a graph if we must resort to printed values to make sense of it?
Part-to-Whole and Ranking Bad Acceptable
Part-to-Whole and Ranking Acceptable?
Part-to-Whole and Ranking • Instead, use a bar graph • One exception – if values cluster close together, the bar differences are small and hard to see • So narrow the scale (zoom in) so differences bigger • But, use dot plot – dots or lines instead of bars – so we don’t misjudge the bar lengths
Part-to-Whole and Ranking • Use a Pareto chart to show the cumulative contributions of each part to a whole • a line graph plus a bar chart shows how the parts sum to 100 • summarize and display the relative importance of the differences between groups of data. Pareto charts • distinguish the "vital few" from the "useful many."
Part-to-Whole and Ranking • Vilfredo Pareto, a turn-of-the-century Italian economist, studied the distributions of wealth, finding that about 20% of people controlled about 80% of a society's wealth. • This same distribution has been observed in other areas and has been termed the Pareto Principle or 80/20 rule.
Part-to-Whole and Ranking • Best Practices • Grouping categorical values in ad hoc manner: group very small categories into one called ‘other’ or regrouping similar categories into one master category for better analysis • Using Pareto charts with percentile scales: group values into percentile intervals (top 10%, ,next 10%, etc) and use Pareto line – can lead to new insights • Using line graphs to view ranking changes through time: use line graphs to show changes in ranking (such as salesperson’s sales) over time – the lines show the relative ranking but not the actual values – inspired by bump charts from racing
Part-to-Whole and Ranking • Best Practices • Re-expressing values to solve quantitative scaling problems: sometimes the small values on a bar chart are hard to see relative to the large values – so re-express the number using the square root, or a logarithm, if it reduces the range from highest to lowest; can also use an inverse scale (divide each value by the largest value or some other value such as a million)
Part-to-Whole and Ranking • Do’s and Don’t’s: Bar Charts • Do not insist on minimizing ink. • Mark corresponding bars in same color or symbol for multiple parameters. • Arrange corresponding bars in same order for multiple parameters. • Ensure overlapping bars do not look like stacked bars – offset the bars. • Leave space between bar clusters for multiple parameters. • Do not extend bars beyond the end of the scale.
Part-to-Whole and Ranking • Do’s and Don’t’s: Pie Charts • Draw radii from the center of the circle. • Explode a maximum of 25% of the wedges. • Arrange wedges in a simple increasing progression. • Place labels in wedges provided they can be easily read. • Place labels next to all wedges if they cannot fit inside wedges (otherwise reader will think ones outside wedge are more important).
Best Practices Deviation Analysis(sources: Colin Ware and Stephen Kosslyn)
Deviation Analysis • Examining how a set of values deviate from a reference point (a budget, average, or price in time) • Usually use a bar graph with two bars per entity – the actual and expected, such as for a budget • However this makes user subtract values in head • Better to have the graph 0 line be the expected reference, and the bars show the amount over or under (the deviation)
Deviation Analysis • Comparisons • Current target, future target • Same point in time in past • Immediately prior period • Standard or norm • Other items in same category or same market
Deviation Analysis • Best shown as bar or line graphs with reference line at 0 or 100% • If at 0, values expressed as positive and negative deviations in dollars or percents • If at 100%, values expressed as percentages of the reference value • Best to use a line graph when doing comparisons over time, from one period to the next; if comparing entities such as areas or companies, use a bar graph
Deviation Analysis • Best Practices • Expressing deviations as percentages: helps normalize multiple data sets to same units to allow for better comparison – works best if values or mostly <= 100% and nothing exceeds 500% • Comparing deviations to other points of reference: besides showing reference line, show other lines such as acceptable deviations from norm, or standard deviations from mean
Best Practices Distribution Analysis(sources: Colin Ware and Stephen Kosslyn)
Distribution Analysis • Seeing how numerical values are distributed from low to high, and compare how multiple values sets are distributed • “The median isn’t the message” (Stephen Jay Gould) • knowing the average or median value hides the full range of values • even knowing the max and min values hides the number of values at each numerical value in a range of data
Distribution Analysis • Characteristics of distributions of values • Spread: the difference between the max and min values – the full range of values • Center: estimate of the middle of a set of values – the mean or median or average • Shape: where values are located in a spread – skewed to a side? Evenly distributed? • Distribution summaries: • 3 value: low, median, high • 5 value: low, 25th %ile, median, 75th %ile, high
Distribution Analysis • Patterns - Shape: • Curved or flat? • If curved, curved upward (bell curve) or downward (opposite of bell curve)? • If curved upward, one peak, two peaks (bi-modal), or more? • If single peaked, symmetrical or skewed left or right? • Concentrations? Noticeably high peaks, that may not be the absolute peak • Gaps? Areas of low or no values
Distribution Analysis Gaussian distribution
Distribution Analysis Bimodal distribution for graduating lawyer salaries