280 likes | 675 Views
2: Frequency distributions. Stemplot, frequency tables, histograms. Stem-and-leaf plots (stemplots). Analyses start by exploring data with pictures My favorite technique is the stemplot : a histogram-like display of data points. You can observe a lot by looking – Yogi Berra.
E N D
2: Frequency distributions Stemplot, frequency tables, histograms Frequency Distributions
Stem-and-leaf plots (stemplots) • Analyses start by exploring data with pictures • My favorite technique is the stemplot: a histogram-like display of data points You can observe a lot by looking – Yogi Berra Frequency Distributions
Illustrative example: sample.sav • A SRS of AGE (in years) • Data as an ordered array (n = 10): 05 11 21 24 27 28 30 42 50 52 • Divide each data point into • Stem values first one or two digits • Leaf values next digit • In this example • Stem values tens place • Leaf values ones place • e.g., 21 has a stem value of 2 and leaf value of 1 Frequency Distributions
Stemplot (cont.) • Draw stem-like axis from lowest to highest stem 0| 1| 2| 3| 4| 5| ×10 axis multiplier (important!) • Place leaves next to stem • 21 plotted (animation) 1 Frequency Distributions
Continue plotting … • Rearrange leaves in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 ×10 • For discussion, let’s rotate the plot 8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 (x10) ------------Rotated stemplot Frequency Distributions
Interpreting frequency distributions • Central Location • Gravitational center mean • Middle value median • Spread • Range and inter-quartile range • Standard deviation and variance (next week) • Shape • Symmetry • Modality • Kurtosis Frequency Distributions
Mean = arithmetic average “Eye-ball method” visualize where plot would balance Arithmetic method = total divided by n 8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 ------------ ^ Grav.Center Eye-ball method balances around 25 to 30 Actual arithmetic average = 29.0 Frequency Distributions
Middle point median • Count from top to depth of (n + 1) ÷ 2 • For illustrative data: • n = 10 • Depth of median = (10+1) ÷ 2 = 5.5 Frequency Distributions
Spread variability • Easiest way to describe spread is by stating its range, e.g., “from 5 to 52” (not the best way) • A better way is to divide the data into low groups and high groups • Quartile 1 = median of low group • Quartile 3 = median of high group Frequency Distributions
Shape visual pattern • Skyline silhouette of plot • Symmetry • Mounds • Outliers (if any) • When n is small, it’s too difficult to describe shape accurately X X X XX X X X X X------------0 1 2 3 4 5 ------------ Frequency Distributions
What to look for in shape • Idealized shape = density curve • Look for: • General pattern • Symmetry • Outliers Frequency Distributions
Symmetrical shapes Frequency Distributions
Asymmetrical shapes Frequency Distributions
Modality (no. of peaks) Frequency Distributions
Kurtosis (steepness of peak) fat tails Mesokurtic (medium) Platykurtic (flat) skinny tails Leptokurtic (steep) Kurtosis can NOT be easily judged by eye Frequency Distributions
Second example (n = 8) • Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 • Truncate extra digit (e.g., 1.47 1.4) • Stem = ones-place • Leaves = tenths-place • Do not plot decimal |1|4|2|03|3|4779|4|4(×1) • Center: between 3.4 & 3.7 (underlined) • Spread: 1.4 to 4.4 • Shape: mound, no outliers Frequency Distributions
Third example (pollution.sav) Regular stem: |1|4789|2|223466789|3|000123445678(×1) • Regular stemplot (top) too squished • Split-stem (bottom) • First 1 on stem leaves 0 to 4 • Second 1 on stem leaves 5 to 9 Split-stem: |1|4|1|789|2|2234|2|66789|3|00012344|3|5678(×1) Note negative skew Frequency Distributions
How many stem-values? • Start with between 4 and 12 stem- values • Then, trial and error to draw out shape for the most informative plot (use judgment) Frequency Distributions
Body weight (n = 53) Data range from 100 to 260 lbs. 100 lb. multiplier seems too broad (only two stem values) 100 lb. multiplier w/ split stem-values still too broad (only 4 stem values) Try 10 pound stem multiplier Frequency Distributions
Body weight (n = 53) 10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (×10) 10|0 means “100” Shape: Positive skew, high outlier (260) Location: median = 165 (underlined) Spread: from 100 to 260 Frequency Distributions
Quintuple split:Body weight data (n = 53) 1*|0000111 1t|222222233333 1f|4455555 1s|666777777 1.|888888888999 2*|0111 2t|2 2f| 2s|6 (×100) • Codes: • * for leaves 0 and 1 t for leaves two and threef for leaves four and fives for leaves six and seven. for leaves eight and nine • Example: • 2t| 2 means a value of 222 (×100) Frequency Distributions
Frequency counts (SPSS plot) Age of participants SPSS provides frequency counts w/ stemplot: Frequency Stem & Leaf 2.00 3 . 0 9.00 4 . 0000 28.00 5 . 00000000000000 37.00 6 . 000000000000000000 54.00 7 . 000000000000000000000000000 85.00 8 . 000000000000000000000000000000000000000000 94.00 9 . 00000000000000000000000000000000000000000000000 81.00 10 . 0000000000000000000000000000000000000000 90.00 11 . 000000000000000000000000000000000000000000000 57.00 12 . 0000000000000000000000000000 43.00 13 . 000000000000000000000 25.00 14 . 000000000000 19.00 15 . 000000000 13.00 16 . 000000 8.00 17 . 0000 9.00 Extremes (>=18) Stem width: 1 Each leaf: 2 case(s) 3 . 0 means 3.0 years Because of large n, each leaf represents 2 observations Frequency Distributions
Frequency tables AGE | Freq Rel.Freq Cum.Freq. ------+----------------------- 3 | 2 0.3% 0.3% 4 | 9 1.4% 1.7% 5 | 28 4.3% 6.0% 6 | 37 5.7% 11.6% 7 | 54 8.3% 19.9% 8 | 85 13.0% 32.9% 9 | 94 14.4% 47.2%10 | 81 12.4% 59.6%11 | 90 13.8% 73.4%12 | 57 8.7% 82.1%13 | 43 6.6% 88.7%14 | 25 3.8% 92.5%15 | 19 2.9% 95.4%16 | 13 2.0% 97.4%17 | 8 1.2% 98.6%18 | 6 0.9% 99.5%19 | 3 0.5% 100.0%------+-----------------------Total | 654 100.0% • Frequency = count • Relative frequency = proportion or % • Cumulative frequency % less than or equal to current value Frequency Distributions
Class intervals • When data sparse group data into class intervals • Classes can be uniform or non-uniform Frequency Distributions
Uniform class intervals • Create 4 to 12 class intervals • Set end-point convention - include left boundary and exclude right boundary • e.g., first class interval includes 0 and excludes 10 (0 to 9.99 years of age) • Talley frequencies • Calculate relative frequency • Calculate cumulative frequency (demo) Frequency Distributions
Here’s age data in sample.sav… Frequency Distributions
Histogram – for quantitative data Bars are contiguous Frequency Distributions
Bar chart – for categorical data Bars are discrete Frequency Distributions