290 likes | 442 Views
Stat 31, Section 1, Last Time. Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html What is Statistics? Data types and structure Get going in EXCEL Exploratory Data Analysis Bar Graphs. Stat 31, Student Poll Results.
E N D
Stat 31, Section 1, Last Time • Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html • What is Statistics? • Data types and structure • Get going in EXCEL • Exploratory Data Analysis • Bar Graphs
Stat 31, Student Poll Results As indicated on “Student Info” form: Big changes from the past: More biology More diversity
Stat 31, Student Poll Results “Have you taken an AP Exam?” Only ~10% had & grades generally low So don’t worry if you haven’t…
Major Concept: Distributions “Distribution” = “Patterns of data” = “way data is spread out” e.g. Bar Graph is visual display of categorical “distribution”
Exploratory Data Analysis 3 Visual Display of Quantitative Distributions: • Stem and Leaf Plots Not Recommended (Main motivation was pencil and paper statistical analysis, but now have better graphical methods readily accessible) A limited special case of….
Visual Disp: Quantitative Dist’ns 2. Histograms Idea: Apply bar graph idea, By creating categories, Called “class intervals” or “classes” or “bins”
Histograms Idea: put numbers into “bins”, bar heights are counts, or “frequencies” 1.3 3.6 1.9 3.1 1.5 0 1 2 3 4
Class Histogram Example Buffalo, N. Y. (Annual) Snowfall Data Raw Data: https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls 63 years, ranging from ~30 - ~120 (inches)
Buffalo Snowfall Data Buffalo, N. Y. (Annual) Snowfall Data Raw Data: https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls 63 years, ranging from ~30 - ~120 (inches) Histogram Analysis (pre-done): https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Done.xls
Buffalo Snowfall Data, I A. EXCEL default (of bin edges) • Unround numbers for bin edges • Data “centered around 90” • Most data between 50 and 130 • Assymetric Distribution
Buffalo Snowfall Data, II B. Smaller bins • Chosen by me • Binwidth = 5, << ~13 from EXCEL default • Nicer edge numbers • Data centered around 84 (now more precise) • Bar graph rougher (fewer points in each bin) • Suggests 3 main groups (called “modes”) (can’t see this above: bin width counts)
Buffalo Snowfall Data, III C. Larger bins • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default • Bar graph is “smooth” (since many points in each bin) • Only one mode??? • Quite symmetric? (different from above: bin width counts)
Buffalo Snowfall Data, IV • What’s under the hood (how to do this): • Tools Data Analysis Histogram (& Chart Out) (may need Data Analysis “Add-in”) • Massage pic (especially bar width) • Sigma min, max • Bin range: create first two & drag • Histogram, using input bin edges
Buffalo Snowfall Data, IV • What’s under the hood (how to do this): • Tools Data Analysis Histogram (& Chart Out) (may need Data Analysis “Add-in”) • Massage pic (especially bar width) • Sigma min, max • Bin range: create first two & drag • Histogram, using input bin edges
Histogram HW HW: 1.21 • Use Excel and histograms • Get data from CDrom • Do both: • Excel Default bins • Bins set to: 0,10,20,…,240 • Which gives answers closer to answers in back of book? • Turn in only one page
Histogram Binwidths Nice Example from the Webster West, U.S.C.: http://www.stat.sc.edu/~west/applets/histogram.html Control Binwidth with slider: • Undersmoothing? • About right? • Oversmoothing? (critical to visual impression)
Histogram Binwidth Example Hidalgo Stamp Data From Mexico in 1800s How many sources of paper? How many modes: 1, 2, 5, 7, 10?
Histogram Binwidth Example How many modes? Caution: Answer depends on binwidth (a serious and current statistical research problem)
Stamps Data Histogram How many modes? 2nd Caution: Answer also depends on bin location (i.e. “shift” of bins)
Histogram Bins For this course: Try several binwidths, to “get the idea” Weakness of EXCEL (we will see several): This is inconvenient
Comparison of Histograms Class Example: Study Habits Data Idea: Compare Study Habits of Males vs. Females (measured by some “survey score”, perhaps of questionable value?) https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg4Done.xls
Study Habits Data EXCEL default histograms: • Populations look similar??? • Careful: Binwidth very big… • Careful: Different bin ranges… • Need smaller binwidths, and common scales
Study Habits Data Better Choice: Binwidths = 10, same bins for both • Clear difference, easy to see • Females higher “on average” • Males are “more spread” • 1 “exceptional value”, really true???
Things to look for (in histo’s) • Population Center Point (Study Habits Data) • Population Spread (Study Habits Data) • Shape - Symmetric vs. Skewed Right Skewed: Left Skewed: • Modes - Unexpected clusters • Outliers - “unusual data points”
Comparison of Histograms HW HW: 1.25b, 1.27, 1.29, 1.22 • Work in this order • Get data from CDrom • Use EXCEL and histograms • Odd answers in back • You choose the bins (if you miss something in answers, change this) • Turn in at most one page for each
Plotting Bivariate Data Toy Example: (1,2) (3,1) (-1,0) (2,-1)
Plotting Bivariate Data Sometimes: Can see more insightful patterns by connecting points
Plotting Bivariate Data Sometimes: Useful to switch off points, and only look at lines/curves
Plotting Bivariate Data Common Name: “Scatterplot” A look under the hood: EXCEL: Chart Wizard (colored bar icon) • Chart Type: XY (scatter) • Subtype conrols points only, or lines • Later steps similar to above (can massage the pic!)