1 / 33

Stat 226

Stat 226. Note Set 1 James D. Abbey. Statistics?. The age old question of all courses: What is ______? For our purposes, we need to know what statistics does for us. Perhaps we will have an answer by the end of the course!

yachi
Download Presentation

Stat 226

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 226 Note Set 1 James D. Abbey

  2. Statistics? • The age old question of all courses: What is ______? For our purposes, we need to know what statistics does for us. Perhaps we will have an answer by the end of the course! • Your text definition: Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. What does this definition mean to us?!

  3. The inevitable terms • As with any study, we must learn certain terms. A few early ones: • Individuals: Objects of data • A person, a monkey, a person who acts like a monkey, etc. • Variables: A characteristic of an individual or set of individuals • Purchasing budget, a demographic, etc.

  4. Definitions of definitions! • Now that we know what a variable represents, we need to know what types of variables we will encounter: • Categorical: A categorical variable differentiates individuals into categories • City, suburb or rural locations • Coke or Pepsi? (What about Mountain Dew!) • Quantitative: A quantitative variable has an intuitive numerical value • Budget amounts in dollars, total time to complete a project in days, etc.

  5. Analyzing the variables • We need to see the distribution of our variables. • Raw data? 76,77,78,82,83,84,85,86,86, 88,89,92,93,95,96,96,98,99…. Not easy to see patterns. • Graphical analysis?

  6. Analyzing Distributions • Describing a distribution (exploratory analysis) • Method 1: Look at a each variable and then expand to a multiple variable comparison • Jump straight to the data. Works well for very simple data sets. • Method 2: Start with graphs of the data. Next, examine specific numerical summaries • Use graphs to tame the data and give intuition about variables/topics worth investigating.

  7. Graphical Analysis • Categorical: • Bar graph: a bar graph with counts or percent within each category • Pareto graph: A reorganized bar graph with categories in order of greatest relative frequency • Pie Graph: Slices of pie represent the percent or count for each category • Quantitative: • Bar graph: Use the bar graph to give counts of pre-specified ranges for the quantitative variable • Also known as a Histogram

  8. Categorical Graphical Analysis • Bar Graph: • Analyze  Distribution

  9. Categorical Graphical Analysis • Pareto Graph: Graph  Pareto Plot

  10. Categorical Graphical Analysis • Pie Graph: Graph  Chart

  11. Quantitative Graphical Analysis • The Histogram • Steps • 1) Divide the individuals into non-overlapping, equal width groups • 2) Count the number of individuals in each group • 3) Break out the old, dusty drawing skills to draw the histogram (or use a computer!)

  12. The Histogram • Notes before we see a histogram • The base or horizontal axis (x-axis) contains the pre-defined categories of individuals • The vertical (y-axis) contains the count or relative frequency of the individuals within each category • Caution: Avoid histograms with all the individuals in one category (sky-scrappers) or with individuals spread among too many categories (the wide blob)

  13. The Histogram • The data: 1,1,1,1,2,2,3,3,3,3,3,4,4,4 • JMP output from Analyze  Distribution • Good enough Bad (too wide)

  14. Another Histogram • Histogram of Unemployment Data on pages 10-11 of text (ex 1.3)

  15. Histograms show us what? • Histograms show us three prime attributes • Shape: Symmetric, Skewed • Symmetric | Skewed Right | Skewed Left (minor) • Skew right means the data extends far to the right • Skew left means the data extends far to the left • Center: Where does the data heavily clump • Spread: How low and high the data values go • Also, outliers come into the category of spread

  16. Identifying the traits • Unemployment Data (pgs 10-11)

  17. Keeping the data: Stemplots • The stem plot keeps the data visible while giving the benefits of a histogram • However, you should only use stemplots for data sets with less than 100 observations • If the data has a large number of digits, you may wish to use only a few significant digits • Ex: 9.54234, 10.12341 become 9.5, 10.1

  18. The Stemplot • Creating a stemplot • 1) Categorize each individual/observation into • A stem: all but the final digit • The leaf: the final digit • 2) List the stems vertically starting with the smallest value at the top • 3) Place the leafs in the rows within each stem category

  19. The Stemplot • The unemployment data • The stem is the one’s digit while the leaf is the decimal • Breakdown: • 1.5% becomes 1|5 • 2.0% becomes 2|0

  20. Expanding the Stemplot • The split-stemplot • The last stemplot had some very long categories (not good) • So, we can split the stems into sub-categories • When split, each stem will appear twice • Ex: • 1.0 to 1.9 becomes 1.0 to 1.4 and 1.5 to 1.9 • 2.0 to 2.9 becomes 2.0 to 2.4 and 2.5 to 2.9 • Etc.

  21. The split stemplot • Unemployment data split-stem stemplot • The category 1.0 to 1.9 became 1.5 to 1.9 (1.0 to 1.4 was empty) • 2.0 to 2.9 became 2.0 to 2.4 and 2.5 to 2.9

  22. Examining Quantitative Data • Henry Cavendish density of earth as a multiple of water. • The ordered data

  23. Cavendish does dirt (cont.) • Step 1? Categorize!

  24. Cavendish plays with dirt (cont) • Step 2? Count the individuals in each category!

  25. Cavendish’s dirt again • Step 3: Graphical Analysis. Histogram or stemplot time • The histogram: • Shape: • Skewed left • Center: • 5.46 • Spread: • 4.07 to 5.86  Is 4.07 an outlier? Perhaps….

  26. Cavendish’s dirt on a stemplot • Time for a stemplot! • What’s wrongwith this stemplot?:

  27. Cavendish’s dirt on a stemplot • A better stemplot (split stem plot): • Aside from the split of the numbers, what else is different in this stemplot?

  28. An overview of Time Plots • At times, data are based on time. For example: • Stock prices at different times of the day or at closing across days • Temperatures in Ames, Iowa by day across an entire year • Your GPA by semester until you finish college

  29. Time plots defined • Time plot traits: • The x-axis (horizontal) is always some unit of time (days, weeks, semesters, etc.) • The y-axis (vertical) is the variable of interest with an appropriate scale • The customary time plot has connected points to better see trends • If the intervals are consistent, then you have a time series (a fairly complex area of statistics)

  30. Time plots (and time series) • In time series, we may find seasonal patterns • Shopping revenues spike during the holiday seasons • Time series may also show trends • Energy prices over the last decade show a general uptrend

  31. Time plots • A time plot of seasonally adjusted unemployment data from 1948 to 1993:

  32. Section 1.1 Summary • Datasets contain: • Individuals: Objects of data including people, prices or any object of interest • Variables: A characteristic of an individual or set of individuals such as a budget or salary • Variables: • Categorical: Places individuals into categories such as undergraduate or graduate student • Quantitative: A variable with a native numerical measure such as price of gasoline

  33. Section 1.1 Summary • Exploratory data analysis (Graphical): • Categorical: Bar graph, Pareto graph, Pie graph • Quantitative: Histogram, Stemplot • When examining graphics, look for • Shape, center and spread as well as outliers. • Remember to use timeplots for data that occur over time (e.g., stock prices at closing over the past month)

More Related