1 / 54

Misconception – Jessica Utts, UC Irvine

Misconception – Jessica Utts, UC Irvine. “Statistics is a boring subject and has little relevance in daily life, so it does not matter if you remember anything you learn about it.” – Anonymous Student (not from Stat 226!). Statistics affects our lives in ways most people never realize.

waneta
Download Presentation

Misconception – Jessica Utts, UC Irvine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Misconception – Jessica Utts, UC Irvine • “Statistics is a boring subject and has little relevance in daily life, so it does not matter if you remember anything you learn about it.” – Anonymous Student (not from Stat 226!)

  2. Statistics affects our lives in ways most people never realize. • Cost of insurance for your car • When new movies will be released • Whether a store has your size in stock • Best way to grow, process, ship and sell food • Risk assessment for a credit card company • Determining if email is spam (i.e. spam filters) • What type of calling plan for your phone will appeal to you and makes money for the company • Which incoming students are most likely to succeed at ISU (based on High School GPA, ACT/SAT, etc.)

  3. Chapter 1 Examining Distributions

  4. Introduction • Statistics is the science of collecting, organizing, and interpreting data in the presence of variation • The fact that variation exists is the reason why you are taking this class (in addition to “my advisor told me to”) • Statistics aids us in finding the truth.

  5. Introduction • Steps for Statistical Problem Solving • Question Formulation: Articulate a research question or a hypothesis to be tested • Data Production: Collect defensible and relevant data. • Data Summarization: Graph data and compute numerical summaries. • Statistical Inference: Draw conclusions about how results apply in a broader context. (We will talk about this in later chapters)

  6. Definitions • MeasurementThe value of a variable obtained and recorded on an individual • Examples: 145 recorded as a person’s weight 65 recorded as the height of a tree “purple” as the color you dyed your dog’s hair • Data is a set of measurements made on a group of individuals

  7. The Three W’s • Any set of data is accompanied by background information that helps us understand the data • Three questions to ask when planning a statistical study or exploring data from someone else’s work 1) Who?: Individuals 2) What?: Variables 3) Why?: Purpose

  8. Definitions • Individuals are the objects described by a set of data • Employees, lab mice, states… • A variable is any characteristic of an individual • Age, salary, weight, location… • The purpose is why you have the data (or why someone is giving you money to do research) • Cure cancer, marketing survey, time machine…

  9. Two Types of Variables • A categorical variable places an individual into one of several groups or categories • A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense

  10. **The distribution of a variable tells us what values the variable takes and how often it takes those values • Ex. • The way we visualize the distribution depends on the type of variable...

  11. Recognizing the type • Age: • Gender: • Race: • Salary: • Job Type:

  12. Section 1.1 Displaying Distributions with Graphs

  13. Introduction • Statistical tools and ideas help us examine data in order to describe their main features • This is called exploratory data analysis • We first want to simply describe what we see • Basic strategy to help us organize our exploration • Step 1: Examine variables one by one, • Step 2: Look at the relationships among variables • In both steps, we begin by visualizing (with graphs/pictures), then we focus on specific aspects of the data using the appropriate numerical summaries

  14. Categorical Variables • The values of a categorical variable are labels for the categories • examples: Gender → male or female Location → NE, SE, SW, NW or MW • The distribution of a categorical variable lists the categories and gives either the count(frequency) or percent (aka proportion or relative frequency)of individuals who fall into each category • Three types of graphs • Bar, Pie, and Pareto

  15. Example—CEO’s

  16. Summary Table • How we will summarize categorical data is by using a table. Note that proportions (a.k.a. percentages in decimal form) are often called Relative Frequencies.

  17. Bar Graph • The bar graph quickly compares the degrees of the four groups • The heights of the four bars show the counts for the four degree categories

  18. Pie Chart • The pie chart helps us see what part of the whole each group forms • To make a pie chart, you must include all the categories that make up the whole • What if I don’t have color?

  19. Pareto Chart • A bar graph whose categories are ordered from most frequent to least frequent is called a Pareto chart • Pareto charts identify the “vital few” categories that contain most of our observations • Many categories

  20. Brand Preference Example The distribution:

  21. Graphs from Brand Example

  22. Summary for Categorical Variables • Bar graphs, pie charts, and Pareto charts help an audience grasp a distribution quickly • Bar graph is nearly always preferable to a pie chart. It is easier to compare bars than slices of pie • (although comparing pie is much tastier...) • These graphs are of limited use for data analysis because it is usually easy to understand categorical data on a single variable without a graph

  23. Quantitative Variables: • Graphical Summary • Histogram • Stem plot • Time plot

  24. Creating a Histogram • A histogram is the most common way to graph a quantitative variable • Step 1: • Divide the range of the data into classes of equal width • Be sure to specify the classes precisely so that each individual falls into exactly one class • Step 2: • Count the number of individuals in each class • Step 3: • Draw the histogram by making the heights of the bars for each class equal to the number of individuals that fall in that class

  25. Creating a Histogram • The vertical axis contains the scale of counts (or percents), and each bar represents a class • The base of the bar covers the class, and the bar height is the class count

  26. The bars of a histogram should cover the entire range of values of a variable, with no space between bars unless a class is empty. • When the possible values of a variable have gaps between them, extend the bases of the bars to meet halfway between two adjacent possible values. • Ex. pants sizes

  27. CAUTION!! • A few cautions about choosing classes: • Two few classes will give a skyscraper graph, with all values in a few classes with tall bars • Two many classes will produce a pancake graph, with most classes having one or no observations

  28. Example: Unemployment

  29. Example continued… The classes are: 1.0 ≤ rate < 1.5 1.5 ≤ rate < 2.0 2.0 ≤ rate < 2.5 2.5 ≤ rate < 3.0 3.0 ≤ rate < 3.5 3.5 ≤ rate < 4.0 4.0 ≤ rate < 4.5 4.5 ≤ rate < 5.0 5.0 ≤ rate < 5.5 5.5 ≤ rate < 6.0 6.0 ≤ rate < 6.5

  30. Interpreting Histograms • The purpose of the graph is to help us understand the data • After you make a graph, always ask, “What do I see?” • Once you have displayed a distribution, you can see its important features

  31. Interpreting Histograms • Examining a distribution • You can describe the overall pattern of a histogram by its Shape, Center, and Spread • An important kind of deviation is an outlier, an individual value that falls outside the overall pattern • Concentrate on the main features • look for rough symmetry or clear skewness • look for major peaks • look for clear outliers

  32. Shapes • Symmetric: the right and left sides are approximately mirror images of each other • Skewed right: the right side extends farther out than the left side • Skewed left: the left side extends farther out than the right side

  33. …Shape • Some types of data regularly produce distributions that are symmetric or skewed • symmetric example: • IQ scores • right skewed example: • Income • left skewed example: • year found on a penny • Some types of data produce distributions that are neither symmetric nor skewed • Stat 226 Test scores…

  34. Center and Spread • Center • For now, we can describe the center by its midpoint • Where are most of the values found? • Spread • For now, we can describe the spread by giving the smallest and largest values

  35. Unemployment revisited… • Shape • Center • Spread • Outliers

  36. Stem (and leaf) plots • For small data sets (less than 100 observations), a stemplot is quicker to make and presents more detailed information about a quantitative variable • When the observed values have many digits, it is best to round the numbers to just a few digits before making a stemplot

  37. Stem (and leaf) plots • Step 1--Order the data from least to greatest • Step 2--Separate each observation into a stem, consisting of all except the final digit (the leaf) • Step 3--Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column • Step 4--Write each leaf in the row to the right of its stem, in increasing order out from the stem • Step 5--Label

  38. Unemployment Rate: Stemplot:

  39. You can also split stems to double the number of stems when all the leaves would otherwise fall on just a few stems • each stem then appears twice • leaves 0 to 4 go on the upper stem • leaves 5 to 9 go on the lower stem • The greater number of stems mightgive a clearer picture of the distribution

  40. …Unemployment split-stem stemplot

  41. Histogram vs. Stemplot

  42. Advantages Stemplot: Able to see individual data values Histogram: Quicker and neater with a large data set Disadvantages Stemplot: With a large number of data points plot quickly becomes messy Histogram: Not able to see individual data values Histogram vs. Stemplot

  43. Complete Histogram Example • Henry Cavendish (1798) • 29 measurements of the density of earth as a multiple of water Ordered data:

  44. Step 2 • Class Count • 1) 1 • 2) 0 • 3) 0 • 4) 1 • 5) 1 • 6) 13 • 7) 9 • 8) 4

  45. Step 3 • Shape: • Center: • Spread: • Outliers: • Histogram:

  46. Regular Stem plot

  47. Split-Stem plot (Notice: These values have been rounded to one decimal place)

  48. Time Plots • Many variables are measured at intervals over time: --Closing stock prices (each day) --Number of hurricanes (each year) --Unemployment rates

More Related