1 / 49

Statistics 100 Lecture Set 5

Statistics 100 Lecture Set 5. Read chapters 8, 9 and 10 Suggested problems: Chapter 8: 8.3, 8.9, 8.11, 8.17, 8.19, 8.23, 8.25 Chapter 9: 9.3, 9.7, 9.15, 9.19, 9.23, 9.27 Chapter 10: 10.3, 10.5, 10.11, 10.23, 10.25. Measurement.

lanza
Download Presentation

Statistics 100 Lecture Set 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 100Lecture Set 5

  2. Read chapters 8, 9 and 10 • Suggested problems: • Chapter 8: 8.3, 8.9, 8.11, 8.17, 8.19, 8.23, 8.25 • Chapter 9: 9.3, 9.7, 9.15, 9.19, 9.23, 9.27 • Chapter 10: 10.3, 10.5, 10.11, 10.23, 10.25

  3. Measurement • Important question facing researchers: what data should I collect? • Text gives example of wanting to do a study attempting to address the question “Are people with larger brains more intelligent?” What to do?

  4. Measurement • Question for your instructors: How do you measure how much a student knows? • Hard to do

  5. Measurement • Measurement

  6. Measurement • So, would like the measurements you take to inform the problem at hand • What good are surveys and experiments when people often can’t measure the property they want to learn about?

  7. How is the variable defined? • Example: Citizens for Public Justice released a report Bearing the Brunt: How the 2008-2009 Recession Created Poverty for Canadian Families • What is poverty and how is it measured?

  8. Valid or invalid measurement? • Measurement is valid if it a relevant representation of the property under study • Example: Number of alcohol related deaths in BC since the proliferation of private liquor stores • Example: Using number of deaths to measure severity of a hurricane/typhoon • Example: Measuring the amount of snow to describe the severity of the winter

  9. Valid or invalid measurement? • Valid or Invalid? • Sometimes the problem is a matter of scaling: • “Most accidents occur within 10 km of home” • Supposing this is true…why? • Counts are frequently influenced by how much

  10. Accuracy and Precision • If you want to know your weight, what do you do? • Does is always give you the right answer? • Different scales • Different clothes • Recent meals • Recent…un-meals • “Random” variation

  11. Accuracy and Precision • Measurement = truth + bias + random error

  12. Accuracy and Precision • How would you decrease variability?

  13. Chapter 9 • Basically, this chapter deals with thinking about what you are being told

  14. Chapter 9 • Interpreting numbers properly is hard. Why?

  15. Chapter 9 • How should you approach reported numbers and summaries?

  16. Chapter 9 • Consider a Coquitlam Now article on Alcohol related deaths: • “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.”

  17. Chapter 9 • Consider the Coquitlam Now article on Alcohol related deaths: • “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.” • During the study period, the number of private stores jumped from 727 in 2003 to 977 in 2008, while the number of government stores dropped from 222 in 2003 to 199 in 2008. There was little to no increase in the amount of bars and restaurants during that time, though the number of alcohol-related deaths rose: 1,937 in 2003; 1,983 in 2004; 2,016 in 2005; 2,086 in 2006; 2,074 in 2007; and 2,011 in 2008.

  18. Chapter 9 • Comparisons: • Look for key phrases • One of the best… • No one has more/better/lower… • Compared to a leading brand…

  19. Example • If the presenter stands to gain, expect bias

  20. Chapter 9 • Other ways to mislead:

  21. Example • From a NY Times editorial about immegrant children: • Immigrant children lagged in mastering standard academic English, the passport to college and to brighter futures. Whereas native -born children’s language skills follow a bell curve, immigrants’ children were crowded in the lower ranks: More than three-quarters of the sample scored below the 85th percentile in English proficiency.

  22. Example • 10-day weather forecast for Seattle • “Is this right?” • “What do they want me to conclude?” • “Are there other explanations for this besides what they want me to think?” • “Is the reporting incomplete?”

  23. Descriptive Statistics • Statistics deals with tools for collecting and understanding data • Have discussed ways to collect data • How do we deal with the data we collect? • Begin by summarizing the data • Want to describe or summarize data in a clear and concise way • Will first focus on descriptive statistics (graphical and numeric) in chapters 10-14

  24. Recall … • Interested in something about a population • Population is a collection of individuals • Describe individuals with data • Data sets contain information/facts relating to individuals • Variables are attributes of an individual (e.g., hair color, pain severity, ...) • Distribution of a variable gives the values the variable can take and how often it takes on each value

  25. Types of Variables • Two types of variables: • Categorical Variables: each individual falls into a category (ethnicity, machine works or does not, …) • A special type of categorical data is ordered categorical (ordinal) • Categories are ordered in a natural way • Can apply ideas of >, < (ordering) • Quantitative Variables take on numeric values for which addition and averaging make sense (height, weight, income,…).

  26. Types of Variables – Which type? • Hair color: • Color preference (red=1, blue=2, green=3): • Length of time slept: • Height of an individual: • Level of education (Some HS, HS Grad, Some post HS, Associate’s Degree, Bachelor’s Degree, Graduate Degree)

  27. Chapter 10: Descriptive Statistics • Want to describe or summarize data in a clear and concise way • Two basic methods: graphical and numerical

  28. Graphical Descriptions of Data • Often, pictures tells entire story of data • Have different plots for the different sorts of variables • A graph (or graphic) is any visual display of numbers • The goal of a graph is to • Summarize information from a set of data into a picture that is easy to understand • Help to highlight a specific story or point within the data (sometimes)

  29. Graphical Descriptions of Data • Many way to do this: • Tables • Pie Charts • Bar Charts • Histograms • Time plots • Line graphs • Scatter-plots • Custom-made graphics • …

  30. Graphical Descriptions of Data • Recall: • Data are values of variables that we observe in a sample • Sample was drawn from a population • We are trying to find out about something about the values of the variable in the population

  31. Graphical Descriptions of Data • Distribution of a variable gives the values the variable can take and how often it takes on each value • A population distribution is a distribution for a population of values • Also called a probability distribution • An empirical distribution is a distribution for a sample • We have this information in the data • So in a graph, we use summaries of an empirical distribution to learn about a population distribution

  32. Graphical Descriptions for Categorical Data • For categorical data of any kind, we can summarize the distribution easily: • Identify all of the values the variable can take • Count the number of times each value is observed • Count is often called a frequency • Often compute percentages from the counts • Can display in a table or a chart

  33. Graphical Descriptions for Categorical Data • A table of the distribution is just a list of values and corresponding counts and/or percentages. • Tables are great for detail • Takes time to scan and digest

  34. Bar Charts • Variable values are the category labels (typically placed along the x-axis) • Heights of bar is the count (percentage) of values falling in that category. • Note bars are the same width! • Usually start axis at zero …. WHY?

  35. Comments • Ordering of categories: • Often done alphabetically. • Not necessarily the best! • Good when there are many bars: categories easy to find • Sometimes done in order of heights • Sometimes called a Pareto Plot • Good for making comparisons among bars. • Individual categories can be hard to locate • Do what makes sense for you and the reader • Start axes at reasonable values … do not try to mislead

  36. Example (retirement savings) • A USA Today (Jan. 4, 2000) poll asked Americans who earn $35,000 or less how they expected to accumulate a $500,000 retirement nest-egg. • The results are summarized in the frequency table below:

  37. Pie Charts • Variable values are the category labels • Each category must appear on the plot • Percentage of area of pie covered by pie is relative frequency or percent) of values falling in that category. • Can easily see percentage for each category • Note: Less flexible than bar chart

  38. Comments • Bar charts more flexible than pie charts • Bar charts easier to compare frequencies of categories than pie charts • Comparisons between datasets are easier using the bar chart than a pie chart • Pie chart must have same data as table to make precise comparisons

  39. Comments

  40. Plots for Quantitative Variables • Can summarize quantitative data using plots • Most common plots – time-plots, histogram and box-plots • Will introduce box-plots later

  41. Time-plots (line graph) • If measuring a variable across time, plot against time • That is, plot you observations on the y-axis versus the time on the x-axis

  42. Apple stock prices in the past year

  43. Which stock would you buy?

  44. More Comments • Include in a graph only things that describe the data • Beware missing axis labels • Beware moving axis labels • See Example 6/Figure 10.7 in book for great example of messing with the axes to tell different stories • Graph is a compromise between summary and detail

  45. Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/

  46. Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/

More Related