490 likes | 609 Views
Statistics 100 Lecture Set 5. Read chapters 8, 9 and 10 Suggested problems: Chapter 8: 8.3, 8.9, 8.11, 8.17, 8.19, 8.23, 8.25 Chapter 9: 9.3, 9.7, 9.15, 9.19, 9.23, 9.27 Chapter 10: 10.3, 10.5, 10.11, 10.23, 10.25. Measurement.
E N D
Read chapters 8, 9 and 10 • Suggested problems: • Chapter 8: 8.3, 8.9, 8.11, 8.17, 8.19, 8.23, 8.25 • Chapter 9: 9.3, 9.7, 9.15, 9.19, 9.23, 9.27 • Chapter 10: 10.3, 10.5, 10.11, 10.23, 10.25
Measurement • Important question facing researchers: what data should I collect? • Text gives example of wanting to do a study attempting to address the question “Are people with larger brains more intelligent?” What to do?
Measurement • Question for your instructors: How do you measure how much a student knows? • Hard to do
Measurement • Measurement
Measurement • So, would like the measurements you take to inform the problem at hand • What good are surveys and experiments when people often can’t measure the property they want to learn about?
How is the variable defined? • Example: Citizens for Public Justice released a report Bearing the Brunt: How the 2008-2009 Recession Created Poverty for Canadian Families • What is poverty and how is it measured?
Valid or invalid measurement? • Measurement is valid if it a relevant representation of the property under study • Example: Number of alcohol related deaths in BC since the proliferation of private liquor stores • Example: Using number of deaths to measure severity of a hurricane/typhoon • Example: Measuring the amount of snow to describe the severity of the winter
Valid or invalid measurement? • Valid or Invalid? • Sometimes the problem is a matter of scaling: • “Most accidents occur within 10 km of home” • Supposing this is true…why? • Counts are frequently influenced by how much
Accuracy and Precision • If you want to know your weight, what do you do? • Does is always give you the right answer? • Different scales • Different clothes • Recent meals • Recent…un-meals • “Random” variation
Accuracy and Precision • Measurement = truth + bias + random error
Accuracy and Precision • How would you decrease variability?
Chapter 9 • Basically, this chapter deals with thinking about what you are being told
Chapter 9 • Interpreting numbers properly is hard. Why?
Chapter 9 • How should you approach reported numbers and summaries?
Chapter 9 • Consider a Coquitlam Now article on Alcohol related deaths: • “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.”
Chapter 9 • Consider the Coquitlam Now article on Alcohol related deaths: • “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.” • During the study period, the number of private stores jumped from 727 in 2003 to 977 in 2008, while the number of government stores dropped from 222 in 2003 to 199 in 2008. There was little to no increase in the amount of bars and restaurants during that time, though the number of alcohol-related deaths rose: 1,937 in 2003; 1,983 in 2004; 2,016 in 2005; 2,086 in 2006; 2,074 in 2007; and 2,011 in 2008.
Chapter 9 • Comparisons: • Look for key phrases • One of the best… • No one has more/better/lower… • Compared to a leading brand…
Example • If the presenter stands to gain, expect bias
Chapter 9 • Other ways to mislead:
Example • From a NY Times editorial about immegrant children: • Immigrant children lagged in mastering standard academic English, the passport to college and to brighter futures. Whereas native -born children’s language skills follow a bell curve, immigrants’ children were crowded in the lower ranks: More than three-quarters of the sample scored below the 85th percentile in English proficiency.
Example • 10-day weather forecast for Seattle • “Is this right?” • “What do they want me to conclude?” • “Are there other explanations for this besides what they want me to think?” • “Is the reporting incomplete?”
Descriptive Statistics • Statistics deals with tools for collecting and understanding data • Have discussed ways to collect data • How do we deal with the data we collect? • Begin by summarizing the data • Want to describe or summarize data in a clear and concise way • Will first focus on descriptive statistics (graphical and numeric) in chapters 10-14
Recall … • Interested in something about a population • Population is a collection of individuals • Describe individuals with data • Data sets contain information/facts relating to individuals • Variables are attributes of an individual (e.g., hair color, pain severity, ...) • Distribution of a variable gives the values the variable can take and how often it takes on each value
Types of Variables • Two types of variables: • Categorical Variables: each individual falls into a category (ethnicity, machine works or does not, …) • A special type of categorical data is ordered categorical (ordinal) • Categories are ordered in a natural way • Can apply ideas of >, < (ordering) • Quantitative Variables take on numeric values for which addition and averaging make sense (height, weight, income,…).
Types of Variables – Which type? • Hair color: • Color preference (red=1, blue=2, green=3): • Length of time slept: • Height of an individual: • Level of education (Some HS, HS Grad, Some post HS, Associate’s Degree, Bachelor’s Degree, Graduate Degree)
Chapter 10: Descriptive Statistics • Want to describe or summarize data in a clear and concise way • Two basic methods: graphical and numerical
Graphical Descriptions of Data • Often, pictures tells entire story of data • Have different plots for the different sorts of variables • A graph (or graphic) is any visual display of numbers • The goal of a graph is to • Summarize information from a set of data into a picture that is easy to understand • Help to highlight a specific story or point within the data (sometimes)
Graphical Descriptions of Data • Many way to do this: • Tables • Pie Charts • Bar Charts • Histograms • Time plots • Line graphs • Scatter-plots • Custom-made graphics • …
Graphical Descriptions of Data • Recall: • Data are values of variables that we observe in a sample • Sample was drawn from a population • We are trying to find out about something about the values of the variable in the population
Graphical Descriptions of Data • Distribution of a variable gives the values the variable can take and how often it takes on each value • A population distribution is a distribution for a population of values • Also called a probability distribution • An empirical distribution is a distribution for a sample • We have this information in the data • So in a graph, we use summaries of an empirical distribution to learn about a population distribution
Graphical Descriptions for Categorical Data • For categorical data of any kind, we can summarize the distribution easily: • Identify all of the values the variable can take • Count the number of times each value is observed • Count is often called a frequency • Often compute percentages from the counts • Can display in a table or a chart
Graphical Descriptions for Categorical Data • A table of the distribution is just a list of values and corresponding counts and/or percentages. • Tables are great for detail • Takes time to scan and digest
Bar Charts • Variable values are the category labels (typically placed along the x-axis) • Heights of bar is the count (percentage) of values falling in that category. • Note bars are the same width! • Usually start axis at zero …. WHY?
Comments • Ordering of categories: • Often done alphabetically. • Not necessarily the best! • Good when there are many bars: categories easy to find • Sometimes done in order of heights • Sometimes called a Pareto Plot • Good for making comparisons among bars. • Individual categories can be hard to locate • Do what makes sense for you and the reader • Start axes at reasonable values … do not try to mislead
Example (retirement savings) • A USA Today (Jan. 4, 2000) poll asked Americans who earn $35,000 or less how they expected to accumulate a $500,000 retirement nest-egg. • The results are summarized in the frequency table below:
Pie Charts • Variable values are the category labels • Each category must appear on the plot • Percentage of area of pie covered by pie is relative frequency or percent) of values falling in that category. • Can easily see percentage for each category • Note: Less flexible than bar chart
Comments • Bar charts more flexible than pie charts • Bar charts easier to compare frequencies of categories than pie charts • Comparisons between datasets are easier using the bar chart than a pie chart • Pie chart must have same data as table to make precise comparisons
Plots for Quantitative Variables • Can summarize quantitative data using plots • Most common plots – time-plots, histogram and box-plots • Will introduce box-plots later
Time-plots (line graph) • If measuring a variable across time, plot against time • That is, plot you observations on the y-axis versus the time on the x-axis
More Comments • Include in a graph only things that describe the data • Beware missing axis labels • Beware moving axis labels • See Example 6/Figure 10.7 in book for great example of messing with the axes to tell different stories • Graph is a compromise between summary and detail
Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/
Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/Examplehttp://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/