250 likes | 265 Views
Chapter 1: Exploring Data. Sec. 1.1 Analyzing Categorical Data. Data Analysis: Making Sense of Data. IDENTIFY the individuals and variables in a set of data CLASSIFY variables as categorical or quantitative DISPLAY categorical data with a bar graph
E N D
Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data
Data Analysis: Making Sense of Data • IDENTIFY the individuals and variables in a set of data • CLASSIFY variables as categorical or quantitative • DISPLAY categorical data with a bar graph • IDENTIFY what makes some graphs of categorical data deceptive • CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table • CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table • DESCRIBE the association between two categorical variables
Data Analysis Statisticsis the science of data. Data Analysis is the process of organizing, displaying, summarizing, and asking questions about data.
Data Analysis • Individuals • objects described by a set of data (don’t necessarily have to be people) • Variable • any characteristic of an individual • Categorical Variable • places an individual into one of several groups or categories. • Quantitative Variable • takes numerical values for which it makes sense to find an average.
Example, p. 3 CensusAtSchool is an international project that collects data about primary and secondary school students using surveys. Hundreds of thousands of students from Australia, Canada, New Zealand, South Africa, and the United Kingdom have taken part in the project since 2000. Data from the surveys are available at the project’s Web site (www.censusatschool.com). We used the site’s “Random Data Selector” to choose 10 Canadian students who completed the survey in a recent year. The table displays the data.
Example, p. 3 • Who are the individuals in this data set? • What variables were measured? Identify each as categorical or quantitative. • Describe the individual in the highlighted row.
Distribution • tells us what values a variable takes and how often it takes those values. Data Analysis How to Explore Data Start by determining the variable of interest. Display the data in a graph. Add Numerical Summaries.
Categorical Variables Categorical variables place individuals into one of several groups or categories. Variable Values Count Percent
Count vs. Percent • Frequency Table – displays counts in each category • Relative Frequency Table – displays the percent in each category (% found by count in category / total count) • Usually percents are rounded. If the rounded percents do not add up to exactly 100%, it may be due to round-off error.
Displaying Categorical Data – Bar Graph • Categorical data • Bars don’t touch and must be equally wide • Can rearrange categories • Scale may be in counts or percents
Displaying Categorical Data – Pie Chart • Categorical Data • Must be Out of a Whole (100%) Read more about Pie Charts on p. 8.
Graphs: Good and Bad • There are two important lessons to keep in mind: • beware the pictograph, and • watch those scales. Who Buys iMacs?
Two-Way Tables and Marginal Distributions A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. Typically, we put the explanatory variables in the columns and the response variables in the rows.
Example, p. 12 I’m Gonna Be Rich! A survey of 4826 randomly selected young adults (aged 19 to 25) asked, “What do you think the chances are you will have much more than a middle-class income at age 30?” The table below shows the responses. What are the variables described by this two-way table? How many young adults were surveyed?
Two-Way Tables and Marginal Distributions The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. • How to examine a marginal distribution: • Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. (row or column total / table total) • Make a graph to display the marginal distribution.
Examine the marginal distribution of chance of getting rich. Example, p. 13
Relationships Between Categorical Variables A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. • How to examine or compare conditional distributions: • Select the row(s) or column(s) of interest. • Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). • Make a graph to display the conditional distribution. • Use a bar graph to compare distributions.
Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Example, p. 15
Example, p. 15 Can we say there is an association between gender and opinion in the population of young adults? Making this determination requires formal inference, which will have to wait a few chapters. Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background.
Data Analysis: Making Sense of Data • DISPLAY categorical data with a bar graph • IDENTIFY what makes some graphs of categorical data deceptive • CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table • CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table • DESCRIBE the association between two categorical variables • A dataset contains information on individuals. • For each individual, data give values for one or more variables. • Variables can be categorical or quantitative. • The distribution of a variable describes what values it takes and how often it takes them. • Inference is the process of making a conclusion about a population based on a sample set of data.
Homework – Due Block Day • P. 6 # 3 & 6 • P. 20 # 9, 16, 19, 21