530 likes | 549 Views
Learn about different types of data, graphs, and numerical summaries essential for comprehensive statistical analysis and interpretation. Understand variables, variation, and methods to summarize data effectively.
E N D
Chapter 2Exploring Data with Graphs and Numerical Summaries • Learn …. The Different Types of Data The Use of Graphs to Describe Data The Numerical Methods of Summarizing Data
Section 2.1 What are the Types of Data?
In Every Statistical Study: • Questions are posed • Characteristics are observed
Characteristics are Variables A Variable is any characteristic that is recorded for subjects in the study
Variation in Data • The terminology variablehighlights the fact that data values vary.
Example: Students in a Statistics Class • Variables: • Age • GPA • Major • Smoking Status • …
Data values are called observations • Each observation can be: • Quantitative • Categorical
Categorical Variable • Each observation belongs to one of a set of categories • Examples: • Gender (Male or Female) • Religious Affiliation (Catholic, Jewish, …) • Place of residence (Apt, Condo, …) • Belief in Life After Death (Yes or No)
Quantitative Variable • Observations take numerical values • Examples: • Age • Number of siblings • Annual Income • Number of years of education completed
Graphs and Numerical Summaries • Describe the main features of a variable • For Quantitative variables: key features are center and spread • For Categorical variables: key feature is the percentage in each of the categories
Quantitative Variables • Discrete Quantitative Variables and • Continuous Quantitative Variables
Discrete • A quantitative variable is discrete if its possible values form a set of separate numbers such as 0, 1, 2, 3, …
Examples of discrete variables • Number of pets in a household • Number of children in a family • Number of foreign languages spoken
Continuous • A quantitative variable is continuous if its possible values form an interval
Examples of Continuous Variables • Height • Weight • Age • Amount of time it takes to complete an assignment
Frequency Table • A method of organizing data • Lists all possible values for a variable along with the number of observations for each value
Example: Shark Attacks Example: Shark Attacks • What is the variable? • Is it categorical or quantitative? • How is the proportion for Florida calculated? • How is the % for Florida calculated?
Example: Shark Attacks • Insights – what the data tells us about shark attacks
Identify the following variable as categorical or quantitative: Choice of diet (vegetarian or non-vegetarian): • Categorical • Quantitative
Identify the following variable as categorical or quantitative: Number of people you have known who have been elected to political office: • Categorical • Quantitative
Identify the following variable as discrete or continuous: The number of people in line at a box office to purchase theater tickets: • Continuous • Discrete
Identify the following variable as discrete or continuous: The weight of a dog: • Continuous • Discrete
Section 2.2 How Can We Describe Data Using Graphical Summaries?
Graphs for Categorical Data • Pie Chart: A circle having a “slice of pie” for each category • Bar Graph: A graph that displays a vertical bar for each category
Pie Chart vs. Bar Chart • Which graph do you prefer? • Why?
Graphs for Quantitative Data • Dot Plot: shows a dot for each observation • Stem-and-Leaf Plot: portrays the individual observations • Histogram: uses bars to portray the data
Dotplot for Sodium in Cereals • Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
Stem-and-Leaf Plot for Sodium in Cereal Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
Frequency Table Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
Which Graph? • Dot-plot and stem-and-leaf plot: • More useful for small data sets • Data values are retained • Histogram • More useful for large data sets • Most compact display • More flexibility in defining intervals
Shape of a Distribution • Overall pattern • Clusters? • Outliers? • Symmetric? • Skewed? • Unimodal? • Bimodal?
Consider a data set containing IQ scores for the general public: What shape would you expect a histogram of this data set to have? • Symmetric • Skewed to the left • Skewed to the right • Bimodal
Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly: What shape would you expect a histogram of this data set to have? • Symmetric • Skewed to the left • Skewed to the right • Bimodal
Section 2.3 How Can We describe the Center of Quantitative Data?
Mean • The sum of the observations divided by the number of observations
Median • The midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest)
Find the mean and median CO2 Pollution levels in 8 largest nations measured in metric tons per person: 2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 • Mean = 4.6 Median = 1.5 • Mean = 4.6 Median = 5.8 • Mean = 1.5 Median = 4.6
Outlier • An observation that falls well above or below the overall set of data • The mean can be highly influenced by an outlier • The median is resistant: not affected by an outlier
Mode • The value that occurs most frequently. • The mode is most often used with categorical data
Section 2.4 How Can We Describe the Spread of Quantitative Data?
Measuring Spread: Range • Range: difference between the largest and smallest observations