490 likes | 668 Views
Chapter 1 Section 1. Introduction to the Practice of Statistics. Chapter 1 – Section 1. The science of statistics is Collecting Organizing Summarizing Analyzing information to draw conclusions or answer questions. Chapter 1 – Section 1. Organize and summarize the information
E N D
Chapter 1Section 1 Introduction to the Practice of Statistics
Chapter 1 – Section 1 • The science of statistics is • Collecting • Organizing • Summarizing • Analyzing information to draw conclusions or answer questions
Chapter 1 – Section 1 • Organize and summarize the information Descriptive statistics (chapters 2 through 4) • Draw conclusion/generalization from the information Inferential statistics (chapters 9 through 11)
Chapter 1 – Section 1 • A population - Is the group to be studied - Includes all of the individuals in the group • A sample • Is a subset of the population • Is often used in analyses because getting access to the entire population is impractical
Chapter 1 – Section 1 • Characteristics of the individuals under study are called variables • Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables • Some variables have values that are numeric measurements … those are called quantitative variables • The suggested approaches to analyzing problems vary by the type of variable
Chapter 1 – Section 1 • Examples of qualitative variables • Gender • Zip code • Blood type • States in the United States • Brands of televisions • Qualitative variables have category values … those values cannot be added, subtracted, etc.
Chapter 1 – Section 1 • Examples of quantitative variables • Temperature • Height and weight • Sales of a product • Number of children in a family • Points achieved playing a video game • Quantitative variables have numeric values … those values can be added, subtracted, etc.
Chapter 1 – Section 1 • Quantitative variables can be either discrete or continuous • Discrete variables • Variables that have a finite or a countable number of possibilities • Frequently variables that are counts • Continuous variables • Variables that have an infinite but not countable number of possibilities • Frequently variables that are measurements
Chapter 1 – Section 1 • Examples of discrete variables • The number of heads obtained in 5 coin flips • The number of cars arriving at a McDonald’s between 12:00 and 1:00 • The number of students in class • The number of points scored in a football game • The possible values of qualitative variables can be listed
Chapter 1 – Section 1 • Examples of continuous variables • The distance that a particular model car can drive on a full tank of gas • Heights of college students
Summary: Chapter 1 – Section 1 • The process of statistics is designed to collect and analyze data to reach conclusions • Variables can be classified by their type of data • Qualitative or categorical variables • Discrete quantitative variables • Continuous quantitative variables
Chapter 2 Organizing and Summarizing Data
Chapter 2 Sections • Sections in Chapter 2 • Organizing Qualitative Data • Organizing Quantitative Data • Graphical Misrepresentations of Data
Chapter 2Section 1 Organizing Qualitative Data
Chapter 2 – Section 1 • Qualitative data values can be organized by a frequencydistribution • A frequency distribution lists • Each of the categories • The frequency for each category
Chapter 2 – Section 1 • A simple data set is blue, blue, green, red, red, blue, red, blue • A frequency table for this qualitative data is • The most commonly occurring color is blue
Chapter 2 – Section 1 • The relativefrequencies are the proportions (or percents) of the observations out of the total • A relative frequency distribution lists • Each of the categories • The relative frequency for each category
Chapter 2 – Section 1 • A relative frequency table for this qualitative data is • A relative frequency table can also be constructed with percents (50%, 12.5%, and 37.5% for the above table)
Chapter 2 – Section 1 • Bar graphs for our simple data (using Excel) • Frequency bar graph • Relative frequency bar graph
Chapter 2 – Section 1 • A Paretochart is a particular type of bar graph • A Pareto differs from a bar chart only in that the categories are arranged in order • The category with the highest frequency is placed first (on the extreme left) • The second highest category is placed second • Etc. • Pareto charts are often used when there are many categories but only the top few are of interest
Chapter 2 – Section 1 • A Pareto chart for our simple data (using Excel)
Chapter 2 – Section 1 • An example side-by-side bar graph comparing educational attainment in 1990 versus 2003
Chapter 2 – Section 1 • An example of a pie chart
Chapter 2Section 2 Organizing Quantitative Data:
Chapter 2 – Section 2 • Consider the following data • We would like to compute the frequencies and the relative frequencies
Chapter 2 – Section 2 • The resulting frequencies and the relative frequencies
Chapter 2 – Section 2 • Example of histograms for discrete data • Frequencies • Relative frequencies
Chapter 2 – Section 2 • Continuous data cannot be put directly into frequency tables since they do not have any obvious categories • Categories are created using classes, or intervals of numbers • The continuous data is then put into the classes
Chapter 2 – Section 2 • For ages of adults, a possible set of classes is 20 – 29 30 – 39 40 – 49 50 – 59 60 and older • For the class 30 – 39 • 30 is the lowerclasslimit • 39 is the upperclasslimit • The classwidth is the difference between the upper class limit and the lower class limit • For the class 30 – 39, the class width is 40 – 30 = 10
Chapter 2 – Section 2 • All the classes have the same widths, except for the last class • The class “60 and above” is an open-endedclass because it has no upper limit • Classes with no lower limits are also called open-ended classes
Chapter 2 – Section 2 • The classes and the number of values in each can be put into a frequency table • In this table, there are 1147 subjects between 30 and 39 years old
Chapter 2 – Section 2 • Good practices for constructing tables for continuous variables • The classes should not overlap • The classes should not have any gaps between them • The classes should have the same width (except for possible open-ended classes at the extreme low or extreme high ends) • The class boundaries should be “reasonable” numbers • The class width should be a “reasonable” number
Chapter 2 – Section 2 • Just as for discrete data, a histogram can be created from the frequency table • Instead of individual data values, the categories are the classes – the intervals of data
Chapter 2 – Section 2 • A stem-and-leafplot is a different way to represent data that is similar to a histogram • To draw a stem-and-leaf plot, each data value must be broken up into two components • The stem consists of all the digits except for the right most one • The leaf consists of the right most digit • For the number 173, for example, the stem would be “17” and the leaf would be “3”
Chapter 2 – Section 2 • In the stem-and-leaf plot below • The smallest value is 56 • The largest value is 180 • The second largest value is 178
Chapter 2 – Section 2 • To draw a stem-and-leaf plot • Write all the values in ascending order • Find the stems and write them vertically in ascending order • For each data value, write its leaf in the row next to its stem • The resulting leaves will also be in ascending order • The list of stems with their corresponding leaves is the stem-and-leaf plot
Chapter 2 – Section 2 • Modifications to stem-and-leaf plots • Sometimes there are too many values with the same stem … we would need to split the stems (such as having 10-14 in one stem and 15-19 in another) • If we wanted to compare two sets of data, we could draw two stem-and-leaf plots using the same stem, with leaves going left (for one set of data) and right (for the other set)
Chapter 2 – Section 2 • A dotplot is a graph where a dot is placed over the observation each time it is observed • The following is an example of a dot plot
Chapter 2 – Section 2 • A useful way to describe a variable is by the shape of its distribution • Some common distribution shapes are • Uniform • Bell-shaped (or normal) • Skewed right • Skewed left
Chapter 2 – Section 2 • A variable has a uniform distribution when • Each of the values tends to occur with the same frequency • The histogram looks flat
Chapter 2 – Section 2 • A variable has a bell-shaped distribution when • Most of the values fall in the middle • The frequencies tail off to the left and to the right • It is symmetric
Chapter 2 – Section 2 • A variable has a skewedright distribution when • The distribution is not symmetric • The tail to the right is longer than the tail to the left • The arrow from the middle to the long tail points right Right
Chapter 2 – Section 2 • A variable has a skewedleft distribution when • The distribution is not symmetric • The tail to the left is longer than the tail to the right • The arrow from the middle to the long tail points left Left
Summary: Chapter 2 – Section 2 • Quantitative data can be organized in several ways • Histograms based on data values are good for discrete data • Histograms based on classes (intervals) are good for continuous data • The shape of a distribution describes a variable … histograms are useful for identifying the shapes
Chapter 2Section 3 Graphical Misrepresentations of Data
Chapter 2 – Section 4 • The two graphs show the same data … the difference seems larger for the graph on the left • The vertical scale is truncated on the left
Chapter 2 – Section 4 • The gazebo on the right is twice as large in each dimension as the one on the left • However, it is much more than twice as large as the one on the left Original “Twice” as large
Summary: Chapter 2 – Section 1 • Qualitative data can be organized in several ways • Tables are useful for listing the data, its frequencies, and its relative frequencies • Charts such as bar graphs, Pareto charts, and pie charts are useful visual methods for organizing data • Side-by-side bar graphs are useful for comparing two sets of qualitative data