260 likes | 360 Views
Section 1.1. Informed Decisions Using Data. Vocabulary. Data: information we gather with experiments and with surveys. Examples: Peoples height Number of points a drug lowers cholesterol Favorite type of pets Answers to a “Yes” or “No” question. Vocabulary. Statistics
E N D
Section 1.1 Informed Decisions Using Data
Vocabulary • Data: • information we gather with experiments and with surveys. • Examples: • Peoples height • Number of points a drug lowers cholesterol • Favorite type of pets • Answers to a “Yes” or “No” question
Vocabulary • Statistics • The “art and science” of: • Designing studies • Analyzing resultant data • Translating data into knowledge and understanding
The Statistical Method • Design: Planning how to obtain data • Description: Summarizing the data • Inference: Making decisions and predictions
Design: • Design questions: • How to conduct the experiment, or • How to select “people” for the survey to insure trustworthy results
Description • Descriptive Statistics: • Summaries of the data collected from the design stage. • Numerical summaries • Average • Sum • Proportion • Graphical summaries • Pie Charts • Bar graphs • Histograms
Inference • Methods of making decisions or predictions about a populations based on information obtained from a sample. • “Conclusions”
More Vocab: Section 1.2 • Subject: • The entities that we measure in a study • Population: • All subjects of interest • Sample: • Subset of the population from which we will gather data Population Sample
Example: • In California in 2003, a special election was held to consider whether Governor Gray Davis should be recalled from office. • An exit poll sampled 3160 of the 8 million people who voted. Define the sample and the population for this exit poll. • The population was the 8 million people who voted in the election. • The sample was the 3160 voters who were interviewed in the exit poll.
Descriptive vs. Inferential Statistics • Descriptive Statistics refers to methods for summarizing the data. Summaries consist of graphs and numbers such as averages and percentages • Inferential statistics refers to methods of making decisions or predictions about a population based on data obtained from a sample of that population.
Example: • By surveying 1000 likely voters, we find a sample proportion of 39% who approve of the job congress was doing. • We are 95% confident that the population proportion of likely voters who approve of the job congress is doing is between 36% and 42%.
Parameters • A statistic is based on a sample. • Descriptive statistics give information on that sample. • Inferential statistics use the sample data to draw conclusions about the population. • They are still based on the sample. • A Parameter is a description based on the population. • These are uncommon in practice. • We will use different symbols for statistics and parameters
Randomness: • A simple random sample: • Each subject in the population has the same chance of being selected as part of the sample. • This is the goal of the design stage. • When this is not the case you might be accused of sampling bias.
Sample Size: • Because our data will vary from subject to subject, and from sample to sample, larger samples produce more reliable results. • Why not sample the whole population? • How Large is large enough? • “Thirty”, sometimes
Variables: • Variables are characteristics observed from the subject of a study. There are two main types. Categorical: Categorical variables record non-numerical information. Quantitative: Quantitative variables record numerical information.
Categorical Data • Categorical data is such that data can be assigned to “classes” with each data belonging to one category. • Examples: • Favorite type of pet • Does the subject smoke • Blood type • Political affiliation
Categorical Data • When using categorical data we are usually concerned with the proportion or percentage of the total observations within each category. • Example: • Dogs (32%) • Cats (28%) • Birds (13%) • Reptiles (8%) • Other (19%) • When there are only two categories, categorical data is called Binomial. (yes and no questions)
Finding proportions: • The proportion of the observations that fall in a certain category is the frequency (count) of observations in that category divided by the total number of observations Frequency of that class Sum of all frequencies • The Percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies.
Example: • You ask 50 people what their favorite type of pet is and sixteen tell you its dogs. What proportion of people consider dogs their favorite type of pet? Frequency of that class16 8 .32Sum of all frequencies 50 25 • So, the proportion is .32. • To find the percentage; (.32) x 100 = 32% • Either value can be called the relative frequency. = = =
Example 2: • Given the frequency table for blood types in the USA: Find the proportion for each category.
Example 2: • Given the frequency table for blood types in the USA:
Quantitative Data • When the data that is collected consists of numerical information we call it quantitative data. Quantitative data comes in two “flavors”. • Discrete: • Data that can all be represented by a finite number of separate values, like {1, 2, 3, 4, 5, …} • Continuous: • Data that can be any value within a range, like weight of a penny.
Quantitative Data: • When using Quantitative data we are usually concerned with the central tendencies and spread (variability) of the observations. • Central Tendencies: • Mean • Median • Mode (??) • Spread • Standard Deviation • Inter Quartile Range