110 likes | 396 Views
Statistics-MAT 150 Chapter 1 Introduction to Statistics. Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: x7421. Chapter 1. Overview Nature of data Skills needed in statistics. Statistics: Descriptive Analyze nature of data from surveys, experiments, observations,
E N D
Statistics-MAT 150Chapter 1Introduction to Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: x7421
Chapter 1 • Overview • Nature of data • Skills needed in statistics
Statistics: Descriptive Analyze nature of data from surveys, experiments, observations, Inferential Draw conclusions from the analyses with respect to the population Survey: tool to collect data from a smaller group which is part of a larger group to learn something about the larger group Overview • Key goal of statistics: • Learn about a large group • (population) from data from • from a smaller subgroup • (sample)
Definitions: Data: observations collected (measurements, gender, answers,…) Statistics: collection of methods to analyze data Population: complete collection of elements (scores, measurements, subjects,…) Sample: subcollection of members from selected population Census: collection of data from every member of the population Overview
Overview 2 Example: • Poll: 1087 adults are asked whether they drink alcoholic beverages or not. • Sample: 1087 adults • Population: US adults 150 million. • Census: Every 10 years, the census bureau tries to collect information from every member of the US population. • Impossible! • Very expensive! • Use sample data to draw conclusions from whole population: inferential statistics!
Types of data Parameter: • A numerical measurement describing some characteristic of the population. • Lincoln elected: 39.82% of 1,865,908 votes counted. • 39.82% is a parameter. Statistic: • A numerical measurement describing some characteristic of the sample. • Based on a sample of 877 elected executives, 45% would not hire an applicant with a typographical error in the application. • 45% is a statistic.
Types of data 2 Quantitative data:Numbers representing counts or measurements. • Weights of supermodels. Qualitative data: Nonnumerical. • Gender of an athlete. Discrete vs. continuous data • # of people in a household vs. temperatures in May. Nominal level of measurement: names, labels categories: no ordering. • Yes/No/Undecided responses, colors. Ordinal level of measurement: some order, but numerical values meaningless or nonexistent. • Course grades A, B, C, D, F. “Livability rank of a city”. Interval level of measurement: order, but “no 0” or meaningless. • Temperature, year. Ratio level of measurement: as before with meaningfull zero. • Weights, prices (non-negative).
Basic skills Samples: • representative: • “39/40 polled people vote for A” Sampled in A’s headquarters! • Not too small: • CDF published “among HS students suspended, 67% suspended more than 3 times” Sample size: 3! Graphs: In which one does red do better? Percentage of: • 6 % of 1200 = 6 / 100 * 1200 = 72% Fraction >>> percentage: • 3/4 = 0.75 >>> 0.75 * 100% = 75 % Percentage >>> decimal: • 27.3% = 27.3/100 = 0.273 Decimal >>> percentage: • 0.852 >>> 0.852 * 100% = 85.2% • `
Basic skills 2 Calculator:
Design Observational study: observe and measure characteristics without trying to modify subjects. • Gallup poll. • Cross-sectional: data observed, measured at one point in time. • Retrospective: data are collected from the past (records) • Prospective: data collected along the way from groups (smokers/NS) Experiment: apply treatment and observe and measure effects. • Clinical trial for Lipitor. • Control: blinding - placebo, double-blinding, blocks • Replication: ability to repeat experiment • Randomization: data needs to be collected in an appropriate (random) way, otherwise it is completely useless! • Random sample: members of the population are selected so that each individual member has the same chance of being selected. • Simple random sample of size n : every possible random sample of size n has the same chance of being chosen.
Design 2 Sampling: • systematic: select starting point and every kth member chosen. • convenience: use easy to get data • stratified: subdivide population into at least 2 subgroups with common characteristic and draw samples from each (e.g. gender or age) • cluster: divide population into areas and draw samples form clusters Sampling error: the difference between a sample result and the true population result; results from chance sample fluctuations Nonsampling error: occurs when data is incorrectly collected, measured, recorded or analyzed.