350 likes | 376 Views
Learn about types of data, variables, histograms, scatterplots, and sampling methods for statistical analysis in MATH/STAT.352 Spring 2007 lectures at UNR. Study male patients' survival post-heart attack.
E N D
Probability and Statistics MATH/STAT 352 Spring 2007 Lecture 2: Types of Data Simple Graphs Sampling UNR, MATH/STAT 352, Spring 2007
Types of data UNR, MATH/STAT 352, Spring 2007
MATH/STAT 352: Quiz 0 UNR, MATH/STAT 352, Spring 2007
A subset of the data from a study of a series of male patients from Greenlane Hospital in Aukland after a heart attack Goal of the study: How long will the patient live after the heart attack? UNR, MATH/STAT 352, Spring 2007
Types of variables Quantitative Qualitative (Age, time, probability, etc.) (Color, surgery outcome, smoking, etc.) Continuous Discrete Categorical Ordinal May take any value from some interval (probability) May take values from some grid (age in years) No order (surgery outcome) Order (Letter grade) UNR, MATH/STAT 352, Spring 2007
Time Time Type of variable is determined by data you have and problem you consider Aging of a person is a continuous process Age is a quantitative, continuous variable Time Age in years (18, 25, 63,…) is quantitative, discrete Age as (Kid, Young, Middle-age, Senior) is qualitative, ordinal UNR, MATH/STAT 352, Spring 2007
A subset of the data from a study of a series of male patients from Greenlane Hospital in Aukland after a heart attack Goal of the study: How long will the patient live after the heart attack? UNR, MATH/STAT 352, Spring 2007
How to proceed? We need some tools for reducing the masses of figures to simple entities that we can manipulate with. A picture is worth a thousand words! UNR, MATH/STAT 352, Spring 2007
Simple plots: dot plot Original data: {8 3 6 4.5 4 4.5} Sorted data: {3 4 4.5 4.5 6 8} Simple graph: dot plot UNR, MATH/STAT 352, Spring 2007
Simple plots: dot plot Interesting features of the data emphasized by the dot plot UNR, MATH/STAT 352, Spring 2007
Simple plots: dot plot Exploiting gaps and clusters: UNR, MATH/STAT 352, Spring 2007
Simple plots: dot plot body of data outliers Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram Histogram is the most widely used statistical graph Measurement UNR, MATH/STAT 352, Spring 2007
n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 Simple plots: histogram Divide observational interval into subintervals, (also called bins, class intervals) Calculate number of observation within each bin Draw a rectangle w/heigth = number of observations = frequency Relative frequency is the number of observations within a bin divided by the total number of observations Measurement UNR, MATH/STAT 352, Spring 2007
Frequency vs. relative frequency Frequency Number of observations n=20 (sample size) k=8 (# of bins) 1 2 3 4 3 1 2 4 Measurement Relative frequency Fraction of observations n=20 (sample size) k=8 (# of bins) .05 .1 .15 .15 .05 .2 .1 .2 Measurement UNR, MATH/STAT 352, Spring 2007
Frequency vs. relative frequency 080 – right answer 080 – right answer Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram UCLA, Stats 14, Fall 2004 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram Bimodal Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram ~ uniform ? Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram unimodal, right skewed outliers Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram spike WRB number Data from Quiz 0 UNR, MATH/STAT 352, Spring 2007
Simple plots: pie chart http://static.deliaonline.com/images/originals/cc444-apple-blackberry-pie-18775.jpg UNR, MATH/STAT 352, Spring 2007
Simple plots: pie chart UNR, MATH/STAT 352, Spring 2007
Simple plots: pie chart UNR, MATH/STAT 352, Spring 2007
Simple plots: pie chart UNR, MATH/STAT 352, Spring 2007
Simple plots: scatterplot UNR, MATH/STAT 352, Spring 2007
Simple plots: histogram Central values and spread spread, most of the observed values Measurement x0 is the central value, characteristic value UNR, MATH/STAT 352, Spring 2007
Sampling (the science of choosing the data) UNR, MATH/STAT 352, Spring 2007
Definitions: • A population is the entire collection of objects or outcomes about which information is sought. • A sample is a subset of a population, containing the objects or outcomes that are actually observed. • A simple random sample(SRS) of size n is a sample chosen by a method in which each collection of n population items is equally likely to comprise the sample, just as in the lottery. UNR, MATH/STAT 352, Spring 2007
Definition: A sample of convenience is a sample that is not drawn by a well-defined random method. Things to consider with convenience samples: • Differ systematically in some way from the population. • Only use when it is not feasible to draw a random sample. UNR, MATH/STAT 352, Spring 2007
Simple Random Sampling • A SRS is not guaranteed to reflect the population perfectly. • SRS’s always differ in some ways from each other; occasionally a sample is substantially different from the population. • Two different samples from the same population will vary from each other as well. • This phenomenon is known as sampling variation. UNR, MATH/STAT 352, Spring 2007
Populations • Definitions: • A tangible population is a finite population that consists of actual objects. Examples: People in our class, buildings in Reno. • A conceptual population consists of items that are not actual objects. Examples: All possible shootings from the riffle, all possible tossings of a coin, all possible results of weighting a rock sample. UNR, MATH/STAT 352, Spring 2007