350 likes | 511 Views
Stat 501. Experimental Statistics I. Data, Data, Data, all around us !. We use data to answer research questions What evidence does data provide? How do I make sense of these numbers without some meaningful summary?. Example 2.
E N D
Stat 501 Experimental Statistics I
Data, Data, Data, all around us ! • We use data to answer research questions • What evidence does data provide? • How do I make sense of these numbers without some meaningful summary?
Example 2 • Study to assess the effect of exercise on cholesterol levels. One group exercises and other does not. Is cholesterol reduced in exercise group? • people have naturally different levels • respond differently to same amount of exercise (e.g. genetics) • may vary in adherence to exercise regimen • diet may have an effect • exercise may affect other factors (e.g. appetite, energy, schedule)
What is statistics? • Recognize the randomness: the variability in data. • …“the science of understanding data and making decisions in face of variability” Three steps to the process of statistics: • Design the study • Analyze the collected Data • Discover what data is telling you…
Section 1.2 Displaying Distributions with Graphs
Individuals and Variables • Individuals – objects described by a set of data • people, animals, things • also called Cases • called Subjects if they are human • Variable – characteristic of an individual, takes different values for different subjects. • The three questions to ask : • Why: Purpose of study? • Who: Members of the sample, how many? • What: What did we measure (the variables) and in what units?
Key Characteristics of a Data Set • Every data set is accompanied by important background information. In a statistical study, always ask the following questions: • Who? What cases do the data describe? How many cases does a data set have? • What? How many variables does the data set have? How are these variables defined? What are the units of measurement for each variable? • Why? What purpose do the data have? Do the data contain the information needed to answer the questions of interest?
Categorical and Quantitative Variables • A categorical variable places each case into one of several groups, or categories. • A quantitative variable takes numerical values for which arithmetic • operations such as adding and averaging make sense. • The distribution of a variable tells us the values that a variable takes and how often it takes each value.
Distribution of a Variable To examine a single variable, we graphically display its distribution. • The distribution of a variable tells us what values it takes and how often it takes these values. • Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable. Categorical variable Pie chart Bar graph Quantitative variable Histogram Stemplot
Categorical Variables • The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into each category. • Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories Have to know the whole pie • Bar graphs represent categories as bars whose heights show the category counts or percents more flexible
Quantitative Variables • The distribution of a quantitative variable tells us what values the variable takes on and how often it takes those values. • Histograms show the distribution of a quantitative variable by using bars. The height of a bar represents the number of individuals whose values fall within the corresponding class. • Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. • Time plots plot each observation against the time at which it was measured.
Stemplots • To construct a stemplot: • Separate each observation into a stem(first part of the number) and a leaf(the remaining part of the number). • Write the stems in a vertical column; draw a vertical line to the right of the stems. • Write each leaf in the row to the right of its stem; order leaves if desired.
151516161717 Stemplots • If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splittingthe original stems. • Example: If all of the data values are between 150 and 179, then we may choose to use the following stems: Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”).
Example: Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues: 13 27 26 44 30 39 40 34 45 44 24 32 44 39 29 44 38 47 34 40 20 12 10
Step 1: Identify all the stems • 1 2 3 4 • Step 2: Write the stems in increasing order (usually from top to bottom) 1 2 3 4
Step 3: Draw a line next to the stem and write the leaves against the stem 1 3 2 0 2 7 6 4 9 0 3 0 9 4 2 9 8 4 4 4 0 5 4 4 4 7 0
Step 4: Rewrite the stemplot rearranging the leaves in ascending order (this can be done simultaneously with step 3): 1 0 2 3 2 0 4 6 7 9 3 0 2 4 4 8 9 9 4 0 0 4 4 4 4 5 7
Back-to-Back stemplot • Compare the numbers of Hank Aaron to Barry Bonds: 5 16 19 24 25 25 26 28 33 33 34 34 37 37 40 42 45 45 46 46 49 73 0 5 3 2 0 1 6 9 9 7 6 4 0 2 4 5 5 6 8 9 9 8 4 4 2 0 3 3 3 4 4 7 7 7 5 4 4 4 4 0 0 4 0 2 5 5 6 6 9 5 6 7 3
Examining distributions • Describe the pattern • Shape • How many modes (peaks)? • Symmetric or skewed in one direction? • Center – midpoints? • Mean/average; median • Spread • range between the smallest and the largest values, standard deviation, 5-number summary, quartiles • Look for outliers – individual values that do not match the overall pattern.
What do you see? • Shape: Somewhat symmetric, unimodal • Center: about 110 or 115 • Spread : values between 80 and 150 • Remember! • Histograms only meaningful for quantitative data
Quantitative Example • Breaking strength of connections for electronic components: • Need to discuss variation • How to group these items with so many different values?
Outliers • Check for recording errors • Violation of experimental conditions • Discard it only if there is a valid practical or statistical reason, not blindly!
Time Series or Time plots • We care about two important parts • Trend – persistent, long-term rise or fall • Seasonal variation – a pattern that repeats itself at known regular intervals of time. • Mississippi data: • Increasing trend • Large seasonal variations –there is usually a large spike every few years
Summary • Categorical and Quantitative variables • Graphical tools for categorical variables • Bar Chart • Pie Chart • Graphical tools for quantitative variables • Stem and leaf plot • Histogram • Maybe timeplot if appropriate • Distributions • Describe: Shape, center, spread • Watch for patterns and/or deviations from patterns.