90 likes | 301 Views
Chapter 1: Exploring Data Review. Vaishali Saseenthar Christine Kil Aditi Bellur. The BIG Idea. Data Production Data Analysis Probability Statistical Inference Presenting the statistics: Plotting the data: Dotplot , stemplot , histogram.
E N D
Chapter 1:Exploring DataReview VaishaliSaseenthar Christine Kil AditiBellur
The BIG Idea • Data Production • Data Analysis • Probability • Statistical Inference • Presenting the statistics: • Plotting the data: Dotplot, stemplot, histogram. • Interpreting the data: SOCS (shape, outliers, center, spread) • Numerical summary: mean and SD, five-number summary
When do YOU use this?? • We use graphs such as histograms, stemplots, pie graphs, and bar graphs to organize and present the data. • Data analysis is used to draw conclusions from our data. • Examining the distributions for shape, center, spread, and deviations helps understandthe data. • Data analysis is used practically in economics, politics, journalism, and other fields.
Vocab to know • Statistical inference- drawing conclusions about a large group based on a smaller group. • Surveys- popular ways to gauge public opinion; asks the individuals in the sample some questions and record their responses. • Observational study- observing individuals and measuring variables but no influencing the responses. • Experiment- deliberately influencing individuals to observe responses. • Individuals- objects described by the data. • Variable- any characteristic of the individual. • Categorical variable- individual is in one of several groups. • Quantitative variable- numerical values for adding and averaging. • Mean- average value x-bar. Is sensitive to the influence of a few extreme observation. • Outlier- individual value that falls outside the overall pattern.
Vocab to know dos • pth percentile of a distribution is the value such that p% of the observations fall at or below it. • First quartile Q₁- median of observations to the left of the overall median in the ordered list. • Third quartile Q₃- median of observations to the right of the overall median in the ordered list. • Five-number summary of a set of observations = the smallest observation, the first quartile, the median, the third quartile, and the largest observation written smallest to largest. • Boxplot- graph of the five-number summary. • Interquartile range- the distance between the quartiles (the range of the center half of the data) is a more resistant 50% of range. • Standard deviation measure spread by looking at how far the observations are from their mean. • Variance of a set of observations- the average of the squares of the deviations of the observations from their mean. • Data analysis- art of describing data using graphs and numerical summaries.
Key Topics Covered in this Chapter • Graphs for Categorical Variables • Stem plots • Histograms • Examining Distributions • Dealing with Outliers • Relative Frequency and Cumulative Frequency • Time Plots • Measuring Center: The Mean • Mean versus Median • Measuring spread: The Quartiles • The Five-Number Summary and Boxplots • The 1.5 X IQR Rule for Suspected Outliers • Measuring Spread: The Standard deviation • Properties of Standard Deviation • Choosing Measures of Center and Spread • Changing the Unit of Measurement • Comparing Distributions
Calculator Key Strokes • Calculator Boxplots and Numerical Summaries • Enter first set of data in L1/ list 1 and second set in L2/ list 2 • Set up two statistics plots: Plot 1 to show modified boxplot of data in list 1 and Plot 2 to show modified boxplot of data in list 2 • Use the calculators zoom feature to display the side-by-side boxplots • Calculate numerical summaries for each set of data • Notice the down-arrow on the left side of the display. Press down to see other statistics.
Formulas You Should Know • Mean: x-bar = x(1) + x(2)….+ x(n) • Variance: • Standard Deviation: • Linear transformation: X(new)= a + bx n
Helpful HINTS • Stemplots do not work well for large data sets where each stem must hold a large # of leaves. • Use histograms of %’s for comparing several distributions with different # of observations. • When examining a distribution, look for the overall pattern and striking deviations from the pattern. • Look for outliers that are clearly apart from the body of the data, not just in the most extreme observations. • The simplest useful numerical description of a distribution is the measure of center and measure of spread.