1 / 55

EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES. Chapter 2. 2.1 What Are the Types of Data?. Variable. A variable is any characteristic that is recorded for the subjects in a study Examples: Marital status, Height, Weight, IQ A variable can be classified as either Categorical or

gretel
Download Presentation

EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2

  2. 2.1 What Are the Types of Data?

  3. Variable A variable is any characteristic that is recorded for the subjects in a study • Examples: Marital status, Height, Weight, IQ • A variable can be classified as either • Categorical or • Quantitative • Discrete or • Continuous www.thewallstickercompany.com.au

  4. Categorical Variable A variable is categorical if each observation belongs to one of a set of categories. • Examples: • Gender (Male or Female) • Religion (Catholic, Jewish, …) • Type of residence (Apt, Condo, …) • Belief in life after death (Yes or No) www.post-gazette.com

  5. Quantitative Variable A variable is called quantitative if observations take numerical values for different magnitudes of the variable. • Examples: • Age • Number of siblings • Annual Income

  6. Quantitative vs. Categorical • For Quantitative variables, key features are the center (a representative value) and spread (variability). • For Categorical variables, a key feature is the percentage of observations in each of the categories .

  7. Discrete Quantitative Variable • A quantitative variable is discrete if its possible values form a set of separate numbers: 0,1,2,3,…. • Examples: • Number of pets in a household • Number of children in a family • Number of foreign languages spoken by an individual upload.wikimedia.org

  8. Continuous Quantitative Variable • A quantitative variable is continuous if its possible values form an interval • Measurements • Examples: • Height/Weight • Age • Blood pressure www.wtvq.com

  9. Proportion & Percentage (Rel. Freq.) Proportions and percentages are also called relative frequencies.

  10. Frequency Table A frequency table is a listing of possible values for a variable, together with the number of observations or relative frequencies for each value.

  11. 2.2 Describe Data Using Graphical Summaries

  12. Graphs for Categorical Variables Use pie charts and bar graphs to summarize categorical variables • Pie Chart: A circle having a “slice of pie” for each category • Bar Graph: A graph that displays a vertical bar for each category wpf.amcharts.com

  13. Pie Charts • Summarize categorical variable • Drawn as circle where each category is a slice • The size of each slice is proportional to the percentage in that category

  14. Bar Graphs • Summarizes categorical variable • Vertical bars for each category • Height of each bar represents either counts or percentages • Easier to compare categories with bar graph than with pie chart • Called Pareto Charts when ordered from tallest to shortest

  15. Graphs for Quantitative Data • Dot Plot: shows a dot for each observation placed above its value on a number line • Stem-and-Leaf Plot: portrays the individual observations • Histogram: uses bars to portray the data

  16. Which Graph? • Dot-plot and stem-and-leaf plot: • More useful for small data sets • Data values are retained • Histogram • More useful for large data sets • Most compact display • More flexibility in defining intervals content.answers.com

  17. Dot Plots To construct a dot plot • Draw and label horizontal line • Mark regular values • Place a dot above each value on the number line Sodium in Cereals

  18. Stem-and-leaf plots • Summarizes quantitative variables • Separate each observation into a stem (first part of #) and a leaf(last digit) • Write each leaf to the right of its stem; order leaves if desired Sodium in Cereals

  19. Histograms Graph that uses bars to portray frequencies or relative frequencies of possible outcomes for a quantitative variable

  20. Constructing a Histogram • Divide into intervals of equal width • Count # of observations in each interval Sodium in Cereals

  21. Constructing a Histogram • Label endpoints of intervals on horizontal axis • Draw a bar over each value or interval with height equal to its frequency (or percentage) • Label and title Sodium in Cereals

  22. Interpreting Histograms • Assess where a distribution is centered by finding the median • Assess the spreadof a distribution • Shape of a distribution: roughly symmetric, skewed to the right, or skewed to the left Left and right sides are mirror images

  23. Examples of Skewness

  24. Shape and Skewness Consider a data set containing IQ scores for the general public. What shape? • Symmetric • Skewed to the left • Skewed to the right • Bimodal botit.botany.wisc.edu

  25. Shape and Skewness Consider a data set of the scores of students on an easy exam in which most score very well but a few score poorly. What shape? • Symmetric • Skewed to the left • Skewed to the right • Bimodal

  26. Shape: Type of Mound

  27. Outlier An outlier falls far from the rest of the data

  28. Time Plots • Display a time series, data collected over time • Plots observation on the vertical against time on the horizontal • Points are usually connected • Common patterns should be noted Time Plot from 1995 – 2001 of the # worldwide who use the Internet

  29. 2.3 Describe the Center of Quantitative Data

  30. Mean • The mean is the sum of the observations divided by the number of observations • It is the center of mass

  31. Median Midpoint of the observations when ordered from least to greatest • Order observations • If the number of observations is: • Odd, the median is the middle observation • Even, the median is the average of the two middle observations

  32. Comparing the Mean and Median • Mean and median of a symmetric distribution are close • Mean is often preferred because it uses all • In a skewed distribution, the mean is farther out in the skewed tail than is the median • Median is preferred because it is better representative of a typical observation

  33. Resistant Measures • A measure is resistant if extreme observations (outliers) have little, if any, influence on its value • Median is resistant to outliers • Mean is not resistant to outliers www.stat.psu.edu

  34. Mode • Value that occurs most often • Highest bar in the histogram • Mode is most often used with categoricaldata

  35. 2.4 Describe the Spread of Quantitative Data

  36. Range Range = max - min The range is strongly affected by outliers.

  37. Each data value has an associated deviation from the mean, A deviation is positive if it falls above the mean and negative if it falls belowthe mean The sum of the deviations is always zero Standard Deviation

  38. Standard Deviation Standard deviation gives a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations: Find mean Find each deviation Square deviations Sum squared deviations Divide sum by n-1 Take square root

  39. Standard Deviation Metabolic rates of 7 men (calories/24 hours)

  40. Properties of Sample Standard Deviation • Measures spread of data • Only zero when all observations are same; otherwise, s > 0 • As the spread increases, s gets larger • Same units as observations • Not resistant • Strong skewness or outliers greatly increase s

  41. Empirical Rule: Magnitude of s

  42. 2.5 How Measures of Position Describe Spread

  43. Percentile The pth percentile is a value such that p percent of the observations fall below or at that value

  44. Finding Quartiles Splits the data into four parts • Arrange data in order • The median is the second quartile, Q2 • Q1 is the median of the lower half of the observations • Q3 is the median of the upper half of the observations

  45. Measure of Spread: Quartiles Quartiles divide a ranked data set into four equal parts: 25% of the data at or below Q1 and 75% above 50% of the obs are above the median and 50% are below 75% of the data at or below Q3 and 25% above Q1= first quartile = 2.2 M = median = 3.4 Q3= third quartile = 4.35

  46. Calculating Interquartile Range The interquartile range is the distance between the thirdand first quartile, giving spread of middle 50% of the data:IQR = Q3 - Q1

  47. Criteria for Identifying an Outlier An observation is a potential outlier if it falls more than 1.5 x IQR below the first or more than 1.5 x IQR above the third quartile.

  48. 5 Number Summary The five-number summary of a dataset consists of: • Minimum value • First Quartile • Median • Third Quartile • Maximum value

  49. Boxplot • Box goes from the Q1 to Q3 • Line is drawn inside the box at the median • Line goes from lower end of box to smallest observation not a potential outlier and from upper end of box to largest observation not a potential outlier • Potential outliers are shown separately, often with * or +

More Related