1 / 0

Welcome to BUAD 310

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 3, Wednesday January 22, 2014. Agenda. Finish up Chapter 4, Describing Numerical Data Homework 1 has been assigned: It is due at or before 5 PM, Wednesday January 29, 2014 . Start early; no extensions will be given .

africa
Download Presentation

Welcome to BUAD 310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to BUAD 310

    Instructor: Kam Hamidieh Lecture 3, Wednesday January 22, 2014
  2. Agenda Finish up Chapter 4, Describing Numerical Data Homework 1 has been assigned: It is due at or before 5 PM, Wednesday January 29, 2014. Start early; no extensions will be given. Reading for Chapters 1-4: look at my slide 2 from last lecture. (There are a few questions on HW1 that are only based on the reading.) Quick StatCrunch Demo. (Time Permitting!)
  3. About HW 1 Make sure you have access to www.mystatlab.com. You have two options: Buy the book at the bookstore, get the code inside the book and register at the above site. You’ll have access to the electronic version of the text. Go directly to the site above and buy the electronic version. You will *not* get the hard copy. Watch this for the registration process:http://www.youtube.com/watch?v=DpdGQb36-IcIgnore his comment about Blackboard! I’ve had much better luck with Google Chrome and FireFox than Internet Explorer! Watch my hints for using StatCrunchon HW 1 online at:http://www.youtube.com/watch?v=JKkKA9AVXyk
  4. StatCrunch This is StatCrunch Channel: StatCrunch YouTube Channel 3 Min Intro: StatCrunch Introduction 3 Min Histogram (older version):Creating Histograms in StatCrunch
  5. From Last Time Tools for EDA Categorical Numerical Graphical Tables Graphical “Tables” Bar charts Pie charts Frequency Tables Relative Frequency Tables Histograms Boxplots Scatter plots Time plots Numerical summaries such as SD, mean, median, etc.
  6. From Last Time Histograms: Put your numerical values into categories – AKA classes, bins, intervals – and create a bar chart. Shows the distribution of the numerical variable Shape of the distribution: Symmetrical? Skewed? Center? Spread? Uniform? Outliers?
  7. Salary Example(http://www.sportscity.com/nfl/salaries/seattle-seahawks-salaries/) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2541 0.4928 0.6250 2.0160 2.0460 11.0000
  8. Remember Percentiles? A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. Admission test scores for colleges and universities are frequently reported in terms of percentiles. For example, if your prof. told you that your score on the exam corresponds to the 83rd percentile, this means that you did better than 83% of the class. (17% did better.)
  9. Quartiles, Max, Min Split the ordered values into half that is below the median and the half that is above the median. Q2 is the just the median = 50th percentile Q1 = lower quartile = median of data values that are below the median = 25th percentile Q3 = upper quartile = median of data values that are above the median = 75th percentile Min, Max, Median, Q1 , and Q3 : Min Q1 Med Q3 Max 25% 25% 25% 25%
  10. Examples of Quartiles, Max, Min Quartiles: Lower quartile, Q1 = 25th percentile Median, Q2 = 50th percentile Upper quartile, Q3 = 75th percentile Example: Compute the quartiles for the following numbers, n = 12: {271, 275, 285, 288, 288, 289, 292, 294, 295, 305, 313, 332} Max Min Q1=287 Q2=291 Q3=300
  11. In Class Exercise 1 Suppose you have the following data, n = 10: { 7, 3, 3, 6, 4, 8, 6, 4, 4, 5 } Answer the questions: What is the minimum? What is the maximum? What is the median? Find Q1 and Q3 of the data.
  12. Importance of Variability Suppose 20 people take exams. Possible scores go from 0 to 100. The average score is 87. Bob got an 88. How well do you think he did? Case I Case II Case I: Bob is doing fine. Case II: Bob is not so fine. Just knowing the mean or the median is not enough. We need to know something about the spread and shape of data.
  13. Measures of Variability Just like you had mean, median, and mode to measure the location of the data, there are various ways to describe the variability in the data. We will look into the following Range Interquartile range Variance Standard deviation
  14. Range Range = largest value – smallest value This is one of the easier ways to summarize the variability in your data. Suppose you have the following data {-100, 10, 112 , 291, 300} The range is 300 – (-100) = 400.
  15. Interquartile Range Interquartile Range (IQR) = Q3 – Q1 IQR is generally used when reporting the median. Recall: {271, 275, 285, 288, 288, 289, 292, 294, 295, 305, 313, 332} IQR = 300 – 287 = 13 Q1=287 Q3=300
  16. Boxplots A Boxplots is a graphical way to summarize your data. A boxplot will show the five number summary: minimum, first quartile, second quartile (median), third quartile, and the maximum. It is useful tool for detecting outliers, and comparing different groups. It was created by John Tukey:
  17. Boxplots Step 1: Label either a vertical axis or a horizontal axis with numbers from min to max of the data. Step 2: Draw box with lower end at Q1 and upper end at Q3. Step 3: Draw a line through the box at the median. Step 4: Draw a line from Q1 end of box to smallest data value that is not further than 1.5  IQR from Q1. Draw a line from Q3 end of box to largest data value that is not further than 1.5  IQR from Q3. Step 5: Mark data points further than 1.5  IQR from either edge of the box with an asterisk or some other symbol. Points represented with asterisks or other symbols are considered to be outliers. (More on this a bit later.)
  18. Example Suppose you have data: { -17, -12, -2, 0, 1, 4, 16, 40} We all data have:min = -17, Q1 = -7, Median = 0.5, Q3 = 10, Max = 40, IQR = 17, 1.5 * IQR = 25.5Q1 – 25.5 = -32.5Q3 + 25.5 = 35.5 We don’t have the 40:min = -17, Q1 = -7, Median = 0, Q3 = 2.5, Max = 16, IQR = 9.5, 1.5 * IQR = 14.25Q1 – 14.25= -21.25Q3 + 25.5 = 16.75 With all Data 40 Taken Out
  19. Example Suppose travel times in minutes for 15 workers in North Carolina are collected: 5, 10, 10, 10, 10, 12, 15, 20, 20, 25, 30, 30, 40, 40, 60 Five-number Summary:Min=5, Q1=10, M=20, Q3=30, Max=60 Suppose travel times for 20 workers in New York are also collected: 5, 10, 10, 15, 15, 15, 15, 20, 20, 20, 25, 30, 30, 40, 40, 45, 60, 60, 65, 85Five-number Summary:Min=5, Q1=15, M=22.5, Q3=42.5, Max=85 Together we have:
  20. Example
  21. Outliers (Read on your own.) Outlier: a data point that does not seem to fit with the bulk of the data. It’s either way too large or way too small. Remarks: Look for them via graphs. I recommend boxplots. Can have a big influence on conclusions. Can cause complications in some statistical analysis. Can not discard without solid justification.
  22. Outliers (Read on your own.) It is a BAD idea to exclude outliers in an automatic manner:NASA launched Nimbus 7 satellite to record atmospheric data. After a few years in 1985, a few scientists observed a large decrease in ozone over Antarctic. It was found later that the NASA data processors were automatically throwing away data with very small values (ozone readings) and assumed to be mistakes. Had this been known earlier, perhaps CFC phase-out would have been implemented sooner.
  23. More on Outliers (Read on your own.) Outlier may be exactly the points you are looking for!Credit card Fraud: very high activity associated with stolen card. Extreme weather: as an example, if you are building a levee system, you care about the extreme (outlier) water levels. Extreme financial losses: watch out if your stock goes down a lot really fast! May sometimes be due to errors in data entry.Example: You have height data for people and the minimum height shows up as 2 inches! Can’t be right!
  24. Salaries Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2541 0.4928 0.6250 2.0160 2.0460 11.0000
  25. In Class Exercise 2 Draw the boxplot corresponding to the data on slide 10.
  26. Describing Spread via Standard Deviation Standard deviation measures variability by summarizing how far individual data values are from the mean of the data. Think of the standard deviation as roughly the average distance values fall from the overall mean. It will be in the same units as our data. What we will learn in the next few slides is actually the sample standard deviation but for now we’ll drop the word sample.
  27. Computing the SD Formula for the standarddeviation: The value of s2 is called the variance. An equivalent formula, easier to compute, is:
  28. Computing the SD Step 1: Calculate , the sample mean. Step 2: For each observation, calculate the difference between the data value and the mean. Step 3: Square each difference in step 2. Step 4: Sum the squared differences in step 3, and then divide this sum by n – 1. Step 5: Take the square root of the value in step 4.
  29. Simple Example Consider just four numbers: 62, 68, 74, 76 Step 1: Steps 2 and 3: Step 4: Step 5:
  30. WHY? Why divide by n-1? Why square the differences? What is the difference between SD and variance?
  31. Some Comments about SD What does s = 0 mean? S is sensitive to large values: For 62, 68, 74, 76 we had s = 6.3. If we have 62, 68, 74, 76100 we get s= 16.7. If your data is approximately symmetric then s and the mean are good summary numbers to report; otherwise, report the five number summaries and IQR as a measure of spread. True or False: s is always positive or zero.
  32. Next Time Skip Chapter 5 for now, do a bit of Chapter 6 on Scatter Plots and start Chapter 7 on Probability.
More Related