1 / 35

Statistics and Data Analysis

Statistics and Data Analysis. Professor William Greene Stern School of Business IOMS Department Department of Economics. Statistics and Data Analysis. Part 1 – Data Presentation. Data Presentation Agenda. Data and Data Types Representing Data: pie chart, bar chart.

duscha
Download Presentation

Statistics and Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

  2. Statistics and Data Analysis Part 1 – Data Presentation

  3. Data Presentation Agenda • Data and Data Types • Representing Data: pie chart, bar chart. • Summarizing Data: box plot, histogram • Central tendency • Spread • Distribution (shape)

  4. Data = A Set of FactsA picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data more informative?

  5. Data Types and Measurement • Quantitative • Discrete = count: Number of car accidents by city by time • Continuous = measurement: Housing prices • Qualitative • Categorical: Shopping mall, car brand, trip mode • Ordinal: Survey data on attitudes; “How do you feel about…?” Strongly disagree  Disagree  Neutral  Agree  Strongly agree Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on. • Frameworks • Cross section • Time series

  6. Discrete Data – US Crime Statistics; Counts of Occurrences.

  7. Continuous DataHousing Prices and Incomes

  8. Unordered Qualitative DataTravel Mode Between Sydney and Melbourne by 210 Travelers

  9. Ordered Qualitative DataGerman Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?

  10. Ordered Qualitative Outcomes Bond Ratings Movie Ratings

  11. Problem with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety (1994)* Very Unsatisfactory Unsatisfactory OK Satisfactory Very Satisfactory Is there an objective meaning to “3” on some standard scale?Does everyone’s “1” or “2” or “3” … mean the same thing? * Jeff Simonoff: Data Presentation and Summary, pp. 3-4

  12. Quantitative vs. Qualitative Data Qualitative Data: No units of measurement Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train Quantitative Data: Units of measurement make sense. Arithmetic computations make sense.

  13. Cross Section DataHousing Prices and Incomes

  14. Time Series Data: Car Thefts

  15. Representing Data • In raw form • Transformed to a visual form • Summarized graphically • Summarized statistically

  16. Pie Chart Pizza Pies Sold, by Type

  17. Data Representation BAR CHART PIE CHART Same data. Which is easier to understand?

  18. 2013 data. Source: Bloomberg

  19. Raw Data on Housing Prices and Incomes

  20. A Box Plot Describes the Distributionof Values in a Set of Data Hawaii Box and Whisker Plot for House Price Listings

  21. Making a Box Plot for Per Capita Income Maximum=31136 3rdQuartile = 24933 Interquartile Range = IQR= 24933-21677 = 3256 Median=22610 1stQuartile = 21677 Minimum=17043

  22. Box and Whisker Plot What is an outlier?Why do we believe a particular point is an outlier? Outliers Smaller of (Maximum, Median + 1.5 IQR 75th Percentile Interquartile range=IQR Median 25th Percentile Larger of (Minimum, Median – 1.5 IQR HOG, pp. 39-43

  23. A Frequency Distribution

  24. Histogramfor House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings. HOG, pp. 16-18

  25. Distribution of House Price Listings … shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices…

  26. A Caution About Graphical Data Summaries Graphical tools can be very badly behaved when: (1) The data have only a few observations. (2) There are wild observations in the data set. The box and whisker plot is distorted (and dominated) by one wildly errant observation.

  27. Summary • What story does the data presentation tell? • Data in raw form tell no story. • Visual representation of data tells something about the data • Data reduction and summary representation: What do we learn? • Location • Spread • Shape of the distribution • What tool is most informative? • Reduction to a small number of features • Visual displays of data • Pie chart • Box and whisker plots • Histograms • Time series plots “There are lies, damned lies and statistics.” (Benjamin Disraeli)

  28. The Visual Data Do Tell the Story:Napoleon’s March to Moscow

  29. Source: Bloomberg. August 2013

  30. Source: Bloomberg. August 2013

  31. Probability of Survival to Age 50, Female at BirthU.S. and 20 Other Wealthy Countries

More Related