1 / 48

Statistics & Data Analysis

Statistics & Data Analysis. Course Number B01.1305 Course Section 60 Meeting Time Monday 6-9:30 pm. CLASS #1. Class #1 Outline. Introduction to the instructor Introduction to the class Review of syllabus Introduction to statistics Class Goals Types of data

Download Presentation

Statistics & Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics & Data Analysis Course Number B01.1305 Course Section 60 Meeting Time Monday 6-9:30 pm CLASS #1

  2. Class #1 Outline • Introduction to the instructor • Introduction to the class • Review of syllabus • Introduction to statistics • Class Goals • Types of data • Graphical and numerical methods for univariate series • Minitab Tutorial

  3. Professor Balkin’s Info • Ph.D. in Business Administration, Penn State • Masters in Statistics, Penn State • Mathematics/Economics and Music, Lafayette College • Employment • Pfizer Inc. • Management Science Group; Sept. 2001 – current • Ernst & Young • Quantitative Economics and Statistics Group; June 1999 – August 2001

  4. What is Statistics? • STATISTICS: A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty. • POPULATION: set of measurements corresponding to the entire collection of units • SAMPLE: set of measurements that are collected from a population • OBJECTIVES: • To make inferences about a population from a sample, including the extent of uncertainty • Design the data collection process to facilitate drawing valid inferences

  5. Reasons for Sampling • Typically due to prohibitive cost of contacting millions of people or performing costly experiments • Election polls query about 2,000 voters to make inferences regarding how all voters cast their ballots • Sometimes the sampling process is destructive • Sampling wine quality

  6. Statistics in Everyday Life • Monthly Unemployment Rates (BLS) • Consumer Price Index • Presidential Approval Rating • Quality and Productivity Improvement • Scientific Inquiry • Training effectiveness • Advertising impact

  7. Interesting Statistical Perspectives • “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write”. • (H. G. Wells) • “There are three kinds of lies -- Lies, damn lies, and statistics”. • (Benjamin Disraeli) • “You’ve got to know when to hold ‘em, know when to fold ‘em.” • (Kenny Rogers, in The Gambler) • “The average U. S. household has 2.75 people in it.” • (U. S. Census Bureau, 1980) • “4 out of 5 dentists surveyed recommended Trident Sugarless Gum for their patients who chew gum.” • (Advertisement for Trident)

  8. Semester Overview • Understanding data • Intro to descriptive statistics, interpreting data, and graphical methods • Dealing with and quantifying uncertainty • Random variables and probability • Using samples to make generalizations about populations • Assessing whether a change in data is beyond random variation • Modeling relationships and predicting • Using sample data to create models that give predictions for all values of a population

  9. Goals for this Class • To gain an understanding of descriptive statistics, probability, statistical inference, and regression analysis so that it may be applied to your job • To be able to identify when statistical procedures are required to facilitate your business decision making • To be able to identify both good and poor use of statistics in business

  10. Goals for Me • To teach you statistics and data analysis effectively • To improve my effectiveness as an instructor

  11. My Promise To You • I will not teach you anything in this class that is not regularly used in business and industry • If you ask, “Where is this used?” I will have a real example for you

  12. Types of Data

  13. Example: Data Types • Business Horizons (1993) conducted a comprehensive survey of 800 CEOs who run the country's largest global corporations. Some of the variables measured are given below. Classify them as quantitative or qualitative. • State of birth • Age • Educational Level • Tenure with Firm • Total Compensation • Area of Expertise • Gender

  14. How Much Data

  15. CHAPTER 2 Summarizing Data about One Variable

  16. Introduction • Unorganized mass of numbers is difficult to interpret • First task in understanding data is summarizing it • Graphically • Numerically

  17. Chapter Goals • Distinguish between qualitative and quantitative variables • Learn graphic representations of univariate data • Learn numerical representations of univariate data • Investigate data acquired over time

  18. Distribution of Values • Distribution is essentially how many times each possible data values occur in a set of data. • Methods for displaying distributions • Qualitative data • Frequency table • Bar charts • Quantitative data • Histograms • Stem-Leaf diagrams • Boxplots

  19. Example: Qualitative Data • Background: A question on a market research survey asked 17 respondents the size of their households • Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6 • Frequency Table

  20. Example: Qualitative Data (cont.) • Barchart: Plot of frequencies each category occurs in the data set

  21. Example: Quantitative Data • Background: Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than $350 million. Firms were ranked by five-year average return on investment. The data are the annual salary of the chief executive officer for the first 60 ranked firms. • Data (in thousands): 145 621 262 208 362 424 339 736 291 58 498 643 390 332 750 368 659 234 396 300 343 536 543 217 298 1103 406 254 862 204 206 250 21 298 350 800 726 370 536 291 808 543 149 350 242 198 213 296 317 482 155 802 200 282 573 388 250 396 572

  22. Example: Quantitative Data (cont.) • Histograms are constructed in the same way as bar charts except: • User must create classes to count frequencies • Bars are adjacent instead of separated with space

  23. Example: Quantitative Data (cont.)

  24. Example: Quantitative Data (cont.) • Questions: • What is the typical value of CEO salary? • How much variability is there around this value? • What is the general shape of the data? • Histogram characteristics: • Central tendency • Variability • Skewness • Modality • Outliers

  25. Skewnesss

  26. Modality

  27. Outliers

  28. Example: Stem-Leaf Diagram • Background: Telecom company wants to analyze the time to complete new service orders measured in hours • Data: 42 21 46 69 87 29 34 59 81 97 64 60 87 81 69 77 75 47 73 82 91 74 70 65 86 87 67 69 49 57 55 68 74 66 81 90 75 82 37 94 • Diagram: 2 | 19 3 | 47 4 | 2679 5 | 579 6 | 045678999 7 | 0344557 8 | 111226777 9 | 0147

  29. Measures of Central Tendency • Mode: Value or category that occurs most frequently • Median: Middle value when the data are sorted • Mean: Sum of measurements divided by the number of measurements

  30. Example: Mode • Background: A question on a market research survey asked 17 respondents the size of their households • Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6 • Frequency Table Mode

  31. Example: Median • Background: A question on a market research survey asked 17 respondents the size of their households • Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6 • Since the n=17 observations, • Median is the (n+1)/2 = 9th observation Median

  32. Example: Mean • Background: Cable company wants to know how long an installer spends at each stop. One employee performed five installations in one day and recorded how many minutes she was at each location. • Data: 45, 23, 36, 29, 52 • Mean = (45+23+36+29+52) / 5 = 37 minutes

  33. Example: Back to the CEO’s Salaries Mean = 404.1695 Median = 350 WHY THE DIFFERENCE?

  34. Measures of Variation • A primary reason for using statistics is due to variability • If there was no variability, we would not nee statistics • Examples: • Worker productivity • Stock market • Promotional expenditures • Measures • Standard deviation: variation around the mean • Range: distance between smallest and largest observations

  35. Standard Deviation • Standard Deviation: summarizes how far away from the mean the data value typically are. • Calculation • Find the deviations by subtracting the mean from each data value • Square these deviations, add them up, and divide by n-1 • Take the square root of this number

  36. Example: Standard Deviation • Background: Your firm spends $19 Million per year on advertising, and management is wondering if that figure is appropriate. Other firms in your industry have a mean advertising expenditure of $22.3 Million per year.

  37. Example: Standard Deviation (cont.)

  38. Example: Standard Deviation (cont.) • Difference from peer group average is $3.3 Million • This difference is smaller than the industry standard deviation of $9.18 Million • Conclusion: You advertising budget, while slightly below the industry average, is typical compared with your industry peers

  39. Empirical Rule • If the histogram for a given sample is unimodal and symmetric (mound-shaped), then the following rule-of-thumb may be applied: • Let represent the sample mean and s the sample standard deviation. Then

  40. Example: Stock Market Volatility • Description: Stock market returns are supposed to be unpredictable. Let’s see if the empirical rule holds true • Data: S&P-500 Daily returns; Jan 01, 1998 – May 17, 2002 • Mean = 0.0002 • St. Dev. = 0.0128 • 72.8% (95.3%) of the returns fallbetween the sample mean plusand minus one (two) st.dev.

  41. Inter-Quartile Range • Inter-Quartile Range (IQR) provides an alternative approach to measuring variability • Computation: • Sort the data and find the median • Divide the data into top and bottom halves • Find the median of both halves. These are the 25th and 75th percentiles • IQR = 75th percentile – 25th percentile • Outlier Measure – Any value outside the inner fences is an outlier candidate • Lower inner fence = 25th percentile – 1.5 IQR • Upper inner fence = 75th percentile + 1.5 IQR

  42. Box-Plot – S&P-500 Example Data: S&P-500 Daily returns; Jan 01, 1998 – May 17, 2002 Upper inner fence Outliers 75th percentile Median 25th percentile Lower inner fence

  43. Minitab Tutorial

  44. Why Use Minitab??? • Goal of course is to learn statistical concepts • Most statistical analyses are performed using computers • Each company may use a different statistical package • YES…Minitab is used in business! • Typically in quality control and design of experiments • EXCEL has very limited statistical functionality and is considerably more difficult to use than Minitab • There are many stat packages (SAS, SPSS, Systat, Splus, R, Statistica, Mathematica, etc.) • Minitab is the easiest program to use right away • Excellent Help facilities • Statistical glossary built-in

  45. Minitab Tutorial – Case Study 1 • A hotel kept records over time of the reasons why guest requested room changes. The frequencies were as follows • Room not clean 2 • Plumbing not working 1 • Wrong type of bed 13 • Noisy location 4 • Wanted nonsmoking 18 • Didn’t like view 1 • Not properly equipped 8 • Other 6

  46. Minitab Tutorial – Case Study 2 • Exercise 2.8 in book • Produce graphics • Produce descriptive statistics

  47. Minitab Tutorial – Case Study 3 • Diversification??? • Data: S&P-500 and IBM daily returns from Jan 01, 1998 through May 17, 2002

  48. Next Time • Probability and Probability Distributions

More Related