1 / 42

Basic Statistics

Basic Statistics. What is Statistics?. Shilling - “Statistics is communicating information from data.”. H.G. Wells - “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”.

Download Presentation

Basic Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Statistics

  2. What is Statistics? Shilling - “Statistics is communicating information from data.” H.G. Wells -“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” Statistics is a tool to organize data into meaningful information for understanding the past, to make informed decisions and to predict the future.

  3. Game Show Exercise This exercise is based on the game show “Let’s Make a Deal”. The purpose of this exercise is to analyze the classes decision making patterns and then relate those patterns to probability. Read the following background for the exercise:This exercise is based on an actual true set of events, in fact there was much controversy about the exercise after it was published in Parade Magazine’s “Ask Marylyn” section. The game show “Let’s Make a Deal” has done this very exercise with its guest hundreds of times. In the game show, a person from the audience (you) is given an opportunity to win a valuable prize. To win the prize, you must select from one of three doors. Behind each door is a prize. Only one of the doors has a valuable prize behind it and only the game show host knows which door has the valuable prize. The other two doors have booby prizes. After you have made your selection of door 1, 2 or 3, the game show host will provide you with an option to change your decision. To do so, the game show host will open one of the remaining two doors and the door he opens will never have the valuable prize behind it. The host will now ask you if you want to stay with your original selection or if you would like to switch your selection to the other remaining unopened door. Task 1: Decide if you would stay with your original selection or if you would switch your decision to select the remaining unopened door. You will have just 10 seconds to decide (The TV show was on-air and broadcast time is valuable time). Check the appropriate decision box below. _____ Stay with my original selection _____ Change my mind and switch to the remaining door. Task 2: The instructor will poll the class and discuss the data.

  4. Game Show Exercise Discussion: Should you change? Does it matter? If you change, will it improve your chance to win the car? How would you analyze this situation using data to determine the best choice?

  5. Do Not Change Change 1 2 3 I II III LOSE WIN 1 2 3 WIN LOSE 1 2 3 WIN LOSE When intuition leads us down the wrong path, the use of statistical tools and data can set us straight! Analyzing the Possibilities

  6. Ask Marilyn – Parade Magazine Feb. 1991 “You are utterly incorrect about the game-show question, and I hope this controversy will call some public attention to the serious national crisis in mathematical education. If you can admit your error, you will have contributed constructively toward the solution of a deplorable situation. How many irate mathematicians are needed to get you to change your mind?” E. Ray Bobo, Ph.D., Georgetown University “You are in error-and you have ignored good counsel-but Albert Einstein earned a dearer place in the hearts of the people after he admitted his errors.” Frank Rose, Ph.D., University of Michigan “Your logic is in error, and I am sure you will receive many letters on this topic from high school and college students. Perhaps you should keep a few addresses for help with future columns.” W. Robert Smith, Ph.D., Georgia State University

  7. Ask Marilyn – Parade Magazine Feb. 1991 “Maybe women look at math problems differently than men.” Don Edwards, Sunriver, Or. “You’re wrong, but look on the positive side. If all those Ph.D.’s were wrong, the country would be in very serious trouble.” Everett Harman, Ph.D., US. Army Research Institute “You are indeed correct. My colleagues at work had a ball with this problem, and I dare say that most of them - including me at first - thought you were wrong!” Seth Kaleon, Ph.D., Massachusetts Institute of Technology

  8. A decision is made based on the similarity of a previous event or occurrence. 1. Using previous experience. Data is collected, looked at and “sized” based on the apparent patters and a decision made. 2. Collecting data and then looking at the numbers. Generally the data is put into an excel spreadsheet, graphs are looked at and a decision is made. 3. Grouping data so as to form charts and graphs. Data is collected on 100% of the items under review and statistics like average, minimum and maximum are used to make decisions. 4. Using census data with descriptive statistics. Data is collected on a subset of the items under review and statistics like average, min and max are used to make decisions. 5. Using sample data with descriptive statistics. Data is collected on a subset of the items under review and the analysis allows inferences about larger populations with known risks. 6. Using sample data with inferential statistics. Levels of Analysis

  9. Exercise - Usage of Data Exercise The purpose of this exercise is estimate the amount of time your company tends to use the various levels of analysis to make decisions and to run the business. Be as honest as you can, we will put all of the data from the class together and present it back to immediately upon the completion of the exercise. Read the following background for the exercise:Most companies have not thought about the typical level of data analysis they use. You and your colleagues in this class will form a “sample” of the organizations you are a part of. This exercise will be used to establish a reference that you can use to build from in your organizational pursuit of improvement. Task 1: Estimate to the best of your ability what percentage of the time your company tends to use each level of analysis. Your total should add up to 100%. Task 2:The instructor will then collect each individuals estimates and sum them together using an Excel spreadsheet tool to show the results.

  10. Levels of Analysis % Use Instructions Estimate from your personal experience the percent of time your organization uses these various levels of data analysis to either make decisions or to manage process performance Try to allocate among the six levels to obtain a total of 100% We will collect the data from the class and make a Histogram. 1. We use previous experience 2. We collect data and then look at the numbers 3. We group data so as to form charts and graphs 4. We use census data with descriptive statistics 5. We use sample data with descriptive statistics 6. We use sample data with inferential statistics Total 100% Exercise – Usage of Data

  11. Scottish mathematician and physicist who contributed to many branches of physics 1824 – 1907 “When you can measurewhat you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” As the level of analysis increases so does our success Knowledge is in the Data

  12. Types of Data Attribute (Qualitative) Data • Yes, no • Go, no go • Acceptable, unacceptable • Pass, fail • Continuous (Quantitative) Data • Discrete (count) Data - patients, bottles of medicine, late deliveries, system lock-ups • Continuous Data -dimension, volume, time (decimal subdivisions are meaningful)

  13. Each day you target to arrive at work at 8:00 AM, but do you always arrive at exactly 8:00 AM? By measuring the arrival times we notice that the times are not exactly the same. Understanding Variation

  14. Visualizing Variation - Histogram The Normal Distribution The Histogram represents the behavior of the variable of interest for every time it was measured 6 5 4 Frequency (Number of Occurrences) 3 2 1 0 8:10 7:52 7:55 7:58 8:01 8:04 8:07 Data that was measured for arrival time to work. Histograms can be for any variable such as volumes in ml, errors per request, time in hours, etc. The shape that forms for many types of data we deal with daily.

  15. Each time you call a company for customer service, you get a different level of service and you experience a different level of satisfaction. Each time you hit a golf ball it goes a different direction and distance. Some golfers are more accurate than others, but no one is perfect. Each time you provide a product or a service to your customer, it varies. If it varies too much, your customer will not accept it. Variation and its Source

  16. Color Time You should always construct a histogram when working with data, either mentally or on paper Weight Height 6 5 Length Frequency 4 3 Width Shape 2 1 Speed Temperature 0 8:10 7:52 7:55 7:58 8:01 8:04 8:07 Variation is Everywhere and Affects Everything

  17. Each time you call a company for customer service, you get a different level of service and you experience a different level of satisfaction. Each time you hit a golf ball it goes a different direction and distance. Some golfers are more accurate than others, but no one is perfect. Each time you provide a product or a service to your customer, it varies. If it varies too much, your customer will not accept it. Variation and its Source SO WHY DOES EVERY OUTPUT VARY? Because all inputs, the X’s vary. Remember, the output Y is a function of the input X’s. TIP: To control the variation in an outcome you are interested in, you will have to control the variation of the inputs that affect it.

  18. The X’s (Inputs) Y = f(X) The Y’s (Outputs) X1 Process: “A Blending Of Inputs to Achieve Some Desired Output/Result” The things you measure as an indication of the success of the process X2 Y1 Materials, People, Equipment, etc. X3 Y2 X4 Y3 X5 Critical X -Any input variable that exerts an undue influence on one or more of the important outputs of a process. CTQ = Critical to Quality -Any output variable of a process which exerts an undue influence on the success of the process or customer needs. Outputs that do not meet requirements create defects and generate additional costs called “Cost of Poor Quality” or COPQ. Six Sigma View of a Process

  19. “Function of” Ys Xs The Process: The Blending Of Inputs ProcessOutputs ControllableInputs Things that we or the customer want Process Temperature Information or Data Time e Room Temperature Humidity People Machines Noise Inputs Unknown Inputs Variation – Signal – Noise – Error Pictorial Representation of Y = f(X) +

  20. Averages and Variation First three measurements for filling out a purchase order form 100 measurements of elapsed time for filling out a purchase order form

  21. First three measurements for filling out a purchase order form 100 measurements of elapsed time for filling out a purchase order form Averages and Variation

  22. Mathematically, the process for calculating the mean is written as: X (pronounced “x bar”) is the symbol representing the calculated average; Xi represents each of the individual measurement values; The Greek letter tells you to sum (add) all the individual measurements; and n is the number of individual measurements(100 in this example) for your data set. Calculating the Average

  23. Calculating the Average Average = 50 Seconds 100 measurements of elapsed time for filling out a purchase order form

  24. 15% Variation and Average We would experience the average 15 times out of 100, or 15% of the time. We would experience some other value 85 times out of 100, or 85% of the time. You and your customers rarely feel the average, most often you feel the variation.

  25. Scenario 1 GREAT!! The average price is $125,000 25 homes are in the neighborhood. Scenario 2 The average price is now $170,000 5 Additional homes valued at $400,000 each. + = Same 25 homes are in the neighborhood. Variation and Average

  26. $170,000 $125,000 Variation and Average Scenario 1 Scenario 2 X XXX XXXXX XXXXXXX XXXXX XXX X XXXXX $100,000 $150,000 $200,000 $250,000 $300,000 $350,000 $400,000

  27. “The Rio Grande, on average, is only 4 feet deep. So let’s wade across!” Well, the variation is from 1 inch to 20 feet! Could you have a problem? A salesperson arrives to work on averageat 8:00 AM. Some mornings he is there as early as 7:30 and as late as 8:30. Customers needing his support sometimes get him in the morning just after 8:00, other times they get his voicemail. Some customers call the competition! You will get paid, on average, once every two weeks. But sometimes the check is three weeks late. Would this be a problem? Variation and Average

  28. You tell your customers that your delivery time is two days, on average. You have just set a customer expectation for a two day delivery. Because of the variation in your delivery process, 20 percent of the time packages take more than two days to deliver. How do you tell one out of five unhappy customers that they are just a victim of the averages? Example - Averages and Variation

  29. Average (Mean, m, Xbar) -The arithmetic average of a set of values Uses the quantitative value of each data point Is strongly influenced by extreme values Median (M) - The number that reflects the 50% mark of a set of values Can be easily identified as the center number after all values are sorted from high to low Is not affected by extreme values Mode - The value that appears most frequently Range (R) - The spread of the data from lowest to highest, calculated by subtracting the minimum value from the maximum value Sigma-A value that measures the amount of variation in a population Common Statistical Metrics For a Normal Distribution, the Mean, Median and Mode are the same.

  30. Usage of Data Exercise The purpose of this exercise is to demonstrate how easy it is to collect a data set and turn it into a Histogram. Read the following background for the exercise:Not everyone is the same height, not even in this class. As a class we will plot a Histogram on a flipchart for the distribution of everyone’s height in this classroom. Data will be gathered from the class for the purpose creating a histogram. Use the post-it notes supplied to record your height.Task 1: Write your height in inches rounded to the nearest inch. Task 2:Pass the note to the instructor. Task 3: The instructor will have one of the students read the values. Task 4: The instructor will generate a Histogram of the data

  31. Observations: Is there variation in the data? What is the Range (max value – min value)? What is the average height (sum of all heights/number of students)? Does the data create a specific shape (which distribution)? Is it symmetrical or is it skewed in one direction? What is the Mode? Exercise – Developing a Histogram

  32. Probability is the likelihood of an event occurring in the future: The weatherman predicts an 80% chance of rain tomorrow Probabilities come from facts (data) and statistics:. Wind direction, altitudes and velocity Temperature, humidity, barometric pressure Probabilities can be used to predict the outcome of single events or combinations of events: The probability it will rain (P1), that it will rain 2 or more inches (P2) and that the temperature will be 73 degrees (P3) Understanding Probability

  33. Other examples of probability: I flip a coin and pick heads as my choice. What are the chances of getting a head? One in two, or 50% I buy a lottery ticket. There is one winner and 1,000 tickets are sold. What are my chances of winning? One in one thousand or .001 Over a 6 month period of time you discover that consistently 20 percent of the products you have shipped to customers are defective. Your best performance has been 18% defective and your worst has been 22.5% defective. If you take no action to improve, what is the probability that the next 2 months of shipments will average between 15% and 25% defective? Essentially 100%, more precisely it will be 99.99…..% Understanding Probability

  34. The sum of all probabilities always equals 100%. This allows us to more easily solve Statistical Problems because one side of the equation is always known. Certainty + Uncertainty = 100% Known + Unknown = 100% Belief + Disbelief = 100% Confidence + Risk = 100% Yield + Defect Rate = 100% Understanding Probability Remember - The sum of probability is always 100%

  35. What’s the probability of rolling a 1? What’s the probability of rolling a 2? What’s the probability of rolling a 6? What’s the sum of the six probabilities of rolling a 1, a 2, a 3, a 4, a 5, a 6? Understanding the Die Probabilities Questions: Remember - the sum of all possible outcomes always equals 100%

  36. Value from Die 1 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 Value from Die 2 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Understanding the Dice Probabilities To understand the probabilities of all the combinations in the roll of two dice, you simply have to construct a matrix of all the combinations. Note: There are 36 possible outcomes for the sum of the roll of two dice with six numbers on each dice.

  37. What’s the probability of rolling an 8 with 2 dice? Value from Die 1 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 + 3 Value from Die 2 4 5 6 7 8 9 + 4 Using the matrix, we see there are 5 combinations that add up to a total of 8 5 6 7 8 9 10 + 5 6 7 8 9 10 11 + 6 7 8 9 10 11 12 Understanding the Dice Probabilities

  38. Each combination is a 1/36 chance multiplied by 5 combinations equals a 5/36 probability. Value from Die 1 1/36 + 1/36 + 1/36 + 1/36 + 1/36 _________________________________ 5/36 = 0.138 or 13.8% 1 2 3 4 5 6 1 1/36 1/36 1/36 1/36 1/36 1/36 2 1/36 1/36 1/36 1/36 1/36 1/36 + 1/36 1/36 1/36 1/36 1/36 1/36 3 Value from Die 2 + 1/36 1/36 1/36 1/36 1/36 1/36 4 + 1/36 1/36 1/36 1/36 1/36 1/36 5 + 1/36 1/36 1/36 1/36 1/36 1/36 6 Understanding the Dice Probabilities What’s the probability of rolling an 8 with 2 dice?

  39. What’s the probability of rolling a 7 with 2 dice? Value from Die 1 1 2 3 4 5 6 1/36 + 1/36 + 1/36 + 1/36 + 1/36 + 1/36 _________________________________ 6/36 = 0.167 or 16.7% 1 + 2 3 4 5 6 7 2 + 3 4 5 6 7 8 3 + Value from Die 2 4 5 6 7 8 9 4 + 5 6 7 8 9 10 5 + 6 7 8 9 10 11 6 7 8 9 10 11 12 Understanding the Dice Probabilities

  40. The purpose of this exercise is to demonstrate how a simple process will generate data and to demonstrate what probability is. Read the following background for the exercise:This exercise involves the process of rolling two dice. Each die has six sides numbered 1 through 6. After rolling the dice a number of times, the output (Y) will be a range of numbers between 2 and 12. This output is called the VOP (Voice of the Process). The process has a customer who will only accept outcomes between 3 and 11. They will not accept a 2 or a 12. This is called the VOC (Voice of the Customer). The lower limit of 3 is called the LSL (Lower Spec Limit) and the upper limit of 11 is called the USL (Upper Spec Limit) Task 1:Break into teams of two. One person will roll the dice 50 times while the other records the data on the following page. The team members will switch jobs and repeat the process. Task 2: The data will be recorded directly into a histogram. Each time a number is thrown, add an “X” in the appropriate numbered column. Task 3: Calculate the percentage of times your process was unable to meet the requirements of the customer. Task 4: When everyone is finished, your team will report your data to the Instructor for further evaluation by the class. Exercise – Meeting Customer Expectations

  41. Place an X into the column representing the value of each throw Each person will toss the dice 50 times for a total of 100 tosses for the team Quantity of Times A Number is Thrown Total Value of the Dice Throw Exercise – Meeting Customer Expectations? Count the number of X’s that appear in the 2 and 12 columns Since you made 100 tosses, the combined number is the percentage of times you failed to meet the VOC

  42. Remember, the customer required an outcome that totals 3, 4, 5, 6, 7, 8, 9, 10 or 11. What is the probability of meeting customer expectations? Is our process, the process of tossing two dice, capable of meeting the VOC? Value from Die 1 1 2 3 4 5 6 34 / 36 = 0.944 Which is 94.4% Or 5.6% Defects The Process is Not Capable! 1 X 2 3 4 5 6 7 2 3 4 5 6 7 8 3 Value from Die 2 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 X 7 8 9 10 11 12 Back to Meeting Our Customer’s Needs

More Related