1 / 46

Chap 2 Introduction to Statistics

Chap 2 Introduction to Statistics. This chapter gives overview of statistics including histogram construction, measures of central tendency, and dispersion. INTRODUCTION TO STATISTICS. Statistics – deriving relevant information from data Deals with

ginger
Download Presentation

Chap 2 Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap 2 Introduction to Statistics This chapter gives overview of statistics including histogram construction, measures of central tendency, and dispersion

  2. INTRODUCTION TO STATISTICS • Statistics – deriving relevant information from data • Deals with • Collection of data – census, GDP, football, accident, no. of employees (male, female , department, etc) • Collection , tabulation, analysis, interpretation, an presentation of quantitative data – can make some conclusions on sample or population studied, make decisions on quality

  3. INTRODUCTION TO STATISTICS • Use of statistics in quality deals with second meaning. – inductive statistics • Examples : • What can we learn from the data? • What conclusions can be drawn? • What does the data tell about our process and product performance? etc.

  4. INTRODUCTION TO STATISTICS • Understand the use of statistics vital in business • to make decisions based on facts • in conducting business improvements • in controlling and monitoring process, products or service performance • Application of statistics to real life problems such as for quality problems will result in improved organizational performance

  5. Collection of data • Collect Data – direct observation or indirect through written or verbal questions (market research, opinion polls) • Direct observation measured, visual checking, classified as variables and attributes • Variables data – measurable quality characteristics • Attributes – characteristics not measured but classified as conforming or non-conforming

  6. Collection of data • Data collected with purpose • Find out process conditions • For improvement • Variables – quality characteristics that are measurable and countable • CONTINUOUS - Dimensions, weight, height, etc. (meter, gallon, p.s.i., etc.) • DISCRETE - numbers that exhibit gaps, countable, (no. of defective parts, no. of defects/car, Whole numbers, 1, 2, 3….100)

  7. Collection of data • Attributes - quality characteristics that are non-measurable and ‘those we do not want to measure’ • Example : surface appearance, color, Acceptable, non-acceptable conforming, non-conf. • Data collected in form of discrete values • Variables (weight of sugar) CAN be classified as attributes  • weight within limits – number of conforming • outside limits – no. of non conforming

  8. Summarizing Data • Consider this data set on number of Daily Billing errors • Data in this from • Meaningless • Not effective • Difficult to use

  9. Need to summarize data in the form of: • Graphical – Freq. Dist., Histogram, Graphs, Charts, Diagrams • Analytical – Measures of central tendency, Measure of dispersion

  10. Frequency Distribution (FD) • Summary of how data (observations) occur within each subdivision or groups of observed values • Help visualize distribution of data • Can see how total frequency is distributed • Two types : • Ungrouped data – listing of observed values • Grouped data – lump together observed values

  11. FD - Ungrouped Data • Establish array, arrange in ascending or descend (as in column 1) • Tabulate the frequency – place tally marking in column 2 • Present in graphical form – Histogram, Relative freq. distr.

  12. 14 12 10 8 6 4 2 0 4 1 2 3 5 FD – Ungrouped data • 4 graphical representations • Frequency histogram • Relative freq histogram • Cumulative frequency histogram • Relative cum frequency histogram Frequency

  13. Frequency Distribution For Grouped Data • Data which are continuous variable need grouping Steps 1. Collect data and construct tally sheet • Make tally - coded if necessary • Too many data – group into cells • Simplify presentation of distribution • Too many cells – distort true picture • Too few cells – too concentrated • No of cells – judgment by analyst – trial and error • Generally 5-20 cells • Less than 100 data – use 5 –9 cells • 100 – 500 data – use 8 to 17 cells • More than 500 – use 15 to 20 cells

  14. Cell interval (i) CELL Midpoint UPPER BOUNDARY CELL NOMENCLATURE

  15. 2. Determine the range • R = XH - XL • R = range • XH = highest value of data • XL = lowest value of data • Example : • If highest number is 2.575 and lowest number is 2.531, then • R = XH - XL • = 2.575 – 2.531 • = 0.044

  16. 3. Determine the cell interval • Cell interval = distance between adjacent cell midpoints. If possible, use odd interval values e.g. 0.001, 0.07, 0.5 , 3; so that midpoint values will have same no. decimal places as data values. • Use Sturgis rule. • i = R/(1+ 3.322 log n) • Trial and error • h = R/i ;h= number of cells or cllases • Assume i = 0.003; h = 0.044/0.003 = 15 cells • Assume i = 0.005; h = 0.044/0.005 = 9 cells • Assume ii = 0.007; h = 0.044/0/.007 = 6 cells • Cell interval 0.005 with 9 cells will give best presentation of data. Use guidelines in step 1.

  17. 2.533 2.538 4. Determine cell midpoints • MPL = XL + i/2 (do not round) • = 2.531 + 0.005/2 = 2.533 • 1st cell have 5 different values (also the other cells) 2.531 2.532 2.533 2.534 2.535

  18. 5. Determine cell boundaries • Limit values of cell • lower • upper • To avoid ambiguity in putting data • Boundary values have an extra decimal place or sig. figure in accuracy that observed values • + 0.0005 to highest value in cell • - 0.0005 to lowest value in cell

  19. 6. Tabulate cell frequency • Post amount of numbers in each cell • Frequency distribution table

  20. Freq dist gives better view of central value and how data dispersed than the unorganized data sheet • Histogram – describes variation in process • Used to • solve problems • determine process capability • compare with specifications • suggest shape of distribution • indicate data discrepancies, e.g. gaps

  21. Sym. Skew Right Skew Left Bi-modal flatter platykurtic ‘very peak’ leptokurtic Characteristics Of Frequency Distribution • Symmetry, Number of modes (one, two or multiple), Peakedness of data

  22. Characteristics of Frequency Distribution • F.D. can give sufficient info to provide basis for decision making. • Distributions are compared regarding:- Shape Spread Location

  23. Descriptive Statistics • Analytical method allow comparison between data • 2 main analytical methods for describing data • Measures of central tendency • Measures of dispersion • Measures of central tendency of a distribution - a numerical value that describes the central position of data • 3 common measures • mean • median • mode

  24. Measure of Central Tendency • Mean - most common measure used • What is middle value? What is average number of rejects, errors, dimension of product? • Mean for Ungrouped Data - unarranged • x (x bar)

  25. Mean Example A QA engineer inspects 5 pieces of a tyre’s thread depth (mm). What is the mean thread depth? x1 = 12.3 x2 = 12.5 X3 = 12.0. x4 = 13.0 x5 = 12.8

  26. Mean - Grouped Data • When data already grouped in frequency distribution fi (n)= sum. of freq. fi = freq in the ith cell n = no. of cells/class xi = mid point in ith cell

  27. Mean - Grouped Data = 2700/50 = 54

  28. Weighted average Tensile tests aluminium alloy conducted with different number of samples each time. Results are as follows: 1st test : x1 = 207 MPa n = 5 2nd test : x2 = 203 MPa n = 6 3rd test : x3 = 206 MPa n = 3 or use sum of weights equals 1.00 W1 = 5/(5+6+3) = 0.36 W2 = 6/(5+6+3) = 0.43 W3 = 3/(5+6+3) = 0.21 Total = 1.00 xw = weighted avg. wi = weight of ith average

  29. Median – Ungrouped Data • Median – value of data which divides total observation into 2 equal parts • Ungrouped data – 2 possibilities • When total number of data (N) is a) odd or b) even • If N is odd ; (N+1/2)th value is median • eg. 3 4 5 6 8 N+1/2=6/2=3 , 3rd no. • If N is even • eg. 3 5 7 9 ½ of (5+7)=6 • NOTE: ORDER THE NUMBERS FIRST!

  30. Median – Grouped Data • Need to find cell / class having middle value & interpolating in the cell using Lm = lower boundary of cell with the median Cfm = Cum. freq. of all cells below Lm fm =class/cell freq. where median occurs i = cell interval Example MD = 40.5 + 10 = 53.5

  31. Measures of dispersion • describes how the data are spread out or scattered on each side of central value • both measures of central tendency & dispersion needed to describe data • Exams Results • Class 1 – avg. : 60.0 marks • highest : 95 • lowest : 25 • Class 2 – avg. : 60.0 marks • highest : 100 • lowest : 15 marks

  32. Measures of dispersion • Main types – range, standard deviation, and variance • Range – difference bet. highest & lowest value • R = XH - XL • Standard deviation • Variance – standard deviation squared • Large value shows greater variability or spread

  33. Standard deviation • For Ungrouped Data • s = sample std. dev. xi = observed value x = average n = no. of observed value • or use

  34. Standard deviation – grouped data NOTE: DO NOT ROUND OFF fixi & fixi2 ACCURACY AFFECTED

  35. Concept Of Population and Sample • Total daily prod. of steel shaft. • Year’s Prod. Volume of calculators • Compute x and s sample statistics • True Population Parameters •  and  • Why sample? • not possible measure population • costs involved • 100% manual inspection – accuracy/error Population Sample

  36. POPN. Parameter  - mean  - std. dev. SAMPLE Statistics, x , s Concept Of Population and Sample

  37. ND Normal Distribution • Also called Gaussian distribution • Symmetrical, unimodal, bell-shaped dist with mean, median, mode same value • Popn. curve – as sample size  cell interval  - get smooth polygon

  38. Normal Distribution • Much of variation in nature & industry follow N.D. • Variation in height of humans, weight of elephants, casting weights, size piston ring • Electrical properties, material – tensile strength, etc.

  39. Example - ND

  40. Characteristics of ND • Can have different mean but same standard deviation

  41. Different standard deviation but same mean

  42. Relationship between std deviation and area under curve

  43. Normal Distribution Example • Need estimates of mean and standard deviation and the Normal Table • Example : • From past experience a manufacturer concludes that the burnout time of a particular light bulb follows a normal distribution. Sample has been tested and the average (x ) found to be 60 days with a standard deviation () of 20 days. How many bulbs can be expected to be still working after 100 days.

  44. Solution • Problem is actually to find area under the curve beyond 100 days • Sketch Normal distribution and shade the area needed • Calculate z value corresponding to x value using formula • Z=(xi - )/ = (100-60)/20 = +2.00 • Look in the Normal Table for z = +2.00 – gives area under curve as 0.9773 • But, we want x >100 or z > 2.00. Therefore Area = 1.000 – 0.9773 = 0.0227, i.e. 2.27% probability that life of light bulb is > 100 hours σ =20 μ = 60 0 100 x

  45. Test For Normality • To determine whether data is normal • Probability Plot - plot data on normal probability paper • Steps • Order the data • Rank the observations • Calculate the plotting position i= rank , n=sample size, PP= plotting position in % • Label data scale • Plot the points on normal probability paper • Attempt to fit by eye ‘best line’ • Determine normality

  46. Example

More Related