1 / 42

CHEE320

CHEE320. Module 1: Graphical Methods for Analyzing Data, and Descriptive Statistics. Graphical Methods for Analyzing Data. What is the pattern of variability? Techniques histograms dot plots stem and leaf plots box plots quantile plots. Histogram.

adriel
Download Presentation

CHEE320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHEE320 Module 1: Graphical Methods for Analyzing Data, and Descriptive Statistics J. McLellan

  2. Graphical Methods for Analyzing Data What is the pattern of variability? Techniques • histograms • dot plots • stem and leaf plots • box plots • quantile plots J. McLellan

  3. Histogram • summary of frequency with which certain ranges of values occur • ranges - “bins” • choosing bin size - influences ability to recognize pattern • too large - data clustered in a few bins - no indication of spread of data • too small - data distributed with a few points in each bin - no indication of concentration of data • there are quantitative rules for choosing the number of bins - typically automated in statistical software • not automated in Excel! J. McLellan

  4. Histogram - Important Features symmetry? number of peaks tails? - extreme data points max, min data values - range of values spread in the data centre of gravity J. McLellan

  5. Dot Plots • similar to histogram • plot data by value on horizontal axis • stack repeated values vertically • look for similar shape features as for histogram • e.g., data set for solder thickness • {0.1, 0.12, 0.09, 0.07, 0.09, 0.11, 0.1, 0.13, 0.1} 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 J. McLellan

  6. Stem and Leaf Plots • illustrate variability pattern using the numerical data itself • choose base division - “stem” • build “leaves” by taking digit next to base division Data 12.00 10.00 14.00 20.00 18.00 18.00 25.00 21.00 36.00 44.00 11.00 15.00 22.00 21.00 27.00 25.00 18.00 21.00 18.00 20.00 Decimal point is 1 place to the right of the colon 1 : 0124 1 : 58888 2 : 001112 2 : 557 3 : 3 : 6 Stems 10-14 15-19 20-24 Tooth Discoloration by Fluoride 25-29 Leaves J. McLellan

  7. Stem and Leaf Plots Solder example • numbers viewed as 0.070, 0.080, 0.090, 0.100, 0.110,… • decision - what is the stem? • considerations similar to histogram - size of bins Decimal point is 2 places to the left of the colon 7 : 0 8 : 9 : 00 10 : 000 11 : 0 12 : 0 13 : 0 J. McLellan

  8. Box Plots • graphical representation of “quartile” information • quartiles - describe how data occurs - ordering • 1st quartile - separates bottom 25% of data • 2nd quartile (median) - separates bottom 50% of data • 3rd quartile - separates bottom 75% of data and extreme data values • add “whiskers” - extend from box to largest data point within • upper quartile + 1.5 * interquartile range • lower quartile - 1.5 * interquartile range • interquartile range = Q3 - Q1 • plot outliers - data points outside Q3 + 1.5*IQR, Q1-1.5*IQR J. McLellan

  9. Interpretation no outliers relatively symmetric distribution longer tails on both sides fairly tightly clustered about centre Box Plot - for solder data J. McLellan

  10. Interpretation no outliers asymmetric distribution - long lower tail some tails on both sides fairly tightly clustered at higher range of discoloration Box Plot - for teeth discoloration J. McLellan

  11. Quantile Plots • plot cumulative progression of data • values vs. cumulative fraction of data • comparison to standard distribution shapes • e.g., normal distribution, lognormal distribution, … • can be plotted on special axes • analogous to semi-log graphs to provide visual test for closeness to given distribution • e.g., test to see if data are normally distributed J. McLellan

  12. Interpretation data don’t follow linear progression underlying distribution not normal? Quantile Plot - teeth discoloration Note the irregular spacing - similar to “semi-log” paper - cumulative points should follow linear progression on this scale if distribution is normal. J. McLellan

  13. Graphical Methods for Quality Investigations • primary purpose - help organize information in quality investigation Examples • Pareto Charts • Fishbone diagrams - Ishikawa diagrams J. McLellan

  14. Pareto Chart • used to rank factors • typically present as a bar chart, listing in descending order of significance • significance can be determined by • number count - e.g., of defects attributed to specific causes • by size of effect - e.g., based on coefficients in regression model J. McLellan

  15. Example - Circuit Defects Number of Defects Attributed to: Stamping_Oper_ID 1 Stamping_Missing 1 Sold._Short 1 Wire_Incorrect 1 Raw_Cd_Damaged 1 Comp._Extra_Part 2 Comp._Missing 2 Comp._Damaged 2 TST_Mark_White_Mark 3 Tst._Mark_EC_Mark 3 Raw_CD_Shroud_Re. 3 Sold._Splatter 5 Comp._Improper_1 6 Sold._Opens 7 Sold._Cold_Joint 20 Sold._Insufficient 40 Data from Montgomery J. McLellan

  16. Pareto Chart • for circuit defect data J. McLellan

  17. Fishbone Diagrams • organize causes in analysis • have spine, with cause types branching from spine, and sub-groups branching further Example - factors influencing poor conversion in reactive extrusion catalyst used - metallocene/Ziegler-Natta half-life initiator type polymer grade poor conversion barrel temperature temperature distribution along barrel temperature control J. McLellan

  18. Graphical Methods for Analyzing Data Looking for time trends in data... • Time sequence plot • look for • jumps • ramps to new values • meandering - indicates time correlation in data • large amount of variation about general trend - indication of large variance } indicate shift in mean operation J. McLellan

  19. Time Sequence Plot - for naphtha 90% point - indicates amount of heavy hydrocarbons present in gasoline range material excursion - sudden shift in operation meandering about average operating point - time correlation in data J. McLellan

  20. Graphical Methods for Analyzing Data Monitoring process operation • Quality Control Charts • time sequence plots with added indications of variation • account for fluctuations in values associated with natural process noise • look for significant jumps - shifts - that exceed normal range of variation of values • if significant shift occurs, stop and look for “assignable causes” • essentially graphical “hypothesis tests” • can plot - measurements, sample averages, ranges, standard deviations, ... J. McLellan

  21. Example - Monitoring Process Mean • is the average process operation constant? • collect samples at time intervals, compute average, and plot in time sequence plot • indication of process variation - standard deviation estimated from prior data • propagates through sample average calculation • if “s” is sample standard deviation, calculated averages will lie between of the historical average 99% of the time if the mean operation has NOT shifted • values outside this range suggest that a shift in the mean operation has occurred - alarm - “something has happened” ± 3 s / n J. McLellan

  22. Example - Monitoring Process Mean • time sequence plot with these alarm limits is referred to as a “Shewhart X-bar Chart” • X-bar  - sample mean of X centre-line or target line - indicates mean when process is operating properly upper and lower control limits no points exceed limits  in a state of statistical control J. McLellan

  23. Example - Monitoring Process Mean Point exceeds region of natural variation - significant shift has occurred • X-bar chart J. McLellan

  24. Graphical Methods for Analyzing Data Visualizing relationships between variables Techniques • scatterplots • scatterplot matrices • also referred to as “casement plots” J. McLellan

  25. Scatterplots ,,, are also referred to as “x-y diagrams” • plot values of one variable against another • look for systematic trend in data • nature of trend • linear? • exponential? • quadratic? • degree of scatter - does spread increase/decrease over range? • indication that variance isn’t constant over range of data J. McLellan

  26. Scatterplots - Example • tooth discoloration data - discoloration vs. fluoride trend - possibly nonlinear? J. McLellan

  27. Scatterplot - Example • tooth discoloration data -discoloration vs. brushing signficant trend? - doesn’t appear to be present J. McLellan

  28. Scatterplot - Example • tooth discoloration data -discoloration vs. brushing Variance appears to decrease as # of brushings increases J. McLellan

  29. Scatterplot matrices … are a table of scatterplots for a set of variables Look for - • systematic trend between “independent” variable and dependent variables - to be described by estimated model • systematic trend between supposedly independent variables - indicates that these quantities are correlated • correlation can negatively ifluence model estimation results • not independent information • scatterplot matrices can be generated automatically with statistical software, manually using Excel J. McLellan

  30. Scatterplot Matrices - tooth data J. McLellan

  31. Describing Data Quantitatively Approach - describe the pattern of variability using a few parameters • efficient means of summarizing Techniques • average - (sample “mean”) • sample standard deviation and variance • median • quartiles • interquartile range • ... J. McLellan

  32. Sample Mean - “Average” Given “n” observations xi : Notes - • sensitive to extreme data values - outliers - value can be artificially raised or lowered n 1 = å x x i n = i 1 J. McLellan

  33. Sample Variance • sum of squared deviations about the average • squaring - notion of distance (squared) • average - is the centre of gravity • sample variance provides a measure of dispersion - spread - about the centre of gravity n 1 2 2 = - å s ( x x ) i - n 1 = i 1 Note - there is an alternative form of this equation which is more convenient for computation. Note that we divide by “n-1”, and NOT “n” - degrees of freedom argument J. McLellan

  34. Sample Standard Deviation … is simply • sample standard deviation provides a more direct link to dispersion • e.g., for Normal distribution • 95% of values lie within 2 standard devn’s of the mean • 99% of values like within 3 standard devn’s of the mean 2 = s s J. McLellan

  35. Range • provides a measure of spread in the data • defined as maximum data value - minimum data value • can be sensitive to extreme data points • is often monitored in quality control charts to see if process variance is changing J. McLellan

  36. “Order” Statistics … summarize the progression of observations in the data set Quartiles • divide the data in quarters Deciles • divide the data in tenths ... J. McLellan

  37. Quartiles • order data - N data points {yi}, i=1,…N • if N is odd, • median is observation • if N is even, • median is • i.e., midpoint between two middle points y + ( N 1 ) / 2 y y N N + + 1 2 2 2 J. McLellan

  38. Quartiles - Q1 and Q3 • Q1: Compute (N+1)/4 = A.B • Q3: Compute 3(N+1)/4 = A.B • i.e., interpolate between adjacent points • Note - there are other conventions as well - e.g., for Q1, take bottom half of data set, and take midpoint between middle two points if there are an even number of points... = + - Q 1 y B * ( y y ) + A A 1 A J. McLellan

  39. Quartiles - Example • solder data set • observations • 0.1, 0.12, 0.09, 0.07, 0.09, 0.11, 0.1, 0.13, 0.1 • ordered: 0.07, 0.09, 0.09, 0.1, 0.1, 0.1, 0.11, 0.12, 0.13 • 9 points --> median is 5th observation: 0.1 • Q1: (N+1)/4 = 2.5 • Q1 = 0.09+0.5*(0.09-0.09) = 0.9 • Q3: 3(N+1)/4 = 7.5 • Q3 = 0.11 + 0.5*(0.12-0.11) = 0.115 J. McLellan

  40. Robustness … refers to whether a given descriptive statistic is sensitive to extreme data points Examples • sample mean • is sensitive to extreme points - extreme value pulls average toward the extreme • sample variance • sensitive to extreme points - large deviation from the sample mean leads to inflated variance • median, quartiles • relatively insensitive to extreme data points J. McLellan

  41. Robustness -Solder Data Example • replace 0.13 by 0.5 - output from Excel With 0.13 With 0.5 J. McLellan

  42. Robustness • Other robust statistics • “m-estimator” - involves iterative filtering out of extreme data values, based on data distribution • trimmed mean - other bases for eliminating extreme data point effect • median absolute deviation J. McLellan

More Related