1 / 61

- Sumit Ghosh Saurabh Vishal

Ch.9 Data Analysis. - Sumit Ghosh Saurabh Vishal. Definition. Analysis of data  is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Rotary Clinker kiln.

tamika
Download Presentation

- Sumit Ghosh Saurabh Vishal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch.9 Data Analysis -SumitGhoshSaurabhVishal

  2. Definition • Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Chap 9. Data Analysis

  3. Rotary Clinker kiln Chap 9. Data Analysis

  4. Rotary Clinker kiln • The aim of the stoker is to keep the kiln in a "proper" state. • Kiln revolutions(KR) • Coal worm revolutions(CWR) • Burning zone temperature (BZT) • Burning zone color (BZC) • Clinker granulation(CG) • Kiln inside color(KIC) Chap 9. Data Analysis

  5. Attributes • The condition attributes: • a - burning zone temperature(BZT) • b - burning zone color(BZC) • c - clinker granulation(CG) • d - kiln inside color(KIC) • The decision attributes: • e - kiln revolutions(KR) • f - coal worm revolutions(CWR) Chap 9. Data Analysis

  6. Domain of Attributes • Burning zone temperature(BZT) • 1 - (1380-1420 C) • 2 - (1421-1440 C) • 3 - (1441-1480 C) • 4 - (1481-1500 C) • Burning Zone Color(BZC) • 1 - scarlet • 2 - dark pink • 3 - bright pink • 4 - very bright pink • 5 - rose white • Clinker Granulation(CG) • 1 - fines • 2 - fines with small lumps • 3 - granulation • 4 - lumps • Kiln Inside Color(KIC) • 1 - dark streaks • 2 - indistinct dark streaks • 3 - lack of dark streaks • Kiln Revolutions(KR) • 1 - 0,9 rpm • 2 - 1,22 rpm • Coal Worm Revolutions(CWR) • 1 - 0 rpm • 2 - 15 rpm • 3 - 30 rpm • 4 - 40 rpm Chap 9. Data Analysis

  7. Stroker‘s Observation (table 1) • TIME BZT BZC CG KIC KR CWR • a b c d e f • 1 3 2 2 2 2 4 • 2 3 2 2 1 2 4 • 3 3 2 2 1 2 4 • 4 2 2 2 1 1 4 • 5 2 2 2 2 1 4 • 6 2 2 2 1 1 4 • 7 2 2 2 1 1 4 • 8 2 2 2 1 1 4 • 9 2 2 2 2 1 4 • 10 2 2 2 2 1 4 • 11 2 2 2 2 1 4 • 12 3 2 2 2 2 4 • 13 3 2 2 2 2 4 • 14 3 2 2 3 2 3 • 15 3 2 2 3 2 3 • 16 3 2 2 3 2 3 • 17 3 3 2 3 2 3 • 18 3 3 2 3 2 3 • 19 3 3 2 3 2 3 • 20 4 3 2 3 2 3 • 21 4 3 2 3 2 3 • 22 4 3 2 3 2 3 • 23 4 3 3 3 2 3 • 24 4 3 3 3 2 2 • 25 4 3 3 3 2 2 • 26 4 4 3 3 2 2 a b c d e f • 27 4 4 3 3 2 2 • 28 4 4 3 3 2 2 • 29 4 4 3 3 2 2 • 30 4 4 3 3 2 2 • 31 4 4 3 2 2 2 • 32 4 4 3 2 2 2 • 33 4 3 3 2 2 2 • 34 4 3 3 2 2 2 • 35 4 3 3 2 2 2 • 36 4 2 3 2 2 2 • 37 4 2 3 2 2 2 • 38 3 2 2 2 2 4 • 39 3 2 2 2 2 4 • 40 3 2 2 2 2 4 • 41 3 3 2 2 2 4 • 42 3 3 2 2 2 4 • 43 3 3 2 2 2 4 • 44 3 3 2 3 2 3 • 45 3 3 2 3 2 3 • 46 4 3 2 3 2 3 • 47 4 3 2 3 2 3 • 48 4 3 2 3 2 2 • 49 4 3 3 3 2 2 • 50 4 4 3 3 2 2 • 51 4 4 3 2 2 2 • 52 4 4 3 3 2 2 Chap 9. Data Analysis

  8. After elemination of identical rows (table2) • U a b c d e f • 13 3 2 2 2 4 • 2 3 2 2 2 2 4 • 3 3 2 2 1 2 4 -------------------------------------- • 4 2 2 2 1 1 4 • 5 2 2 2 2 1 4 -------------------------------------- • 63 2 2 3 2 3 • 73 3 2 3 2 3 • 84 3 2 3 2 3 -------------------------------------- • 94 3 3 3 2 2 • 104 4 3 3 2 2 • 114 4 3 2 2 2 • 124 3 3 2 2 2 • 134 2 3 2 2 2 Chap 9. Data Analysis

  9. Removing attribute a (table 3) • U b c d e f • 1 3 2 2 2 4 • 2 2 2 2 2 4 • 3 2 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 6 2 2 3 2 3 • 7 3 2 3 2 3 • 8 3 2 3 2 3 -------------------------------------- • 9 3 3 3 2 2 • 10 4 3 3 2 2 • 11 4 3 2 2 2 • 12 3 3 2 2 2 • 13 2 3 2 2 2 Chap 9. Data Analysis

  10. Inconsistant (table 3) • Table 3 is inconsistent because the following pairs of decision rules • (i)b2c2d1 →e2f4(rule3) b2c2d1 → e1f4(rule4) • (ii)b2c2d2 → e2f4(rule2) b2c2d2 → e1f4(rule5) • are inconsistent. Chap 9. Data Analysis

  11. Removing attribute b (table 4) • U a c d e f • 13 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 • It is easily seen that all decision rules in the table are consistent, hence the attribute b is superfluous. Chap 9. Data Analysis

  12. Removing attribute c (table 5) • U a b d e f • 13 3 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 3 3 2 3 • 84 3 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 4 3 2 2 • 114 4 2 2 2 • 124 3 2 2 2 • 134 2 2 2 2 Chap 9. Data Analysis

  13. Inconsistant table 5 • Table 5 is inconsistent because the following pairs of decision rules • a4b3d3 → e2f3(rule8) • a4b3d3 → e2f2(rule9) • are inconsistent. Chap 9. Data Analysis

  14. Removing attribute d (table 6) • U a b c e f • 13 3 2 2 4 • 2 3 2 2 2 4 • 3 3 2 2 2 4 -------------------------------------- • 4 2 2 2 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 2 2 3 • 73 3 2 2 3 • 84 3 2 2 3 -------------------------------------- • 94 3 3 2 2 • 104 4 3 2 2 • 114 4 3 2 2 • 124 3 3 2 2 • 134 2 3 2 2 Chap 9. Data Analysis

  15. Inconsistent table 6 • Table 6 is inconsistent because the following pairs of decision rules • (i)a3b3c2 → e2f4(rule1) a3b3c2 → e2f3(rule7) • (ii)a3b2c2 → e2f4(rule3) a3b2c2 → e2f3(rule6) • are inconsistent. Chap 9. Data Analysis

  16. Result • Thus without one of the attributes a,c or d Table 2 becomes inconsistent, and without the attribute b the table remains consistent. Attribute b can be dropped from the table. • We recall that, if there are two or more identical decision rules in a table we should drop all but one, arbitrary representative. Chap 9. Data Analysis

  17. After removing attribute b (table 4) • U a c d e f • 13 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 Chap 9. Data Analysis

  18. Check duplicate rows • U a c d e f • 1 3 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 6 3 2 3 2 3 • 7 3 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 9 4 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 Chap 9. Data Analysis

  19. After removing duplicate rules (table 7) • U a c d e f • 13 2 2 2 4 • 2 3 2 1 2 4 ---------------------------------- • 3 2 2 1 1 4 • 4 2 2 2 1 4 ---------------------------------- • 53 2 3 2 3 • 64 2 3 2 3 ---------------------------------- • 74 3 3 2 2 • 8 4 3 2 2 2 Chap 9. Data Analysis

  20. Substitute decisions • In this decision table there are four kinds of possible decision, which are specified by the following pairs of values of decision attributes e and f : • (e2,f4)→ I, • (e1,f4) → II, • (e2,f3) → III and • (e2,f2) → IV Chap 9. Data Analysis

  21. After substituting decision (table 8) • U a c d e f • 13 2 2 I • 2 3 2 1 ---------------------------------- • 3 2 2 1 II • 4 2 2 2 ---------------------------------- • 53 2 3 III • 64 2 3 ---------------------------------- • 74 3 3 IV • 8 4 3 2 Chap 9. Data Analysis

  22. Removing superfluous values • Now removing superfluous values of condition attributes from the table. • For this purpose we have to compute which attribute values are dispensable or indispensable with respect to each decision class and find out more core values and reductvalues for each decision rule. That means we are looking only for those attributes values which are necessary to distinguish all decision classes, i.e. preserving consistency of the table. Chap 9. Data Analysis

  23. Calculating core for rule 1 • Let us compute core values and reduct values for the first decision rule • a3c2d2 → e2f4 (rule1) in Table 8 • Values a and d are indispensable in the rule, since the following pairs of rules are inconsistent. • (i)c2d2 → e2f4(rule1) c2d2 → e1f4(rule4) • (ii)a3c2 → e2f4(rule1) a3c2 → e1f3(rule5) • whereas the attribute value c2 is dispensable, since the decision rule a3d2 → e2f4 is consistent. Thus a3and d2 are core values of the decision value a3c2d2 → e2f4. Chap 9. Data Analysis

  24. Compute core using proposition 7.1. • To this end we have to check whether the following inclusions • |c2d2|  ⊆ |e2f4| ; |a3d2|  ⊆ |e2f4| and |a3c2|  ⊆ |e2f3| are valid or not. Because we have • |c2d2| = {1, 4}, • |a3c2| = {1, 2, 5}, • |a3d2| = {1} and • |e2f4| = {1, 2}, • hence only the decision rule a3d2 → e2f4 is true, and consequently the core values of the first decision rule are a3 and d2. Chap 9. Data Analysis

  25. Core values (table 9) • U a c d e f • 13 - 2 I • 2 3 - 1 ---------------------------------- • 3 2 - - II • 4 2 - - ---------------------------------- • 5- - 3 III • 6- 2 - ---------------------------------- • 7 - 3 - IV • 8 - - - Chap 9. Data Analysis

  26. Reduct for I and II • It can be easily seen that in the decision classes I and II sets of core values of each decision rule are also reducts, because rules • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • are true. Chap 9. Data Analysis

  27. for III and IV • For the decision classes III and IV however core values do not form value reducts. For example decision rules • d3 → e2f3(rule5) • d3 → e2f2(rule7) • are inconsistent, and so are decision rules • c2 → e2f3(rule6) • c2 → e1f4(rule4) • hence, according to the definition, they do not form reducts. Chap 9. Data Analysis

  28. Reduct values (table 10) • U a c d e f • 1 3 X 2 I • 2 3 X 1 ---------------------------------- • 3 2 X X II • 4 2 X X ---------------------------------- • 5 X 2 3 III • 5’ 3 X 3 • 6 4 2 X • 6’ X 2 3 ---------------------------------- • 7 X 3 X IV • 8 4 3 X • 8’ X 3 2 • 8’’ 4 X 2 Chap 9. Data Analysis

  29. Minimal solution • It is easy to see that there are not superfluous decision rules in class I and II. For decision class III we have two minimal solutions • c2d3 → e2f3 • and • a4c2 → e2f3 • a3d3 → e2f3 • and for class IV we have one minimal solution • c3 → e2f2 Chap 9. Data Analysis

  30. Minimal algorithm • hence we have following two decision minimal algorithms • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • c2d3 → e2f3 • c3 → e2f2 • and • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • a3d3 → E2f3 • a4c2 → e2f3 • c3 → e2f2 Chap 9. Data Analysis

  31. Combined forms • The combined forms of these algorithms are • a3d1 V a3d2 → e2f4 • a2 → e1f4 • c2d3 → e2f3 • c3 → e2f2 • and • a3d1 V a3d2 → e2f4 • a2 → e1f4 • a3d3 V a4c2 → e2f3 • c3 → e2f2 Chap 9. Data Analysis

  32. Another Approach • Example of cement kiln control (cf, Sandness (1986)) • In which actions of a stoker are based not on the kiln state but on the quality of the cement produced Chap 9. Data Analysis

  33. Described by the followingattributes • a - Granularity • b - Viscosity • c - Color • d - pH level which are assumed to be condition attributes Again there are two decision(action) attributes • e - Rotation Speed • f - Temperature Chap 9. Data Analysis

  34. Interesting Note • The table is not obtained as a result of the stoker's actions observation, and does not represent the stoker's knowledge • But it contains the prescription which the stoker should follow in order to produce cement of required quality. Chap 9. Data Analysis

  35. Dispensable Attribute • we find out that attribute b is again dispensable with respect to the decision attributes, • which means that the viscosity is a superfluous condition, which can be dropped without affecting the decision procedure. Chap 9. Data Analysis

  36. Re-numeration of decision rules can be simplified Chap 9. Data Analysis

  37. Compute the Core values Chap 9. Data Analysis

  38. The case of Inconsistent Data • when a decision table is the result of observations or measurements • It may happen that the table is inconsistent • Some observed or measured data can be conflicting. • This finally leads to partial dependency of decision and condition attributes • But we are more interested in consistent data some times inconsistent data could also be interested Chap 9. Data Analysis

  39. Chap 9. Data Analysis

  40. Condition and Decision attributes • Condition attributes • a – temperature • b – Dry-cough • c – headache • d – Muscle pain • Decision attributes • e - influenza Chap 9. Data Analysis

  41. Decision rule 4 and 5 is inconsistent • Rule 4- if (temperature,subfeb) and (dry cough,present) and (muscle pain,absnet) then (influenza,absent) • Rule 5 -if (temperature,subfeb) and (dry cough,present) and (muscle pain,absnet) then (influenza,present) Similar with rule 7 and rule 8. remaining 5 decision rules are true, Chap 9. Data Analysis

  42. Dependency • So the dependency between decision and condition attributes is • 5/9 • This means the condition attributes are not sufficient to decide whether a patient has influenza or not. • But in consistent decision rule we can classify patient having influenza Chap 9. Data Analysis

  43. Decompose the decision table • Into Consistent and Inconsistent tables • Inconsistent consists of rule 4,5,7 and 8 • Rest of the rules consist of consistent parts Chap 9. Data Analysis

  44. Consistent rules in the Table • Rule 1- if (temperature,normal) and (dry cough,absent) and (muscle pain,absnet) then (influenza,absent) • Rule 2 -if (temperature,normal) and (dry cough,absent) and (muscle pain,present) then (influenza,absent) Chap 9. Data Analysis

  45. Consistent rules in the Table • Rule 3- if (temperature,subfeb) and (dry cough,absent) and (muscle pain,present) then (influenza,present) • Rule 6 -if (temperature,high) and (dry cough, absent) and (muscle pain,absent) then (influenza, absent) Chap 9. Data Analysis

  46. Consistent rules in the Table • Rule 9 –if (temperature,high) and (dry cough, present) and (muscle pain,present) then (influenza, present) • We have to compute the core of the condition attributes Chap 9. Data Analysis

  47. Shorten decision and condition attributes using tabular notation Chap 9. Data Analysis

  48. Note attribute c and d are equivalent, we can drop one of them Chap 9. Data Analysis

  49. Compute Core of attributes removing ‘a’ Rule 2 & rule 3 are inconsistent, which will change the consistent rules of decision algorithm. so ‘a ‘is indispensable Chap 9. Data Analysis

More Related