610 likes | 700 Views
Ch.9 Data Analysis. - Sumit Ghosh Saurabh Vishal. Definition. Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Rotary Clinker kiln.
E N D
Ch.9 Data Analysis -SumitGhoshSaurabhVishal
Definition • Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Chap 9. Data Analysis
Rotary Clinker kiln Chap 9. Data Analysis
Rotary Clinker kiln • The aim of the stoker is to keep the kiln in a "proper" state. • Kiln revolutions(KR) • Coal worm revolutions(CWR) • Burning zone temperature (BZT) • Burning zone color (BZC) • Clinker granulation(CG) • Kiln inside color(KIC) Chap 9. Data Analysis
Attributes • The condition attributes: • a - burning zone temperature(BZT) • b - burning zone color(BZC) • c - clinker granulation(CG) • d - kiln inside color(KIC) • The decision attributes: • e - kiln revolutions(KR) • f - coal worm revolutions(CWR) Chap 9. Data Analysis
Domain of Attributes • Burning zone temperature(BZT) • 1 - (1380-1420 C) • 2 - (1421-1440 C) • 3 - (1441-1480 C) • 4 - (1481-1500 C) • Burning Zone Color(BZC) • 1 - scarlet • 2 - dark pink • 3 - bright pink • 4 - very bright pink • 5 - rose white • Clinker Granulation(CG) • 1 - fines • 2 - fines with small lumps • 3 - granulation • 4 - lumps • Kiln Inside Color(KIC) • 1 - dark streaks • 2 - indistinct dark streaks • 3 - lack of dark streaks • Kiln Revolutions(KR) • 1 - 0,9 rpm • 2 - 1,22 rpm • Coal Worm Revolutions(CWR) • 1 - 0 rpm • 2 - 15 rpm • 3 - 30 rpm • 4 - 40 rpm Chap 9. Data Analysis
Stroker‘s Observation (table 1) • TIME BZT BZC CG KIC KR CWR • a b c d e f • 1 3 2 2 2 2 4 • 2 3 2 2 1 2 4 • 3 3 2 2 1 2 4 • 4 2 2 2 1 1 4 • 5 2 2 2 2 1 4 • 6 2 2 2 1 1 4 • 7 2 2 2 1 1 4 • 8 2 2 2 1 1 4 • 9 2 2 2 2 1 4 • 10 2 2 2 2 1 4 • 11 2 2 2 2 1 4 • 12 3 2 2 2 2 4 • 13 3 2 2 2 2 4 • 14 3 2 2 3 2 3 • 15 3 2 2 3 2 3 • 16 3 2 2 3 2 3 • 17 3 3 2 3 2 3 • 18 3 3 2 3 2 3 • 19 3 3 2 3 2 3 • 20 4 3 2 3 2 3 • 21 4 3 2 3 2 3 • 22 4 3 2 3 2 3 • 23 4 3 3 3 2 3 • 24 4 3 3 3 2 2 • 25 4 3 3 3 2 2 • 26 4 4 3 3 2 2 a b c d e f • 27 4 4 3 3 2 2 • 28 4 4 3 3 2 2 • 29 4 4 3 3 2 2 • 30 4 4 3 3 2 2 • 31 4 4 3 2 2 2 • 32 4 4 3 2 2 2 • 33 4 3 3 2 2 2 • 34 4 3 3 2 2 2 • 35 4 3 3 2 2 2 • 36 4 2 3 2 2 2 • 37 4 2 3 2 2 2 • 38 3 2 2 2 2 4 • 39 3 2 2 2 2 4 • 40 3 2 2 2 2 4 • 41 3 3 2 2 2 4 • 42 3 3 2 2 2 4 • 43 3 3 2 2 2 4 • 44 3 3 2 3 2 3 • 45 3 3 2 3 2 3 • 46 4 3 2 3 2 3 • 47 4 3 2 3 2 3 • 48 4 3 2 3 2 2 • 49 4 3 3 3 2 2 • 50 4 4 3 3 2 2 • 51 4 4 3 2 2 2 • 52 4 4 3 3 2 2 Chap 9. Data Analysis
After elemination of identical rows (table2) • U a b c d e f • 13 3 2 2 2 4 • 2 3 2 2 2 2 4 • 3 3 2 2 1 2 4 -------------------------------------- • 4 2 2 2 1 1 4 • 5 2 2 2 2 1 4 -------------------------------------- • 63 2 2 3 2 3 • 73 3 2 3 2 3 • 84 3 2 3 2 3 -------------------------------------- • 94 3 3 3 2 2 • 104 4 3 3 2 2 • 114 4 3 2 2 2 • 124 3 3 2 2 2 • 134 2 3 2 2 2 Chap 9. Data Analysis
Removing attribute a (table 3) • U b c d e f • 1 3 2 2 2 4 • 2 2 2 2 2 4 • 3 2 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 6 2 2 3 2 3 • 7 3 2 3 2 3 • 8 3 2 3 2 3 -------------------------------------- • 9 3 3 3 2 2 • 10 4 3 3 2 2 • 11 4 3 2 2 2 • 12 3 3 2 2 2 • 13 2 3 2 2 2 Chap 9. Data Analysis
Inconsistant (table 3) • Table 3 is inconsistent because the following pairs of decision rules • (i)b2c2d1 →e2f4(rule3) b2c2d1 → e1f4(rule4) • (ii)b2c2d2 → e2f4(rule2) b2c2d2 → e1f4(rule5) • are inconsistent. Chap 9. Data Analysis
Removing attribute b (table 4) • U a c d e f • 13 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 • It is easily seen that all decision rules in the table are consistent, hence the attribute b is superfluous. Chap 9. Data Analysis
Removing attribute c (table 5) • U a b d e f • 13 3 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 3 3 2 3 • 84 3 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 4 3 2 2 • 114 4 2 2 2 • 124 3 2 2 2 • 134 2 2 2 2 Chap 9. Data Analysis
Inconsistant table 5 • Table 5 is inconsistent because the following pairs of decision rules • a4b3d3 → e2f3(rule8) • a4b3d3 → e2f2(rule9) • are inconsistent. Chap 9. Data Analysis
Removing attribute d (table 6) • U a b c e f • 13 3 2 2 4 • 2 3 2 2 2 4 • 3 3 2 2 2 4 -------------------------------------- • 4 2 2 2 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 2 2 3 • 73 3 2 2 3 • 84 3 2 2 3 -------------------------------------- • 94 3 3 2 2 • 104 4 3 2 2 • 114 4 3 2 2 • 124 3 3 2 2 • 134 2 3 2 2 Chap 9. Data Analysis
Inconsistent table 6 • Table 6 is inconsistent because the following pairs of decision rules • (i)a3b3c2 → e2f4(rule1) a3b3c2 → e2f3(rule7) • (ii)a3b2c2 → e2f4(rule3) a3b2c2 → e2f3(rule6) • are inconsistent. Chap 9. Data Analysis
Result • Thus without one of the attributes a,c or d Table 2 becomes inconsistent, and without the attribute b the table remains consistent. Attribute b can be dropped from the table. • We recall that, if there are two or more identical decision rules in a table we should drop all but one, arbitrary representative. Chap 9. Data Analysis
After removing attribute b (table 4) • U a c d e f • 13 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 63 2 3 2 3 • 73 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 94 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 Chap 9. Data Analysis
Check duplicate rows • U a c d e f • 1 3 2 2 2 4 • 2 3 2 2 2 4 • 3 3 2 1 2 4 -------------------------------------- • 4 2 2 1 1 4 • 5 2 2 2 1 4 -------------------------------------- • 6 3 2 3 2 3 • 7 3 2 3 2 3 • 84 2 3 2 3 -------------------------------------- • 9 4 3 3 2 2 • 104 3 3 2 2 • 114 3 2 2 2 • 124 3 2 2 2 • 134 3 2 2 2 Chap 9. Data Analysis
After removing duplicate rules (table 7) • U a c d e f • 13 2 2 2 4 • 2 3 2 1 2 4 ---------------------------------- • 3 2 2 1 1 4 • 4 2 2 2 1 4 ---------------------------------- • 53 2 3 2 3 • 64 2 3 2 3 ---------------------------------- • 74 3 3 2 2 • 8 4 3 2 2 2 Chap 9. Data Analysis
Substitute decisions • In this decision table there are four kinds of possible decision, which are specified by the following pairs of values of decision attributes e and f : • (e2,f4)→ I, • (e1,f4) → II, • (e2,f3) → III and • (e2,f2) → IV Chap 9. Data Analysis
After substituting decision (table 8) • U a c d e f • 13 2 2 I • 2 3 2 1 ---------------------------------- • 3 2 2 1 II • 4 2 2 2 ---------------------------------- • 53 2 3 III • 64 2 3 ---------------------------------- • 74 3 3 IV • 8 4 3 2 Chap 9. Data Analysis
Removing superfluous values • Now removing superfluous values of condition attributes from the table. • For this purpose we have to compute which attribute values are dispensable or indispensable with respect to each decision class and find out more core values and reductvalues for each decision rule. That means we are looking only for those attributes values which are necessary to distinguish all decision classes, i.e. preserving consistency of the table. Chap 9. Data Analysis
Calculating core for rule 1 • Let us compute core values and reduct values for the first decision rule • a3c2d2 → e2f4 (rule1) in Table 8 • Values a and d are indispensable in the rule, since the following pairs of rules are inconsistent. • (i)c2d2 → e2f4(rule1) c2d2 → e1f4(rule4) • (ii)a3c2 → e2f4(rule1) a3c2 → e1f3(rule5) • whereas the attribute value c2 is dispensable, since the decision rule a3d2 → e2f4 is consistent. Thus a3and d2 are core values of the decision value a3c2d2 → e2f4. Chap 9. Data Analysis
Compute core using proposition 7.1. • To this end we have to check whether the following inclusions • |c2d2| ⊆ |e2f4| ; |a3d2| ⊆ |e2f4| and |a3c2| ⊆ |e2f3| are valid or not. Because we have • |c2d2| = {1, 4}, • |a3c2| = {1, 2, 5}, • |a3d2| = {1} and • |e2f4| = {1, 2}, • hence only the decision rule a3d2 → e2f4 is true, and consequently the core values of the first decision rule are a3 and d2. Chap 9. Data Analysis
Core values (table 9) • U a c d e f • 13 - 2 I • 2 3 - 1 ---------------------------------- • 3 2 - - II • 4 2 - - ---------------------------------- • 5- - 3 III • 6- 2 - ---------------------------------- • 7 - 3 - IV • 8 - - - Chap 9. Data Analysis
Reduct for I and II • It can be easily seen that in the decision classes I and II sets of core values of each decision rule are also reducts, because rules • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • are true. Chap 9. Data Analysis
for III and IV • For the decision classes III and IV however core values do not form value reducts. For example decision rules • d3 → e2f3(rule5) • d3 → e2f2(rule7) • are inconsistent, and so are decision rules • c2 → e2f3(rule6) • c2 → e1f4(rule4) • hence, according to the definition, they do not form reducts. Chap 9. Data Analysis
Reduct values (table 10) • U a c d e f • 1 3 X 2 I • 2 3 X 1 ---------------------------------- • 3 2 X X II • 4 2 X X ---------------------------------- • 5 X 2 3 III • 5’ 3 X 3 • 6 4 2 X • 6’ X 2 3 ---------------------------------- • 7 X 3 X IV • 8 4 3 X • 8’ X 3 2 • 8’’ 4 X 2 Chap 9. Data Analysis
Minimal solution • It is easy to see that there are not superfluous decision rules in class I and II. For decision class III we have two minimal solutions • c2d3 → e2f3 • and • a4c2 → e2f3 • a3d3 → e2f3 • and for class IV we have one minimal solution • c3 → e2f2 Chap 9. Data Analysis
Minimal algorithm • hence we have following two decision minimal algorithms • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • c2d3 → e2f3 • c3 → e2f2 • and • a3d2 → e2f4 • a3d1 → e2f4 • a2 → e1f4 • a3d3 → E2f3 • a4c2 → e2f3 • c3 → e2f2 Chap 9. Data Analysis
Combined forms • The combined forms of these algorithms are • a3d1 V a3d2 → e2f4 • a2 → e1f4 • c2d3 → e2f3 • c3 → e2f2 • and • a3d1 V a3d2 → e2f4 • a2 → e1f4 • a3d3 V a4c2 → e2f3 • c3 → e2f2 Chap 9. Data Analysis
Another Approach • Example of cement kiln control (cf, Sandness (1986)) • In which actions of a stoker are based not on the kiln state but on the quality of the cement produced Chap 9. Data Analysis
Described by the followingattributes • a - Granularity • b - Viscosity • c - Color • d - pH level which are assumed to be condition attributes Again there are two decision(action) attributes • e - Rotation Speed • f - Temperature Chap 9. Data Analysis
Interesting Note • The table is not obtained as a result of the stoker's actions observation, and does not represent the stoker's knowledge • But it contains the prescription which the stoker should follow in order to produce cement of required quality. Chap 9. Data Analysis
Dispensable Attribute • we find out that attribute b is again dispensable with respect to the decision attributes, • which means that the viscosity is a superfluous condition, which can be dropped without affecting the decision procedure. Chap 9. Data Analysis
Re-numeration of decision rules can be simplified Chap 9. Data Analysis
Compute the Core values Chap 9. Data Analysis
The case of Inconsistent Data • when a decision table is the result of observations or measurements • It may happen that the table is inconsistent • Some observed or measured data can be conflicting. • This finally leads to partial dependency of decision and condition attributes • But we are more interested in consistent data some times inconsistent data could also be interested Chap 9. Data Analysis
Condition and Decision attributes • Condition attributes • a – temperature • b – Dry-cough • c – headache • d – Muscle pain • Decision attributes • e - influenza Chap 9. Data Analysis
Decision rule 4 and 5 is inconsistent • Rule 4- if (temperature,subfeb) and (dry cough,present) and (muscle pain,absnet) then (influenza,absent) • Rule 5 -if (temperature,subfeb) and (dry cough,present) and (muscle pain,absnet) then (influenza,present) Similar with rule 7 and rule 8. remaining 5 decision rules are true, Chap 9. Data Analysis
Dependency • So the dependency between decision and condition attributes is • 5/9 • This means the condition attributes are not sufficient to decide whether a patient has influenza or not. • But in consistent decision rule we can classify patient having influenza Chap 9. Data Analysis
Decompose the decision table • Into Consistent and Inconsistent tables • Inconsistent consists of rule 4,5,7 and 8 • Rest of the rules consist of consistent parts Chap 9. Data Analysis
Consistent rules in the Table • Rule 1- if (temperature,normal) and (dry cough,absent) and (muscle pain,absnet) then (influenza,absent) • Rule 2 -if (temperature,normal) and (dry cough,absent) and (muscle pain,present) then (influenza,absent) Chap 9. Data Analysis
Consistent rules in the Table • Rule 3- if (temperature,subfeb) and (dry cough,absent) and (muscle pain,present) then (influenza,present) • Rule 6 -if (temperature,high) and (dry cough, absent) and (muscle pain,absent) then (influenza, absent) Chap 9. Data Analysis
Consistent rules in the Table • Rule 9 –if (temperature,high) and (dry cough, present) and (muscle pain,present) then (influenza, present) • We have to compute the core of the condition attributes Chap 9. Data Analysis
Shorten decision and condition attributes using tabular notation Chap 9. Data Analysis
Note attribute c and d are equivalent, we can drop one of them Chap 9. Data Analysis
Compute Core of attributes removing ‘a’ Rule 2 & rule 3 are inconsistent, which will change the consistent rules of decision algorithm. so ‘a ‘is indispensable Chap 9. Data Analysis