650 likes | 831 Views
Industrial Diagnosis by Hyper Space Data Mining. Presented at AAAI 99 Spring Symposiumon Equipment Diagnosis Stanford University March 23, 1999.
E N D
Industrial Diagnosis by Hyper Space Data Mining Presented at AAAI 99 Spring Symposiumon Equipment Diagnosis Stanford University March 23, 1999 Dr. Dongping (Daniel) Zhu Zaptron Systems, Inc.Mountain View, CA 94043Tel: 650-966-8700, Fax: 650-966-8780E-mail: zhu@zaptron.comhttp://www.zaptron.com Zaptron, 1999
OUTLINE • Diagnosis overview: applications & technologies • Hyperspace data mining • Diagnostic examples • product quality control (steel making) • resolve bottleneck (gasoline production) • improve yield (chemical plan) • Conclusions • MasterMiner™ demo Zaptron, 1999
Diagnosis &Trouble-Shooting • Cost of support to products/services • Customer satisfaction • Key Issues • how to best approach the same problem next time • how to use history information - data mining • how to update KB • Solutions • on-line help • web-based, remote diagnostics • knowledge management tools • data mining (history data are available) Zaptron, 1999
D Mining KD(D+K) K Updating A Web-based Diagnostic System Call Centers Service Teams Support Teams Data Collecting Mechanisms Standardization Data Management Product Delivery Mechanisms Training tools Web-based diagnosis On-line Help SW Remote Repairs Factor Analysis KB manage Zaptron, 1999
History Database Fault Physics Primary Cases Cause Analysis Fix Fault Diagnose Rule Base Diagnostic Matrix Self Learning Query New data & Cases Update Database Rule-based Diagnostic Process Zaptron, 1999
Data Base {a, b} KB {Mij} Web Users Interviewer (fi, hj) WebGUI K Collector (aijl, bikl) Analyzer, Visualizer KB Builder (Mijk) Problem Solver (Search Engine) Self Learner rijk Expert System Architecture Zaptron, 1999
Evolution of Diagnostic Techniques • Equipment and Processes • Sensors • Data • Databases • Data Models • Data Patterns (behavior in space) • Data Fusion, sensor fusion • Data Mining • Data …… Zaptron, 1999
Data Mining: Techniques • Correlation/association analysis • Factor analysis • Trend prediction & forecasting • Neural networks • Genetic algorithms • Fuzzy logic, expert systems • Uncertainty reasoning (DS, rough sets) • Bayessian Networks • Hyper space data mining - • find data pattern first • no model assumption • provide solutions to failure isolation/recognition Zaptron, 1999
Hyper Space Data Mining • Introduction • Diagnosis - An optimization problem • A Hyper Space Technology • Application Examples • SW: MasterMiner™ Zaptron, 1999
A General Issue • For any system - find a model to describe Relation ships Nonlinear High noise M-variant (no model) Operating data record In situ sensor report Raw materials composition Design/operating process parameters Failure & fault Bottle neck Energy use Cost/risk Quality Yield/returns Reliability Productivity Zaptron, 1999
A Catch 21 Problem Data Pattern <--?--> Data Model Questions: • what type of data to collect • which data to use in modeling Solution: • Hyperspace data mining Zaptron, 1999
To Start - A Real Case Aluminum Production Problem Target: to Optimize the Leaching Rate of Al2O3 Factors: • a1 - Fe/Al in the ore • a2 - Sodium Na/(Al2O3+Fe2O3)) • a3 - leaching temperature • a4 - lime (CaO)/(SiO2-TiO2) 2 Solutions: • Principal Component Analysis (PCA) by SAS JMP or RS/1 - bad • Hyperspace data mining by Zaptron MasterMiner™ - good result Zaptron, 1999
Can you see the pattern? • If not, do data mining to separate into subspaces Zaptron, 1999
A Real Case - PCA Result: no separation Zaptron, 1999
A Real Case - MasterMiner: good separation Zaptron, 1999
MasterMiner2nd step: complete separation Zaptron, 1999
A Real Case - MasterMiner: build a model Zaptron, 1999
Steps in Data Mining History Data Separability Test Pretreatment: local view, delete outliers Linearity, topological type, correlation, association, best matching point, NN points Data Mining Feature reduction (entropy, voting) Feature Selection Inequality, equations, PLS, sensitivity, advisory Modeling (PH, MREC, ANN, GA) State diagnosis by using current operation data Extrapolation to optimal zone for max yield Propose an optimal operating condition or new materials Equations as criteria for optimal control Map description of cross-sections of normal op zone & failure zones Zaptron, 1999
Clustering - Data Separation PCA - projection in the max separable direction Fisher: line projection with max distance between clusters MREC: projective geometry, better than either Data Base Data Mining Data Patterns One-sided (voting) Inclusive (entropy) Exclusive Sandwich Zaptron, 1999
Software Architecture DataBase Pattern Recognitin GUI KnowBase Artificial Neural Nets Genetic Algorithm Zaptron, 1999
MasterMiner™ Functions Zaptron, 1999
MasterMiner™ Tools • Data loading, editing, sorting, calculation • Preprocessing: statistics, Feature selection, folding • Factor analysis • target-factor analysis • factor-factor analysis • Projections • Fisher, LMAP, PCA, PLS, MREC • Modeling • envelope, auto-box, Sphere, KL, ANN (train, estimation, sensitivity) • Extrapolation • PLS vector (linear),Simplex, appending, Zaptron, 1999
Virtual Mining Tools for Convex and concave space • Virtual mining in hyper space • Hidden projection - tunnel model • Envelope - generate a convex polyhedron • Use “auto-box” for concave polyhedrons of samples • Interchange of data classes • Folding transform (to change data pattern in space) • Virtual mining of data samples • divide into multiple segments • convert concave polyhedron into convex ones • build the model for each subspace • separability went from 31% to 96% in one case Zaptron, 1999
Virtual Mining Methods (b) The Envelop-Boxing method (a) Tunnel model to separate data samples in hyper space (c) Generate convex polyhedrons from a concave one Zaptron, 1999
Iterative Feature Selection/Reducton • Data pattern classified into 2 topological classes • “one-sided class” • “inclusive class” • Hidden projections applied • Projected factors are orthogonal in hyper space • Feature selection method (highly effective): • Entropy method is used for inclusive pattern • Voting method is used for one-sided pattern • Reduce features to reduce noise & complexity • e.g., good result based on 5 features out of 500 • Reduced feature set needs to pass Separation test Zaptron, 1999
MREC - Map Recognition Method MREC - Projection in the best direction, complete separation in 2 steps PCA: No separation Zaptron, 1999
We have Improved the Quality of alloy steels carbon fiber reinforced, resin-based composite materials Bi2O3-containing High Tc superconductors rare earth containing phosphor electrode materials of Ni/H batteries VPTC ceramic semi-conductor high temperature, SiC-based structural ceramics high-polymers: PVC, synthetic fiber & rubber, polyethylene, ... high energy materials semi-conductor devices MOCVD method of III-V compound film Zaptron, 1999
We have applied MasterMiner™ to Industrial Optimization & Diagnosis • Petrochemical industry • distillation • hydro-cracking • vapor recovery • platinum reforming • delayed cooking • de-waxing • vinyl acetate • polypropylene • jet fuel (Union Oil recipe, yield 87% -> 94%, +6,000 ton/yr) • increase life of catalyst in polyvinyl plant (catalyst cost $1.2MM) • etc. Zaptron, 1999
We have applied MasterMiner™ to Industrial Optimization & Diagnosis • Metallurgical Industry • blast furnace • casting • alloy steels quality improving (60% -> 80%) • energy saving in aluminum production • Automobile Industry • electro-plating • heat treatment • Chemical Industry • PVC, polyformaldhyde • butadiene rubber Zaptron, 1999
Data Mining Process Optimization Materials Design Application Areas Equipment Process Diagnosis Petrochemical Industry Metallurgical Industry Semiconductor Industry • GOAL: Optimal control of complex processes involving • Heat transfer • Mass transfer • Fluid flow • Chemical reactions Zaptron, 1999
Pattern Recognition Methods • Linear Regression (LS) - “forced fitting” • LS fitting coefficients as model parameters, the “best wish” • PCA - principal component analysis • projection in “best” direction, select two directions, LS • LMAP - linear mapping • NN - neural nets • blind learning, over-fitting, forced fitting • origin at cluster center, covered with an ellipsoidal, PCA • MREC - map recognition (non linear) • polyhedrons, hidden projections, separation, back-mapping • NNREC - neural nets + MREC Zaptron, 1999
Comparison of Various Methods CONDITION METHOD TO USE 1. (in some cases) Rule-based expert systems Mechanism known 2. (in 20% cases) Linear regression, statistical method Linear w/o noise 3. (in most cases) Hyper-space data mining Highly noisy Multi-variant Non Gaussian Zaptron, 1999
good separation No separation Why not Principle Component Analysis (PCA) ? Principle Component Analysis (PCA)Data Mining by MasterMiner Linear nonlinear, Hierarchical Gaussian Non-Gaussian Low noise High noise Use all data in modeling Use subset of data in modeling 20 projections 2 projections Zaptron, 1999
Why not Least Square Only ? PLSapplies whenPRESS < 0.3 (1/4 of cases in our practice) PROJECT PRESS (Error) synthetic rubber 0.2052 (can use PLS) steel plate for ship building 0.6419 (can not use PLS) rare earth phosphor 0.3067 Baoshan Iron & Steel 0.3441 Ni/H battery 0.7389 Ni/H materials 0.1932 propylene recovery (noisy data) 0.7755 propylene recovery 0.3752 solvent oil 0.3975 VPTC 0.1330 hydro-cracking plant 0.2055 methanol production 0.8255 casting for car 0.9157 Zaptron, 1999
Wrong zone by ANN c Zone by MasterMiner b Why not Neural Networks (GA) Only ? • Over-fitting problem by NN (GA) • Industrial records are not complete • e.g. Leaching rate problem at an aluminum Co. • Leaching rate = f(a, b, c, T) • A cross-section of the • optimal zone: • by ANN: too large • by our Yield Mater™: smaller Zaptron, 1999
Applications in Diagnosis • Equipment setup • steel making (roller distance, • oil refinery (bottleneck in gasoline production) • chemical plans (cooling pipe length, inlet position) • Process optimization • drug fermentation • environmental emission controls • materials manufacturing Zaptron, 1999
Blasting furnace Steel making Casting Hot rolling Cold rolling E.g. 1 Steel Making ST14 steel plate for auto body • German equipment, yield 10,000 tons/yr • Problem - “deep pressing” property • 100 = 5x20 factors in 5 stages • 2 major factors: • N2 - Nitrogen content should be reduced • d1/d2- distance ratio of cold rollers increased • Benefit - wasted steel reduced by 5 times Zaptron, 1999
2nd issue: QC in ST14 Steel Plate Making Feed of Scrap, CaO, MgO, Iron Ore O2 blower Ladle Zaptron, 1999
Problem Background • After each batch, samples were taken in a 3-min test for QC • Need to control the amount of O2 blown and scrap added • Japanese case-based reasoning SW --> 65% separability • Problem: ST14 quality is off-spec • We used MasterMiner to build a model for QC • Target: FC (C content in steels, 17-30% by customer spec) • 13 Factors • Model built and used to control product quality • Result: 100% separability, products are on-spec Zaptron, 1999
Feature Selection Feature selectedProperty LY age of O2 gun (years)PLH height of O2 gun DYSLT O2 amount (m3) before sampling DYCD C content at sampling time (10-2 %) DYTEMP liquid iron temperature when sampling (C°) PCAO amount of CaO used PMGO amount of MgO added PORE amount of iron ore added WCH total charge of the converter in ton TOIRON total liquid iron SCAPT amount of scrap LDLIFE life of ladle used to transport liquid iron QO2 amount of O2 blown after sampling Zaptron, 1999
114 Sample Data Zaptron, 1999
Target-Feature Maps Zaptron, 1999
Data Separation by MasterMiner: 100% Zaptron, 1999
Data Separation by PCA: 30% Zaptron, 1999
Feature Selection (1) - Principle component regression Zaptron, 1999
Feature Selection (2) - PLS (partial least square) Zaptron, 1999
Feature Selection (3) - KW method (linear) Zaptron, 1999
Tunnel Models: 32 Inequalities Zaptron, 1999
Quality Control Issue • Solve the set of 32 equations • or use “appending” operation • assign values to uncontrollable factors • add N random samples • project them onto the N-dimensional space • select those falling into the optimal space • Results: • The C content of ST14products are on-specs Zaptron, 1999
Add Random Samples (green) Zaptron, 1999