1 / 21

Process Data Mining

Process Data Mining. Geoff C Jones. Data Mining using modern techniques. The truth is out there - in our plant data! But how can we find it, and how do we get to understand it? Is my process running normally? Is that QC result correct for my operating conditions?

mitchell
Download Presentation

Process Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Data Mining Geoff C Jones

  2. Data Mining using modern techniques • The truth is out there - in our plant data! • But how can we find it, and how do we get to understand it? • Is my process running normally? • Is that QC result correct for my operating conditions? • Can I predict when that column needs cleaning?

  3. What do we mean by Data Mining? • Knowledge discovery from human activity (‘information from data’) • Consumer surveys, loyalty cards, credit history • Modelling behaviour - identifying relationships & trends • Prediction of buying habits, risk, etc How does Process Data Mining differ? • Knowledge discovery from systems behaviour • Data capture (on-line process & QC data) • Modelling of normal behaviour (exploratory data analysis) • Monitoring for abnormal behaviour • Prediction of quality, yield, fouling, catalyst deactivation, etc

  4. How Can It Benefit My Business? • It enables the relationships between process and performance variables to be identified, that influence: • Yields & usages • Processing times, speed, throughput • Product impurities • Cost is low compared with the benefits

  5. How does it differ from Conventional Process Control & SPC? • Closed-loop control: • deals with dynamic behaviour • requires sampling rate much higher than system frequency response • requires cause-effect relationships to be known • SPC: • (generally) deals with steady-state systems • sampling rate determined by cost of data collection • identifies non-random system behaviour (variation due to assignable causes) • usually configured for univariate monitoring • Process Data Mining: • identifies relationships and abnormalities in multivariate systems for process optimisation and troubleshooting • three stages: exploration, modelling, validation

  6. What do I need to achieve this? • For modelling normal behaviour: • Access to historic time-stamped process & QC data • example BASF Seal Sands plant: • over 2 full years PI data for >11,000 variables, every 30 seconds • Some knowledge of what variables to monitor • Some expectation of what is to be achieved • Understanding the data • For monitoring abnormal behaviour: • A model of normal behaviour • Real-time data access • Software to identify causes of abnormal behaviour

  7. Integrated network access to historic time-stamped process & QC data • Majority of process control systems (DCS & PLC) have interfaces to external systems: • normally offer ‘firewall’ security to prevent unauthorised write-back • normally offer network connectivity • Ethernet, TCP/IP, browser-configurable • Major data historian packages have standard interfaces to these systems: • client - server • data compression comes as standard • can import QC data • clients run on same platform as business systems • GUI, Excel add-in, API & web browser • Data historian systems may be plant-dedicated, shared over a business WAN, or shared over the Internet.

  8. Internet Service Delivery Web Server Service Provider’s LAN PI Server LAN Firewall DMZ Internet FW Internet Internet FW DMZ LAN Firewall I/F Server Customer’s Business LAN DCS PC Business Network Service Delivery

  9. Some expectation of what is to be achieved: • High Profile Targets: • % Yield Improvement • % Give-away reduction • % Utility usage reduction • % Rate or batch-speed increase • Side benefits: • Reduced ‘Grey’ material when changing grades • Reduced downtime • Reduced inventories awaiting analysis - sell from line • On-line monitoring of catalyst deactivation and process fouling

  10. Constraint Poor control Improved control £ Move closer to constraint £+ Mean Process variable Time

  11. Quantifying the improvement: AFTER 5% violation Dx BEFORE Dx = 1.65(s - s ) new old

  12. Understanding the data: • To deal with abnormalities in the data: • Pre-Screening • Visualisation • Time-shift correction • Missing values • Outliers • Exploratory data analysis: • Projection Techniques (Chemometrics) • Clustering • Principal Component Analysis (PCA) • Regression • Least Squares (PLS)

  13. Time-shift correction Raw data Cross-correlation Relationship now obvious No identifiable relationship Time-shifted cross-correlation

  14. Sample with large T2 Unusual variation inside the model Sample with large Q (or SPE) Unusual variation outside the model First PC 6 4 Variable 3 Second PC 2 6 0 6 4 4 Variable 1 Variable 2 2 2 0 0 Ack to Eigenvector Research Inc.

  15. Multivariate data analysis - correlation & clustering Outlier Outlier Highly correlated data

  16. Real-time Modelling • Modelling for monitoring: • A subset of historic data is chosen which represents: • ‘common-cause’ process behaviour (no outliers) • the process operating within ‘control’ • The model will then be sensitive to outliers & out-of control data • Easier to say than it is to do: • many processes are ‘multi-modal’ • rate & grade changes - known as co-variates’ • co-variates must be ‘orthogonalised’ first, to give model sensitivity

  17. Real-time modelling • Modelling for prediction: • Can include ‘richer’ process data for better prediction away from normal operation (but excluding outliers) • Must be validated on unseen data • Model drift is a problem, as processes change with time: • Difference between predicted & actual used for monitoring fouling & catalyst deactivation • Can be used to validate QC data

  18. A selection of available software • Excel • Data analysis add-in - Moving Average, StDev, MLR • VBA programmable - on-line applications are possible • Matlab (statistics toolboxes including PCA & PLS) • http://www.ncl.ac.uk/inpact/ • http://www.eigenvector.com/ • can also be programmed for on-line monitoring • MSPC+ (off & on-line - uses embedded PI database) • http://www.mdctech.com/mspc.htm • Pirouette (exploratory data analysis) & InStep (prediction) • http://www.infometrix.com/ (demo available to download) • SIMCA-P (modelling) & SIMCA 4000 (on-line) • http://www.umetrics.com/ (demo available to download)

  19. Thank you for your attention

More Related