150 likes | 258 Views
Visualization of high dimensional data by use of genetic programming – application to on- line infrared spectroscopy based process monitoring Tibor Kulcsár, János Abonyi University of Pannonia Department of Process Engineering. „The perfect is not good enough !” (Carl Benz).
E N D
Visualization of high dimensional data by use of genetic programming – application to on-line infrared spectroscopy based process monitoring Tibor Kulcsár, János Abonyi University of Pannonia Department of ProcessEngineering „The perfect is notgoodenough!” (Carl Benz)
Preconditions • Online analyzers are widely used in oil industry to predict product properties like Density, Cloudpoint, etc. • Properties can’t be described using linear models • Visualization of high dimensional spectral database is needed for model development and proces monitoring • Cost function and a tool for equation discovery is needed to obtain compact and interpretable mappingof high dimensional data
Similarspectra - Similarproperty im Ev Dmax Rsphere = 3 Percentage of Dmax corresponding to the radius of the sphere
S1 N1 S2 N2 S3 N3 S4 N4 S5 N5 S6 N6 Finding similar spectra • Prediction model • Nearest Neighbors algorithm • The neighborhood is basis of the prediction • 2D mapping • Define the range of validity for the local models • The mapped plain should follow the original spectral space • Quality measure • Measure the quality of mapping • Measure the neighborhood preserving X Property X = f ( Prop[S1, S2, S3, S4, S5, S6] )
Chemical information – interpretable? aromatic linear 1.2 olefinic Linear 1 Aromatic Saturated .8 Aromatic Branched / cyclonic Branched Absorbency Saturated .6 .4 Olefinic Ethylenic Aromatic .2 0 4000 4800 4100 4200 4300 4400 4500 4600 4700 Wavenumber (cm-1)
Aggregates – need for explicit mapping Twoaggregate Aggrage 2 2D mapping Aggrage 1
Representation of Aggregates • One of the most popular method for representing structures is the binary tree. Non terminal nodes Operators: +,-,*,/ Functions: exp(),cos() Terminal nodes: Variables: x1, x2 Parameters: p1, p2
- x1 - / + p1 x1 / p1 x1 x2 * x1 x2 Genetic Operators: Mutation
Genetic Operators: Crossover - + x1 / + x2 + p1 x1 p1 x1 x2 - + / + x1 x2 + p1 x1 p1 x1 x2
Scheme of Genetic Programing Creation of initialpopulation Parameteroptimization Evaluation Fitnessvalue End? Selection Crossover Direct reproduction Mutation New generation End
Results Best pair from original set Best eq and an optimised pair Searche a better pair
Conclusion • The quality of mapping is measureable • Neighborhood preserving (forward and backward) • Discriminating operational regimes • Aggregate based mapping • Interpretable chemical information • Build aggregate – needs much experience (divination) • Genetic programing • Controlled method to make new equations • Needs proper cost function(measure the quality of mapping) • Visual representation of models • Aggregate -> 2D plot -> dashboard graph • Information about the model structure
Questions? … In case of any question or remark please contact us kulcsart@fmt.uni-pannon.hu ACKNOWLEDGMENT The financial support of the TAMOP-4.2.2/B-10/1-2010-0025 project is acknowledged.