Artificial Intelligence for Data Mining in the Context of Enterprise Systems

Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau

Overview • Background • Research Question • Data Sources • Methodology • Implementation • Results • Conclusion

Background • Information distortion in the supply chain • Difficult for manufacturers to forecast

Current solutions • Exponential Smoothing • Moving Average • Trend • Etc.. • Wide range of software forecasting solutions • M3 Competition research tests most forecasting solutions and finds the simplest work best

Artificial Intelligence • Universal Approximators • Artificial Neural Networks (ANN) • Recurrent Neural Networks (RNN) • Support Vector Machines (SVM) • Theorectically should be able to match or outperform any traditional forecasting approach.

Neural Networks • Learns by adjusting weights of connections • Based on empirical risk minimization • Generalization can be improved by: • Cross Validation based early stopping • Levenberg-Marquardt with Bayesian Regularization

Support Vector Machine • Learns be separating data in a different feature space with support vectors • Feature space can often be a higher or lower dimensionality space than the input space • Based on structural risk minimization • Optimality guaranteed • Complexity constant controls the power of the machine

Support Vector Machine CV • 10-fold Cross Validation based optimization of Complexity Constant • More effective than NN because of guaranteed optimality

SVM Complexity Example • SVM Complexity Constant optimization based on 10-Fold Cross Validation

Research Question • For a manufacturer at the end of the supply chain who is subject to demand distortion: • H1: Are AI approaches better on average than traditional approaches (error) • H2: Are AI approaches better than traditional approaches (rank) • H3: Is the best AI approach better than the best traditional

Data Sources • Chocolate Manufacturer (ERP) • Toner Cartridge Manufacturer (ERP) • Statistics Canada Manufacturing Survey

Methodoloy • Experiment • Using top 100 from 2 manufacturers and random 100 from StatsCan • Comparison based on out-of-sample testing set

Implementation • Experiment programmed in MATLAB • Using existing toolbox where possible (eg, NN, ARMA, etc) • Programming missing ones • SVM implemented using mySVM called from MATLAB

Experimental Groups

Super Wide model • Time series are short • Very noisy because of supply chain distortion • Super Wide model combined data from many products • Much larger amount of data to learn from • Assumes similar patterns occur in the group of products.

Result Table (Chocolate)

Results Table (Toner)

Results Table (StatsCan)

Results Discussion • AI provides a lower forecasting error on average. (H1=Yes) • However, this is only because of the extremely poor performance of trend based forecasting • Traditional ranked better than AI. (H2=No) • Extreme trend error has no impact on rank. • SVM Super Wide performed better than the best traditional (ES). (H3=Yes) • However, exponential smoothing was found to be the best and no non-super-wide AI technique reliably performed better.

Results SVM Super Wide details • SVM Super Wide performed better than all others • Isolated to SVM / Super Wide combination only • Other Super Wide did not reliably perform better than ES • Other SVM models did not perform better than ES • Dimensionality augmentation/reduction (non-linearity) is important • Super Wide SVM performed better than Super Wide MLR

Conclusion • When unsure, us Exponential Smoothing it is the simplest and second best. • Super Wide SVM provides the best performance • Cost-benefit analysis by a manufacturer should help decide if the extra effort is justified. • If implementations of this technique proves useful in practice, eventually it should be built into ERP systems. Since it may not be feasible to build for SME.

Implications • Useful for forecasting models which should include more information sources / more variables (Economic indicators, product group performances, marketing campaigns) because: • Super Wide = More observations • SVM+CV = Better Generalization • Not possible with short and noisy time series on their own.

Artificial Intelligence for Data Mining in the Context of Enterprise Systems