670 likes | 678 Views
This thesis explores an algorithm for constructing Bayesian networks to predict economic and development indicators of 248 countries. It addresses challenges like missing values, dimensionality, and interpretability. The study compares related algorithms and discusses the limitations of manual network building. Leveraging domain knowledge and automatic methods can optimize model performance. The proposed algorithm integrates domain knowledge, statistical structure learning, and resource restrictions to streamline network construction and evaluation. By categorizing variables and considering dependencies, the model aims to provide a reliable framework for economic prediction.
E N D
An Algorithm for the Automatic Construction of Bayesian Networks with limited Domain Knowledgeas applied to the prediction of Economic and Development Indicators of 248 Countries and World Regions presented to the Faculty of the Graduate School at the University of Missouri-Columbia In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science By Fernando Javier Torre-Mora Dr. Yi Shang, Thesis Supervisor MAY 2016
The Problem Computational Motivation Related algorithms
Interpretability problem • García, Fernández, Luengo, and Herrera 2009 • Casillas, Cordón, Herrera, and Magdalena 2013 • Liu, Cocea and Gegov2016 Trustworthyness Meaningfulness Useful knowledge
Bayesian Network PrevCompensation Prev Capital Formation Inflation Prev Capital PrevConsumption PrevInvestment Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Bayesian Network drawbacks • Usually built manually, then trained • Requires full knowledge of how domain works • Automatic methods exist • Order dependent (Cooper and Herskovits 1992) • Real example: 2016 Inflation 2015 Wages 2016 Wages
Bayesian Network drawbacks • Usually built manually • Requires full knowledge of how domain works • Automatic methods exist • Order dependent (Cooper and Herskovits 1992) • Real example: 2016 Wages 2016 Inflation 2015 Wages
Bayesian Network drawbacks • Usually built manually • Requires full knowledge of how domain works • Automatic methods exist • Order dependent (Cooper and Herskovits 1992) • Real example: • Counter-intuitive results • Genetic Algorithm (Chickering and Heckerman 1997) • Random ordering • Counter-intuitiveness may be optimized • Statistical structure learning (Friedman et al. 1999) • Evaluates all possible candidates
Statistical structure-learning: all vs all PrevCompensation Prev Capital Formation Inflation n=16⇒ Prev Capital PrevConsumption PrevInvestment 20,922,789,888,000comparisons approx.O(n!) Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Statistical structure learning: categories PrevCapFrom PrevCompens Inflation Prev Capital Capital Form Prev Invest PrevConsump Cap-labor ratio Compensation Interest Capital GPD Investment Exogenous Workers Consumption
Statistical structure learning: groups Resource Restrictions Smoothing Estimation Parameter Stickiness Resource Restrictions Estimation Parameters Economy
Grouped all vs all n=16⇒ 1,213,440 comparisons O(c!n22c) • Estimation parameter stickiness • Resource Restriction smoothing • Resource restrictions • Estimation parameters • Economy
Friedman et al. 1999 main drawback PrevCompensation Prev Capital Formation Inflation Prev Capital PrevConsumption PrevInvestment Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Proposed Algorithm Basic idea Representation Pipeline Applicability
Basic idea: Domain Knowledge graph PrevCapFrom PrevCompens Inflation Prev Capital Capital Form Prev Invest PrevConsump Cap-labor ratio Compensation Interest Capital GPD Investment Exogenous Workers Consumption
Proposed Smets-WoutersDomain Knowledge Resource Restrictions Smoothing n=16⇒ 306 Comparisons O Estimation Parameter Stickiness Resource Restrictions Estimation Parameters Economy
Proposed UNESCODomain Knowledge Previous Year Economy Innovation Education Production Economy
UNESCO variables Previous Year Economy PreviousGDP PreviousGrowth Innovation Education Journal Assistance Secondary Primary Tertiary Government Trademarks Production Economy Services Industry Manufacturing Agriculture Growth GDP Unemployment
UNESCO sample generated network PreviousGrowth PreviousGDP Primary Secondary Tertiary Assistance Journal Government Trademarks Manufacturing Unemployment Agriculture Services Industry GDP Growth
UNESCO Domain Knowledge model representation Previous Year Economy Categorization hashmap Category relation Graph Prev GDP Prev Year Econ Prev Growth Primary Innovation Educ Secondary Education Tertiary Industry Prod Services Unemployment Innov Journal Economy Production Trademarks Government Econ Prev GDP Prev Growth
Basic Dataflow Bayesian Network Construction Training Data Test data Bayesian Network Evaluation
Basic Dataflow Bayesian Network Construction Training Data Test data Bayesian Network Evaluation Domain Knowledge variable categories Dependency evaluation Domain Knowledge model • Weighted Domain Knowledge model Domain Knowledge graph
Dependency evaluation Compensation Inflation Interest PrevCompens Capital Form Prev Cap From Cap-labor ratio
Arc Evaluation 1− > min • Standard Error for Least SquaresLinear Regression (STE) • Consistent with Correlation Coefficient • Consistent with Mutual Information Coefficient • Non-Symmetric 1− Capital Form ? Prev Cap From Weight
Dependency evaluation (cont.) Compensation Inflation Interest PrevCompens Capital Form Prev Cap From Cap-labor ratio Weight
Dependency evaluation (cont.) PrevCompens Prev Cap From Inflation Compensation Inflation Interest PrevCompens Compensation Capital Form Capital Form Interest Cap-labor ratio Prev Cap From Cap-labor ratio Threshold
Basic Dataflow (cont) Bayesian Network Construction Training Data Test data Bayesian Network Evaluation Domain Knowledge variable categories • Shenoy-Shafer algorithm CPT computation Dependency evaluation Domain Knowledge model • Weighted Domain Knowledge model • Untrained Bayesian Network • Trained Bayesian Network • Accuracy Domain Knowledge graph
Sample applicability 1 Background Eyes Foreground Vegetation Detection
Sample applicability 2 Molecular function Eyes Biological process Cellular Componen Phenotype
Sample applicability 3 Environment Drugs Activity Health Mood
Sample applicability 3 Temp Precip Humid Coffee Speed Alcohol illegal Posture Talking Heart Mood Survey Body Temp Breath
Dataset Dataset Motivation Related work in the domain Missing value problem Domain Knowledge models
The economic prediction problem Dot com crash Great Recession Thai Dong collapse
Machine Learning economic prediction(Manually designed, algorithmically trained) Shaaf, Mohamed, 2000 Gonzalez, Steven, 2000 US Treasury Bond Prev. year GDP Growth rate Prev Employ Cons. Confid. Real reaturn Curr employ Gov spend. Curr. Year GDP Curr. Year GDP
The Smets-Wouters economic model for the United States, 2007 PrevCompensation Prev Capital Formation Inflation Prev Capital PrevConsumption PrevInvestment Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Data availability for each Smets-Woutersvariable, United States PrevCompensation Prev Capital Formation Inflation Prev Capital PrevConsumption PrevInvestment Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Data availability for each Smets-Woutersvariable, global average PrevCompensation Prev Capital Formation Inflation Prev Capital PrevConsumption PrevInvestment Capital-labor ratio Capital Formation Compensation Interest Exogenous Capital Workers Consumption Investment GPD
Smets-Wouters categories Resource Restrictions Smoothing PrevCapFrom PrevCompens Inflation Estimation Parameter Stickiness Prev Capital Resource Restrictions Capital Form Prev Invest PrevConsump Cap-labor ratio Compensation Interest Estimation Parameters Economy Capital Investment Exogenous GPD Consumption Workers
Proposed Smets-Wouters Domain Knowledge (revisited) Resource Restrictions Smoothing Estimation Parameter Stickiness Resource Restrictions Estimation Parameters Economy
Smets-Wouters Domain Knowledge model representation (revisited) Resource Restrictions Smoothing Categorization hashmap Category relation Graph Inflation Res RestrSmth PrevCompens Prev Cap From PrevConsump Estimation Parameter Stickiness Est Para Sticki Resource Restrictions Prev Invest Prev Capital Res Restr Compensation Interest Capital Form Est Para Cap-labor ratio Consumption Estimation Parameters Economy Investment Econ Workers Capital GDP
Proposed UNESCO Domain Knowledge (revisited) Previous Year Economy Innovation Education Production Economy
UNESCO Domain Knowledge model representation (revisited) Previous Year Economy Categorization hashmap Category relation Graph Prev GDP Prev Year Econ Prev Growth Primary Innovation Educ Secondary Education Tertiary Industry Prod Services Unemployment Innov Journal Economy Production Trademarks Government Econ Prev GDP Prev Growth