250 likes | 265 Views
Explore the role of machine learning in economic statistics transformation, challenges faced, applications for statistics, neural networks insights, model comparisons, validation techniques, and practical examples using trade and country data sets.
E N D
Machine learning - Applications to economic analysis Andrew Banks Economic Statistics Transformation Programme Office for National Statistics
Machine learning – the background Rapid developments in computing power and development of algorithms. Vast increase in open data Open source machine learning libraries
Machine learning – the challenges Most algorithms focus on three key areas: Regression Classification Clustering Best for prediction only, not understanding causal relationships Acquiring adequate data Identifying meaningful, novel aspects in the data
Machine learning – applications to statistics Predictive models with great out of sample accuracy Missing data imputation Change and deviation detection uncover data records that suspiciously diverge from the pattern of their peers. Numerous classification and regression application (images / text etc)
Practical example – Trade data • Country data • Size, area and income (GDP) • Distance from the UK • World development indicators (e.g. airports per sqkm) • Commodity data • World import demand • Average tariff • Commodity description Country characteristics Distance from the UK (Gravity model) Size of country (area, GDP, GDP per capita) Other characteristics (language, continent) Commodity information Conventional rate of duty World trade in those commodities Key words/ features Gravity model of UK trade in Goods, natural log scale
Overview of neural networks Various input feeds Set of weights An ‘activation function’ that determines whether a neuron ‘fires’ A single layer ‘perceptron’ is the most straightforward neural network (shown right)
‘Deep learning’ > 1 layer = Deep neural network Feed forward or backward Mainly used to classify images or in large recommendation models. Uses backpropagation and gradient descent to arrive at a local optimum solution
Advantages OLS is a single weighted sum of features, cannot deal with non-linearities Could use polynomial regression to control for this, but you would have to make assumptions about the model structure. Neural Networks can model non-linearities automatically Can handle continuous and categorical variables together easily. Far better out of sample prediction accuracy!
Disadvantages A network of units connected by weighted links is difficult for humans to interpret. More than one local optimum solution Requires the tuning of various parameters, such as: No. hidden neurons Layers Iterations to solve More sensitive to how features are scaled
Practical example – Trade data • Country data • Size, area and income (GDP) • Distance from the UK • World development indicators (e.g. airports per sqkm) • Commodity data • World import demand • Average tariff • Commodity description Country characteristics Distance from the UK (Gravity model) Size of country (area, GDP, GDP per capita) Other characteristics (language, continent) Commodity information Conventional rate of duty World trade in those commodities Key words/ features Gravity model of UK trade in Goods, natural log scale
Validation – Using a test / train split (90%, 10%) Training dataset Test dataset Log_distance Log_distance
OLS, Training data, R^2 = 0.307 Training dataset Filled values on training dataset Log_distance Log_distance Variables used: Distance from the UK, GDP per capita, and size of country
OLS, Test dataset (R^2 = 0.172) OLS, Test data, R^2 = 0.172 Filled values on test dataset Test dataset Log_distance Log_distance Variables used: Distance from the UK, GDP per capita, and size of country
Tensorflow Machine learning libraries - Python Scikit Learn Maintained by Google Machine learning Python library Open source – wider breadth of tools • Network used: • 100 hidden layers • Logistic activation function
DNN, Training dataset, R^2 = 0.993 Filled values on training dataset Training dataset Log_distance Log_distance Variables - Distance from UK, GDP per capita, and country size • Network used: • 100 hidden layers • Logistic activation function
DNN Test (R^2 = 0.994) DNN, Test dataset, R^2 = 0.994 Test dataset Filled values on test dataset Log_distance Log_distance Variables - Distance from UK, GDP per capita, and country size • Network used: • 100 hidden layers • Logistic activation function
Full country by commodity dataset (280,000 samples) Practical example – Larger scale dataset Scatter plot of UK exports, country by commodity (x-axis = world import demand for commodity, y-axis = export value, colour = country) Log_value Log_world_im
Full country by commodity dataset (280,000 samples) Practical example – Larger scale dataset • Example commodities: • Parrots, parakeets, macaws and cockatoos • Throat pastilles and cough drops • Wooden furniture for bedrooms (excl. seats) • Brass wind instruments • [X] • All countries • = 290,000 combinations • New variables • World import demand of the commodity • Tariff on exporting a specific commodity • to a particular country
OLS, Test dataset (R^2 = 0.172) OLS, R^2 = 0.170 (50% train/test split) • Variables used: • Distance from the UK • GDP per capita • Size of country • Import demand for the commodity • Tariff rate on commodity Filled values on test dataset Log_predicted value
OLS, Test dataset (R^2 = 0.172) DNN, R^2 = 0.991 (50% train/test split) • Variables used: • Distance from the UK • GDP per capita • Size of country • Import demand for the commodity • Tariff rate on commodity Filled values on test dataset Log_predicted value
Machine learning and causality Coefficients on specific neurons are difficult to interpret. In addition, the behavior of certain neurons may not adhere to any behavior one might expect. However new tools are being developed to enforce certain constraints to apply with neural networks (i.e. monotonic conditions) Estimated values of trade with a fake set of countries (x-axis = Predicted values of exports (£), y-axis = distance from the UK (natural log))
Machine learning and causality While DNNs have the benefit of excellent predictive power, this can be a consequence of capturing spurious relationships. Therefore causal relationships cannot be inferred from the results. Machine learning community are also looking at how results can be used to infer causal relationships.