1.03k likes | 1.34k Views
Datamining Methods for Demand Forecasting at National Grid Transco. David Esp A presentation to the Royal Statistical Society local meeting of 24 February 2005 at the University of Reading, UK. Contents. Introduction National Grid Transco The Company Gas Demand & Forecasting Datamining
E N D
Datamining Methods forDemand Forecastingat National Grid Transco David Esp A presentation to the Royal Statistical Societylocal meeting of 24 February 2005at the University of Reading, UK.
Contents • Introduction • National Grid Transco • The Company • Gas Demand & Forecasting • Datamining • Especially Adaptive Logic Networks • Datamining for Gas Demand Forecasting • Framing the Problem • Data Cleaning • Model Inputs • Model Production • Scope for Improvement • Conclusions
National Grid Transco (NGT) • Part of the NGT Group (www.ngtgroup.com) • NGT Group has interests around the globe, particularly the US • NGT-UK consists of: • National Grid (NG): Electricity transmission (not generation or distribution) • Transco (T): Gas transmission
Introduction to:Gas Demand and its Forecastingat National Grid Transco
Breakdown of Demand • National Transmission System (NTS) • Many Large industrials • Large industrials • Gas-fired power stations • 13 Local Distribution Zones (LDZs) • Mostly domestic • The presentation will focus on models for this level onlY.
Forecasting Horizons • Within day - at five different times • Day Ahead • Up to one week ahead
What Factors Drive Gas Demand? • Weather • Thermostats • Heat leakage from buildings • Heat distribution in buildings (hot air rises) • Gas-powered plant efficiencies • Consumer Behaviour • Season (e.g. stay indoors when dark) • Holidays • Weather-Influenced Consumer Behaviour • Perception of weather (actual or forecast) • Adjustment of thermostats
Weather • Temperature ( 1ºC = 5 to 6%) • Wind ( above 10 Knots 1K = 0.5%) • Cooling Power - wind-chill (a function of wind and temperature) • ( + Straight, delayed and moving average derivations of all the above ).
20 1ºC = 3% 18 16 14 1ºC = 6% 12 Demands 10 1ºC = 2% 8 6 4 0 5 10 15 20 25 Temperatures Demand Temperature Effects
Percentage Change In Demand Millions Cubic Meters (mcm) Seasonal Temperature Sensitivityof Gas Demand
Consumer Behaviour • Seasonal Transitions (Autumn and Spring) • Bank Holidays (Typically -5 to -20% variation) • Adjust thermostats & timers in (delayed) response to weather. • e.g. protracted or extreme cold spells • Weather Forecast Effects • Special Events
Datamining • A generally accepted definition: • “The non-trivial extraction of implicit, previously unknown and potentially useful information from data”[Frawley, Piatetsky-Shapiro & Metheus] • In practice: • The use of novel computational tools (algorithms & machine power). • “Information” may include models, such as neural networks. • A higher-level concept, of which Datamining forms a (key) part: • Knowledge Discovery from Databases (KDD) • Relationship: Knowledge > Information > Data
Datamining Techniques • What are they? • Relatively novel computer-based data analysis & modelling algorithms. • Examples: neural nets, genetic algorithms, rule induction, clustering. • In existence since 1960’s, popular since 1995. • Why advantages have they over traditional methods? • More automatic • Less reliance on forecasting expertise. • Fewer man-hours (more computer-hours) • Potentially more accurate • New kinds of model, more accurate than existing ones • Greater accuracy overall, when used in combination with existing models • Knowledge discovery might lead to improvements in existing models.
Core Methods & Tools • Data Cleaning • Self-Organizing Map • Used to highlight atypical demand profiles and cluster typical ones • Adaptable (Nonlinear Nonparametric) Modelling • Adaptive Logic Network (ALN) • Automatically produces models from data. • “Better than a Neural Network” • Input Selection • Genetic Algorithm (GA) • Selects best combination of input variables for model • Also optimizes an ALN training parameter - learning rate
Experience • 1995-1999: Financial, electrical & chemical problems. • 1999: Diagnosis of Oil-Filled Equipment (e.g. supergrid transformers) by Kohonen SOM. • 2000: Electricity Demand Forecasting • Encouraging results • Business need disappeared • 2001-2: EUNITE Datamining competitions • 2003: Gas Demand Forecasting Experiments • 2004: Gas Demand Forecasting models in service • 2005: More gas models, also focusing on wind power.
Introduction to Datamining:Nonlinear Nonparametric Models The core datamining method applied to gas demand forecasting.
Some Types of Problem • Linear - e.g. y=mx+c • Non-Linear and Smooth • Monotonic - e.g. y=x3 • Non-Monotonic - e.g. y=x2 • Discontinuous- e.g. y=max(0,x) • We might not know thetype of function in advance.
Parametric Modelling Linear (1st Order Polynomial) Fit 3rd Order Polynomial Fit
Non-Parametric Modelling One Linear Segment Two Linear Segments • Linear Segmentation is not the only non-parameterised technique. • The key feature is growth - hence no constraint on degrees of freedom.
Non-Parametric Modelling Three Linear Segments Four Linear Segments • No need for prior knowledge of the nature of the underlying function. • The underlying function does not have to be smooth, monotonic etc.
Parametric Modelling Method • A formula is known or at least assumed • Typically a polynomial (e.g. linear). • May be any kind of formula. • Can be discontinuous. • Model complexity is constrained • Tends to make the training process robust and data-thrifty. • A model of complexity exactly as required by the problem should be slightly more accurate than a non parametric model, which can only approximate this degree of complexity. • Specialist regression tools can be applied for different classes of function • linear (or linearizable), smooth, discontinuous...
Parametric Modelling Method:e.g. Multiple Linear Regression • Advantages: • Extremely fast both to ‘train’ and use • If well-tailored to the problem, should give optimal results. • Disadvantages: • Requires uncorrelated inputs • Assumptions about data distributions
Non Parametric Modelling: Benefits • Advance knowledge of the problem is not required • Domain-specific knowledge, though helpful, is not vital. • No assumptions about population density or independence of inputs. • Model complexity is unconstrained • Advantage: Model may capture unimagined subtleties. • Disadvantages • Training demands greater time, data volume and quality. • Model may grow to become over-complex, e.g. fitting every data point. • Additional possibilities: • Feasibility Study • Determine if any model is possible at all. • Knowledge Discovery • Analyze the model to determine an equivalent parametric model.
Non-Parametric Modelling: Issues • Might not be completely flexible; learning algorithm may have limitations. • We may need to partition the problem manually. • The model might not generalize to the extent theoretically possible. • Much greater need for training data. • Can over-fit (resulting in errors): Extra measures needed to prevent this. • Longer training time (may not be an issue).
Introduction to Datamining:Nonlinear Nonparametric Models: Under, Optimal and Over Fitting This section applies to many nonlinear nonparametric modelling methods, not just neural networks.
Example: Underlying (2-D) FunctionA privileged view - we would not normally know what the function looked like... z= 1000 sin(0.125 x) cos(4 /(0.2 y +1))
Undertrained ModelALN model with 24 segments i.e. planes. Too angular (from privileged knowledge)
Optimally Trained ModelALN model with 300 planes. Looks very similar to our defined function.
Overtrained ModelAn ALN with 1500 planes “joins the dots” of the data instead of generalising.
Determining Optimality of Fit • The function is not known in advance • Might be smooth, might be wrinkly - we don’t know. • What are our requirements on the model? • What degree of accuracy is needed? • Any constraints on shape or rates-of-change? • How do we assess the model’s quality? • Test against a held-back set of data • Analyze the model’s characteristics • Assumes we know what to require or expect. • e.g. Sensitivity to inputs (at various parts of the data space) • e.g. Cross-sections (of each variable, for different set-points of the other variables)
Future data (unavailable) Traditional Cross-ValidationValidate on data that is randomly or systematicallyselected from the same period as the training data. Train on the training data (grey) until error is least on the cross-validation data (blue). Actual use will be in the future (green), on data which is not yet available.
Back-ValidationValidate on data that, relative to the training data, is as old as the future is new. Train on the training data (grey) until error is least on the back-validation data (blue).Reason: like the future data, the back-val. data is an edge. Back-val. data Training (regression) data Future data (unavailable) This method has been proven by experiment to be superior totraditional cross validation for both gas and electricity problems.
Optimal and Over Training This is deliberate over-training. The optimum point is where the (purple) Back-Validation (Backval) error curve is at a minimum, namely Epoch 30. This agrees well with that of the Holdback (pseudo future) data.
Introduction to Datamining:Nonlinear Nonparametric Models: Example Algorithms
Machine Learning / Natural Computing /Basis Function Techniques • Derive models more from data (examples) than from knowledge. • Roots in nature and philosophye.g. artificial intelligence & robotics.but converging with traditional maths & stats. • Many types of algorithm. • Evolutionary / Genetic Algorithms • Neural Network (e.g. MLP-BP or RBF) - popular • Support Vector Machine - fashionable • Adaptive Logic Network - experience • Regression Tree • Rule Induction • Instance (Case) and Cluster Based
Introduction to Datamining:Nonlinear Nonparametric Models: Example Algorithms:Neural Networks (ANNs)Focussing on the Multi Layer Perceptron (MLP)
Neural Networks - Brief Overview (1) But how many neurons or layers? Repeatedly experiment (grow, prune…)
Neural Networks - Brief Overview (2) • Inspired by nature (and used to test it). • Output is sum of many (basis-) functions, typically S-shaped. • Each function is offset and scaled by a different amount. • Very broadly analogous to Fourier etc. • Given data, produce its underlying model.
Introduction to Datamining:Nonlinear Nonparametric Models: Example Algorithms:Adaptive Logic Networks (ALNs)
Main Advantages over ANNs • Theoretical • No need to define anything like a number of neurons or layers • ALNs automatically grow to the required extent. • No need for ‘outer loop’ of experimentation (e.g. pruning) • Basis functions are more independent, hence: • easier and faster learning • greater accuracy • faster execution. • Less “black-box” - can be understood. • Function inversion - can run backwards.
Main Advantages over ANNs • Observed • Better accuracy: sharper detail. • Better training: faster, more reliable and more controllable.
What is an ALN? • A proprietary technique developed by William Armstrong, formerly of University of Alberta, founder of Dendronic Decision Limited in Canada. • WWW.DENDRONIC.COM • A combined set of Linear Forms (LFs) • An LF: y=offset+a1x1+a2x2+... • An ALN initially has one LF - making it the same as normal linear regression • After optimizing its own fit, each LF can divide into independent LFs. • ALNs are generated in a descriptive form that can be translated into various programming languages (e.g. VBA, C or Matlab).
Max Min Inputs: ... Minimum (Min) & Maximum (Max) Operators in ALNs y = Min(a,b,c) - lines cut down y = Max(a,b,c,d) - lines cut up Output: Linear Forms: (regressions) ...
Max LeftHump Min RightHump Min Min & Max Combined Output: LeftHump = Min(a,b,c)RightHump = Min(d,e,f,g) y = Max(LeftHump,RightHump) Linear Forms: ... Inputs: