380 likes | 613 Views
Systems Research Institute Polish Academy of Sciences. Koelpinsee 2010 : Modelierung und Simulation. Forecasting of the hydraulic load of communal wastewater networks. Jan Studzinski Marcin Stachura IBS PAN Warszawa studzins@ibspan.waw.pl. IBS PAN Warszawa. Contents: Introduction
E N D
Systems ResearchInstitutePolishAcademy of Sciences Koelpinsee 2010: Modelierung und Simulation Forecasting of the hydraulic load of communal wastewater networks Jan Studzinski Marcin Stachura IBS PAN Warszawa studzins@ibspan.waw.pl
IBS PAN Warszawa • Contents: • Introduction • Time series methods • Neuronal nets • Fuzzy set models • Results of modeling • End conclusions
IBS PAN Warszawa • Introduction • A water and sewage system consists usually of three basic objects: of water supply system, wastewater network and of sewage treatment plant. • They are connected each other in series and the work quality of one of them affects the functioning of the following one: • the water production for the waternet has an influence on the hydraulic load of the • wastewater net and it decides of the raw sewage inflow entering the sewage treatment • plant • the sewage inflow affects the quality of sewage purification and makes worse the • treatment plant control in case of fast and big inflow changes • Itis important to know in advance the inflow changes to have the opportunity to prepare the plant controllers on the oncoming events. • A method to predict the sewage inflow changes is to model them mathematically.
At the Systems Research Institute of Polish Academy of Sciences an integrated information system for complex management of communal waterworks is under development. • It consists of three subsystems for the water supply system, wastewater network and the sewage treatment plant. • Each of the subsystems has the modular structure and the component modules are GIS, SCADA, optimization algorithms and mathematical models improving the management of the basic waterworks objects. • These mathematical models are the hydraulic models of water- and wastewater nets, the physical model of sewage treatment plant and the models to forecast the hydraulic loads of the water- and wastewater nets. • To improve the control of the sewage treatment plant there is recommended to have the models with which the raw sewage inflow entering the treatment plant could be predicted.
IBS PAN Warszawa Time series methods The main methods for the time series modeling are based on the classical least squares method. Its advantage is the big simplicity and efficiency and also the clarity of its mathematical description. The calculation task of the time series methods consists in general in solving a system of linear algebraic equations, regarding the model parameters. In the modeling calculations three time series methods have been used: the least squares Kalman method, the generalized least squares Clarke method and the maximum likelihood method.
IBS PAN Warszawa The general descriptions of the process investigated and of the model are: respectively, with n = 1, 2, ..., N, - number of measurements data. The process and model equations can be formulated in the matrix form:
IBS PAN Warszawa The estimator c of the process parameters is calculated by minimizing the following residual sum: with - the estimator of correlated noise v. The Kalman estimator resulted while minimizing the sum is: and it is asymptotically biased. It would be asymptotically unbiased when i.e. for the uncorrelated noise.
IBS PAN Warszawa The Clarke method is the least squares method applied to the process equation with an additional description of the noise correlation in form of the relation: The noise equation in the matrix form is: The idea of this method is the transformation of the model equation in such the way that the correlated noise v in it will be changed into the uncorrelated one. The parameters estimator would be then asymptotically unbiased.
IBS PAN Warszawa The maximum likelihood method is the least squares method applied to the process equationwith an additional description of correlated noise v, which is different from this one in the Clarke method. To the process equation the following noise equation: is added.
IBS PAN Warszawa Neuronal nets The artificial neuronal nets try to imitate the operation of the biological neuronal nets of human beings. This imaging is very rough because of large quantitative limitations of artificial nets. The brain of human being consists of more than 10 billion neurons which are combined each other with more than several thousand connections. In an artificial net there are usually not more than several hundred neurons and not more than several dozen connections between two selected neurons. Another essential difference between a real and an artificial neuronal net is the division of this last one into layers on which the neurons are placed. Such the structure simplifiers the formulation of the mathematical net description.
IBS PAN Warszawa To each neuron on a layer the signals from the neurons located on the anterior layer are transmitted. The signals entered a neuron are multiplied by weight coefficients and accumulated. If the sum resulted is higher than the critical threshold value attached to the neuron than the neuron ignition succeeds. The totaled signal is converted by using a transition function related to the neuron. The signal value computed by the transition function means the output signalof the neuron. The mostly used transition function in neuronal nets is the nonlinear sigmoidal function.
IBS PAN Warszawa By calculating neuronal nets a learning process is realised during which the end structure of the net is formulated. From the net the inter-neuronal connections are eliminated for which the values of the weight coefficients are stated as zero or close to zero. During the learning process the data from the learning set are used to model the network. By the modeling the error generated by the network is minimized. This error is calculated using an error function and it is usually the squares residual sum. The minimization of the squares residual sum occurs usually with analgorithm of gradient optimization.
An essential problem while learning a neuronal net is gaining by it theability for generalization,i.e. for making right forecasting with the use of other data sets. • Very often the correct neuronal net resulted from the learning set gains wrong results with other calculation data. • This event is called the net overlearning. • A neuronal net with a big number of weight coefficients or hidden layers has usually a large tendency to an excessive adjustment to the data instead of ignoring their insignificant changes. • Complex neuronal nets reach almost at all times the smaller calculation errors than the simpler ones but it shows rather on the overlearning effect of them than on the good quality of the models.
To avoid the overlearning of neuronal nets a validation approach is used. • It consists in realizing a simulation run with the model resulted from the learning iteration with another data set. • If the quality of the model won by the learning process and the quality of the results won by the validation approach are similar then the model is correct. • When the error resulted while learning the neuronal net is growing smaller in successive iterations and the error resulted from the validation runs is growing higher then the neuronal net is going to be overlearned. • Then it has to be simplified by reducing the number of its hidden neurons or its hidden layers.
In order to improve the valuation of the neuronal net a third data set:testing set, is isolated from the initial measurements data. • Then the model calculated with the use of the learning set and verified using the validation set is tested additionally using thetesting set. • The testing calculation is done only once after the whole learning process is finished.
Fuzzy set models • Fuzzy models of Takagi-Sugeno-Kanga have been applied for the modeling. • The algorithm of modeling consists of three steps: • fuzzyfication of the inputs data • fuzzyficated conclusion • defuzzyfication of the ouput signal • On the step of fuzzyfication the adherence functions are used. Their values are from 0 up to 1 and they are usually in form of trapezoid. • On the step of defuzzyfication the classical linear time series models are used. • By the modeling the initial are are divided into two equal sets: the learning and the testing set. • Zadania realizowane obecnie przez programy systemu, c.d.: • Lokalizacja awarii (SCADA) • Lokalizacja awarii (SCADA + MOSUW, wariant I) • Obliczanie wysokości węzłów (GIS + KRIPOS) • Kalibracja modelu hydraulicznego (MOSUW + REH) • Wykreślanie map rozkładów ciśnień i przepływów (MOSUW + KRIPOS) • Dobór punktów monitoringu (MOSUW) • Zadania realizowane obecnie przez programy systemu, c.d.: • Lokalizacja awarii (SCADA) • Lokalizacja awarii (SCADA + MOSUW, wariant I) • Obliczanie wysokości węzłów (GIS + KRIPOS) • Kalibracja modelu hydraulicznego (MOSUW + REH) • Wykreślanie map rozkładów ciśnień i przepływów (MOSUW + KRIPOS) • Dobór punktów monitoringu (MOSUW)
IBS PAN Warszawa Results of modeling The data used for the calculations has been got from the waterworks in Rzeszow. They are daily measurements data series concerning: • the water production for the communal water net • raw sewage inflow reaching the sewage treatment plant • rainfalls data for the city Rzeszow and • the water level values in the river flowing through the city. The number of measurements in each of the data series is equal to 974.
To evaluate the models the following criteria have been used:
IBS PAN Warszawa • On the first stage of modeling the time series methods of Kalman (K), Clarke (C) and of the maximum likelihood (ML) have been used. • The models: • with three inputs, i.e. with the water production (WP), rainfalls data (R) and with the • water level values in the river (WL) • with two inputs, i.e. with WP and WL or with WP and R • with only one input, i.e. with WP or with WL or with R • have been developed. • As the single output of the models the raw sewage inflow has been taken at all times.
IBS PAN Warszawa Table 1. Time series models with three inputs.
IBS PAN Warszawa Table 2. Time series models with two inputs.
IBS PAN Warszawa Table 3. Kalman models with one input.
Modeling results for the best Kalman, Clarke and maximum likelihood methods.
First conclusions: The best results of modeling have been got with the Kalman model of sixth order and with all three inputs considered. The simplest Kalman model is the best but the other ones got with other methods are in general not much worse when they are compared each other quantitatively in view of the criteria values as well as qualitatively in view of their diagrams.
On the second stage of modeling the neuronal nets of type MLP have been used. In this case also the models with only one input (WP or WL or R), with two inputs (WP and WL or WP and R) and with three inputs (WP and WL and R) have been tested. Different time delays in the data series introduced onto the inputs have been defined. The numbers of the time delays are equivalent to the orders of difference operators by the time series methods. The output of the neuronal nets is one at all times and it means the raw sewage inflow to the sewage treatment plant. A neuronal net marked in the following as MLP/1/3/3-6-1 means the MLP net with the delay (shift) in the data equal to 1 day, with 3 inputs, with 3 neurons on the input layer, 6 neurons on the hidden layer and with 1 neuron on the input layer.
Second conclusions: The neuronal model with the full inputs set (MLP/4/3/12-5-1) turned out to be the best. In this model the measurements data in the input series are shifted of four days what corresponds with the difference operator order equal to 4 in the time series methods.
On the third stage of modeling the fuzzy sets have been used. • In this case the models with with two inputs (WP and WL or WP and R) and with three inputs (WP and WL and R) have been tested. • The models have the autoregressive structure like the time series and the neuronal models.
Figure. The best neuronal net model MLP/4/3/12-5-1. The fuzzy set model TSK-WP-R-WL for the learning and testing data.
The fuzzy set model TSK-WP-WL for the learning and testing data. .
The fuzzy set model TSK-WP-R for the learning and testing data. .
Third conclusions: • The neuronal model with the full inputs set (MLP/4/3/12-5-1) turned out to be the best. • In this model the measurements data in the input series are shifted of four days what corresponds with the difference operator order equal to 4 in the time series methods.
IBS PAN Warszawa End results Table 7. Comparison of the best time series, neuronal net and fuzzy set models.
The results of modeling the raw sewage inflow into a sewage treatment plant have been presented. The models are developed with the time series methods, the neuronal nets and fuzzy set methods. The results show that the simplest method of Kalman gets better models than the other more complicated time series methods. The Kalman method is also better than the more complex method of neuronal nets and of fuzzy sets. But the differences between the results of different methods are not essentially big. The sewage inflow models are meant for the forecast goals. They are to be included into an information system improving the control of the sewage treatment plant and the management of the communal waterworks.