480 likes | 655 Views
Outline. Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM. Time series prediction. Suppose we have an univariate time series x ( t ) for t = 1, 2, …, N . Then we want to know or predict the value of x ( N + p ).
E N D
Outline • Time series prediction • Find k-nearest neighbors • Lag selection • Weighted LS-SVM
Time series prediction • Suppose we have an univariate time series x(t) for t = 1, 2, …, N. Then we want to know or predict the value of x(N + p). • If p = 1, it would be called one-step prediction. • If p > 1, it would be called multi-step prediction.
Find k-nearest neighbors • Assume the current time index is 20. • First we reconstruct the query • Then the distance between the query and historical data is
Find k-nearest neighbors • If k = 3, and the first k closest neighbors are t14, t15, t16. Then we can construct the smaller data set.
Lag selection • Lag selection is the process of selecting a subset of relevant features for use in model construction. • Why we need lag? • Lag selection is like feature selection, not feature extraction.
Lag selection • Usually, the lag selection can be divided into two broad classes: filter method and wrapper method. • The lag subset is chosen by an evaluationcriterion, which measures the relationship of each subset of lags with the target or output.
Wrapper method • The best lag subset is selected according to the model. • The lag selection is a part of the learning.
Filter method • In this method, we need the criterion which can measures the correlation or dependence. • For example, correlation, mutual information, … .
Lag selection • Which is better? • The wrapper method solve the real problem, but need more time. • The filter method provide the lag subset which perform the worse result. • We use the filter method because of the architecture.
Entropy • The entropy is a measure of uncertainty of a random variable. • The entropy of a discrete random variable is defined by • 0log0 = 0
Entropy • Example, let • Then
Entropy • Example, let • Then
Joint entropy • Definition: The joint entropy of a pair of discrete random variables (X, Y) is defined as
Conditional entropy • Definition: The conditional entropy is defined as • And
Mutual information • The mutual information is a measure of the amount of information one random variable contains about another. • It’s the extended notion of the entropy. • Definition: The mutual information of the two discrete random variables is
Mutual information • Definition: The mutual information of the two continuous random variables is • The problem is that the joint probability density function of X and Y is hard to compute.
Binned Mutual information • The most straightforward and widespread approach for estimating MI consists in partitioning the supports of X and Y into bins of finite size
Binned Mutual information • For example, consider a set of 5 bivariate measurements, zi=(xi, yi), where i = 1, 2, …, 5. And the values of these points are
Estimating Mutual information • Another approach for estimating mutual information. Consider the case with two variables. The 2-dimension space Z is spanned by X and Y. Then we can compute the distance between each point.
Estimating Mutual information • Let us denote by the distance from to its k-nearest neighbor, and by and the distances between the same points projected into the X and Y subspaces. • Then we can count the number nx(i) of points xj whose distance from xi is strictly less than , and similarly for y instead of x.
Estimating Mutual information • The estimate for MI is then • Alternatively, in the second algorithm, we replace nx(i) and ny(i) by the number of points with
Estimating Mutual information • For the same example, k = 2 • For the point p1(0, 1) • For the point p2(0.5,5)
Estimating Mutual information • For the point p3(1,3) • For the point p4(3,4)
Estimating Mutual information • For the point p5(4,1) • Then
Estimating Mutual information • Example • a=rand(1,100) • b=rand(1,100) • c=a*2 • Then
Estimating Mutual information • Example • a=rand(1,100) • b=rand(1,100) • d=2*a + 3*b • Then
Model • Now we have a training data set which contains k records, then we need a model to predict.
Instance-based learning • The points that are close to the query have large weights, and the points far from the query have small weights. • Locally weighted regression • General Regression Neural Network(GRNN)
Weighted LS-SVM • The goal of the standard LS-SVM is to minimize the risk function: • Where the γ is the regularization parameter.
Weighted LS-SVM • The modified risk function of the weighted LS-SVM is • And
Weighted LS-SVM • The weighted is designed as