140 likes | 323 Views
Linear regression. By gradient descent (with thanks to Prof. Ng’s machine learning course). Extending the single variable multivariate linear regression. h Θ (x) = Θ 0 + Θ 1 x h Θ (x) = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + … Θ n x n
E N D
Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)
Extending the single variablemultivariate linear regression hΘ(x) = Θ0 + Θ1x hΘ(x) = Θ0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn e.g. start with house prices versus sqft and then move to house prices versus sqft, number of bedrooms, age of house hΘ(x) = Θ0x0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn With x0 = 1 hΘ(x) = ΘTx
Cost function J(Θ) = (1/2m)Σ i=1,m (hΘ(x(i)) – y(i))2 Gradient descent: Repeat { Θj = Θj - α ∂J(Θ)/∂Θj } for all j simultaneously Θj = Θj - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) Θ0 = Θ0 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x0(i)1 Θ1 = Θ1 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x1(i) Θ2 = Θ2 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x2(i)
What the Equations Mean The matrices: y and x
Feature Scaling Would like all features to fall roughly into range -1 ≤ x ≤ +1 xi replace with (xi - µi )/si where µi is the mean and si is the range; alternatively, use mean and standard deviation Don’t scale x0
Learning Rate and Debugging With small enough α, J should decrease on each iteration: this is first test. An α too large could have you going past the minimum and climbing other side of curve. With α too small, convergence is too slow. Try series of α values, say .oo1, .003,. 01, .03, .1, .3, 1, …
Feature Normalization • function [X_norm, mu, sigma] = featureNormalize(X) • X_norm= X; • mu = zeros(1, size(X, 2)); • sigma = zeros(1, size(X, 2)); • mu = mean(X); • sigma = std(X); • m = size(X,1); • A = repmat(mu,m,1); • X_norm= X_norm - A; • A = repmat(sigma,m,1); • X_norm=X_norm./A; • end
Gradient Descent • function [theta, J_history] • = gradientDescentMulti(X, y, theta, alpha, num_iters) • m = length(y); • % number of training examples • J_history= zeros(num_iters, 1); • for iter = 1:num_iters • A = (X*theta - y); • deltatheta = (alpha/m)*(A'*X); • theta = theta - deltatheta'; • J_history(iter) = computeCostMulti(X, y, theta); • end • end
Cost Function function J = computeCostMulti(X, y, theta) m = length(y); % number of training examples A = (X*theta - y); J = (1/(2*m))*(A'*A); end
Polynomials hΘ(x) = Θ0 + Θ1x + Θ2x2 + Θ3x3 Replace x with x1, x2 with x2, x3 with x3 Scale the x, x2, x3values
Normal Equations • Θ = (A’ A)-1 A’y • A(:,n+1) = ones(length(x),1,class(x)); • for a polynomial: • for j = n:-1:1 • A(:,j) = x.*A(:,j+1); • end • W = A'*A • Y = A'*y • Θ = W\Y