Regression

Regression Usman Roshan CS 698 Machine Learning

Regression • Same problem as classification except that the target variable yi is continuous. • Popular solutions • Linear regression (perceptron) • Support vector regression • Logistic regression (for regression)

Linear regression • Suppose target values are generated by a function yi = f(xi) + ei • We will estimate f(xi) by g(xi,θ). • Suppose each ei is being generated by a Gaussian distribution with 0 mean and σ2 variance (same variance for all ei). • Now we can ask what is the probability of yi given the input xi and variables θ (denoted as p(xi|yi,θ) • This is normally distributed with mean g(xi,θ) and variance σ2.

Linear regression • Apply maximum likelihood to estimate g(x, θ) • Assume each (xi,yi) i.i.d. • Then probability of data given model (likelihood) is P(X|θ) = p(x1,y1)p(x2,y2)…p(xn,yn) • Each p(xi,yi)=p(yi|xi)p(xi) • Maximizing the log likelihood gives us least squares (linear regression)

Logistic regression • Similar to linear regression derivation • Minimize sum of squares between predicted and actual value • However • predicted is given by sigmoid function and • yi is constrained in the range [0,1]

Support vector regression • Makes no assumptions about probability distribution of the data and output (like support vector machine). • Change the loss function in the support vector machine problem to the e-sensitive loss to obtain support vector regression

Support vector regression • Solved by applying Lagrange multipliers like in SVM • Solution w is given by a linear combination of support vectors (like in SVM) • The solution w can also be used for ranking features. • From regularized risk minimization the loss would be

Application • Prediction of continuous phenotypes in mice from genotype (Predicting unobserved phen…) • Data are vectors xi where each feature takes on values 0, 1, and 2 to denote number of alleles of a particular single nucleotide polymorphism (SNP) • Output yi is a phenotype value. For example coat color (represented by integers), chemical levels in blood

Mouse phenotype prediction from genotype • Rank SNPs by Wald test • First perform linear regression y = wx + w0 • Calculate p-value on w using t-test • t-test: (w-wnull)/stderr(w)) • wnull = 0 • T-test: w/stderr(w) • stderr(w) given by Σi(yi-wxi-w0)2 /(xi-mean(xi)) • Rank SNPs by p-values • OR by Σi(yi-wxi-w0) • Rank SNPs by support vector regression (w vector in SVR) • Perform linear regression on top k ranked SNP under cross-validation.

Prediction of MCH in mouse

Prediction of CD8 in mouse

Regression

Regression

Presentation Transcript

Regression Analysis Simple Regression

Regression

Regression

Regression

Regression

Regression

Regression

REGRESSION

Regression

Regression

REGRESSION

Regression

Regression Linear Regression Regression Trees

Regression Linear Regression

Regression

REGRESSION

Regression

Regression

Regression Analysis Simple Regression

REGRESSION

Regression

Regression