Lecture 3

Lecture 3 Review of Linear Algebra Simple least-squares

9 things you need to remember from Linear Algebra

Number 1 rule for vector and matrix multiplicationu = Mv ui = Sk=1N Mik vkP = QR Pij = Sk=1N Qik Rkj Name of index in sum irrelevant. You can call it anything (as long as you’re consistent) Sum over nearest neighbor indices

Number 2transpostionrows become columns and columns become rows(AT)ij = Ajiand rule for transposition of products(AB)T = BTAT Note reversal of order

Number 3 rule for dot productab = aTb = Si=1Naibinoteaa is sum of squared elements of a“the length of a”

Number 4 the inverse of a matrixA-1 A = IA A-1 = I(exists only when A is square) I is the identity matrix 1 0 0 0 1 0 0 0 1

Number 5 solving y=Mx using the inversex = M-1y

Number 6multiplication by identity matrixM = IM = MIin component notationIij = dijSk=1Ndik Mkj = MijSk=1Ndik Mkj = Mij Just a name … Cross out sum Cross out dik And change k to i in rest of equation

Number 7 inverse of a 22 matrix d -b -c a a b c d 1 A = A-1 = ad-bc

Number 8 inverse of a diagonal matrix a 0 0 … 0 0 b 0 … 0 0 0 c … 0 ... 0 0 0 …z 1/a 0 0 … 0 0 1/b 0 … 0 0 0 1/c … 0 ... 0 0 0 …1/z A-1 = A =

Number 9 rule for taking a derivativeuse component-notationtreat every element as a independent variableremember that since elements are independentdxi / dxj = dij = identity matrix

Example: Suppose y = Ax How does yi vary as we change xj? (That’s the meaning of the derivative dyi/dxj) first write i-th component of y, yi = Sk=1N Aik xk (d/dxj) yi = (d/dxj) Sk=1N Aik xk = Sk=1N Aik dxk/dxj = Sk=1N Aikdkj = Aij We’re using I and j, so use a different letter, say k, in the summation! So the derivative dyi/dxj is just Aij. This is analogous to the case for scalars, where the derivative dy/dx of the scalar expression y=ax is just dy/dx=a.

best fitting linethe combination ofapre and bprethat have the smallestsum-of-squared-errorsfind it by exhaustive search‘grid search’

Fitting line to noisy datayobs = a + bx Observations: the vector, yobs

Guess values for a, bypre = aguess + bguessx Prediction error = observed minus predicted e = yobs - ypre Total error: sum of squared predictions errors E = Σ ei2 = eTe aguess=2.0 bguess=2.4

Systematically examine combinations of (a, b) on a 101101 grid Minimum total error E is here Note E is not zero apre Error Surface bpre

Note Emin is not zero Here are best-fitting a, b best-fitting line Error Surface

Note some range of values where the error is about the same as the minimun value, Emin Emin is here Error pretty close to Emin everywhere in here All a’s in this range and b’s in this range have pretty much the same error Error Surface

moralthe shape of the error surfacecontrols the accuracy by which (a,b) can be estimated

What controls the shape of theerror surface?Let’s examine effect of increasing the error in the data

The minimum error increases, but the shame of the error surface is pretty much the same Error in data = 0.5 Emin = 0.20 Error in data = 5.0 Emin = 23.5

What controls the shape of theerror surface?Let’s examine effect of shifting the x-position of the data

Big change by simply shifting x-values of the data Region of low error is now tilted High b low a has low error Low b high a has low error But (high b, high a) and (low a, low b) have high error 0 10 5

Meaning of tilted region of low errorerror in (apre, bpre) arecorrelated

Uncorrelated estimates of intercept and slope Best-fit line Best-fit line Best fit intercept erroneous intercept When the data straddle the origin, if you tweak the intercept up, you can’t compensate by changing the slope

Negatively correlation of intercept and slope Same slope s Best-fit line Best-fit line Low slope line erroneous intercept Best fit intercept When the data are all to the right of the origin, if you tweak the intercept up, you must lower the slope to compensate

Positive correlation of intercept and slope erroneous intercept Best fit intercept Best fit intercept Same slope as best-fit line Best-fit line When the data are all to the right of the origin, if you tweak the intercept up, you must raise the slope to compensate

data near originpossibly good control on interceptbut lousy control on slope small big -5 0 5

data far from originlousy control on interceptbut possibly good control on slope big 0 50 100 small

d = Gm Set up for standard Least Squares yi = a + b xi y1 1 x1 a y2 = 1 x2 b … … … yN 1 xN

Standard Least-squares Solution mest = [GTG]-1GTd

Derivation: use fact that minimum is at dE/dmi = 0E = Sk ek ek = Sk (dk- SpGkpmp) (dk- SqGkqmq) =Sk dkdk - 2 Sk dk SpGkpmp + SkSpGkpmpSqGkqmqdE/dmi = 0 - 2 Sk dk SpGkp(dmp/dmi) +SkSpGkp(dmp/dmi)SqGkqmq + SkSpGkpmpSqGkq(dmq/dmi) = -2 Sk dk SpGkpdpi +SkSpGkpdpiSqGkqmq + SkSpGkpmpSqGkqdqi= -2 Sk dk Gki + SkGkiSqGkqmq + SkSpGkpmpGki= -2 Sk Gki dk + 2 Sq [SkGkiGkq]mq = 0or 2GTd + 2[GTG]m = 0 or m=[GTG]-1GTdy

Why least-squares?Why not least-absolute length?Or something else?

Least-Squares Least Absolute Value a=0.94 b = 2.02 a=1.00 b=2.02

Lecture 3

Lecture 3

Presentation Transcript

Lecture 3

Lecture 3

LECTURE №3

Lecture #3

Lecture 3-3

Lecture 3

Lecture 3:

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture # 3

Lecture 3