340 likes | 519 Views
Lecture 3. Review of Linear Algebra Simple least-squares. 9 things you need to remember from Linear Algebra. Number 1 rule for vector and matrix multiplication u = Mv u i = S k=1 N M ik v k P = QR P ij = S k=1 N Q ik R kj.
E N D
Lecture 3 Review of Linear Algebra Simple least-squares
Number 1 rule for vector and matrix multiplicationu = Mv ui = Sk=1N Mik vkP = QR Pij = Sk=1N Qik Rkj Name of index in sum irrelevant. You can call it anything (as long as you’re consistent) Sum over nearest neighbor indices
Number 2transpostionrows become columns and columns become rows(AT)ij = Ajiand rule for transposition of products(AB)T = BTAT Note reversal of order
Number 3 rule for dot productab = aTb = Si=1Naibinoteaa is sum of squared elements of a“the length of a”
Number 4 the inverse of a matrixA-1 A = IA A-1 = I(exists only when A is square) I is the identity matrix 1 0 0 0 1 0 0 0 1
Number 6multiplication by identity matrixM = IM = MIin component notationIij = dijSk=1Ndik Mkj = MijSk=1Ndik Mkj = Mij Just a name … Cross out sum Cross out dik And change k to i in rest of equation
Number 7 inverse of a 22 matrix d -b -c a a b c d 1 A = A-1 = ad-bc
Number 8 inverse of a diagonal matrix a 0 0 … 0 0 b 0 … 0 0 0 c … 0 ... 0 0 0 …z 1/a 0 0 … 0 0 1/b 0 … 0 0 0 1/c … 0 ... 0 0 0 …1/z A-1 = A =
Number 9 rule for taking a derivativeuse component-notationtreat every element as a independent variableremember that since elements are independentdxi / dxj = dij = identity matrix
Example: Suppose y = Ax How does yi vary as we change xj? (That’s the meaning of the derivative dyi/dxj) first write i-th component of y, yi = Sk=1N Aik xk (d/dxj) yi = (d/dxj) Sk=1N Aik xk = Sk=1N Aik dxk/dxj = Sk=1N Aikdkj = Aij We’re using I and j, so use a different letter, say k, in the summation! So the derivative dyi/dxj is just Aij. This is analogous to the case for scalars, where the derivative dy/dx of the scalar expression y=ax is just dy/dx=a.
best fitting linethe combination ofapre and bprethat have the smallestsum-of-squared-errorsfind it by exhaustive search‘grid search’
Fitting line to noisy datayobs = a + bx Observations: the vector, yobs
Guess values for a, bypre = aguess + bguessx Prediction error = observed minus predicted e = yobs - ypre Total error: sum of squared predictions errors E = Σ ei2 = eTe aguess=2.0 bguess=2.4
Systematically examine combinations of (a, b) on a 101101 grid Minimum total error E is here Note E is not zero apre Error Surface bpre
Note Emin is not zero Here are best-fitting a, b best-fitting line Error Surface
Note some range of values where the error is about the same as the minimun value, Emin Emin is here Error pretty close to Emin everywhere in here All a’s in this range and b’s in this range have pretty much the same error Error Surface
moralthe shape of the error surfacecontrols the accuracy by which (a,b) can be estimated
What controls the shape of theerror surface?Let’s examine effect of increasing the error in the data
The minimum error increases, but the shame of the error surface is pretty much the same Error in data = 0.5 Emin = 0.20 Error in data = 5.0 Emin = 23.5
What controls the shape of theerror surface?Let’s examine effect of shifting the x-position of the data
Big change by simply shifting x-values of the data Region of low error is now tilted High b low a has low error Low b high a has low error But (high b, high a) and (low a, low b) have high error 0 10 5
Meaning of tilted region of low errorerror in (apre, bpre) arecorrelated
Uncorrelated estimates of intercept and slope Best-fit line Best-fit line Best fit intercept erroneous intercept When the data straddle the origin, if you tweak the intercept up, you can’t compensate by changing the slope
Negatively correlation of intercept and slope Same slope s Best-fit line Best-fit line Low slope line erroneous intercept Best fit intercept When the data are all to the right of the origin, if you tweak the intercept up, you must lower the slope to compensate
Positive correlation of intercept and slope erroneous intercept Best fit intercept Best fit intercept Same slope as best-fit line Best-fit line When the data are all to the right of the origin, if you tweak the intercept up, you must raise the slope to compensate
data near originpossibly good control on interceptbut lousy control on slope small big -5 0 5
data far from originlousy control on interceptbut possibly good control on slope big 0 50 100 small
d = Gm Set up for standard Least Squares yi = a + b xi y1 1 x1 a y2 = 1 x2 b … … … yN 1 xN
Standard Least-squares Solution mest = [GTG]-1GTd
Derivation: use fact that minimum is at dE/dmi = 0E = Sk ek ek = Sk (dk- SpGkpmp) (dk- SqGkqmq) =Sk dkdk - 2 Sk dk SpGkpmp + SkSpGkpmpSqGkqmqdE/dmi = 0 - 2 Sk dk SpGkp(dmp/dmi) +SkSpGkp(dmp/dmi)SqGkqmq + SkSpGkpmpSqGkq(dmq/dmi) = -2 Sk dk SpGkpdpi +SkSpGkpdpiSqGkqmq + SkSpGkpmpSqGkqdqi= -2 Sk dk Gki + SkGkiSqGkqmq + SkSpGkpmpGki= -2 Sk Gki dk + 2 Sq [SkGkiGkq]mq = 0or 2GTd + 2[GTG]m = 0 or m=[GTG]-1GTdy
Why least-squares?Why not least-absolute length?Or something else?
Least-Squares Least Absolute Value a=0.94 b = 2.02 a=1.00 b=2.02