400 likes | 585 Views
Linear Algebra Review (with a Small Dose of Optimization). Hristo Paskov CS246. Outline. Basic definitions Subspaces and Dimensionality Matrix functions: inverses and eigenvalue decompositions Convex optimization. Vectors and Matrices. Vector May also write. Vectors and Matrices.
E N D
Linear Algebra Review(with a Small Dose of Optimization) HristoPaskov CS246
Outline • Basic definitions • Subspaces and Dimensionality • Matrix functions: inverses and eigenvalue decompositions • Convex optimization
Vectors and Matrices • Vector • May also write
Vectors and Matrices • Matrix • Written in terms of rows or columns
Multiplication • Vector-vector: • Matrix-vector:
Multiplication • Matrix-matrix: =
Multiplication • Matrix-matrix: • rows of , cols of
Multiplication Properties • Associative • Distributive • NOT commutative • Dimensions may not even be conformable
Useful Matrices • Identity matrix • Diagonal matrix
Useful Matrices • Symmetric • Orthogonal • Columns/ rows are orthonormal • Positive semidefinite: • Equivalently, there exists
Outline • Basic definitions • Subspaces and Dimensionality • Matrix functions: inverses and eigenvalue decompositions • Convex optimization
Norms • Quantify “size” of a vector • Given , a norm satisfies • Common norms: • Euclidean -norm: • -norm: • -norm:
Linear Subspaces • Subspace satisfies • If and , then • Vectors span if
Linear Independence and Dimension • Vectors are linearlyindependent if • Every linear combination of the is unique • if span and are linearly independent • If span then • If then are NOT linearly independent
Matrix Subspaces • Matrix defines two subspaces • Column space • Row space • Nullspace of : • Analog for column space
Matrix Rank • gives dimensionality of row and column spaces • If has rank , can decompose into product of and matrices =
Properties of Rank • For • has fullrank if • If rows not linearly independent • Same for columns if
Outline • Basic definitions • Subspaces and Dimensionality • Matrix functions: inverses and eigenvalue decompositions • Convex optimization
Matrix Inverse • is invertible iff • Inverse is unique and satisfies • If is invertible then is invertible and
Systems of Equations • Given wish to solve • Exists only if • Possibly infinite number of solutions • If is invertible then • Notational device, do notactually invert matrices • Computationally, use solving routines like Gaussian elimination
Systems of Equations • What if ? • Find that gives closest to • is projection of onto • Also known as regression • Assume Invertible Projection matrix
Eigenvalue Decomposition • Eigenvalue decomposition of symmetric is • contains eigenvalues of • is orthogonal and contains eigenvectors of • If is not symmetric but diagonalizable • is diagonal by possibly complex • not necessarily orthogonal
Characterizations of Eigenvalues • Traditional formulation • Leads to characteristic polynomial • Rayleigh quotient (symmetric )
Eigenvalue Properties • For with eigenvalues • When is symmetric • Eigenvalue decomposition is singular value decomposition • Eigenvectors for nonzero eigenvalues give orthogonal basis for
Simple Eigenvalue Proof • Why ? • Assume is symmetric and full rank • If , eigenvalue of is 0 • Since is product of eigenvalues, one of the terms is , so product is 0
Outline • Basic definitions • Subspaces and Dimensionality • Matrix functions: inverses and eigenvalue decompositions • Convex optimization
Convex Optimization • Find minimum of a function subject to solution constraints • Business/economics/ game theory • Resource allocation • Optimal planning and strategies • Statistics and Machine Learning • All forms of regression and classification • Unsupervised learning • Control theory • Keeping planes in the air!
Convex Sets • A set is convex if and • Line segment between points in also lies in • Ex • Intersection of halfspaces • balls • Intersection of convex sets
Convex Functions • A real-valued function is convex if is convex and and • Graph of upper bounded by line segment between points on graph
Gradients • Differentiable convex with • Gradient at gives linear approximation
Gradients • Differentiable convex with • Gradient at gives linear approximation
Gradient Descent • To minimize move down gradient • But not too far! • Optimum when • Given , learning rate , starting point Do until
Stochastic Gradient Descent • Many learning problems have extra structure • Computing gradient requires iterating over all points, can be too costly • Instead, compute gradient at single training example
Stochastic Gradient Descent • Given , learning rate , starting point Do until nearly optimal For in random order • Finds nearly optimal