830 likes | 838 Views
Learn about Principal Component Analysis (PCA), a technique used in machine learning for dimension reduction. Understand the concepts, optimization objectives, and steps involved in PCA.
可视计算 Machine Learning Basics Weiteng Xie
scenario Learning Map task method Semi-supervised Learning Transfer Learning Regression Unsupervised Learning Reinforcement Learning Linear Model Deep Learning SVM, decision tree, K-NN… Structured Learning Non-linear Model Classification Supervised Learning
Bias v.s. Variance Error from bias Error from variance Error observed Underfitting Overfitting Small Bias Large Bias Large Variance Small Variance
Dimension Reduction Looks like 3-D Actually, 2-D
本质上讲,PCA就是将高维的数据通过线性变换投影到低维空间上去,主要作用可以总结为两点:本质上讲,PCA就是将高维的数据通过线性变换投影到低维空间上去,主要作用可以总结为两点: Why? 1.降噪 2.去冗余
X = (3,2) 1.向量的表示及基变换 X = (3,2) 要准确描述向量,首先要确定一组基,然后给出在基所在的各个直线上的投影值就可以了
注意这里R可以小于N,而R决定了变换后数据的维数。即我们可以将一N维数据变换到更低维度的空间中去,变换后的维度取决于基的数量。注意这里R可以小于N,而R决定了变换后数据的维数。即我们可以将一N维数据变换到更低维度的空间中去,变换后的维度取决于基的数量。 其中pi是一个行向量,表示第i个基,aj是一个列向量,表示第j个原始数据记录。 将M个N维向量,变换为由R个N维向量表示的新空间中
如何选择基才是最优的?或者说,如果我们有一组N维向量,现在要将其降到K维(K小于N),那么我们应该如何选择K个基才能最大程度保留原有的信息?如何选择基才是最优的?或者说,如果我们有一组N维向量,现在要将其降到K维(K小于N),那么我们应该如何选择K个基才能最大程度保留原有的信息? 2.优化目标 中心化 希望投影后的投影值尽可能分散
前面说过,我们希望投影后投影值尽可能分散,而这种分散程度,可以用数学上的方差来表述前面说过,我们希望投影后投影值尽可能分散,而这种分散程度,可以用数学上的方差来表述 3.方差 于是上面的问题被形式化表述为:寻找一个一维基,使得所有数据变换为这个基上的坐标表示后,方差值最大。 均值为0
对于二维降成一维的问题来说,找到那个使得方差最大的方向就可以了。但是对于更高维呢?对于二维降成一维的问题来说,找到那个使得方差最大的方向就可以了。但是对于更高维呢? 4.协方差 当协方差为0时,表示两个字段完全独立。为了让协方差为0,我们选择第二个基时只能在与第一个基正交的方向上选择。因此最终选择的两个方向一定是正交的。 至此,我们得到了降维问题的优化目标:将一组N维向量降为K维(K大于0,小于N),其目标是选择K个单位(模为1)正交基,使得原始数据变换到这组基上后,各字段两两间协方差为0,而字段的方差则尽可能大(在正交的约束下,取最大的K个方差)。
假设我们只有a和b两个字段,那么我们将它们按行组成矩阵X:假设我们只有a和b两个字段,那么我们将它们按行组成矩阵X: 5.协方差矩阵 然后我们用X乘以X的转置,并乘上系数1/m: 根据上述推导,我们发现要达到优化目前,等价于将协方差矩阵对角化:即除对角线外的其它元素化为0,并且在对角线上将元素按大小从上到下排列,这样我们就达到了优化目的
设有m条n维数据: 7.PCA算法步骤总结 1.将原始数据按列组成n行m列矩阵X 2.将X的每一行(代表一个属性字段)进行零均值化,即减去这一行的均值 3.求出协方差矩阵 4.求出协方差矩阵的特征值及对应的特征向量 5.将特征向量按对应特征值大小从上到下按行排列成矩阵,取前k行组成矩阵P 6.Y=PX即为降维到k维后的数据
PCA - MNIST images 30 components: Eigen-digits
PCA - Face 30 components: http://www.cs.unc.edu/~lazebnik/research/spring08/assignment3.html Eigen-face
Weakness of PCA • Unsupervised • Linear PCA Non-linear dimension reduction in the following lectures LDA
Distributed Representation • Clustering: an object must belong to one cluster • Distributed representation 小傑是強化系 Dimension Reduction 小傑是
Word Embedding 1-of-N Encoding dog rabbit run jump cat tree flower apple = [ 1 0 0 0 0] bag = [ 0 1 0 0 0] cat = [ 0 0 1 0 0] Word Class dog = [ 0 0 0 1 0] class 1 Class 2 Class 3 ran flower elephant = [ 0 0 0 0 1] dog jumped cat apple tree bird walk
Word Embedding • Machine learn the meaning of wordsfrom reading a lot of documents without supervision Word Embedding tree flower dog rabbit run jump cat
Word Embedding • Generating Word Vector is unsupervised Apple Training data is a lot of text Neural Network
Word Embedding • Machine learn the meaning of wordsfrom reading a lot of documents without supervision • A word can be understood by its context You shall know a word by the company it keeps 蔡英文、馬英九 are something very similar 馬英九 520宣誓就職 蔡英文 520宣誓就職
How to exploit the context? • Count based • If two words wi and wj frequently co-occur, V(wi) and V(wj) would be close to each other • E.g. Glove Vector: V(wi) . V(wj) Ni,j Inner product Number of times wi and wj in the same document
Prediction-based wi …… wi-2 wi-1___ • 0 • z1 • 1-of-N encoding • of the word wi-1 • 1 • z2 The probability for each word as the next word wi • 0 …… tree • Take out the input of the neurons in the first layer z2 … flower …… …… dog rabbit • Use it to represent a word w run jump cat • Word vector, word embedding feature: V(w) z1
Prediction-based You shall know a word by the company it keeps • 0 • z1 • 1 • z2 The probability for each word as the next word wi • 0 蔡英文 or 馬英九 …… “宣誓就職” should have large probability z2 … Training text: …… …… …… 蔡英文 宣誓就職 …… 蔡英文 wi wi-1 馬英九 …… 馬英九 宣誓就職 …… wi-1 z1 wi
Prediction-based– Various Architectures • Continuous bag of word(CBOW) model • Skip-gram wi-1 Neural Network …… wi-1____ wi+1 …… wi wi+1 predicting the word given its context Neural Network wi-1 …… ____ wi____ …… wi wi+1 predicting the context given a word
Prediction-based– TrainingContinuous bag of word(CBOW) model Neural Network 潮水 退了 Collect data: 就 潮水 退了 就 知道 誰 … 不爽 不要 買 … 公道價 八萬 一 … ……… 退了 Neural Network 就 知道 就 Minimizing cross entropy Neural Network 知道 谁
Word Embedding Source: http://www.slideshare.net/hustwj/cikm-keynotenov2014
Word Embedding Fu, Ruiji, et al. "Learning semantic hierarchies via word embeddings."Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics: Long Papers. Vol. 1. 2014.
Word Embedding • Characteristics • Solving analogies Rome : Italy = Berlin : ? Compute Find the word w with the closest V(w)
Multi-lingual Embedding Bilingual Word Embeddings for Phrase-Based Machine Translation, Will Zou, Richard Socher, Daniel Cer and Christopher Manning, EMNLP, 2013
Multi-domain Embedding Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, Zero-Shot Learning Through Cross-Modal Transfer, NIPS, 2013
Document Embedding • word sequences with different lengths → the vector with the same length • The vector representing the meaning of the word sequence • A word sequence can be a document or a paragraph … word sequence (a document or paragraph)
Semantic Embedding Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507 Bag-of-word
Beyond Bag of Word • To understand the meaning of a word sequence, the order of the words can not be ignored. white blood cells destroying an infection positive different meaning exactly the same bag-of-word an infection destroying white blood cells negative
Transfer Learning Dog/Cat Classifier dog cat Data not directly related to the task considered elephant cat dog tiger Different domains, same task Similar domain, different tasks
http://www.bigr.nl/website/structure/main.php?page=researchlines&subpage=project&id=64http://www.bigr.nl/website/structure/main.php?page=researchlines&subpage=project&id=64 Why? http://www.spear.com.hk/Translation-company-Directory.html Task Considered Data not directly related English Speech Recognition Chinese …… Image Recognition Medical Images Text Analysis Specific domain Webpages Taiwanese