430 likes | 790 Views
Classification & Clustering. -- Parametric and Nonparametric Methods. 魏志達 Jyh-Da Wei. Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin. Classes vs. Clusters. Classification: supervised learning Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
E N D
Classification & Clustering -- Parametric and Nonparametric Methods 魏志達Jyh-Da Wei Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin
Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map
Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map
Bayes’ Rule prior likelihood posterior evidence 因為給定x之值則p(x) 均等
Bayes’ Rule: K>2 Classes 因為給定x之值則p(x) 均等
Gaussian (Normal) Distribution • p(x) = N ( μ, σ2) • Estimateμ and σ2: μ σ
P(C1)=P(C2) Equal variances Single boundary at halfway between means
P(C1)=P(C2) Variances are different Two boundaries
Multivariate Normal Distribution • Mahalanobis distance: (x – μ)T∑–1(x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) • Bivariate: d = 2
只分二類的話, 剛好以0.5為界線 discriminant: P (C1|x ) = 0.5 likelihoods posterior for C1
Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map
Parametric vs. Nonparametric • Parametric Methods • Advantage: it reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters. • Disadvantage: this assumption does not always hold and we may incur a large error if it does not. • Nonparametric Methods • Keep the training data;“let the data speak for itself” • Given x, find a small number of closest training instances and interpolate from these • Nonparametric methods are also called memory-based or instance-based learning algorithms.
Density Estimation 該 xt項構成集合之第t項 • Given the training set X={xt}t drawn iid (independent and identically distributed) from p(x) • Divide data into bins of size h • Histogram estimator: (Figure – next page) Extreme case: p(x)=1/h, for exactly consulting the sample space
Density Estimation • Given the training set X={xt}t drawn iid from p(x) • x is always at the center of a bin of size 2h • Naive estimator:(Figure – next page) or (讓每一個 xt投票) w(u): 依地緣關係投票, 贊成票計1/2, [-1,1] 區間積分值為1
Naïve estimator: h=1 h=0.5 h=0.25
Kernel Estimator • Kernel function, e.g., Gaussian kernel: • Kernel estimator (Parzen windows): Figure – next page • If K is Gaussian, then will be smooth having all the derivatives. K(u):依地緣關係給分,實數域積分值為1
Generalization to Multivariate Data • Kernel density estimator with the requirement that Multivariate Gaussian kernel spheric ellipsoid
k-Nearest Neighbor Estimator • Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width dk(x): distance to kth closest instance to x
Nonparametric Classification(kernel estimator) rit視xt是否遲於Ci而定0/1 可不看係數只看後項, 意義為累計各委員評分 這些評分為依地緣而定 的正實數值 原本要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大 但給定x之值則 p(x) 均等,此處大家都不寫,式子較漂亮
Nonparametric Classification k-nn estimator (1) • For the special case of k-nn estimator where ki : the number of neighbors out of the k nearest that belong to ci Vk(x) : the volume of the d-dimensional hypersphere centered at x, with radius cd : the volume of the unit sphere in d dimensions For example,
Nonparametric Classification k-nn estimator (2) • From • Then 意義為 累積找到 k samples 之時 何類的出席數最多 要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大 雖然給定x之值則 p(x) 均等, 但此處大家寫出來,推得的式子較漂亮
Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map
Supervised:X= { xt ,rt }t Classes Cii=1,...,K where p ( x | Ci) ~ N ( μi , ∑i ) Φ= {P (Ci ), μi , ∑i }Ki=1 Unsupervised :X= { xt }t Clusters Gi i=1,...,k where p ( x | Gi) ~ N ( μi , ∑i ) Φ= {P ( Gi ), μi , ∑i }ki=1 Labels, r ti ? Classes vs. Clusters
k-Means Clustering • Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data • Reference vectors, mj, j =1,...,k • Use nearest (most similar) reference: • Reconstruction error 希望群中心造成的總偏離值最小
k-means Clustering 1. Winner takes all 2. 不做逐步修正,而是一口氣取群平均 3. 下頁有實例,上課再舉反例(前方將士變節)
EM in Gaussian Mixtures • zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of supervised learning); assume p(x|Gi)~N(μi,∑i) • E-step: • M-step: Use estimated labels in place of unknown labels 擁有P(Gi )做後援 就不怕將士變節
Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map
Agglomerative Clustering • Start with N groups each with one instance and merge two closest groups at each iteration • Distance between two groups Gi and Gj: • Single-link: • Complete-link: • Average-link, centroid
Example: Single-Link Clustering 人類 侏儒黑猩猩 大猩猩 獼猴 黑猩猩 長臂猿 Dendrogram 可以動態分群