1 / 19

Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

Artificial Intelligence 15-381 Unsupervised Machine Learning Methods. Jaime Carbonell 1-November-2001 OUTLINE: What is unsupervised learning? Similarity computations Clustering Algorithms Other kinds of unsupervised learning. Unsupervised Learning. Definition of Unsupervised Learning:

scarlett
Download Presentation

Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence 15-381Unsupervised Machine Learning Methods Jaime Carbonell 1-November-2001 OUTLINE: What is unsupervised learning? Similarity computations Clustering Algorithms Other kinds of unsupervised learning

  2. Unsupervised Learning • Definition of Unsupervised Learning: Learning useful structure without labeled classes, optimization criterion, feedback signal, or any other information beyond the raw data

  3. Unsupervised Learning • Examples: • Find natural groupings of Xs (X=human languages, stocks, gene sequences, animal species,…) Prelude to discovery of underlying properties • Summarize the news for the past month Cluster first, then report centroids. • Sequence extrapolation: E.g. Predict cancer incidence next decade; predict rise in antibiotic-resistant bacteria • Methods • Clustering (n-link, k-means, GAC,…) • Taxonomy creation (hierarchical clustering) • Novelty detection ("meaningful"outliers) • Trend detection (extrapolation from multivariate partial derivatives)

  4. Similarity Measures in Data Analysis • General Assumptions • Each data item is a tuple (vector) • Values of tuple are nominal, ordinal or numerical • Similarity = (Distance)-1 • Pure Numerical Tuples • Sim(di,dj) = di,kdj,k • sim (di,dj) = cos(didj) • …and many more (slide after next)

  5. Similarity Measures in Data Analysis • For Ordinal Values • E.g. "small," "medium," "large," "X-large" • Convert to numerical assuming constant …on a normalized [0,1] scale, where: max(v)=1, min(v)=0, others interpolate • E.g. "small"=0, "medium"=0.33, etc. • Then, use numerical similarity measures • Or, use similarity matrix (see next slide)

  6. Similarity Measures (cont.) • For Nominal Values • E.g. "Boston", "LA", "Pittsburgh", or "male", "female", or "diffuse", "globular", "spiral", "pinwheel" • Binary rule: If di,k=dj,k, then sim=1, else 0 • Use underlying sematic property: E.g. Sim(Boston, LA)=dist(Boston, LA)-1, or Sim(Boston, LA)=(|size(Boston) – size(LA)| )-1 • Use similarity Matrix

  7. Similarity Matrix tiny little small medium large huge tiny 1.0 0.8 0.7 0.5 0.2 0.0 little 1.0 0.9 0.7 0.3 0.1 small 1.0 0.7 0.3 0.2 medium 1.0 0.5 0.3 large 1.0 0.8 huge 1.0 • Diagonal must be 1.0 • Monotonicity property must hold • Triangle inequality must hold • Transitive property need *not* hold

  8. Document Clustering Techniques • Similarity or Distance Measure:Alternative Choices • Cosine similarity • Euclidean distance • Kernel functions, e.g., • Language Modeling P(y|modelx) where x and y are documents

  9. Document Clustering Techniques • Kullback Leibler distance ("relative entropy")

  10. Incremental Clustering Methods Given n data items: D: D1, D2,…Di,…Dn And given minimal similarity threshold: Smin Cluster data incrementally as follows: Procedure Singlelink(D) Let CLUSTERS = {D1} For i=2 to n Let Dc =Argmax[Sim(Di,Dj]; j<i If Dc>Smin, add Dj to Dc's cluster Else Append(CLUSTERS, {Dj};; new cluster

  11. Incremental Clustering (cont.) Procedure Averagelink(D) Let CLUSTERS = {D1} For i=2 to n Let Dc =Argmax[Sim(Di, centroid(C)] C in CLUSTERS If Dc>Smin, add Dj to cluster C Else Append(CLUSTERS, {Dj};; new cluster • Observations • Single pass over the dataeasy to cluster new data incrementally • Requires arbitrary Smin threshold • O(N2) time, O(N) space

  12. Document Clustering Techniques • Example. Group documents based on similarity Similarity matrix: Thresholding at similarity value of .9 yields: complete graph C1 = {1,4,5}, namely Complete Linkage connected graph C2={1,4,5,6}, namely Single Linkage For clustering we need three things: • A similarity measure for pairwise comparison between documents • A clustering criterion (complete Link, Single Ling,…) • A clustering algorithm

  13. Document Clustering Techniques • Clustering Criterion: Alternative Linkages • Single-link ('nearest neighbor"): • Complete-link: • Average-link ("group average clustering") or GAC):

  14. Non-hierarchical Clustering Methods • A Single-Pass Algorithm • Treat the first document as the first cluster (singleton cluster). • Compare each subsequent document to all the clusters processed so far. • Add this new document to the closest cluster if the intercluster similarity is above the similarity threshold (predetermined); otherwise, leave the new document alone as a new cluster. • Repeat Steps 2 and 3 until all the documents are processed. - O(n2) time and O(n) space (worst case complexity)

  15. Non-hierarchical Methods (cont.) • Multi-pass K-means ("reallocation method") • Select K initial centroids (the "seeds") • Assign each document to the closeest centroid, resulting in K clusters. • Recompute the centroid for each of the K clusters. • Repeat Steps 2 and 3 until the centroids are stabilized. - O(nK) time and O(K) space per pass

  16. Hierarchical Agglomerative Clustering Methods • Generic Agglomerative Procedure (Salton '89): • result in nested clusters via iterations • Compute all pairwise document-document similarity coefficients • Place each of n documents into a class of its own • Merge the two most similar clusters into one; - replace the two clusters by the new cluster - compute intercluster similarity scores w.r.t. the new cluster • Repeat the above step until only one cluster is left

  17. Hierarchical Agglomerative Clustering Methods (cont.) • Heuristic Approaches to Speedy Clustering: • Reallocation methods with k selected-seeds (O(kn) time) - k is the desired number of clusters; n is the number of documents • Buckshot: random sampling (of (k)n documents) puls global HAC • Fractionation: Divide and Conquer

  18. Creating Taxonomies • Hierarchical Clustering • GAC trace creates binary hierarchy • Incremental-link Hierarchical version • Cluster data with high Smin 1st hierarchical level • Decrease Smin (stop at Smin=0) • Treat cluster centroids as data tuples and recluster, creating next level of hierarchy, then repeat steps 2 and 3. • K-means Hierarchical k-means • Cluster data with large k • Decrease k (stop at k=1) • Treat cluster centroids as data tuples and recluster, creating next level of hierarchy, then repeat steps 2 and 3.

  19. Taxonomies (cont.) • Postprocess Taxonomies • Eliminate "no-op" levels • Agglomerate "skinny" levels • Label meaningful levels manually or with centroid summary

More Related