1 / 7

Weighted Chinese Restaurant Process for clustering barcodes

Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST. Cluster Analysis: Group the observations into k distinct natural groups. Non Bayesian Cluster Analysis:

amable
Download Presentation

Weighted Chinese Restaurant Process for clustering barcodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

  2. Cluster Analysis: • Group the observations into k distinct natural groups. • Non Bayesian Cluster Analysis: • Hierarchical clustering: Build a hierarchical tree • - SIMILARITY: Inter point distance: Euclidean, Manhattan… • - Inter cluster distance: Single Linkage, Complete, Average, Ward • Build a hierarchical tree • Non Hierarchical clustering: • K-means • Divisive • PAM • Model Based • Many Other Methods

  3. HierarchicalClustering 1 2 3 4 Specimen 1 Specimen 2 Specimen 3 Specimen 4 Specimen 5 Specimen 6 Specimen 7 6 5 7

  4. Weighted Chinese Restaurant Process • The Restaurant is full of tables. • 2. Customers are sited on tables by a sitting rule. • 3. Customers are allowed to move from one table to another or to a new empty one. • Partition: Each sitting arrangement for all the customers in the restaurant. 1 2 3 4 6 5 7

  5. Partitions: p : Partition of specimens into species. pP : {Space of all possible partitions. All arrangements of specimens into species} Bayes basics: Prior Distribution: π(p) Likelihood: f(x|p) = 1in(p) k(xj, jCi). Posterior: π(p|data)  f(x|p)  π(p)

  6. Weighted Chinese Restaurant Process • Approximate Posterior distribution with WCRP • Run the process for a while and obtain frequency table of partitions visited. • Estimate final partition with posterior mode. • Compare posterior probabilities of most probable partitions. • New Specimens: • Placed in one existing table. • Open a new table=>New Species 1 2 3 4 6 5 7

  7. Future Work • WCRP Algorithm for Barcode data: • Data Visualization: • Final partition => similarities => Euclidean Representation • Multidimensional Scaling • Multivariate Data Visualization (used in taxonomy) • Projection Pursuit • Entropy scanning • Lo (1984), Ishwaran and James (2003b), Cabrera, Lau, Lo (2006) • Javier Cabrera cabrera@stat.rutgers.edu • John Lau john.lau@bristol.edu.uk • Albert Lo imaylo@ust.hk

More Related