70 likes | 174 Views
Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST. Cluster Analysis: Group the observations into k distinct natural groups. Non Bayesian Cluster Analysis:
E N D
Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST
Cluster Analysis: • Group the observations into k distinct natural groups. • Non Bayesian Cluster Analysis: • Hierarchical clustering: Build a hierarchical tree • - SIMILARITY: Inter point distance: Euclidean, Manhattan… • - Inter cluster distance: Single Linkage, Complete, Average, Ward • Build a hierarchical tree • Non Hierarchical clustering: • K-means • Divisive • PAM • Model Based • Many Other Methods
HierarchicalClustering 1 2 3 4 Specimen 1 Specimen 2 Specimen 3 Specimen 4 Specimen 5 Specimen 6 Specimen 7 6 5 7
Weighted Chinese Restaurant Process • The Restaurant is full of tables. • 2. Customers are sited on tables by a sitting rule. • 3. Customers are allowed to move from one table to another or to a new empty one. • Partition: Each sitting arrangement for all the customers in the restaurant. 1 2 3 4 6 5 7
Partitions: p : Partition of specimens into species. pP : {Space of all possible partitions. All arrangements of specimens into species} Bayes basics: Prior Distribution: π(p) Likelihood: f(x|p) = 1in(p) k(xj, jCi). Posterior: π(p|data) f(x|p) π(p)
Weighted Chinese Restaurant Process • Approximate Posterior distribution with WCRP • Run the process for a while and obtain frequency table of partitions visited. • Estimate final partition with posterior mode. • Compare posterior probabilities of most probable partitions. • New Specimens: • Placed in one existing table. • Open a new table=>New Species 1 2 3 4 6 5 7
Future Work • WCRP Algorithm for Barcode data: • Data Visualization: • Final partition => similarities => Euclidean Representation • Multidimensional Scaling • Multivariate Data Visualization (used in taxonomy) • Projection Pursuit • Entropy scanning • Lo (1984), Ishwaran and James (2003b), Cabrera, Lau, Lo (2006) • Javier Cabrera cabrera@stat.rutgers.edu • John Lau john.lau@bristol.edu.uk • Albert Lo imaylo@ust.hk