160 likes | 369 Views
CLUTO A Clustering Toolkit. By Roseline Antai. Wha t is CLUTO?. CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters. Algorithms of CLUTO. v cluster s cluster Major difference : Input format
E N D
CLUTOA Clustering Toolkit By RoselineAntai
What is CLUTO? CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.
Algorithms of CLUTO vcluster scluster Major difference: Input format vcluster: actual multidimensional representation of the objects to be clustered. scluster: The similarity matrix (or graph) between these objects.
Calling Sequence vcluster [optional parameters] MatrixFileNclusters scluster [optional parameters] MatrixFileNClusters
Optional Parameters • Standard specification -paramname or –paramname = value • Three categories: • Clustering algorithm parameters • Reporting and Analysis parameters • Cluster Visualization parameters
Clustering algorithm parameters • Control how CLUTO computes the clustering solution. • Examples • -clmethod=string ( rb, agglo,direct,graph, etc) • -sim = string (cos,corr,dist,jacc) • -crfun = string (i1,i2 etc) • -fulltree
Reporting and Analysis Parameters • Control the amount of information that vcluster and scluster report about the clusters as well as the analysis performed on discovered clusters. • Examples • -clustfile= string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile)) • -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)
-rlabelfile=string • -rclassfile=string (Stores the labels of the rows – objects to be clustered). • -showtree • -showfeatures (descriptive and discriminating)
Cluster Visualization Parameters • Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together. • Examples • -plottree = string; gives graphic representation of the entire hierarchical tree • -plotmatrix= string; shows how the rows of the original matrix are clustered together.
A practical example • ../cluto/Linux/vcluster -clmethod=rb -sim=cos -fulltree -rlabelfile=Final_Results/rlabelfile -rclassfile=Final_Results/classfile -showtree -plotformat=gif -plottree=Final_Results/Images/PT-Final10d -plotmatrix=Final_Results/Images/PM-Final10d -plotclusters=Final_Results/Images/PC-Final10d -showfeaturesFinal_Results/FinalOutput10d-Vt.mat 4
Classfile and rlabelfile EvoSemImpImpDeoDeoImpImpDeoDeoImpDeoDeoImpSemDeoSemImpImpEvo 0123456789101112131415
The plot uses red to denote positive values and green to denote negative values. Bright red/green indicate large positive/negative values, whereas colors close to white indicate values close to zero.