1 / 28

What’s new in JKlustor

What’s new in JKlustor. Miklós Vargyas. UGM 2006. Overview. An introduction to JKlustor Brief history of the product Main features Usage examples Performance LibMCS, an alternative approach to clustering chemical structures Concepts, motivation Features Performance

jaguar
Download Presentation

What’s new in JKlustor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s new in JKlustor Miklós Vargyas UGM 2006

  2. Overview • An introduction to JKlustor • Brief history of the product • Main features • Usage examples • Performance • LibMCS, an alternative approach to clustering chemical structures • Concepts, motivation • Features • Performance • Future of JKlustor

  3. Brief history of JKlustor • First discovery tool in the JChem package • Jarp released in version 1.5.2 (March 22, 2001) • Compr 1.5.7 (May 27, 2001) • Ward 1.5.9 (Jun 25, 2001) • API released in JChem 1.6.2 (May 16, 2002) • Experimental LibMCS first released in JChem 3.0 (Dec 1, 2004) • New JKlustor GUI to be released in JChem 3.?

  4. JKlustor features • Similarity based clustering • ChemAxon’s topological fingerprint • External data points, arbitrary dimension • Tanimoto, weighted Euclidean • Hierarchical clustering: Ward • Reciprocal nearest neighbor algorithm • Kelley method • Non-hierarchical clustering: Jarvis-Patrick • Diversity calculation: Compr • Structure based clustering: LibMCS

  5. JKlustor usage • Command line tools • Pipelining commands • Option flags • Structure file/database input • Manual creation of cluster views Input SDFile GenerateMD NNeib JarvisPatrick CreateView MarvinView Picture

  6. JKlustor usage • Prepare data and run clustering generatemd c input.sdf -k CF -c cfp.xml -D -o fingerprints.txt nneib -f 512 -t 0.1 -g –i fingerprints.txt –o neighborlists.txt jarp -c 0.2 -y –i neighborlists.txt –o clusters.txt • View first cluster crview -i id -c "clid=1" -s input.sdf -t clusters.txt –o jarp_cluster1.sdf mview –c 3 -r 3 jarp_cluster1.sdf • View centroids, display cluster id and size crview-i "centr:2" -c "size>=20" -d "clid:size" -s input.sdf -t clusters.txt -o jarp_centroids.sdf mview -c 3 -r 3 -f "clid:size" jarp_centroids.sdf

  7. JKlustor usage

  8. JKlustor performance • Memory: O(n) • Time: Jarvis-Patrick O(n1.5), Ward O(n2)

  9. What is MCS? • The Maximum Common Substructure of two chemical structures

  10. Clustering by MCS? • Find the MCS of a group of structures

  11. Very brief history of LibMCS • Reaction automapper, based on Maximum Common Subgraph Search • MCS class API made public • Customer requested MCS based clustering • More intuitive than similarity based • Focused set analysis • screens: 2000 – 10000 structures • lead optimization: 3000 – 5000 structures • Should be hierarchical (outliers) • Ultimate goal: cluster 5000 compounds in 5 seconds

  12. LibMCS features • MCS based hierarchical clustering • Flexible search options • Hierarchy browser • Filtering by chemical properties • Cluster statistics • No size limitation • Fast operation

  13. LibMCS – Dendogram view

  14. LibMCS – Molecule view

  15. LibMCS – Table view

  16. LibMCS – Statistics

  17. LibMCS – Selections

  18. LibMCS – Property filters

  19. LibMCS – Output files

  20. LibMCS – Output files CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)CN(C2CCCC2)C1=O 0 21 0 CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)C2CCCN2C1=O 0 21 0 OC(=O)C1CCCN1C(=O)CCS CC(CS)C(=O)N1CCCC1C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS [H]C1(CCCN1C(=O)CCS)C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCC2SC(=O)C3=CC=CC=C3 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCCC2S 0 19 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1SC(=O)C2(C)CC3=CC=CC=C3CN2C1=O 0 20 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1CSC(=O)C2CC3=C(CN2C1=O)C=CC=C3 0 20 0 CC1SC(=O)C2CCCN2C1=O CC1SC(=O)C2CCCN2C1=O 0 30 0 CC1SC(=O)CNC1=O CC1SC(=O)CNC1=O 0 29 0 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 0 31 0 CC(S)C(=O)NCC(O)=O CC(S)C(=O)NCC(O)=O 0 24 0 CCC1=CC=CC=C1 CC(NC(CCC1=CC=CC=C1)C(O)=O)C(=O)N2CCCC2C(O)=O 0 22 0 CCC1=CC=CC=C1 CCOC(=O)C(CC1=CC=CC=C1)NC(=O)NC(CC2=CC=CC=C2)C(=O)OCC 0 22 0 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 0 23 0 C\C(Cl)=N/OC(N)=O C\C(Cl)=N/OC(N)=O 0 27 > <Cluster_ID> 1163 > <Element_count> 1 > <Parent_ID> 1 $$$$ Marvin 05290619172D 23 24 0 0 0 0 999 V2000 2.4230 -0.3587 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.0538 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.8788 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.4349 -1.1837 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.1494 -1.5962 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.8638 -1.1837 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0

  21. LibMCS – RGroup decomposition

  22. LibMCS – RGroup decomposition

  23. LibMCS – Performance • Depends on • average structure size • total diversity • minimal required MCS size • atom/bond constraints • Scales linearly • Maximum speed achieved • 1 000 structures in 3 seconds • Memory requirements • 100 000 structures occupy 200MB

  24. LibMCS – Performance

  25. LibMCS – Further applications • Find the MCS of existing clusters • Data retrieval • Assay analysis • Compound acquisition • Combinatorial library profiling

  26. Development plans • Disconnected MCS • Multi-group clustering • More chemical sense (e.g. avoid opening rings, consider chirality) • Performance tuning (e.g. NN) • Integrate Ward/Jarp into new GUI • Additive clustering • Clustering million compound libraries • Integrate Chemical Terms • Integrate molecular descriptors, optimized metrics

  27. Summary • New tool in JKlustor based on MCS • More plausible grouping • Hierarchical with dendogram browser • Statistics • Filtering, coloring, selection

  28. Acknowledgements • Developers • Ferenc Csizmadia, Árpád Tamási, András Volford, Szilárd Doránt • Péter Vadász, Nóra Máté • Special thanks

More Related