200 likes | 367 Views
ChemAxon’s Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity. The Aureus Pharma System. Knowledge Base. Integration Platform. Query Interface. Analysis/Display Applications. AurSCOPE Statistics: March 2006. AurQUEST.
E N D
ChemAxon’s Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity
The Aureus Pharma System Knowledge Base Integration Platform Query Interface Analysis/Display Applications
AurQUEST • Query management software for AurSCOPE • Web-based application integrating ChemAxon technology • Powerful Query Builder • Biological and Chemical Queries • Structural search using ChemAxon tools • Efficient Navigation • Different Export Formats (SDF, RDF, …)
Data Preprocessing 1 • Counterions • MW > 700 • Inorg • NAS AurSCOPE database 2 • Stereo-duplicates • Identical mol. but different salts • … 3 2D unique structures 4
AurSCOPE Ion Channels: Retrieving Active Molecules Protocols: Binding or Electrophysiology Target: All Target type: Wild Parameter filter Ki, EC50, IC50 < 300 nM (*) November 2005 11519 molecules(*) (9897 uniques)
4000 3500 3000 2500 2000 1500 1000 500 0 IP3 P2X 5HT3 NMDA GABA AMPA/KA Sodium Channel Glycine receptor Calcium Channel Vanilloid receptor Chloride Channel Potassium Channel Ryanodine receptor Acid Sensing Ion Channel Nicotinic Acetylcholine receptor AurSCOPE Ion Channels: Activity Distribution
Encoding Chemical Space and Clustering • Standardization of molecules. • Generating Chemical Fingerprints (CF). • Optimization of different CF parameters. • CF-based Jarvis-Patrick clustering with various adjusted parameters.
Parameters for Generating Hashed Chemical Fingerprints • Fingerprint length • - The number of bits in the bit string. • - Bigger fingerprint increases the capacity for storing information on molecules. • Maximum pattern length • - The maximum length of atoms in the linear paths that are considered during the fragmentation of the molecule. (The length of cyclic patterns is not limited.). • - Longer and more patterns hold more information on the molecule. • Bits to be set for patterns • - After detecting a pattern, some bits of the bit string are set to "1". The number of bits used to code patterns is constant. • - Higher number of bits increases the coded information from a pattern. • Darkness of the fingerprint • - The percentage of "1" digits in the bit string. We consider fingerprints with more ones "darker" than those with less ones.
Chemical Fingerprints: Effect of Parameters FP length Max #bonds Max #bits Aver. Darkness Max. Darkness 512 7 3 68.5 97.5 512 7 4 82.2 99.4 512 7 5 84.9 99.4 512 8 3 76.1 99.2 512 8 4 87.7 99.4 512 8 5 89.8 99.4 1024 7 3 46.1 83.3 1024 7 4 61.5 94.8 1024 7 5 65.5 95.9 1024 8 3 54.8 91.9 1024 8 4 70.2 98.5 1024 8 5 73.8 98.9 2048 7 3 26.8 58.6 2048 7 4 39.1 78.6 2048 7 5 42.4 81.6 2048 8 3 33.4 73.7 2048 8 4 47.5 89.6 2048 8 5 50.9 91.6
CF-based Jarvis-Patrick Clustering 1.For each structure, collect the set of nearest neighbors that has a dissimilarity (distance) less than a Tthreshold value. Two structures cluster together if they are in each others list of nearest neighbors. 2. They have at least Rmin of their nearest neighbors in common, where Rmin is a ratio of the length of the shorter list.
CF-based Jarvis-Patrick Clustering Chemical fingerprint length in bits: 2048 Maximum number of bonds in patterns: 7 Maximum number of bits to set for each pattern: 5 T Rmin # Clusters # Singletons 0.15 0.2 932 1663 0.3 938 1663 0.4 945 1663 0.5 977 1663 0.16 0.3 865 1499 0.5 910 1500 0.17 0.3 819 1372 0.5 860 1373 0.18 0.3 787 1238 0.5 826 1238 0.19 0.3 752 1140 0.5 780 1141 0.20 0.3 722 1051 0.5 752 1051
CF-based Jarvis-Patrick Clustering Similarity threshold = 0.85(*) (*) Martin Y.C. et al.Do structurallysimilar molecules have similar biological activity? J. Med. Chem.2002, 45, 4350-4358.
Most Populated Clusters: Biological " Projection" Gamma aminobutyric acid A receptor Voltage-gated calcium channel Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor Gamma aminobutyric acid A receptor
Potassium channel Gamma aminobutyric acid A receptor Gamma aminobutyric acid A receptor Voltage-gated calcium channel Nicotinic acetylcholine receptor 5-HT3 Gamma aminobutyric acid A receptor
Conclusions • JKlustor integrates computationally rapid and efficient clustering tools. • Shortcomings to be addressed to deal with artificial singletons. • Future work: combination with Maximum Common Substructure approach (LibMCS). • Other algorithms (Ward,…)