290 likes | 397 Views
Data mining of toxic chemicals & database-based toxicity prediction. Jiansuo Wang & Luhua Lai. Institute of Physical Chemistry, Peking University. P. R. China. Our goal: to introduce risk assessment of chemicals in the early stage of drug design. Candidates generated by computer aid.
E N D
Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R. China
Our goal: to introduce risk assessment of chemicals in the early stage of drug design. Candidatesgenerated by computer aid Initial screening of chemical toxicity Leads which are a bit “safer”
Due to computer-aided drug design, characteristics & difficulty of the problem besides the complexity of toxicity : • The virtually generated molecules are numerous. • The molecules designed for drugs may be structurally diverse. • The molecules have no or little other information except for chemical structure.
How to evaluate the bio-activity (toxicity) of a large number of molecules only from their structure? • In terms of structure-activity rules: expert system. • In terms of statistical models: QSAR (Qualitative/Quantitative Structure Activity Relationship).
How to extract rules/models of toxic chemicals from the database of toxic chemicals to aid toxicity assessment? • Structural features of toxic chemicals • statistical analysis, • similarity analysis, • cluster analysis To the database RTECS • QSAR models of toxic chemicals • QSAR combined with cluster analysis To the database RTECS
What features toxic chemicals? Molecular weight Atomic composition of molecules groups of molecules rings of molecules An initial database analysis shows that there is no distinct difference between toxic chemicals and drugs about these basic molecular features.
Classification of toxic substances according to action modes: 1) substances that exhibit extremes of acidity, basicity, dehydrating ability, or oxidizing power; 2) reactive substances that contain functional groups prone to react with biomolecules in a damaging way; 3) heavy metals; 4) lipid-soluble compounds; 5) binding species in a reversible or irreversible way that bond to biomolecules and alter the normal function, and so on . Manahan, S. E. Toxicological chemistry
Structure patterns Considering the integrality of molecules and the specificity of action modes between the molecules. A molecular structure pattern is defined as a template comprising a given framework and some given groups. It represents the common structural features shared by a series of molecules that are possible to act in a toxicologically similar manner.
How to get molecular structure patterns? • Dissect the molecules • Similarity comparison: • Cluster analysis
Do structure patterns really exist in the database of toxic chemicals ? The underlining idea of structure patterns: Specificity of action modes Structural correlation among the molecules with similar action mode • The embodiment of structure patterns in the database: • Structure similarity among the molecules in the databases will become convergent when the size of the databases varies from small to large. Parallel analysis • A large enough database will have predictive power for new toxic chemicals to a certain degree. Cross analysis
The curve of coverage rates vs size of databases when 0.6 is given as the similarity limit. Figure displays that prediction accuracy is given, prediction ability of the databases tends to be convergent when the database is large enough. It indicates of the possibility that structure patterns exist in the database.
The findings of systematic analysis about the database indicate: not only structure patterns promise to exist, but also it is necessary and feasible to search for structure patterns.
The representative molecules of some structure patterns of toxic chemicals
Data mining of toxic chemicals: QSAR combined with structure patterns A two-step strategy to explore noncongeneric toxic chemicals from the database: the screening of structure patterns and the generation of detailed relationship between structure and activity. First, an efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis. Then, QSAR study of structure pattern can provide the estimate of the activity as well as the detailed relationship between activity and structure.
An example of the implementation The representative molecule of the structure pattern(WLN: T6VMVMV FHJ F2Y&1 F2U1; CAS-number: 115-44-6): • Select one structure pattern. • By computing molecular similarity, we get 189 chemicals from the database RTECS whose similarity values to the representative molecule are higher than 0.6. • According to species observed and route of exposure, the chemicals mainly fall in the five major categories. • Build CoMFA models between the structure and LD50 values about three series of chemicals.
Rabbit-intravenous: cross-validated and final fit CoMFA analysis with five components; 37 chemicals, q2 = 0.608, r2 = 0.981, F = 323.
Rabbit-intravenous: contour map of final CoMFA model; for steric effects, more bulk near green and less bulk near yellow is favorable to increase the active, while for electrostatic effects, more positive near blue and more negative near red is desirable for molecules to be more active.
The performance of overall procedure demonstrates: • such a stepwise scheme is feasible and effective to mine a database of toxic chemicals. • The scheme take account of structural diversity of toxic chemicals • The scheme is a compromise between speed and accuracy.
dbToxPre: database-based toxicity predictor of chemicals Inquiry molecule Database of toxic chemicals ShapeAnal Structure-related set Field-basedsimilarity analysis Flexible CoMFA analysis Close molecule & similarity-activity CoMFA model & activity prediction
dbToxPre The program mainly includes four parts: 1) a fast and efficient clustering selection of molecules based on molecular shape 2) field-based similarity computation of molecular structure based on shape cluster 3) flexible CoMFA analysis of molecules based on shape cluster 4) a database of toxic chemicals suitable for such procedure The characteristics of the program: fast; efficient; dynamically combining with the database
ShapeAnal:fast & efficient shape analysis of molecules Inquiry molecule Marking of atoms in the molecule Structure description:dimension,ring systems, relative orientation of ring-system atoms Alignment of molecule shapes Structure-related set
Molecular Field • Concept:continuous property fields around the molecule produced by the molecular atoms. • Similarity analysis of molecular field(Carbo index): • Comparative Molecular Field Analysis, CoMFA
Evolutionary Algorithm -considering flexibility of molecules • Community/Population: structure-related set • Species/Chromosome: combination of rotatable single bonds in the molecules • Convergence: steady state of sorting • Procedure: Parent generation Congenric mutation Child generation
Fast field-based similarity analysis Structure-related set Molecular alignment based on framework shape EA: conformation mutation & similarity comparison Similarity analysis & activity prediction
Flexible CoMFA Structure-related set • The procedure of CoMFA • Characteristics: considering conformational flexibility & hydrophobic field Molecular alignment based on framework shape EA: conformation mutation & CoMFA CoMFA model & activity prediction
Rebuilding of toxic-chemical database • Seleciton of DBMS • Michael Stonebraker’s classification: • simple data & no inquiry--file system • complex data & no inquiry--object-oriented DBMS • simple data & inquiry -- relationship DBMS • complex data & inquiry -- object-relationship DBMS: Postgresql • Sketch map of the design of Toxdb
Database-based toxicity prediction of chemicals provides activity assessment of the inquiry molecule by a serial of related molecules from the database. The purposes: • To try the best to use available known knowledge of related chemicals. • To offset uncertainty of single data by mutual correction among a serial of molecules.
Conclusion • Initial analysis of toxic-chemical database confirms the concept of structure pattern of toxic chemicals. • QSAR combined with structure pattern provide an alternative to explore noncongenric toxic chemicals in the database. • Database-based toxicity prediction combines dynamically the database to assist risk assessment of chemicals. • Data-mining & toxicity prediction: visualization computation Storage computation: effective computation integrated into reasonable data storage Reference & paper: 1. Data mining of toxic chemicals: structural patterns and QSAR, Jiansuo Wang, Luhua lai, Youqi Tang, J. Mol. Modelling,1999,252-262. 2. Predictive toxicology of toxic chemicals and database mining, Jiansuo Wang, Luhua lai, Youqi Tang, Chinese Science Bulletin, 2000, 45, 12, 1093-1097。 3. Structural features of toxic chemicals for specific toxicity, Jiansuo Wang, Luhua lai, Youqi Tang , J. Chem. Inf. Comput. Sci.,1999,39,6,1173-1189.
Acknowledgements Prof. Luhua Lai Prof. Youqi Tang Mr. Alan Gelberg …...