120 likes | 193 Views
Ontology, RDF, SW for Chemical Structures. T N Bhat & J. Barkley NIST. Bhat@nist.gov. Query tool. Use Case. Publications. Major Features, Goal – to Reduce User Frustration. We have established a use case at the HCLS Website - Chemical taxonomies
E N D
Ontology, RDF, SW for Chemical Structures T N Bhat & J. Barkley NIST Bhat@nist.gov Query tool Use Case Publications
Major Features, Goal – to Reduce User Frustration • We have established a use case at the HCLS Website - Chemical taxonomies • Combining of Rule-based terms with Vocabulary-based terms to define elements of RDF • Organization of the elements of RDF into predictable ontology using concepts from use cases • Developing tools and techniques to present the information using familiar database environments • Allows easier portability and implementation of the information by the community • Illustrating the concept using high profile data such as for AIDS inhibitors and Protein Data Bank contents
Combining of Rule-based with Vocabulary-based elements to define RDF • Chemical structures are definable by atomic connectivity – thus structures are suitable for identification using graph theory – InChI • Suitable for machine reasoning • Graphs are hard to digest for humans – therefore proposal is to combine InChI with familiar vocabularies such as Ala, Phenyl, Adenine • Also include synonyms in the vocabulary for greater coverage among diverse users • Vocabularies make it easier for humans to recognize the information
InChI – a Scalable URI • InChI is generated using a software that decodes the chemical connectivity information in certain layers such as chirality, ring structure, atom type and then re-codes them to form a text string • InChI is a naming standard for chemicals recommended by IUPAC
InChI – a rule-based URI • InChI • _1_2FC10H11NO2_2Fc11-10_2812_2913-9-5-7-3-1-2-4-8_287_296-9_2Fh1-4_2C9H_2C5-6H2_2C_28H2_2C11_2C12_29
Vocabulary-based Definitions • For decades scientists have been developing names to identify structures and their images • Simple names • His • Ala • DNA • ATP • Semi-rule-based IUPAC names • 2-amino-3-methylpentanamide • 4-amino-3-hydroxy-6-methylheptanoic_acid • 1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}-propyl-carbamic acid, naphthalen-1-ylmethyl ester • Names facilitate text-based queries of desired components • Names when used together with InChI provide a smoother integration of machine and human needs
Use-Case for SW; Treatment for AIDS is a work in progress • Treatments for AIDS are of two types • Prevention – the most effective • Containment • Drugs to contain, and reduce the viral load • Majority of the drugs ( ~17) target either HIV protease or RT • Complete suppression of either of these viral enzymes could cure AIDS • But drug resistance leads only to partial suppression of the enzymes • All the drug design efforts for AIDS are based on structures • Data needed for drug-design is scattered over many Web resources and users often wean through the data manually • Therefore AIDS drug design is an ideal target for Semantic Web and novel new database related technologies • SW connection between NIST and NIAID AIDS database Choose the problem that matters Website
Annotation Technique/Developing Structural Ontology • Define compounds using chemical features of interest to use cases • Fragment, subgroup, class 000503 030798 1A8K 000505
Web tools • Structures are different from text based info • Structures are not amenable to text-based query/rendering techniques • Majority of the structural users never heard (nor want to hear!) about SPARQL – query language for RDF • Commonly preferred/expected way to query is by ‘click’ • Semantic Web for Structures needs new Web tools that allow navigation by clicking on structural features
Chem-BLAST for Structural Semantic Web http://bioinfo.nist.gov/SemanticWeb_pr3d/chemblast.do Prasanna et al. PROTEINS60, 1-4 (2005). Prasanna et al. PROTEINS 63(4), 907-917(2006). Download publications
Future Plans • Extend the work to chemical structures from Protein Data Bank • If interest exists hold a workshop at NIST Proposed dates - last two weeks of March 2008 • Workshop will be in conjunction with the NIST wide Ontology week • Possible collaboration with IUPAC (International Union of Pure and Applied Chemistry ) and ChEBI • Contact: Colin Batchelor BatchelorC@rsc.org • RSC Publishing,Royal Society of Chemistry • Community participation is essential for further development • Contact bhat@nist.gov 301 975 5448 (US)