200 likes | 223 Views
Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1 , Khaled Khelif 1 , Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis ACACIA project, http://www.inria.fr/acacia http://www.inria.fr/acacia/corese 2 IPMC, Sophia Antipolis http://www.ipmc.fr. Outline.
E N D
Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz1, Khaled Khelif1, Olivier Corby1 Pascal Barbry2 1INRIA - Sophia Antipolis ACACIA project, http://www.inria.fr/acaciahttp://www.inria.fr/acacia/corese 2IPMC, Sophia Antipolishttp://www.ipmc.fr
Outline • Context: Memory of Biochip Experiments • The MEAT Project • Semi-automatic generation of semantic annotations • Conclusions: Requirements for Semantic Web
Context: Biochip experiments • DNA microarrays (gene chips, biochips) enable to simultaneously measure the expression level and transcription rate of various genes in an organism. • Applications in biology, medicine, pharmacology…: • Gene discovery • Disease diagnosis or prognosis • Drug discovery: Pharmacogenomics • Toxicological research: Toxicogenomics
Towards Biochip Experiment Memory Experiment sheets Biologist Domain Ontologies Experiment DB Documents • Need of Knowledge Management for a community of biologists: Biochip Experiment memory • Need of support to validation & interpretation of results of biochip experiments
The MEAT Project MEDIANTE MEAT-Annot&Search MEAT-Miner UMLS, Gene Onto… MEAT-Onto
Order slides in order to launch a new biochip experiment Submission of journal articles on genes supposed interesting Constitution of an electronic document corpus Creation of semantic annotations on these articleswith MEAT-Annot Phases: before experiment Biologist checks & validates probes available on the biochip& selects a subset
Statistical analysis of results with MEAT-Miner Interpretation of results, using more bibliographical searches Addition of new semantic annotations on the experiment Phases: after experiment Storage of the experiment description and of its resultsin MEDIANTE, according to Array Express format
MEAT-Annot:Annotation Acquisition Tool Automatic generation of annotations from a corpus Manual annotation editor MEAT-Search CORESE Search engine BRIGENE:Annotation base Article annotation base Result annotation base • - MEAT-dedicated • Query interface • Result browsing Interface General knowledge base MEAT-Annot&Search ARRAY-EXPRESS - Experiment description - Result description
MEATAnnot: Technical Choices • NLP tools : term extractor + relation extractor • Extraction of terms corresponding to UMLS Ontology concepts, from texts • Extraction of relations between them, from texts • Automatic generation of a semantic annotation and representation in RDF
Relationship extraction Test corpus Syntex • Syntex (Bourigault D. 2000) : Corpus syntactic analyser • Used to reveal « verb syntagms » usually used in the biochip domain
Relationship extraction • Choosing potential relationship revealed by Syntex • Writing relationship extractiongrammar : using JAPE {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"}| {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"}| {Token.string == "important"}| {Token.string == "critical"}| {Token.string == "some"} | {Token.string == "unexpected"}| {Token.string=="multifaceted"} | {Token.string =="major"})? ({SpaceToken})? {Tag.lemme == "role"}
System architecture UMLS Knowledge server {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"}| {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"}| {Token.string == "important"}| {Token.string == "critical"}| {Token.string == "some"} | {Token.string == "unexpected"}| {Token.string == "multifaceted"} | {Token.string == "major"})? ({SpaceToken})? {Tag.lemme == "effects"} {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"} | {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"} | {Token.string == "important"} | {Token.string == "critical"} | {Token.string == "some"} | {Token.string == "unexpected"} | {Token.string =="multifaceted"} | {Token.string == "major"})? ({SpaceToken})? {Tag.lemme == "role"} Gate API ----- -- --- ----------- ---- ------------- RDF Annotations Biologist Documents MeatAnnot
HGF : an instance of the concept « Amino Acid, Peptide or protein » • lung development : an instance of the concept « organ or tissue function » • HGFplay rolelung development : an instance of the relation « play role » between the two terms Example « HGFplays an important role in lung development» The information extracted from this sentence are:
RDF Annotation Generated • <rdf:RDF • xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' • xmlns:m='http://www.inria.fr/acacia/meat#' • xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'> • <m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> • <m:play_role> • <m:Organ_or_Tissue_Function rdf:about='lung_ • development#'/> • </m:play_role> • </m:Amino_Acid_Peptide_or_Protein> • </rdf:RDF>
<accident> <date> 19 Mai 2000 </date> <description> <facteur>le facteur </description> </accident> Ontologies Documents XML Legacy sys. Users <ns:article rdf:about="http://intranet/articles/ecai.doc"> <ns:title>MAS and Corporate Semantic Web</ns:title> <ns:author> <ns:person rdf:about="http://intranet/employee/id109" /> </ns:author> </ns:article> <rdfs:Class rdf:ID="thing"/> <rdfs:Class rdf:ID="person"> <rdfs:subClassOf rdf:resource="#thing"/> </rdfs:Class> query answer push Schema in RDFS Annotations in RDF formed by instances of schema in RDFS RDFS RDF Queries Rules RDF/S Semantic Web server CG Support Web stack QUERIES PROJECTION RULES CG Base CORESE ONTOLOGY CG Results RDFS CG Rules INFERENCES RDF XML NAMESPACES CG Query URI UNICODE CORESE Semantic search engine
Ontology-based query Formulate queries Interface Biologists Return results Submit queries Corese load load Annotation Base UMLS
Semantic Web requirements • Adaptation of Corese semantic search engine to OWL • Corese query language vs SPARQL • Contextual annotations Need of expression of multiple contexts / viewpoints • Temporal queries on the past biochip experiment base+ temporally evolving ontologies & annotations • Scalability of NLP tools: articles stemming from scientific watch on the open (semantic) Web…
Many thanks to • ACACIA team: in particular Khaled Khelif, Laurent Alamarguy, Olivier Corby, Alain Giboin… • IPMC: Pascal Barbry, Kevin Le Brigand, Hélène, Chimène, Yves • Bayer Crop Science: Rémi Bars • Didier Bourigault (ERSS), developer of Syntex • The developers of GATE (Sheffield Univ.)
Documents (Patient record, Best practices Guide …) <dossierPatient> <date> 19 Mai 2000 </date> <donneesAdministratives> <Patient><nom>Dupont</nom> <prenom> Michel </prenom> </Patient> </donneesAdministratives> … Support to health network Medical Ontology Semantic Annotations Translator Life Line Coresesearch engine Virtual Staff Member of the health network Nautilus DB