340 likes | 442 Views
Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation. Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 B orut Peterlin, 2 MD PhD , Thomas C Rindflesch, 3 PhD 1 Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia
E N D
Semantic Relations for InterpretingDNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski,1 PhD, Andrej Kastrin,2 Borut Peterlin,2 MD PhD, Thomas C Rindflesch,3 PhD 1Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia 2Institute of Medical Genetics, University Medical Centre, Ljubljana, Slovenia3National Library of Medicine, National Institutes of Health, Bethesda, MD, U.S.A. e-mail: dimitar.hristovski@mf.uni-lj.si
Introduction Microarray experiments: • great potential to support progress in biomedical research, • results NOT EASY to interpret, • information about functions and relations of relevant genes needs to be extracted from the vast biomedical literature
Related Work • Text mining and microarray analysis • Literature-based Discovery
Proposed Solution • Computerized text analysis system • Extract semantic relations from literature • SemRep • Integrate with microarray experiments • Develop tools for: • Interpretation • Novel hypotheses generation
Overall Design Medline GEO SemRepSem.rels Extraction R Bioconductorscripts microarrays semantic relations Integrated Database=semantic relations +microarrays Interpretation & Discovery Tools
SemRep • Extracts semantic relations from biomedical text (implemented in Prolog) • Based on UMLS Metathesaurus and Semantic Network • <MetaConc> SEMNET RELATION <MetaConc> • Database of relations extracted from MEDLINE • 6.7M citations (01/01/1999 through 03/31/2009) • 43M sentences • 21M relation instances • 7M relation types 6
Semantic Relations Extracted Wide range of relations in: Clinical medicine Molecular genetics Pharmacogenomics Genetic Etiology: associated_with, predisposes, causes Substance Relations: interacts_with, inhibits, stimulates Pharmacological Effects: affects, disrupts, augments Clinical Actions: administered_to, manifestation_of, treats, Organism Characteristics: location_of, part_of, process_of Co-existence: co-exists_with 7
Examples “… the loss of Mbd1 could lead to autism-like behavioral phenotypes …” Relation: MDB1 causes Autistic Disorder “… Mbd1 can directly regulate the expression of Htr2c, one of the serotonin receptors, …” Relation: MBD1 interacts_with HTR2C 8
Interpretation of Microarrays Find known facts from the literature: • Desease related: • Associated genes • Current treatments • … • Microarray Genes: • Relations between genes (INHIBITS, STIMULATES, …) • Relations between the genes and anything else
Novel Hypotheses Generation • Based on discovery patterns • Discovery patterns: • search templates that have a higher likelihood of returning a new discovery • Specific discovery patterns for specific discovery tasks
Discovery Patterns • Inhibit the upregulated: • Search for substances, genes, ... which, according to the literature, inhibit the top N (e.g. 300) genes that are upregulated on a given microarray • Such substances, genes, … might be used to regulate the upregulated genes • Stimulate the downregulated: • Search for substances, genes, ... which, according to the literature, stimulate the top N (e.g. 300) genes that are downregulated on a given microarray • Such substances, genes, … might be used to regulate the downregulated genes
Discovery Patterns – Graphical View Maybe_Treats1? Microarray Literature Drug Z1(or substance) Inhibits Genes Y1 Upregulated Disease X Drug Z2(or substance) Stimulates Downregulated Genes Y2 Maybe_Treats2?
Results – Inhibit the Upregulated • Parkinson microarray GSE8397 • HSP27 (HSPB1) gene is upregulated on the microarray • We identified paclitaxel and quercetin as substances that inhibit the expression of this gene
Results – Stimulate the Downregulated • NR4A2 downregulated on the microarray • We found out that: • Pramipexol stimulates expression of NR4A2 • NR4A2 is associated with Parkinson disease
Evaluation • Estimate – based on [Masseroli, BMC Bioinformatics 2006]: • Extract known facts – baseline precision on 2,042 extracted relations: • Gene – Disease (causes, assoc_with, …) P=74.2% • Gene – Gene (inhibits, stimulates, …) P=41.95% • Propose Argument-Predicate distance for filtering (Gene-Gene): • At distance no more than 1: P=70.75%; R=43.6% • At distance no more than 2: P=55.88%; R=66.28% • We use Argument-Predicate distance for ranking of semantic relations and we show relations more likely to be correct first.
Conclusion • A new bioinformatics tool for interpretation and novel hypotheses generation • Based on integration of semantic relations extracted from literature with microarrays • Available at: • http://sembt.mf.uni-lj.si
Syntactic Processing Mbd1 can directly regulate the expression of Htr2c • MedPost tagger and shallow parser [ NP[head([… inputmatch(mdb1),tag(noun)])], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… inputmatch(htr2c),tag(noun)])] ] 26
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)])], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358)])] ] 27
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> 28
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> 29
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> • Apply indicator rule: Verb(regulate) INTERACTS_WITH 30
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> • Apply indicator rule: Verb(regulate) INTERACTS_WITH 31
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> • Apply indicator rule: Verb(regulate) INTERACTS_WITH • Substitute concepts for semantic types: 32
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> • Apply indicator rule: Verb(regulate) INTERACTS_WITH • Substitute concepts for semantic types: 33
Semantic Processing • Identify concepts: MetaMap and ABGene [ NP[head([… semtype(gngm),entrez(MBD1,4152)], ... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ] • Match semantic type patterns to ontology: <gngm> INTERACTS_WITH <gngm> • Apply indicator rule: Verb(regulate) INTERACTS_WITH • Substitute concepts for semantic types: MBD1 INTERACTS_WITH HTR2C 34