230 likes | 379 Views
Data Integration and Extraction over Molecular Biological Data . Cui Tao. supported by NSF. Motivation. Online biological data: Highly diverse in granularity and variety Various formats Different terminologies, ID systems, units. How to Build a Gene Extraction Ontology?. Concepts
E N D
Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF
Motivation • Online biological data: • Highly diverse in granularity and variety • Various formats • Different terminologies, ID systems, units
How to Build a Gene Extraction Ontology? • Concepts • Relationship sets • Constraints • Data Frames
(G*A*T*C*)* (G*A*U*C*)* How to Build a Gene Extraction Ontology?
Knowledge Sources • Gene Ontology • Thousands of terms • All Species Toolkit • 1,231,935 species names • Protein Databases • Thousands of protein names (Molecular Function, Biological Process, Cellular Component)
Extraction Rules • Statistical NLP • Machine learning • Naïve Bayes • Hidden Markov Models • Decision Trees
Integration • Information Hidden behind Links
Query-based Extraction • Query the gene extraction ontology • Find applicable resources • Fill out forms • Extract information
Gene Sequence Mutant Gene Name Gene Mutant Function Protein Function Query-based Extraction Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”
Contribution • Provides a way to automatically integrate online biological data from different sources • Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query