200 likes | 363 Views
Ontology Generation Based on a User-Specified Ontology Seed. Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University. Supported by NSF. Introduction. Motivation: Traditional search engines: return documents
E N D
Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University Supported by NSF
Introduction • Motivation: • Traditional search engines: return documents • Ontology-based data extraction: return information • Problem: • Build extraction ontology that meet users needs • Goal: • Automatically build ontologies for users’ needs www.deg.byu.edu
Example • Example: a biologist is interested in information about large proteins in humans and their functions • Possible queries: • Find proteins in humans that are >20 kDa • Find all the proteins in humans that serve as receptors • ... • Information sources --- various online databases • NCBI • Gene Cards • The Gene Ontology • GPM Proteomics Database • … www.deg.byu.edu
Extraction Ontology Molecular Weight Regular Expression: ^\d{1,5}(\.\d{1,2})? Unit: kilodaltons?|kdas?|kds|?das?|daltons? www.deg.byu.edu
User Interface Select a title for the forms www.deg.byu.edu
User InterfaceBinary Relationship Protein Name Name Protein www.deg.byu.edu
Molecular Weight User InterfaceBinary Relationship Protein Name Name Protein Molecular weight www.deg.byu.edu
Chromosome number Start End Orientation Chromosome location User InterfaceN-ary Relationship Start End Orientation Chromosome number Chromosome location www.deg.byu.edu
GO User InterfaceN-ary Relationship GO phrase GO GO ID Go ID Go term www.deg.byu.edu
Protein Chromosome number Start End Orientation Name Molecular Weight Chromosome location GO Overall Form Go ID Go term www.deg.byu.edu
Ontology View Start End Orientation Chromosome number Chromosome location GO phrase Name GO Protein GO ID Molecular weight www.deg.byu.edu
Protein Chromosome number Start End Orientation Name Molecular Weight Chromosome location GO Fill in the Form Go ID Go term www.deg.byu.edu
Chromosome number Start End Orientation 1,194,558 17 minus 1,250,267 GO Protein Fill in the Form Name 14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E Molecular Weight 29175 Daltons Chromosome location Go ID Go term GO:0019899 GO:0019904 enzyme binding protein domain specific binding www.deg.byu.edu
Name 14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E Mapping www.deg.byu.edu
Name 14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E Mapping www.deg.byu.edu
Mapping Name www.deg.byu.edu
Data Frame Generation • Choose from data frame library • Data frames for basic values • Numbers within different ranges • Integers, floats, etc • Emails, phone numbers, addresses, etc • Domain specific values (DNA sequences) • Units • Build lexicon files www.deg.byu.edu
Data Frame Generation • Find the best matched data frame from the library • Find the correct units www.deg.byu.edu
Build Lexicon Files Name www.deg.byu.edu
Contribution • Automatically generates ontologies depending on users’ requests • Provides a tool for users to easily provide ontology seeds • Automatically generates ontology views from ontology seeds • Automatically map ontology concepts to source databases www.deg.byu.edu