240 likes | 363 Views
Infrastructure for Peer-Based Knowledge Sharing. Peter Mork University of Washington, Seattle 21-Sep-14. Motivating Example. Microarray Experiment. Information from public databases. ??. ICAT Experiment. Outline. Integration Systems From Data to Knowledge (Metadata)
E N D
Infrastructure for Peer-Based Knowledge Sharing Peter MorkUniversity of Washington, Seattle 21-Sep-14
Motivating Example Microarray Experiment Information from public databases ?? ICAT Experiment
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Overview of Integration Systems + Schema+ Mappings + Annotations Source API
OMIM HUGO Swiss- Prot GO Gene- Clinics Locus- Link Entrez GEO Mediated Schema Entity Sequenceable Entity Structured Vocabulary Experiment Phenotype Gene Nucleotide Sequence Microarray Experiment Protein
BioMediator Maintenance: Push, Limited Journal Pull Validation: Internal Creation: Human Phenotype Maintenance: Push, Yearly Expert Review Validation: External Creation: Human Maintenance: Push Validation: None Creation: Human, Algorithm OMIM Gene- Clinics Entrez
Demo • Start with 6 Proteins and 6 Sequences • Find simple correspondences • Find biologically relevant clusters
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Necessary Metadata • Class Hierarchy • Concepts (e.g., Protein, Gene) • Property Hierarchy • Relationships (e.g., codes-for, causes) • Mappings • Source schema Mediated schema • Mapping Annotations • Information about maintenance and authority
Schema 3 Entity Schema 1 Schema 2 Sequenceable Entity Structured Vocabulary Experiment Phenotype Gene Nucleotide Sequence Microarray Experiment Protein OMIM HUGO Swiss- Prot GO Gene- Clinics Locus- Link Entrez GEO
Centralized Metadata Mgmt Entity Gene- Clinics Sequenceable Entity Phenotype Gene OMIM Nucleotide Sequence Entrez Protein Locus- Link
Declarative Peer Metadata Mgmt GeneClinics: Phenotype Gene Protein OMIM: Record Q3 Q2 Gene Record Entrez: Protein Nucleotide Seq. LocusLink: Phenotype Gene Protein Equivalent Q1
Superset Descriptive Peer Metadata Mgmt OMIM_Record = Phenotype ⊔ Gene Domain(AssociatedWith) = NucleotideSequence ⊔ Gene ⊓
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Experimental Setup • Centralized BioMediator = Gold Standard • Mapping Languages • PPL: Declarative • OWL: Descriptive • Peer Architectures • Complete • Minimal
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Conclusions • More sources accessible • More power per mapping • Additional ‘redundant’ mappings provide little benefit • Less work maintaining mappings • Hidden cost: Logical mappings harder to write correctly • May interact in unforeseen ways
Acknowledgements • Funding • NLM training grant T15LM07442 • NHGRI grant R01HG02288 • BioMediator Team • Advisors • Alon Halevy • Peter Tarczy-Hornoch • Wendy Kramer (grant administrator)