1 / 1

Existing Life Science Links

OMIM. Locuslink. UniProt. BRENDA. LCOMPOUND. OMIM. Has disease-related mutations. Is translated to. Locuslink. Is a substrate. UniProt. BRENDA. Is an enzyme. LCOMPOUND.

alyn
Download Presentation

Existing Life Science Links

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OMIM Locuslink UniProt BRENDA LCOMPOUND OMIM Has disease-related mutations Is translated to Locuslink Is a substrate UniProt BRENDA Is an enzyme LCOMPOUND Labeling and Enhancing Life Science LinksS. Heymann*, F. Naumann*, L. Raschid+, P. Rieger** Humboldt Universität zu Berlin + University of Maryland Existing Life Science Links Why Enrich Links? How to Enrich Links We propose to enrich the current link implementation, so as to support more meaningful queries over enh-links. Enrichment should include semantic labels descriptors (matching an appropriate ontology), and a more precise identification of the link's source and target elements (within a data entry). One can then traverse paths and perform a comparison of paths that is meaningful to the biologist. An abundance of Web-accessible life sciences data sources contain data about scientific entities such as genes, sequences, proteins and citations. The sources are diverse in content and computational capability, they are richly interconnected to each other, and they have varying levels of overlap. The scientific exploration of relationships between objects involves the traversal of links and paths (concatenation of links). Existing links are poor with respect to both syntax and semantics. Links are syntactically poor since the origin and the target of the link are specified only at the level of the database entry or object. Links are semantically poor since they carry no explicit meaning. enh-links: Links enhanced with Label, Origin of Link and Target of Link. • Model and query language for Labeled Life Science Links • A data model to capture enriched link semantics will include: • LT: A set of link types • LS: A set of published links implemented in the sources • LL: Pairs of link types that represent a meaninful link concatenation • LE: Link and path equivalencies • Tools that support semi-automatic annotation and enrichment of existing links. • A query language for a scientist to exploit LT, LS and LL in expressing navigational queries. • A scientist friendly interface • To specify properties LS, LL and LE. • To rank the paths that satisfy some query. Existing Links: No Link Labels Links Enhanced with Link Labels Four NCBI data sources (red arrows) being nodes in a staedily growing convolute of coarsely cross-referenced primary and secondary Life Science data compilations (1). Interactive Navigation Aid: GeneViator Tool - available upon request (2). • Links are added for various reasons: • Represents the result of an experiment protocol to test a hypothesis. • Data curators may add links following domain specific conventions. • A link may have been predicted by some software. • Biologists can usually infer the meaning of a link but search engines and mediators cannot. Current links cannot capture or differentiate these desirable properties. Mapping from logical classes/categories to physical Web accessible collections References: (1) T. Etzold, A. Ulyanov, P. Argos: SRS: Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266: 114-128, 1996. (2) S. Heymann, K. Tham, A. Kilian, G. Wegner, P. Rieger, D. Merkel, J.C. Freytag: Viator - A Tool Family for Graphical Networking and Data View Creation 28th International Conference on Very Large Data Bases 2002 Hong Kong, Proceedings pp. 1067-1070 Primary repository of sequences >> GenBank, EMBL, DDBJ Annotated genome data >> ENSEMBL Hand curated protein sequences >> UniProt (=SwissProt  PIR) Hand curated hereditary diseases >> OMIM Frames of reference >> GO, Taxonomy, HSAGENES ... >> ... Acknowledgements: This research is partially supported by NSF Grants IIS0219909 and EIA0130422 (LR), and by DFG Grant FR1142/1-3 (SH).

More Related