90 likes | 255 Views
A flexible graph-based controlled vocabulary engine. Johann Visagie <johann@egenetics.com>. Background. Implementation of a controlled vocabulary engine Basis for a more complex profiling system that will aid in the identification of disease gene candidates by integrating :
E N D
A flexible graph-basedcontrolled vocabulary engine Johann Visagie <johann@egenetics.com>
Background • Implementation of a controlled vocabulary engine • Basis for a more complex profiling system that will aid in the identification of disease gene candidates by integrating: • transcript information • standardised controlled vocabulary of expression terms • genomic sequence • genetic mapping information
Structure of the Vocabulary • Orthogonal set of hierarchical schemas (trees) • Each schema describes an expression domain, e.g.: • Anatomical site, Pathology, Development Stage, Cell Type • A tree's nodes are associated with terms describing expression states in that tree's domain • Mapped 6937 cDNA libraries (incl. dbEST, SAGE), each with one or more nodes in as many trees as possible
Graph-based implementation • 2nd iteration • Python modules implementing hierarchical data structures, based on generalised graph library • Flexible enough for future experimentation (different data structures, multiple relationship types, etc.) • All operations in-memory • Overcomes most limitations of prior implementation • Forced unique terms, limited to pure trees, speed issues, database-centricity
Query language • Parser for a simplistic Boolean query language: • pathology:cancer AND (anatomy:liver OR anatomy:stomach) • Implicit "query sets" • Tool for the power user • Each query term resolves to set of nodes in a tree (the node matching the term, and all its children), which maps to set of cDNA libraries • Note: Multiple orthogonal classification domains allow for construction of almost arbitrary query resolution
Interfaces • Python API • Under development: • SOAP v1.1 • DAS v1.5 (under investigation) • wxPython-based GUI • Curation • Query interface for users
Application • Proved its worth in a number of SANBI research projects • Components of controlled vocabulary system are in use by a number of groups
Acknowledgements • Soraya Bardien-Kruger • Alan Christoffels • Tania Hide • Winston Hide • Paul Hüsler • Janet Kelso • Damian Smedley