1.26k likes | 1.36k Views
PathCase A Web-Based Exploratory Querying and Visualization Tool for Biological Pathways. Z. Meral Özsoyoğlu Electrical Engineering and Computer Science Case Western Reserve University Cleveland, Ohio 44106. “ Digital” Biology.
E N D
PathCaseA Web-Based Exploratory Querying and Visualization Tool for Biological Pathways Z. Meral Özsoyoğlu Electrical Engineering and Computer Science Case Western Reserve University Cleveland, Ohio 44106
“Digital” Biology Biology and Life Sciences have become increasingly “data rich” over the past decade Rapid growth of biological data, (distributed, heterogeneous) is due to: • investments on public and private resources, • significant advances in data generation, storage, analysis, web-based availability, and sharing technologies, • emerging large-scale biological data gathering technologies
More growth in amount and diversity Huge investments in developing large biological information resources, assembling this information in public databases. Many such resources , and tools are available, NCBI’s Genbank, PubMed, Blast, MGI’s tools and databases, Physiome, etc. Continued explosive growth in the amount and diversity of biological and biochemical data is expected in the next century.
medical informatics and physiological data also very large, diverse, non-standard, and distributed
Biological Data Challenges • in addition to being large, diverse and distributed, three important characteristics: • Complexity, • Heterogeneity, and • Evolution of both data and the schema.
Biological Data is Complex Very rich in metadata, requires metadata management techniques. Large, temporal, and historical, requires special knowledge warehouse design and management techniques. It has inherently deeplynested hierarchical structures (e.g., ontologies), or best modeled as graph structures at the conceptual level (e.g., metabolic pathways, or signaling pathways).
Biological Data is Heterogeneous in the sense that it involves a wide array of data types, including text, image, sequence data, as well as streaming data (e.g., medical sensors data), temporal data, and incomplete and missing data. Also, heterogeneous sources and formats
Biological Data is very Dynamic Data management techniques effectively handle the dynamic data content. But dynamic schema evolution poses challenges for data management applications and the software tools are based on the schema and need to be updated and changed for the evolving schema accordingly.
Research using off-the-shelf data management software tools will not be sufficient for the data management needs of “digital biology”. integration of the existing technologies for biological data, and development new data management techniques are needed. NIH BISTI workshop on Digital Biology http://www.bisti.nih.gov/2003meeting/
PathCase: Case Pathways DataBase System integrated software tool for • storing, • visualizing, • querying, • analyzing, biological pathways at different levels of genetic, molecular, biochemical and organismal detail. http://nashua.case.edu/pathways
Data Model Graph structured database (hypergraph) nodes: substrates and products hyper edges: processes (reactions) represented using a relational database Querying and Visualization based on the graph conceptual view.
Other systems and resources Reactome Kegg Pathway tools Patika Cytoscape BioCarta and others.
Data Model Pathway:interconnected arrangements of processes. (representing functional role of genes in the genome) Process: a reaction (or step) in a pathway involving one genetically unique gene product. (substrates, products, co-factors, inhibitors, activators, of a reaction are all molecular entities in this perspective). Molecular Entity : the general name given to any entity participating in a process, such as a basic molecule, protein, enzyme, gene, amino acid
PathCase usage statistics: hits from 62 countries. User statistics
Database content Metabolic Pathways (39) 37 from [Michal, G. Biochemical Pathways, John Wiley & Sons Inc., 1999] 2 (Folate and Homocystine) for human and mouse by Joe Nadeou and Toshimori Kitami 876 processes (for different organisms) Organisms: Human, mouse, animals, prokarya, plants & yeasts, unspecified
S S S e e e r r r v v v e e e r r r C C C l l l i i i e e e n n n t t t D D D a a a t t t a a a b b b a a a s s s e e e R R R i i i c c c h h h C C C l l l i i i e e e n n n t t t W W W e e e b b b S S S e e e r r r v v v i i i c c c e e e W W W i i i n n n d d d o o o w w w s s s U U U s s s e e e r r r D D D a a a t t t a a a O O O b b b j j j e e e c c c t t t C C C l l l a a a s s s s s s e e e s s s I I I n n n t t t e e e r r r f f f a a a c c c e e e S S S O O O A A A P P P O O O b b b j j j e e e c c c t t t A A A c c c c c c e e e s s s s s s X X X M M M L L L I I I n n n t t t e e e r r r f f f a a a c c c e e e s s s f f f o o o r r r A A A c c c c c c e e e s s s s s s / / / E E E d d d i i i t t t G G G r r r a a a p p p h h h W W W i i i n n n d d d o o o w w w s s s B B B a a a s s s i i i c c c Q Q Q u u u e e e r r r i i i e e e s s s C C C o o o n n n t t t r r r o o o l l l S S Q Q L L Q Q u u e e r r i i e e s s A A A d d d v v v a a a n n n c c c e e e d d d Q Q Q u u u e e e r r r i i i e e e s s s X X X M M M L L L G G G r r r a a a p p p h h h A A A c c c c c c e e e s s s s s s G G G r r r a a a p p p h h h W W W e e e b b b B B B r r r o o o w w w s s s e e e r r r X X X M M M L L L G G r r a a p p h h i i n n g g / / L L a a y y o o u u t t W W W e e e b b b S S S i i i t t t e e e G G G r r r a a a p p p h h h A A A p p p p p p l l l e e e t t t G G r r a a p p h h G G e e n n e e r r a a t t i i o o n n H H H T T T M M M L L L U U U s s s e e e r r r I I I n n n t t t e e e r r r f f f a a a c c c e e e H H H T T T M M M L L L D D D i i i s s s p p p l l l a a a y y y H H H T T T M M M L L L G G r r a a p p h h C C a a c c h h i i n n g g D D D o o o c c c Web-based Pathways Query and Visualization sub-system Server Client Architecture
Exploratory Querying and Visualization • Viewing whole network of pathways • Viewing in multiple levels of abstraction • Querying specific properties of any pathway component in any level of granularity • Path queries • Neighborhood queries • Different forms of queries & displaying outputs - textual -- graphical queries - built-in -- parametrized - tabular – graphical query outputs - advanced query interface
Calls the query interface for finding the paths between two molecular entities
Query interface for “Find paths between two molecular entities” query
Visualizing Pathways Connected pathways are displayed.