940 likes | 1.08k Views
PathCase A Web-Based Exploratory Querying and Visualization Tool for Biological Pathways. Z. Meral Özsoyoğlu Case Western Reserve University Cleveland, Ohio 44106. “ Digital” Biology. Biology and Life Sciences have become increasingly “data rich” over the past decade
E N D
PathCaseA Web-Based Exploratory Querying and Visualization Tool for Biological Pathways Z. Meral Özsoyoğlu Case Western Reserve University Cleveland, Ohio 44106
“Digital” Biology • Biology and Life Sciences have become increasingly “data rich” over the past decade • Rapid growth of biological data, (distributed, heterogeneous) due to: • investments on public and private resources, • significant advances in data generation, storage, analysis, web-based availability, and sharing technologies, • emerging large-scale biological data gathering technologies
More growth in amount and diversity • Huge investments in • developing large biological information resources, • assembling this information in public databases. • Many such resources , and tools are available, • NCBI’s Genbank, PubMed, Blast, MGI’s tools and databases, etc. • Continued explosive growth in the amount and diversity of biological and biochemical data is expected in the next century.
medical informatics and physiological data • Also very large, diverse, non-standard, and distributed
Biological Data Challenges • In addition to being large,diverse and distributed, three important characteristics: • Complexity, • Heterogeneity, and • Evolution of both data and the schema.
Biological Data is Complex • Very rich in metadata, requires metadata management techniques. • Large, temporal, and historical, requires special knowledge warehouse design and management techniques. • It has inherently deeply-nested hierarchical structures (e.g., ontologies), or best modeled as graph structures at the conceptual level (e.g., metabolic pathways, or signaling pathways).
Biological Data is Heterogeneous • in the sense that it involves a wide array of data types, including text, image, sequence data, as well as streaming data (e.g., medical sensors data), temporal data, and incomplete and missing data. • Also, heterogeneous sources and formats
Biological Data is very Dynamic • Data management techniques effectively handle the dynamic data content. • But dynamic schema evolution poses challenges for data management • applications and the software tools are based on the schema and need to be updated and changed for the evolving schema accordingly.
Research • Using off-the-shelf data management software tools will not be sufficient for the data management needs of “digital biology”. • Integration of the existing technologies for biological data, and development new data management techniques are needed. • NIH BISTI workshop on Digital Biology • http://www.bisti.nih.gov/2003meeting/
PathCase: Case Pathways DataBase System • integrated software tool for • storing, • visualizing, • querying, • Analyzing, • biological pathways at different levels of genetic, molecular, biochemical and organismal detail. • http://nashua.case.edu/pathways
Data Model • Graph structured database (hypergraph) • nodes: substrates and products • hyper edges: processes (reactions) • represented using a relational database • Querying and Visualization • based on the graph conceptual view.
Other systems and resources • Reactome • Kegg • BioCyc & Pathway tools • Patika • Cytoscape • BioCarta • and others.
Data Model • Pathway:interconnected arrangements of processes. • (representing functional role of genes in the genome) • Process: a reaction (or step) in a pathway involving one genetically unique gene product. • (substrates, products, co-factors, inhibitors, activators, of a reaction are all molecular entities in this perspective). • Molecular Entity : the general name given to any entity participating in a process, such as a basic molecule, protein, enzyme, gene, amino acid
PathCase usage statistics: hits from 62 countries. User statistics
Database content • Metabolic Pathways (39) • 37 from [Michal, G. Biochemical Pathways, John Wiley & Sons Inc., 1999] 2 (Folate and Homocystine) for human and mouse • by Joe Nadeou and Toshimori Kitami • 876 processes (for different organisms) • Organisms: • Human, mouse, animals, prokarya, plants & yeasts, unspecified
S S S e e e r r r v v v e e e r r r C C C l l l i i i e e e n n n t t t D D D a a a t t t a a a b b b a a a s s s e e e R R R i i i c c c h h h C C C l l l i i i e e e n n n t t t W W W e e e b b b S S S e e e r r r v v v i i i c c c e e e W W W i i i n n n d d d o o o w w w s s s U U U s s s e e e r r r D D D a a a t t t a a a O O O b b b j j j e e e c c c t t t C C C l l l a a a s s s s s s e e e s s s I I I n n n t t t e e e r r r f f f a a a c c c e e e S S S O O O A A A P P P O O O b b b j j j e e e c c c t t t A A A c c c c c c e e e s s s s s s X X X M M M L L L I I I n n n t t t e e e r r r f f f a a a c c c e e e s s s f f f o o o r r r A A A c c c c c c e e e s s s s s s / / / E E E d d d i i i t t t G G G r r r a a a p p p h h h W W W i i i n n n d d d o o o w w w s s s B B B a a a s s s i i i c c c Q Q Q u u u e e e r r r i i i e e e s s s C C C o o o n n n t t t r r r o o o l l l S S Q Q L L Q Q u u e e r r i i e e s s A A A d d d v v v a a a n n n c c c e e e d d d Q Q Q u u u e e e r r r i i i e e e s s s X X X M M M L L L G G G r r r a a a p p p h h h A A A c c c c c c e e e s s s s s s G G G r r r a a a p p p h h h W W W e e e b b b B B B r r r o o o w w w s s s e e e r r r X X X M M M L L L G G r r a a p p h h i i n n g g / / L L a a y y o o u u t t W W W e e e b b b S S S i i i t t t e e e G G G r r r a a a p p p h h h A A A p p p p p p l l l e e e t t t G G r r a a p p h h G G e e n n e e r r a a t t i i o o n n H H H T T T M M M L L L U U U s s s e e e r r r I I I n n n t t t e e e r r r f f f a a a c c c e e e H H H T T T M M M L L L D D D i i i s s s p p p l l l a a a y y y H H H T T T M M M L L L G G r r a a p p h h C C a a c c h h i i n n g g D D D o o o c c c Web-based Pathways Query and Visualization sub-system Server Client Architecture
Exploratory Querying and Visualization • Viewing whole network of pathways • Viewing in multiple levels of abstraction • Querying specific properties of any pathway component in any level of granularity • Path queries • Neighborhood queries • Different forms of queries & displaying outputs • - textual -- graphical queries • - built-in -- parametrized • - tabular – graphical query outputs • - advanced query interface
Calls the query interface for finding the paths between two molecular entities
Query interface for “Find paths between two molecular entities” query