210 likes | 330 Views
PhyQL: A Phylogenetic Visual Query Engine. Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil Integration Informatics Laboratory, Computer Science, Wayne State University Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh BIBM 2008.
E N D
PhyQL: A Phylogenetic Visual Query Engine Shahriyar Hossain, Munirul Islam, Jesmin, Hasan M Jamil Integration Informatics Laboratory, Computer Science, Wayne State University Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh BIBM 2008 Integration Informatics Research Group
What is a Phylogenetic Tree? Integration Informatics Research Group
Queries: • Least Common Ancestor <root> <node>rayfinned fish</node> <inode> <node>lungfish</node> <inode> <inode> <node>salamanders</node> <node>frogs</node> </inode> . . . </inode> </inode> </root> for $root in doc(“tree.xml")//root return <span> <h1> { $root/node/text() } </h1> </span> Integration Informatics Research Group
Phylogenetic Query Language: Select: select a subset of trees that match a given criteria Join: Join two trees based on a pair of nodes Subset: Subset queries retrieve part of a given tree Integration Informatics Research Group
Tree Join Using Path Operators SubTree Projection Integration Informatics Research Group
PhyQL: Visual Query Interface SELECT JOIN User SUBTREE Translator DB XML /NEXUS From User / Interoperable Databases Wrappers XSB Integration Informatics Research Group
Why XSB? • eliminates left recursion problem Path(X,Z) :- Path(X,Y), Edge(Y,Z) • Stores intermediate results (by tabling method) • Model-based (order of writing rules doesn’t matter) Path(X,Y) :- edge(X,Y) Path(X,Y) :- Path(X,Y), edge(Y,Z) • its in-memory database queries are an order of magnitude faster than methods such as tuProlog. :- odbc_import(conn, 'tbl_treeinfo'(‘rootId', ‘author'), tree). :- odbc_import(conn, 'tbl_nodeinfo'('nodeId', 'nodename'), node). :- odbc_import(conn, 'tbl_edge'('parentId', 'childId'), edge). Integration Informatics Research Group
<tree author="stern"> <node type=“*"> <node type=“?"> <node> Stanhopea_gibbosa </node> <node> Stanhopea_vasquezii </node> </node> <node> Stanhopea_shuttleworthii </node> </node> </tree> node(Y1, ‘Stanhopea_shuttleworthii’), node(Y2, ‘Stanhopea_gibbosa’), node(Y3, ‘Stanhopea_vasquezii), edge(Y4,Y2), edge(Y4,Y3), lca(Y0,Y4,Y1), edge(Y0,Y1) Integration Informatics Research Group
Integration Informatics Research Group Integration Informatics Research Group
Summary • PhyQL offers a simple web-based visual query interface • Logic based tree query operations • Modifications to query tools only requires change in logic rules • Proposed architecture can also applied to protein-protein interaction networks, metabolic pathways etc. Future Work: • Database Interoperability – allow retrievingintegrate phylogenetic data during query submission • ReQuery – query on the result set • Tree Similarity Estimation
Thank You! me: http://homopan.wayne.edu/PhD Students/Munirul Islam/index.htm Integration Informatics Research Group
Uses of Phylogenetic Trees: • date events of divergence of species • what is the most common ancestor of all living species? • identify geographic origins of new disease outbreaks Integration Informatics Research Group
Crimson • Uses nested subtrees to avoid long strings • Zheng, Y. S. Fisher, S. Cohen, S. Guo, J. Kim, and S. B. Davidson. 2006. Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms. 32nd International Conference on Very Large Data Bases, ACM, pp. 1231-1234.
0.1.1 0.1.2 0.2.1.1 0.2.1.2 0.2.2 A B C D E 0.1 0.2.1 0.2 0 Dewey system: Integration Informatics Research Group
A B C D E Find clade for: Z = (<CS+Ds) Find common pattern starting from left SELECT * FROM nodes WHERE (path LIKE “0.2.1%”); Integration Informatics Research Group
A B C D E 3 4 5 6 11 12 13 15 16 10 14 7 2 9 8 17 1 18 Depth-first traversal scoring each node with a left and right ID Integration Informatics Research Group
A B C D E 3 4 5 6 10 11 12 13 15 16 14 2 7 9 8 17 1 18 Minimum Spanning Clade of Node 5 SELECT * FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Integration Informatics Research Group