160 likes | 372 Views
Tree Pattern Matching in Phylogenetic Trees. Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard, Laurent Duret, Simon Penel, Manolo Gouy, François Rechenmann, and Guy Perrière. Presented by: Jean Yeh. Background Information.
E N D
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard, Laurent Duret, Simon Penel, Manolo Gouy, François Rechenmann, and Guy Perrière Presented by: Jean Yeh
Background Information • The authors have created three databases that gather genes into homologous families • HOVERGEN – vertebrates • HOBACGEN – prokaryotes • HOGENOM – completely sequenced organisms • Among homologous genes, need to be able to differentiate orthologs from paralogs
Homologous Sequences • Homologs: Two genes related by descent from a common ancestral DNA sequence • Orthologs: Two genes in different species; evolved from a single ancestral gene by speciation • Paralogs: Two genes related by duplication within a genome
Orthologs and Paralogs http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif
Gene Function • Gene function tends to change after gene duplication • Orthologs are more reliable predictors of gene function than paralogs • Evolutionary distance also plays a role • Closely related paralogs probably more similar than distantly related orthologs
Goal • Create algorithms that allow for automatic searching for orthologs or paralogs in their databases • One algorithm for tree reconciliation • One algorithm for tree pattern matching • Implement under architecture used to query the databases
Tree Reconciliation • Infers speciation and duplication events • Compares gene tree G with species tree S to give a reconciled tree R • Algorithm: • R = S • Step through G and R simultaneously • If nodes are incongruent, insert duplication node in R and annotate gene losses
Tree Pattern Matching • A tree pattern is a peculiar tree structure with taxonomic and evolutionary parameters contained in nodes and leaves • Can be considered a subtree • Want to match to a target tree • E.g. pattern (X, Y, Z) matches ((X, Y), Z), (X, (Y, Z)), and ((X, Z), Y)
Tree Pattern Matching • Uses a recurrence algorithm that takes into account different taxonomic levels as well as the specific branch constraints • Cuts down on run time by checking the number of leaves in the pattern and the target tree • Allows users to search for orthologs/paralogs
FamFetch Interface • User interface to access the databases • Incorporates both algorithms • Pattern editor has two frames: tool and pattern • Pattern frame – interactive editor to construct, load, save, and match patterns with a tree database • Tool frame – tools used in pattern frame
Tree Rooting • For tree reconciliation, the trees must be rooted • Authors use their reconciliation algorithm to find the most parsimonious solution – the one that requires the least number of gene duplications • Reconciliation algorithm relatively fast
Tree Pattern Search • By forming their algorithm as a tree pattern search, the authors managed to increase possible queries for the users • Can search for gene duplication or gene speciation events, not just orthologs and paralogs • Also relatively fast algorithm, though lose the human flexibility of pattern matching
Automatic Search for Orthologs • Previously done with pairwise BLAST searches and reciprocal hits • Need all genes and if genes are wrong, results may be wrong • Classifying genes into clusters of orthologs depends on evolutionary distance between species
Possible Improvement • Have program estimate reliability of reconciliation • While it allows for easier comparative sequence analysis, it was designed solely for databases the authors had already created • Might be improved if it could be generalized for more databases