110 likes | 261 Views
Screen gene markers for different taxonomic groups. Dongying Wu Eugene, 06.24.2010. Build gene families for a phylum (blastp and MCL clustering). Build phylogenetic trees for all the families. Automatically parsing the trees for clades to identify sub-families: High universality
E N D
Screen gene markers for different taxonomic groups Dongying Wu Eugene, 06.24.2010
Build gene families for a phylum (blastp and MCL clustering) Build phylogenetic trees for all the families • Automatically parsing the trees for clades to identify sub-families: • High universality • Evenly distributed across the sampled genomes • Distinct (can be separated by hmmbuild/hmmsearch)
Ni: the number of the gene family members from the genome i; Nm: the medium of Ni for all the genomes with the family Ng: the number of genomes with the family Number of Genomes Covered by the Family Universality = 100 x Total Number of Genomes
Make sure the marker candidates still belong to the same clade in trees with homologous sequences from other phyla Phylum specific hmm Hmmsearch against genes from other phyla Pick the top 300 hits, and build a tree with the marker family members Are the marker family members monophyletic?
Monophyletic Value = 100 x Shannon Entropy Monophyletic Analysis A list of taxa that are assumed to be monophyletic can be divided into separate clades A monophyletic value is designed to estimate if given list of taxa are monophyletic or not quantitatively
Keep only the families with: Universality * Evenness * monophyly >= 90*90*90
209 families are identified to be: For at least 5 taxonomic groups Universality * Evenness * monophyly >= 90*90*90 PMPROK00023: ribosome recycling factor LIST:ARCH UNIVERSALITY:NA EVENNESS:NA MONOPHYLY:NA LIST:BACT UNIVERSALITY:99.67 EVENNESS:98.68 MONOPHYLY:NA LIST:ACTINO UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:78.84 LIST:BARIO UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:59.78 LIST:CHLAM UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:100.00 LIST:CHLOFL UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:60.37 LIST:CYANO UNIVERSALITY:100.00 EVENNESS:81.04 MONOPHYLY:100.00 LIST:FIRM UNIVERSALITY:99.06 EVENNESS:100.00 MONOPHYLY:85.98 LIST:SPIRO UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:53.75 LIST:THERMI UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:100.00 LIST:THERMO UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:100.00 LIST:PROTEO UNIVERSALITY:99.69 EVENNESS:100.00 MONOPHYLY:44.61 LIST:ALPHA UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:63.18 LIST:BETAGAMMA UNIVERSALITY:99.45 EVENNESS:100.00 MONOPHYLY:97.47 LIST:BETA UNIVERSALITY:98.21 EVENNESS:100.00 MONOPHYLY:100.00 LIST:GAMMA UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:79.67 LIST:DELTA UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:88.17 LIST:EPSI UNIVERSALITY:100.00 EVENNESS:100.00 MONOPHYLY:100.00
PHYML tree of 1043 Bacterial/Archaeal Genomes with15 Concatenated markers ribosomal protein S5ribosomal protein L6ribosomal protein L1ribosomal protein L22ribosomal protein S2ribosomal protein L11ribosomal protein L4/L1eribosomal protein L2ribosomal protein S9ribosomal protein L5ribosomal protein S7translation initiation factor IF-2ribosomal protein L16phenylalanyl-tRNA synthetase, beta subunitphenylalanyl-tRNA synthetase, alpha subunit