260 likes | 398 Views
Legend. this site is under selection, Pr(w>1) >=0.95 At the left corner: the aligners for whose alignment of the gene at least 1 site was inferred to be under selection, Pr(w>1) >= 0.95 (12 species alignments only)
E N D
Legend this site is under selection, Pr(w>1) >=0.95 At the left corner: the aligners for whose alignment of the gene at least 1 site was inferred to be under selection, Pr(w>1) >= 0.95 (12 species alignments only) Sites inferred to be under selection with Pr(w>1)>0.5 based on PAML’s BEB analysis (Clark et al. 2007). When more than 1 alignment is shown for the gene, the star indicates for which alignment are the shown sites (for example TC – T-Coffee). The reported probabilities are based on correspondingly BEB or NEB analysis (melanogaster group alignments only; all 12 species alignments are based on BEB analysis) Amap TC BEB NEB
12 species alignments examples
Working definitions “Correct” the codons of at least 1 of the inferred sites under selection (Pr(w>1)>0.95) are most likely correctly aligned “Misaligned” there is no inferred positively selected site where the codons are most likely correctly aligned
Amap [FBgn0036058] Correct Clustal Muscle Probcons T-Coffee T-Coffee All 5 aligners produced the same alignment in the shown region, and in all cases this site was inferred to be under selection
Amap [FBgn0040696] Misaligned: end of CDS problems Clustal Muscle Probcons T-Coffee T-Coffee Muscle TC This gene has at least 1 site with Pr(w>1)=1 for all 5 aligners, all in this same region
Amap [FBgn0022960] Misaligned: start of CDS problems + gross misalignment Clustal Muscle Probcons T-Coffee T-Coffee Amap This gene has at least 1 site with Pr(w>1)>0.99 for all 5 aligners, all in the shown region The underlined sequences are almost 100% identical, however T-Coffee did not align them correctly A
[FBgn0031478] Misaligned: fast evolving region Muscle Probcons T-Coffee Clustal Probcons For Clustal, in this region all Pr(w>1)<0.5 except for the site corresponding to Probcon’s 303 K (with Pr 0.53) P
[FBgn0034434] Misaligned: “repeats” (H, Q) Muscle Probcons Muscle Clustal M For Clustal, on this site Pr(w>1)<0.5.
[FBgn0002932] Misaligned: 2 different transcripts Clustal Muscle T-Coffee Clustal Probcons C
[FBgn0004380] Misaligned T-Coffee T-Coffee Muscle T Only the alignment with T-Coffee has Pr(w>1) > 0.95.
[FBgn0039025] Misaligned: indels and “repeats” Muscle Probcons T-Coffee Muscle Clustal M [The shown region is followed by a very conserved 200 aa sequence.]
[FBgn0037580] Misaligned Probcons Probcons There is no reason why the R at position 40, D pseudoobscura , should be before and not after the gap. Exactly the same column but without R in the Amap alignment resulted in Pr(w>1)=0.89.
melanogaster group examples
Working definitions “Correct” the codons “causing” positive selection are most likely correctly aligned “Misaligned” the codons “causing” positive selection are likely incorrectly aligned “Significant?” partial misalignments, which are likely to significantly affect the statistical significance of the PAML LRT/FDR results
[FBgn0033942] Correct T-Coffee 1 NEB All 4 BEB analysis sites have well aligned codons [1] example with 1 of the 4 sites 1 BEB 1
[FBgn0031155] (likely) Correct 1 T-Coffee BEB BEB 2 1 2 • [1], [2]: 2 well aligned sites • BEB analysis • total sites in the w>1 category is 122 (28% of all sites) • At least 6 of the ones with Pr > 0.9 are well aligned; at least 4 are not • [no sites with Pr >= 0.95]
[FBgn0032627, part 1]Misaligned: not due to lack of information 2 T-Coffee 1 Amap 1 BEB T • [1] region that is misaligned with T-Coffee, but not AMAP • [2] the start codon aligned with a non-start codon is selection (Pr 0.943) • start/end problems seem common, mel sequence is often but not always missing 2
[FBgn0032627, part 2]Misaligned, with an attempt to mask T-Coffee 1 Amap 1 T [1] Is a not well masked region X – masked sites BEB T BEB
[FBgn0025815]Misaligned: fast evolving region T-Coffee T-Coffee Unreliable at the codon level, though clearly the region is evolving faster than the rest of the gene BEB
[FBgn0036686] Misaligned: Repeats T-Coffee Probcons NEB • T-Coffee compared to Probcons: highest BEB Pr(w>1) with Probcons is only 0.6 T P BEB
[FBgn0036195 ]Misaligned: alternative splicing and/or annotation and/or non amino acid level polymorphism T-Coffee 1 BEB 1 • [1] These 2 sites (RR) are the only fast evolving sites in a very well conserved gene • Ncbi search on the left: • the sequence observed in dmel, ending with RR, can also be found in dana ending with RR too • dsec, dsim and dyak get similar hits 1
[FBgn0050166 : Part 1 of 2 ] Misaligned: end of CDS issues T-Coffee 1 BEB [1] This region accounts for 35 sites with Pr(w>1)>0.99 BEB
[FBgn0050166: Part 2 of 2 ]Misaligned: different sequence in dana + indel T-Coffee T-Coffee BEB
[FBgn0030998]Significant? 1 [1] Well aligned (site 614) [2] Likely misaligned (site 25*, 17, 19) The remaining not shown sites have dubious alignments (and lower Pr(w>1)). There are a total of only 10 sites in the w > 1 class T-Coffee T-Coffee 2 2 2 NEB BEB 2
[FBgn0034295]Significant? . 1 1 1 1 T-Coffee T-Coffee • [1] Sites with good alignments (2, 48, 49, 81*, 82 – all with BEB P(w>1) > 0.9) • [2] Simple repeats region, in dana the repeat is different 2 NEB BEB 1 2
[FBgn0033607] Misaligned: alternative splicing T-Coffee BEB This is the end of the CDS. This is the only site in the w>1 PAML M8 class in the gene, as well as the only site with Pr(w>1) after NEB analysis (with Pr(w>1)=1.000).