180 likes | 359 Views
Nonparametric estimation of phylogenetic tree distributions. Ruriko Yoshida. Finding outlier gene trees via Kernel density estimation.
E N D
Nonparametric estimation of phylogenetic tree distributions Ruriko Yoshida
Finding outlier gene trees via Kernel density estimation • Here outlier gene tree is a gene tree with such events in genome evolution as gene duplications, lateral gene transfer between species, retention of ancestral polymorphisms by balancing selection, or accelerated evolution by neofunctionalization. • Using the estimated density over the tree space we say trees with small probability as outliers. • Choice of distances: path dierence dP, quartet distance dQ, Robinson-Foulds distance (or splits distance) dS, and matching splits distance dM.
Goals • τ denotes all of tree space on n taxa (either with or without branch lengths) • Given tree estimates T = {t1, . . . , tn} for n genes across the genome • Problem: Estimate distribution f from which “most” trees in T were sampled • Identify outliers in the distribution i.e., Estimate distribution f and a subset Tout subset in T, assuming T - Tout was sampled from f
Kernel methods • Regard trees as points in space, t Φ(t) in RD for some D (possiblyinfinite) • Kernel is denoted K(t1, t2) which is the inner product < Φ(t1), Φ(t2)> • Sometimes for statistics applications we assume integration of K(t1, t2) over t2 = 1. We won’t assume this here • In kernel methods we work with K and T, which implicitly means linear computations with Φ(T) in RD
Variations • Kernel • Uniform • Gaussian • Epanechnikov • … • Bandwidth • Fixed to every data • Variable according to data pattern
Fairy wren data set • There are four species: Red-backed fairy wren (RBFW); White-winged fairy wren (WWFW); Splendid Fairy Wren (SFW); and Variegated Fairy Wren (VFW). • Each species has up to four alleles (1a, 1b, 2a, 2b; the number indicates the individual, with alleles a and b). The complete genes have 16 sequences – 4 species, 4 allleles per species. • total of 39 genes.
Questions? Thank you for your attention!Joint work with P. Huggins, D. Haws and G. Weyenberg