1 / 17

Translating DNA data tables into quasi-median networks Hans-Jürgen Bandelt

Translating DNA data tables into quasi-median networks Hans-Jürgen Bandelt (Dept. of Mathematics, University of Hamburg) & Arne Dür (Institute of Mathematics, University of Innsbruck). Hamming spaces are the natural host spaces of DNA data sets, that is, tables of aligned DNA sequences.

Download Presentation

Translating DNA data tables into quasi-median networks Hans-Jürgen Bandelt

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Translating DNA data tables intoquasi-median networks Hans-Jürgen Bandelt (Dept. of Mathematics, University of Hamburg) & Arne Dür (Institute of Mathematics, University of Innsbruck)

  2. Hamming spaces are the natural host spaces of DNA data sets, that is, tables of aligned DNA sequences. For the purpose of visualisation, a data table does not require the full Hamming space but only a (tailored) retract, which is then represented as network (graph). Then, every DNA data table can be turned into its Ploščica dual, the quasi-median network, which faithfully represents the data.

  3. In extreme cases the data table may generate the full Hamming space as quasi-median network, where each position determines its own sub-alphabet of {A,G,C,T,–}. table product of A C C ACCCCT GTC TTT & & T T T G C

  4. The quasi-median network representing a data table is determined by the pairs of positions: either a pair is strongly compatible and represented by the union of two fibres in the corresponding Hamming space, or a pair is NOT strongly compatible and represented by the product of two fibres in the corresponding Hamming space.

  5. Contrast between the quasi-median networks generated by a pair of positions that are strongly compatible and NOT strongly compatible

  6. positions I and III as well as positions II and III are NOT strongly compatible 4 3 positions I II III 1 A A A 2 G G A 3 G C G 4 C A G taxa I 2 II 2 III III 1 I II 1 2 I II III positions I and II are strongly compatible

  7. The quasi-median network can be really huge: The following network, comprising 868 nodes, is based on real data (Brown et al., J.Mol.Evol.1982) and constitutes the worst case for four sequences.

  8. ... and drawn to scale (with character weights = number of positions merged into characters)

  9. The Steiner problem in Hamming space is referred to by biologists as the parsimony problem. For this problem one seeks to connect a given subset of the Hamming space by a tree of shortest length within this space.

  10. Note that there is a subtle difference between the two concepts of a Steiner minimal tree (from mathematics) and a most parsimonious tree (from biology). In the latter case, only the tree topology is specified that leads to a Steiner minimal tree with an optimal labeling, that is, an embedding into the reference space. Thus, a Steiner minimal tree is a most parsimonious reconstruction of a most parsimonious tree in the language of biologists. CCT CCT ACC ACC GCT GCC TTT GTC TTT GTC A most parsimonious tree A Steiner minimal tree

  11. Theorem. Let T be any tree for which the leaves (pendant nodes) are labelled by the sequences of a data table that is condensed, that is, positions inducing the same partition of the taxa are merged into one (weighted) character. Then every shortest realization of T in the corresponding Hamming space (and, aforteriori, every Steiner minimal tree) is included in the quasi-median network generated by the sequences.

  12. Conclusion: Quasi-median networks are therefore canonical effective solution spaces for the Steiner problem in Hamming space.

  13. In order to be efficient as a visualisation tool, quasi-median networks are best applied to sections of a data set, for example, either representing the variation in a clade of closely related sequences, or representing the variation within a small window of the sequencing range (window analysis), or displaying the variation after removing all fast or medium-rate mutations from the data beforehand (filter analysis).

  14. For human mitochondrial DNA data, for example, one can remove all mutations observed in >1600 complete mtDNA sequences (sampled worldwide from several labs) that can conveniently be retrieved from a website (mtDB database): mtDB2005 filter

  15. Result of the mtDB2005 filter analysis applied to the Caucasus data set of Nasidze & Stoneking (1999) Although only rare mutations survive after filtering, these rare mutations occur in great number and many are even recurrent in this data set.

  16. Comparison with two highest-quality data sets Alps Kenya

  17. The cause for this pattern is massive failure of the electrophoresis carried out, which generated numerous phantom mutations. The issue about this mis-sequenced Caucasian data set is now being debated in the Annals of Human Genetics. Nasidze & Stoneking are rejecting the criticism, thereby giving false statements and providing disinformation. Thus, this has become a case of scientific misconduct.

More Related