1 / 104

Time Travel

Evolution. Time Travel. Evolution. Genome. ?. Evolution. Time Travel. Me. Decoding the Past. Lecture 17. Reference Genome. Reference Genome. Healthy. Healthy. Sick. Sick. Healthy. Compare variants/Comparative Genomics.

patriciac
Download Presentation

Time Travel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution Time Travel

  2. Evolution Genome ? Evolution Time Travel Me

  3. Decoding the Past Lecture 17

  4. Reference Genome

  5. Reference Genome Healthy Healthy Sick Sick Healthy Compare variants/Comparative Genomics. Statistics, Signal Processing/Data Science/Machine Learning/Big Data, etc.

  6. Claude Shannon Victim of Information Theory Effect Evolution Channel Evolution Channel Evolution Channel Evolution Channel Snapshot Evolution Channel No history information First Order!

  7. Evolution Channel For physical traits like race, the history information may not be relevant in a single lifetime. For mutation based diseases like cancer – the history information is critical even in a single lifetime! Hereditary Stochastic Environmental Controlling Factors

  8. Can we use a single genome to gather information about the evolution channel?

  9. Mutational Events

  10. Mutational Events

  11. Mutational Events

  12. Mutational Events

  13. Mutational Events

  14. Mutational Events

  15. Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Mutations ACGTC Seed

  16. Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA There is always a path Some sequence Unconstrained Sub, Ins, Del ACGTC Seed

  17. Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Duplications ACGTC Constrained Seed

  18. Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Tandem Duplications ACGTC Constrained Seed

  19. Tandem Duplication Example Seed = AC ACAC ACACCAC ACACCACAC ACACCACACACCACAC ACACACACCACACACCACAC ACACACACCACACACCACACCACACCACAC ACACACACCAACCACACACCACACCACACCACAC Tandem Repeat

  20. Example We can generate any string that starts with and ends with . Duplications of length z Duplications of length s Seed = , Tandem duplications of length and .

  21. Example We can generate any string that starts with and ends with . Proof: appears times : times z Pick the -th block of s : : times times Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  22. Definitions [F. Farnoud, M. Schwartz, J. Bruck, ISIT’14] Tandem Duplication String System (S) : Alphabet, : seed, : Tandem Duplication Rule Tandem Duplication Rules : Tandem duplications of length only : Tandem duplications of length atmost

  23. Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  24. Examples 0101012 .. 01012 0101212 012 .. 0121212 .. 01212 0101212 .. 0012 ...... 0112 ...... 012 0122 ...... 01012 ...... 01212 ......

  25. Diversity S. Jain, F. Farnoud, J. Bruck, ‘’Capacity and Expressiveness of Genomic Tandem Duplication” IEEE IT 2017 Tandem Duplications seed

  26. Capacity z What it means? s [F. Farnoud, M. Schwartz, J. Bruck, ISIT’14] Definitions is the count of sequences of length n that can be generated in .

  27. Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  28. Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]

  29. Finite Automata Example Sequence of parties of last 10 different American Presidents Useful way to model transitions between states in a sequence

  30. Strongly connected component Perron-Frobenius Theory [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]

  31. Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]

  32. Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]

  33. [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]

  34. Arbitrary Seed [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]

  35. Arbitrary Seed [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]

  36. Expressiveness Seed

  37. Expressiveness Seed

  38. Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  39. Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  40. Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  41. Example We can generate any string that starts with and ends with . is expressive! z For any s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  42. Expressiveness z Proof uses Thue’srepeatfree result. [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]

  43. Example We can generate any string that starts with and ends with . Generation is slow z Number of steps required to generate each string s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .

  44. Example We can generate any string that starts with and ends with . z Does allowing all duplication lengths make generation faster? s Total Number of strings of length that can be generated:

  45. Duplication Distance N. Alon, J. Bruck, F. Farnoud, S. Jain, “Duplication Distance to the root for binary sequence” IEEE IT 2017 Shortest Path Length? seed Answer: NO!, duplication distance = , for all but an exponentially small fraction of sequences. Take away: Short duplication lengths play the main role in generating diversity!

  46. Seed = 01 0101 0101101 Sequence = 01011001

  47. Seed = 01 0101 Think Reverse! 0101101 Sequence = 01011001

  48. 01201212 Example:

  49. 01201212 01212 Example:

More Related