1.04k likes | 1.06k Views
Evolution. Time Travel. Evolution. Genome. ?. Evolution. Time Travel. Me. Decoding the Past. Lecture 17. Reference Genome. Reference Genome. Healthy. Healthy. Sick. Sick. Healthy. Compare variants/Comparative Genomics.
E N D
Evolution Time Travel
Evolution Genome ? Evolution Time Travel Me
Decoding the Past Lecture 17
Reference Genome Healthy Healthy Sick Sick Healthy Compare variants/Comparative Genomics. Statistics, Signal Processing/Data Science/Machine Learning/Big Data, etc.
Claude Shannon Victim of Information Theory Effect Evolution Channel Evolution Channel Evolution Channel Evolution Channel Snapshot Evolution Channel No history information First Order!
Evolution Channel For physical traits like race, the history information may not be relevant in a single lifetime. For mutation based diseases like cancer – the history information is critical even in a single lifetime! Hereditary Stochastic Environmental Controlling Factors
Can we use a single genome to gather information about the evolution channel?
Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Mutations ACGTC Seed
Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA There is always a path Some sequence Unconstrained Sub, Ins, Del ACGTC Seed
Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Duplications ACGTC Constrained Seed
Evolution Model AGATACTATTAGGGCCCCATACGTTGACTA Some sequence Tandem Duplications ACGTC Constrained Seed
Tandem Duplication Example Seed = AC ACAC ACACCAC ACACCACAC ACACCACACACCACAC ACACACACCACACACCACAC ACACACACCACACACCACACCACACCACAC ACACACACCAACCACACACCACACCACACCACAC Tandem Repeat
Example We can generate any string that starts with and ends with . Duplications of length z Duplications of length s Seed = , Tandem duplications of length and .
Example We can generate any string that starts with and ends with . Proof: appears times : times z Pick the -th block of s : : times times Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Definitions [F. Farnoud, M. Schwartz, J. Bruck, ISIT’14] Tandem Duplication String System (S) : Alphabet, : seed, : Tandem Duplication Rule Tandem Duplication Rules : Tandem duplications of length only : Tandem duplications of length atmost
Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Examples 0101012 .. 01012 0101212 012 .. 0121212 .. 01212 0101212 .. 0012 ...... 0112 ...... 012 0122 ...... 01012 ...... 01212 ......
Diversity S. Jain, F. Farnoud, J. Bruck, ‘’Capacity and Expressiveness of Genomic Tandem Duplication” IEEE IT 2017 Tandem Duplications seed
Capacity z What it means? s [F. Farnoud, M. Schwartz, J. Bruck, ISIT’14] Definitions is the count of sequences of length n that can be generated in .
Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]
Finite Automata Example Sequence of parties of last 10 different American Presidents Useful way to model transitions between states in a sequence
Strongly connected component Perron-Frobenius Theory [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]
Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]
Capacity z Proof Idea Finite Automata [S. Jain, F. Farnoud, J. Bruck, IEEE IT’17]
Arbitrary Seed [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]
Arbitrary Seed [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]
Expressiveness Seed
Expressiveness Seed
Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Example We can generate any string that starts with and ends with . z s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Example We can generate any string that starts with and ends with . is expressive! z For any s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Expressiveness z Proof uses Thue’srepeatfree result. [S. Jain, F. Farnoud, J. Bruck, IEEE IT 2017]
Example We can generate any string that starts with and ends with . Generation is slow z Number of steps required to generate each string s Total Number of strings of length that can be generated: Seed = , Tandem duplications of length and .
Example We can generate any string that starts with and ends with . z Does allowing all duplication lengths make generation faster? s Total Number of strings of length that can be generated:
Duplication Distance N. Alon, J. Bruck, F. Farnoud, S. Jain, “Duplication Distance to the root for binary sequence” IEEE IT 2017 Shortest Path Length? seed Answer: NO!, duplication distance = , for all but an exponentially small fraction of sequences. Take away: Short duplication lengths play the main role in generating diversity!
Seed = 01 0101 0101101 Sequence = 01011001
Seed = 01 0101 Think Reverse! 0101101 Sequence = 01011001
01201212 Example:
01201212 01212 Example: