400 likes | 645 Views
Analyzing human population genetic history through the study of genetic variation. Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009. Background. To study human population genetic history is to study parts of human evolution
E N D
Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009
Background • To study human population genetic history is to study parts of human evolution • Human evolution is one of the fundamental questions in science • We ask ourselves many questions like: • Where do we come from? • Why are we all different? • How are we all different?
Background • The ZarLab does studies with the most recent events in human evolution: • Now that we have modern humans, what variations have occurred in our genes since our ancient African ancestors • To answer this question our group is looking at human variation to produce a genetic history of these changes
Why do we care? • Many diseases are caused by variations that have occurred in our genetic history • Better understanding of our genetic history and human variation may eventually lead to better treatment plans • Personalized medicine: • “The right drug, in the right dose, to the right person, at the right time.” PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps
Human Variation • Modern humans share 99.9% of our DNA • 0.1% account for variations between humans • Of this, 80% of the variation are the result of SNPs • SNP (single-nucleotide polymorphism) – position in the genome where there are two different bases present in the population. The base at a SNP on a chromosome is referred to as the “allele” • A haplotype is the sequence of alleles on a genome • The other 20% are from deletions or insertions on the genome PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps
International HapMap Project • Study done by the International HapMap Consortium • “…create a public, genome-wide database of common human sequence variation…” • Identified SNPs and compiled the SNP alleles into a database of haplotypes for four different populations (Phase 1) • Population used were a group of 60 Mormons in Utah • Have been widely studied in the past • Western and Northern European descent • Have very detailed records • Used their chromosome 19 “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005
My Project Goals • Reconstruct human genetic history • This is a very difficult problem • Sub-problem: Identify recent genetic events • Make the assumption that these new genetic events are rare or very few in number • Easier to classify and identify relationships when compared to older more common haplotypes • These new events are important because they identify shared recent ancestry • Disease causing variations could be from recent events
Identifying Recent Genetic Events Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombinations
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Workflow Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT Common AAAAAAAAAT* AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT Rare AA|TTTTTTTT AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA TTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTA*TTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Frequency of Variation Individual’s Region How Many Haplotype TTTTTTTTTTTTTTT TTTTTTTTTT AAAAAAAAAAAAAAA AAAAAAAAAA TTTTTTTTTTTTTTT TTTTTTTTTT AAAAAAAAAAAAAAA AAAAAAAAAA TTTTTTTTTTTTTTT TTTTTTTTTT AAAAAAAAAAAAAAA AAAAAAAAAA AAAAAAAAAA - 59 TTTTTTTTTTTTTTT TTTTTTTTTT TTTTTTTTTT - 58 AAAAAAAAATTTTTT AAAAAAAAAT AAAAAAAAAT - 1 AATTTTTTTTTTTTT AATTTTTTTT AATTTTTTTT - 1 TTTTTTATTTTTTTT TTTTTTATTT TTTTTTATTT - 1 AAAAAAAAAAAAAAA AAAAAAAAAA AAAAAAAAAAAAAAA AAAAAAAAAA
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Frequency of Variation Individual’s How Many Frequency of Haplotype Variation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA AAAAAAAAAA – 59/120 ~49% TTTTTTTTTT|TTTTT TTTTTTTTTT – 58/120 ~48% AAAAAAAAAT|TTTTT AAAAAAAAAT – 1/120 ~1% AATTTTTTTT|TTTTT AATTTTTTTT – 1/120 ~1% TTTTTTATTT|TTTTT TTTTTTATTT – 1/120 ~1% AAAAAAAAAA|AAAAA AAAAAAAAAA|AAAAA
Grouping Variations Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Classified as either common or rare haplotypes • Make the assumption that new genetic events are rare or very few in number • A cut off rate of 5% frequency or higher was used to separate common subsequences from rare subsequences • 5% was a number that came from the International HapMap Consortium study “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Grouping Variations Individual’s Frequency of Group Genes Variation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA Common: TTTTTTTTTT|TTTTT AAAAAAAAAA AAAAAAAAAA|AAAAA AAAAAAAAAA – 49% TTTTTTTTTT TTTTTTTTTT|TTTTT TTTTTTTTTT – 48% AAAAAAAAAT|TTTTT AAAAAAAAAT – 1% Rare: AATTTTTTTT|TTTTT AATTTTTTTT – 1% AAAAAAAAAT TTTTTTATTT|TTTTT TTTTTTATTT – 1% AATTTTTTTT AAAAAAAAAA|AAAAA TTTTTTATTT AAAAAAAAAA|AAAAA
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Recent Events • Make comparisons to identify two forms of variation: • Point mutations • Recombination events Common: Rare: AAAAAAAAAA AAAAAAAAAT TTTTTTTTTT AATTTTTTTT TTTTTTATTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Point Mutations Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAT* AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTT AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA TTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTA*TTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Point Mutations Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA TTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTA*TTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Recent Events • Point mutations • Are found by comparing a common haplotype and with a rare haplotype • A difference of one shows that a rare haplotype is a point mutation of a common haplotype • Marked by a “*” next to the point mutation Common: TTTTTTTTTT TTTTTTA*TTT Rare: TTTTTTATTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Recombination Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAT* AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTT AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA TTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTA*TTT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Recombination Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTT AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA AAAAAAAAAAAAAAA
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Recent Events Recombination • Combine portions of two common haplotypes and see if they form a rare haplotype Common: Possible Recombinations: AAAAAAAAAAAA|TTTTTTTT TTTTTTTTTTAAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT
Select a region in a haplotype and find the frequency of variation Group variations into common and rare Find recent point mutations Find recent recombination events Rare Mutations • Marked by a “|” at the border between one haplotype and another haplotype Possible Recombinations: Actual Recombinations: AA|TTTTTTTTAA|TTTTTTTT AAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT
Sample input and output • chr-haplotypes.txt: new_chr-haplotypes.txt: • Indv1 Indv1 • TTTTTTTTTTTTTTT T T T T T T T T T T • Indv1 Indv1 • AAAAAAAAATTTTTT A A A A A A A A A T* • Indv2 Indv2 • AATTTTTTTTTTTTT A A|T T T T T T T T • Indv2 Indv2 • TTTTTTATTTTTTTT T T T T T T A*T T T
Expanding to the Whole Chromosome • Now that we have a way to look for variations in regions of a chromosome, we can expand the technique to look for variations in a whole chromosome • We used a technique of overlapping windows AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |AAAAAAAAAA| |AAAAAAAAAA| |AAAAAAAAAA| |AAAAAAAAAA| |AAAAAAAAAA|
Overlapping Windows Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAT* AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTT AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA TTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTA*TTT
Overlapping Windows Individual’s Frequency of Identify Haplotypes Variation Events TTTTTTTTTTTTTTT AAAAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAT* AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AAAAAAAAATTTTTT AAAAAAAAAT – 1% AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTATTTTTTTT TTTTTTATTT – 1% AAAAAAAAAAAAAAA
Overlapping • Recombination events that looked like point mutations Common: AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT Rare: AAAAAAAAATTTTTT First 10 Slide over 5 and next 10 Common: AAAAAAAAAA Common: AAAAAAAAAA TTTTTTTTTT Rare: AAAAAAAAAT* Rare: AAAA|TTTTTT AAAAAAAAA|T*TTTTT AAAAAAAAA|TTTTTT
Applying to a Population’s Chromosome • Now that we have a technique to look for new variations in a whole chromosome • We can apply it to a population and identify regions where recent genetic events took place
Identified Recent Genetic Events In chromosome 19: Unique point mutations = 13723 Unique recombination events = 4065 Total unique events = 15697 Total point mutations = 46072 Total recombination events = 11381 Total number of events = 57453 Average point mutations per individual = 383 Average recombination events per individual = 94 Average events per individual = 478
Point Mutations Number of Events SNP Position in the Haplotype
Recombination Events Number of Events SNP Position in the Haplotype Haplotype
Point Mutations and Recombination Events Number of Events Haplotype SNP Position in the Haplotype
Conclusion • We have developed an algorithm for identifying recent genetic events in an individual • There were more point mutations identified than there were recombination events • Certain regions in the genome where there were many recent genetic events and there are regions with fewrecent genetic events
Future Work • Run the algorithm over the whole genome • Extend the algorithm to multiple populations • Identify recent events that are unique to a population vs. ones that are shared • Identify genetic relations between common haplotypes • Create a chronological order of recent events in an individual • Adapt the algorithm for high-throughput sequencing data
UCLA ZarLab • Dr. EleazarEskin • All the lab people SoCalBSI • Dr. JamilMomand • Dr. Sandra Sharp • Dr. Nancy Warter-Perez • Dr. Wendie Johnston • Dr. Beverly Krilowicz • Dr. Silvia Heubach • Dr. Jennifer Faust • Ronnie Cheng Funded By: • SoCalBSI 2009 Interns
Determining ancestors • The other ancestors are determined through SNP differences of 2 or more
My Project • Red line • Point Mutation • Blue line • Ancestor to common relationship • Black dashed line • Haplotype resulted from cross over mutation
Graph Graph is generated by a program called Graphviz which is a graphical visualization program