380 likes | 539 Views
Personal Genomics & Watson’s Genome. Scott Bray Jaimie Barkley Rachel Blumhagen Kristy Theodorson. 1st human genome sequenced using NEXTGEN technology Identified novel genes, SNPs, CNVs and indel polymorphisms Results consistent with traditional methods used to sequence Venter’s genome
E N D
Personal Genomics&Watson’s Genome Scott Bray Jaimie Barkley Rachel Blumhagen Kristy Theodorson
1st human genome sequenced using NEXTGEN technology Identified novel genes, SNPs, CNVs and indel polymorphisms Results consistent with traditional methods used to sequence Venter’s genome Pilot project for personalized genome sequencing
NEXTGEN Pros • Less time • Two months • Less expensive • Approximately 1/100 of the cost of traditional capillary electrophoresis • More Efficient • Avoids loss of genomic sequence due to amplification of DNA in a cell-free system
Quicker, smaller, cheaper M. Wadman Nature 452, 788 (2008).
How’d they do it? • Genomic Extraction of white blood cells • Nebulization • 454 pyrosequencing • 234 runs @ 105Mb per run • Assemble?
No Assembly required!! (ok, a little) • “Reference” sequence (Build 36) to align reads • official reference genome assembly • includes both WGS and BAC sequence data assemblies • additional genomic sequences incorporated
Reads were aligned to a reference sequence with 7.4X coverage • Uniquely Mapped Reads (1.5 million) were WGS assembled
7.4X Coverage X -Chromosome Why would they have lower coverage on the X chromosome?
Single Nucleotide Polymorphisms 14 million initially found • For “known” SNPs: 50% homozygous, and 50% heterozygous, but “novel” SNPs were mostly heterozygous -Why? Does this result support the hypothesis that the SNPs are “novel”? Filter 3.3 million Filter and Align 2.7 million matched “known” from dbSNP 0.61 million deemed “novel” 10,425 did not match dbSNP (unlikely to be third allele or error in dbSNP 0.38% false discovery rate)
Traditional Sequencing Venter’s Genome • 7.5-fold coverage, using WGSA method with Sanger sequencing Similar “novel” SNP results between NEXTGEN and traditional sequencing
Verification of SNP Identification • “known” SNPs identified were compared with the experimental genotyping of the subjects DNA using microarray • microarray of reference sequence hybridized to Watson’s DNA • 494,713 markers successfully genotyped • Watson’s DNA sequence had high agreement with the homozygous reference and homozygous variant, but relatively low agreement to heterozygous – Why?
Accuracy of SNP Identification • 13-fold coverage required to detect 99% of all heterozygous SNPs Coverage is key
Insertions-Deletions (Indels) • Identified 222,718 • Size range of 2-38,896 bp Why do they not have data on length of insertions? Decrease in deletions frequency with increase in size of deletions
Do the indels cause a frame shift? • 345 indels found in coding regions • Primers were designed for 111 of them, followed by Sanger sequencing • 78 indels validated 66 of them were in lengths of multiples of 3 (no frame shift) • 65 were found as heterozygotes
Interesting Find… • They found a homozygous 4-base deletion in exon 11 of Watson’s SGEF gene • SGEF is highly conserved in vertebrates • Guanine nucleotide exchange factor thought to regulate membrane dynamics in promotion of vesicle formation What does this suggest???
Copy Number Variations • CNVs: local gains or losses of regions in the genome because of duplication or deletion • associated with genetic disease • detectable by variation in the average DNA sequence coverage of the region • Comparative genomic hybridization (CGH) used • Examine relative fluorescence intensity in wells • Microarray revealed 23 CNV regions
CNVs • CNV’s are polymorphic: - segregate as alleles with varying frequency, - depends on the reference genome • None of the CNV regions were identified to be involved with any known phenotype. • However 34 genes are predicted to be affected. These genes include: two olfactory receptor groups, several with possible roles in prostate, breast, and colon cancer, a gene from the HLA-D locus, and two proteins involved in RNA editing.
Experimental Conclusions • 3.3 million SNPs identified • 8,996 were non-synonymous ‘known’ SNPs • 1,573 were ‘novel’ • Of the non-synonymous known SNPs, 342 alleles matched mutations found in the Human Gene Mutation Database (HGMD) 32 disease causing
Experimental Conclusions • 10 out of 12 alleles are highly penetrant, Mendialian recessive disease-causing alleles • 7 out of 10 were heterozygous, the other three only exhibited one allele • Subject does not have the diseases.
1.5 million unaligned reads 65% matched known repeats 110K contigs 29Mb of sequence 33 cDNA w/ no map location Protein prediction, 60 significant Matches to 49 proteins
Criticism • “It’s a new standard of sequencing technology,” says Venter. “But I don’t think it’s a new standard of genome coverage and independent assembly.” • Good if reference seq. is available, if not? • Dealing with repeats with small reads (no mate pairs, can coverage compensate?) • Still haven't learned to read “the book of life”
Personal Genomics • “My Genome, My Self”- Steven Pinker • Jan. 11, 2009 • Personal Genome Project • (PGP-10) • Publicly available for association studies • Personal genomics is important to the associations between human genetic variation, physiology and disease risk
Pros • Personalized medicine, customized to patient’s biochemistry • Better genetic testing for screening and prevention of at risk patients • Creation of dataset that can be referenced for association studies • Useful for evolution studies Cons • “Genes of Doom” • Insurance and employment discrimination GINA • Direct-to-consumer testing ( bypass health professionals to test for breast cancer alleles or even mutations linked to cystic fibrosis) • Genetic determinism
“Genetic Determinism”Examples of Single Gene Disorders • Autosomal recessive: • Cystic fibrosis (CF) • Phenylketonuria (PKU) • Sickle cell anemia • ADA deficiency, a rare immunodeficiency disorder ("bubble boy" disease) • Autosomal dominant: • Familial hypercholesterolemia • Huntington's disease • X-linked recessive: • Duchennemuscular dystrophy • Hemophilia A • X-linked dominant: • few, very rare, disorders are classified as X-linked dominant • hypophosphatemicrickets (vitamin D -resistant rickets)
All else is in the numbers…or better yet the genes • “Geno’s Paradox”: single genes are not very informative • Traits are typically a result of many genes, each having little effects • correlating genes with some traits is (currently) too complex • a test for a gene can identify ONE contributor to a trait, but the observance of a trait
Pinker’s Results • FALSE Results • http://fire.biol.wwu.edu/young/470/stuff/steven_pinker_2.html • Contradictory and confusing “If you want to know whether you’re at risk for high cholesterol, have your cholesterol measured; if you want to know whether you are good at math, take a math test.” –Steven Pinker
Common Types of Genetic Testing • Newborn Screening: to identify disorders that can be treated in early stages of development • PKU treated by change in the mother’s diet • Diagnostic: to confirm or rule out a specific genetic or chromosomal condition typically after symptoms are present • Carrier: to identify if individual carries a copy of a mutated gene, typically done for prospective parents • Predictive: presymptomatic, to assess probability of having a genetic disorder that may appear later in life
Nature versus Nuture… versus Chance? • Environment and life experience • Stochastic events (chance) • i.e. identical twins • same genetic makeup, same environment • Behavioral Genetics: …WHO AM I? Personality traits Behavioral traits Decision-making traits
Genetic Information Nondiscrimination Act (2008) • Prohibits insurers from refusing coverage of a healthy individual or charging that person higher premiums based on their genetic predisposition to developing a disease • Prohibits employers from using genetic information to discriminate against individuals in hiring, firing, job placement, etc. • “[GINA] is necessary to ensure that biomedical research continues to advance… such legislation is necessary so that patients are comfortable availing themselves to genetic diagnostic tests.“- NHGRI
Ethics • One copy of APOE E4 variant triples the risk of developing Alzheimer’s • Should your genome be public or private? • Should genetic counseling be required? • Third party complications • Ex: Pinker found he has a gene for familial dysautonomia, knew to get nieces and nephews tested
Conclusions • Sequencing and interpretation of personal genomes will become more accurate with increase in individuals sequenced • Pro-active approach to ethical issues • NEXGEN Sequencing: $ 100,000 genome • http://www.knome.com/home/ • NEX-NEXGEN Sequencing: $ 1,000 genome • 2004, NHGRI awarded $ 38 million dollars in grants
References • Ellerbroek et al. SGEF, a RhoG guanine nucleotide exchange factor that stimulates macropinocytosis. Mol Biol Cell. 2004 Jul;15(7):3309-19 • Pinker, S. My Genome, My Self. NYT, Jan 2009, pp 23-31. • Levy, S. et al. The diploid genome sequence of a single individual. PLoS Biol. 5, e254–e286 (2007). • Wheeler et. al. 2008. The complete genome of an individual by massive parallel sequencing. Nature 452: 872-877. • Olson, M. 2008. Dr. Watson’s base pairs. Nature 452: 819-820. • Wadman, M. 2008. James Watson’s genome sequenced at high speed. Nature 452: 788.