1 / 65

Chapter 6: Structural Variation and Medical Genomics

Chapter 6: Structural Variation and Medical Genomics. CS-6293 Bioinformatics Instructor: Dr. Jianhua Ruan. Presented by: Nesthor Perez. Outline. Outline. 1. Introduction. Based on the genetic every single human has different genomes.

cole
Download Presentation

Chapter 6: Structural Variation and Medical Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6: Structural Variation and Medical Genomics CS-6293 Bioinformatics Instructor: Dr. JianhuaRuan Presented by: Nesthor Perez

  2. Outline Nesthor Perez

  3. Outline Nesthor Perez

  4. 1. Introduction • Based on the genetic every single human has different genomes. • Based on each genome there’s special trait for diseases. • GWAS identified common germline. • DNA variants are associated to: diabetes, heart deseases, and other deseases. • GWAS only explained fraction of heritability of traits. Nesthor Perez

  5. 1. Introduction Every single person: Based on each person genetic and genomes, special trait are applied for each disease. Has a different genome sequence: Nesthor Perez

  6. 1. Introduction • Cancer Genome Sequencing Studies identified Somatic Mutations associated with cancer progression. • This mutations are very heterogeneous. • Few mutations are common between patients. • Hard to associate mutations to cancer causes. • Comprehensive studies involve “all variants”. Individual genomes are req for each case. Nesthor Perez

  7. 1. Introduction • GWAS focus on Single Nucleotide Polymorphism: every single human genome is unique. • Previously Germline Variants identified SCALES ranging of DNA sequences: SNP’s  Structural Variants • Examples: • Duplications. • Deletions. • Inversions. • Translocations. Nesthor Perez

  8. 1. Introduction • Then, GWAS identified common Single Nucleotide Polymorphism SNP’s: • Common SNP’s for common diseases (similarities). • Common Variants between diseases (differences). • Main purpose: Disease Association and Cancer Genetics Studies. • In the last 5 years, DNA sequence next-generation technology become commercially available to companies: • Illumina • Life Technology • Complete Genomics Nesthor Perez

  9. 1. Introduction Chromosome components: Nesthor Perez

  10. 1. Introduction A reference genome range from SNPs to Stuctural Variants: Nesthor Perez

  11. 1. Introduction In the last 5 years, these companies develop sequencing technology: Consequently DNA cost decreased Nesthor Perez

  12. 1. Introduction • Consequently the cost of DNA practice has decreased. • DNA at low cost, the study of all variables is possible. • All variables: • Germlines. • Somatics. • SNP’s (Single Nucleotide Polymorphism). • SV’s (Structural Variants). • This paper talks about these sequence technologies, especially on Structural Variables: SV’s. Nesthor Perez

  13. Outline Nesthor Perez

  14. Outline Nesthor Perez

  15. 2.1 Germline Structural Variation • Human Genetic Study has a big purpose: Identify a unique DNA sequence • Attempts: • Identify common SNP’s (HapMap project). • Whole-Genome Seq & Micro-Array measurement found similar SV’s for: • Duplications • Deletions • Inversions • Then, common SV’s are now linked to: • Autism • Schizophrenia Nesthor Perez

  16. 2.1 Germline Structural Variation Human Genetics Study purpose: Identify a unique DNA sequencing. Steps: Whole-Genome Seq and Micro-Array measurement found similar SVs through: - Duplications - Deletions - Inversions Large DNA seq Identify common SNPs Nesthor Perez

  17. 2.2 Somatic Structural Variation • Cancer: driven by somatic mutations accumulated in life: “Micro Evolutionary Process”. • Early studies in Leukemia and Lymphoma. • Identified as “Recurrent Chromosomal Rearrangements”. • Present in many patients with the same cancer. • DNA sequence Next-Generation reconstruct how cancer genomes are organized at single nucleotide resolution. Nesthor Perez

  18. 2.3 Mechanisms of Structural Variation • Base on the amount of sequence similarity (homology) at the breakpoint of SV’s, there are two mechanism: • NHEJ: Non-Homologus End Joining: • Little or no sequence similarity. • NAHR: Non-Allelic Homologous Recombination: • High sequence similarity. Nesthor Perez

  19. 2.3 Mechanisms of Structural Variation CytogeneticTechniques: Chromosome Painting: Nesthor Perez

  20. 2.3 Mechanisms of Structural Variation CytogeneticTechniques: Nesthor Perez

  21. 2.3 Mechanisms of Structural Variation CytogeneticTechniques: Fluorescent in Situ Hybridization (FISH): Nesthor Perez

  22. (FISH) Nesthor Perez

  23. Outline Nesthor Perez

  24. Outline Nesthor Perez

  25. 3. Technologies for Measurement of Structural Variation • SV’s features are based on: • Size. • Complexity. • Ranging: from hundred of nucleotides to large scale of chromosome rearrangements. • Cytogenetic Techniques: • Chromosome Painting. • Spectral Karyotyping (SKY). • Fluorescent in Situ Hybridization. (FISH) Nesthor Perez

  26. 3. Technologies for Measurement of Structural Variation • Large SV’s can be observed on CHROMOSOMES: Nesthor Perez

  27. 3.1 Microarrays • This technology was used for the first genome-wide survey in 2004. • This technique apply the concept of “array Comparative Genomic Hybridization: aCGH. • Reference genome are identified by a fluorescent color. • By now, there are hundreds of thousands of probes avaiables. • Since individual copy number ratios are subject to experimental errors, computational techniques are required to analyze aCGH. Nesthor Perez

  28. 3.1 Microarrays Nesthor Perez

  29. 3.1 Microarrays • aCGH can be used to measure both: germline SV’s in normal genomes and somatic SV’s in cancer genomes. • aCGH initially was developed for cancer genomics applications. • aCGH now is also used to detect copy number variants in large number of genomes at low cost. • aCHG limitations: • Detects only copy number variants. • Requires that genomic probes from the reference genome lie in non-repetitive regions. Nesthor Perez

  30. 3.2 Next-generation DNA Sequencing Technologies • Since DNA sequencing technology has demonstrated substantial sophistication, the DNA analysis cost has decreased a lot, too. • A limitation can be the length of a DNA that can be sequenced. • DNA short sequences range from 30 to 1000 nucleotides, or base pairs (bp). Nesthor Perez

  31. 3.2 Next-generation DNA Sequencing Technologies • Some DNA sequence technologies use a paired-end sequencing protocol to increase read length. • At earlier Sanger sequencing protocols the DNA fragments size depended on the cloning vector. • At next-generation technologies, several techniques have been used to generate paired reads. • Today, latest techniques produce paired reads from fragments of only a few hundred bp to fragments of 2-3 kb. Nesthor Perez

  32. 3.2 Next-generation DNA Sequencing Technologies • Next-generation sequencing technologies have limited read lengths and limited insert sizes in comparison to Sanger sequencing. • Two approaches to detect SV’s using DNA next-generation technology: • Novo Assembly: • Sophisticated algorithms are used to reconstruct genome sequences from overlaps between reads. • Human genome assemblies are highly fragmented. Nesthor Perez

  33. 3.2 Next-generation DNA Sequencing Technologies • Two approaches to detect SV’s using DNA next-generation technology: • Resequencing: • Differences are found between an individual genome and a related reference genome. • These differences are the same differences between the aligned reads and the reference sequence. Nesthor Perez

  34. 3.2 Next-generation DNA Sequencing Technologies Advantages: From earlier DNA Generation to new sequencing technology: Disadvantages: Limitation in the length of a DNA molecule to be sequenced: Today’s technologies produce “SHORT SEQUENCES” of DNA. Range: 30 1000 nucleotides In order to increasereadlength, these DNA sequencingtechnologies use: PairedEndor Mate Pair Nesthor Perez

  35. 3.2 Next-generation DNA Sequencing Technologies There’retwoapproaches to detectSVs: Nesthor Perez

  36. 3.3 New DNA Sequencing Technologies • Previous DNA technologies challenges have been several limitations. • For example: • SV’s breakpoints in high-repetitive sequences. • Third-generation and single molecule technologies offer additional advantages for SV’s: • Longer reads lengths. • Easier sample preparation. • Lower input DNA requirements. • Higher throughput. Nesthor Perez

  37. 3.3 New DNA Sequencing Technologies • Third-generation technologies expected improvements: • Paired reads: Include more than two reads from a single DNA fragment. • Long-range sequence information with low input DNA requirements. • Sequencing technologies keep a fast development thanks to the improvements of: • Chemistry. • Imaging. • Technology manufacture. Nesthor Perez

  38. 3.3 New DNA Sequencing Technologies • New improvements are expected about: • Increasing read lengths. • Inserting lengths. • Enhancing throughput. • A new sequencing technology is the “Nanopore”, which directly read the nucleotides of long molecules of DNA, giving a dramatic advance. • Using Nanopore, extremely long reads (tens of kb) are generated. Nesthor Perez

  39. 3.3 New DNA Sequencing Technologies New features: Longer read lenghts: Higher throughput: Nesthor Perez

  40. 3.3 New DNA Sequencing Technologies New features: Easier sample preparation Nesthor Perez

  41. 3.3 New DNA Sequencing Technologies New features: Lower input DNA requirements: Nesthor Perez

  42. 3.3 New DNA Sequencing Technologies Keep active development thanks new improvements around: Chemistry: Imaging Processing: Data Processing: Nesthor Perez

  43. Outline Nesthor Perez

  44. Outline Nesthor Perez

  45. 4. Resequencing Strategies for Structural Variation • Purpose: Predict SV’s by alignments of sequence reads to the reference genome. • Steps: • Alignments of reads • Prediction of SV’s from alignments. • Resequencing is straightforward in principle but detection of SV’s in human genomes is really hard. • Some types of SV’s are easy to detect, other are really difficult. Nesthor Perez

  46. 4. Resequencing Strategies for Structural Variation Step 1: Alignments of reads: Reads

  47. 4. Resequencing Strategies for Structural Variation Step 2: Predictions of SVs from alignments: “Disease”

  48. 4. Resequencing Strategies for Structural Variation • Some SV’s are hard to detect due technological limitations and biological features. • Technological limitations: • Sequencing errors. • Limited read lengths. • Insert sizes. • SV’s biological features : • Enriched for repetitive sequences near their breakpoints. • Overlap: multiple states or complex architectures. • Recurrent variants at the same locus. Nesthor Perez

  49. 4. Resequencing Strategies for Structural Variation • Therefore, alignments and predictions of SV’s are not easy tasks. • Effective algorithms are required for highly sensitive and specific predictions of SV’s. • Three approaches to identify SV’s from aligned reads: • Split reads. • Depth of coverage analysis. • Paired-end mapping. Nesthor Perez

  50. 4.1 Read Alignment • This is one of the most researched problem in Bioinformatics. • Specialized task of aligning millions to billions of individual short reads is done by software like: • Maq. • BWA. • Bowtie/Bowtie2. • BFAST. • mrsFAST. Nesthor Perez

More Related