1 / 32

Functional Genomics with Next-Generation Sequencing

Functional Genomics with Next-Generation Sequencing . Jen Taylor Bioinformatics Team CSIRO Plant Industry. Capacity and Resolution. Next generation sequencing Increasing capacity leads to increased resolution. Eric Lander, Broad Institute. How a Genome Works?. Parts Description

ilario
Download Presentation

Functional Genomics with Next-Generation Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

  2. Capacity and Resolution • Next generation sequencing • Increasing capacity leads to increased resolution Eric Lander, Broad Institute CSIRO. INI Meeting July 2010 - Tutorial - Applications

  3. How a Genome Works? Parts Description • Function? • Interconnectedness? Comparisons • Population - level • Between genomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

  4. Application domains Reference genome No Reference Genome Partially sequenced UNsequenced “PUN Genomes” CSIRO. INI Meeting July 2010 - Tutorial - Applications

  5. Assembly Contigs Impact of a Reference Genome Sequence Data Alignment Genome Read Density Characterisation CSIRO. INI Meeting July 2010 - Tutorial - Applications

  6. Profiling of Variation Genetic variation Transcript variation Epigenetic variation Metagenomic variation Discovery Novel genomes Novel genes Novel transcripts Small / long non-coding RNA Today • RNA Sequencing (RNASeq) • Coding and non-coding transcript profiling • Dynamic and Context dependent • Epigenomics • Genome-wide protein-DNA interactions, DNA modifications • Heritable and reversible regulation of gene expression Applications of Next Generation Sequencing CSIRO. INI Meeting July 2010 - Tutorial - Applications

  7. RNASeq • Qualitative – transcript diversity • Quantitative – transcript abundance • Impact of NGS • Observation of transcript complexity • Transcript discovery • Small / long non-coding RNA • Analytical challenges • Transcript complexity • Compositional properties CSIRO. INI Meeting July 2010 - Tutorial - Applications

  8. RNASeq Sample Total RNA PolyA RNA Small RNA Reference Analysis Mapping to Genome Digital “Counts” Reads per kilobase per million (RPKM) Transcript structure Secondary structure Targets or Products Library Construction PUN Assembly to Contigs Sequencing Base calling & QC CSIRO. INI Meeting July 2010 - Tutorial - Applications

  9. RNASeq – Transcript Complexity • Mapping : • Reads with multiple locations • Conserved domains ? • Sequencing error ? • Reads Spanning Exons • Gapped alignments ? • Sequencing error ? Erange Pipeline : Mortazavi et al., Nature Methods VOL.5 NO.7 JULY 2008 CSIRO. INI Meeting July 2010 - Tutorial - Applications

  10. RNASeq – Compositional properties Depth of Sequence • Sequence count ≈ Transcript Abundance • Majority of the data can be dominated by a small number of highly abundant transcripts • Ability to observe transcripts of smaller abundance is dependent upon sequence depth CSIRO. INI Meeting July 2010 - Tutorial - Applications

  11. RNASeq – Compositional properties Composition • Sequence counts are a composition of a fixed number of total sequence reads • Therefore they are sum-constrained and not independent • Large variations in component numbers and sizes can produce artefacts True Reads RPKM CSIRO. INI Meeting July 2010 - Tutorial - Applications

  12. RNASeq - Correspondence • Good correspondence with : • Expression Arrays • Tiling Arrays • qRT-PCR • Range of up to 5 orders of magnitude • Better detection of low abundance transcripts • Greater power to detect • Transcript sequence polymorphism • Novel trans-splicing • Paralogous genes • Individual cell type expression CSIRO. INI Meeting July 2010 - Tutorial - Applications

  13. Reference Genome - RNASeq CSIRO. INI Meeting July 2010 - Tutorial - Applications

  14. Reference Genome - RNASeq • Human Exome • Number of exons targeted: ~180,000 (CCDS database) • plus700+ miRNA(Sanger v13) • 300+ ncRNA CSIRO. INI Meeting July 2010 - Tutorial - Applications

  15. Epigenome • Protein-DNA interactions [ChIPSeq] • Nucleosome positioning • Histone modification • Transcription factor interactions • Methylation [MethylSeq] • Impact of NextGen • Whole genome profiling • Resolution • Analytical challenges • Systematic bias • Unambiguous mapping • Robust event calling Image : ClearScience CSIRO. INI Meeting July 2010 - Tutorial - Applications

  16. ChIPSeq MNase Linker Digest Remove Nucleosomes Sequence & Align CSIRO. INI Meeting July 2010 - Tutorial - Applications

  17. ChIPSeq Sequence & Align MNase Digest Remove Nucleosomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

  18. ChipSeq methods Pepke et al., 2009 CSIRO. INI Meeting July 2010 - Tutorial - Applications

  19. Thymine Cytosine Uracil Bisulfite conversion PCR Bisulfite conversion PCR 5-methylcytosine 5-methylcytosine Cytosine MethylSeq using Bisulfite conversion CSIRO. INI Meeting July 2010 - Tutorial - Applications

  20. Limited publications from BS-Seq • Mammals • Methylation predominant occurs at CpG site • Several publications in human • One publications in mouse • Plants • Methylation occurs at CG, CHH, CHG sites • Two publications in arabidopsis H = A, G, T CSIRO. INI Meeting July 2010 - Tutorial - Applications

  21. Watson >>A Cm G T T C T C C A G T C>> Bisulfite conversion >>A CmG T T T T T T A G T T>> Cm methylated C Un-methylated Problems of mapping BS-seq reads • Reduced sequence complexity >>A CG T T T T T T A G T T>> CSIRO. INI Meeting July 2010 - Tutorial - Applications

  22. Bisulfite conversion BSC << TGCmAAGAGGTTAG << BSW>> ACmGTTTTTTAGTT >> PCR BSCR >> ACG TTCTCCAAGA >>BSC << TGCmAAGAGGTTAG << BSW>> ACmGTTTTTTAGTT>>BSWR << TG CAAAAAATCAA >> Problems of mapping BS-seq reads • Increased search space Watson >> A Cm G T T C T C C A G T C >>Crick << T G Cm A A G A G G T C A G << CSIRO. INI Meeting July 2010 - Tutorial - Applications

  23. ELAND • Mapping reads to genome sequences • Mapping reads to two converted genome sequences • Cross match for reads mapping to multiple positions in converted genomes • Mapping results were combined to generate methylation information • Eland only allows 2 mismatches. Lister et al. Cell (2008) CSIRO. INI Meeting July 2010 - Tutorial - Applications

  24. BSMAP • Based on HASH table seeding algorithm Xi and Li BMC Bioinformatics (2009) CSIRO. INI Meeting July 2010 - Tutorial - Applications

  25. Re-mapping of Lister’s data using BSMAP Lister et al. Cell (2008) CSIRO. INI Meeting July 2010 - Tutorial - Applications

  26. Arabidopsis Chromosome 3 Watson 1.0 Crick CHG 0.80 Methylation Level / 50Kb Watson CG Crick 0.20 CHH Watson Crick Position Methylation pattern throughout chromosomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

  27. Partially / Unsequenced Genomes Options for dealing with partial or unsequenced genomes • Wait for or generate the genome sequence • ‘Borrow’ a reference genome from a phylogenetic neighbour • Take a deep breath and ‘do denovo’ • Denovo Genome • Denovo Transcriptome Gene Annotation DNA or RNA Sequence Data Genetic Variation Partial Assembly Transcript Variation Partial Sequence Database Non-coding RNA CSIRO. INI Meeting July 2010 - Tutorial - Applications

  28. Plant Genomes – Haploid Size Human Arabidopsis Rice Potato Sugarcane Cotton Barley Wheat Diameter proportional to genome haploid genome size CSIRO. INI Meeting July 2010 - Tutorial - Applications

  29. Wheat Plant Genomes – Total Size Human Cotton Barley Sugarcane CSIRO. INI Meeting July 2010 - Tutorial - Applications

  30. Denovo RNA Seq • Why transcriptome ? • Large genome sizes with high repeat content are difficult to assemble • Transcriptomes more constant size • Enriched for functional content • Aims : • Transcript discovery • Small /long non-coding RNA profiling • Analytical challenges • Assembly – ABySS, Velvet, Euler-SR • Comparisons between non-discrete, overlapping transcripts • Annotation • Ploidy CSIRO. INI Meeting July 2010 - Tutorial - Applications

  31. Summary – Impacts and Challenges • RNASeq • Increased resolution • Increased power for transcript complexity and variation • Analytical challenges – transcript complexity, compositional bias • Large gains in small and long non-coding RNA profiling • Epigenomics • ChipSeq and MethylSeq • Genome-wide with resolution • Robust event calling is challenging • Denovo transcriptomics • Attractive option for large, repeat rich genomes CSIRO. INI Meeting July 2010 - Tutorial - Applications

  32. Acknowledgements CSIRO PI Bioinformatics Team Andrew Spriggs Stuart Stephen Emily Ying Jose Robles Michael James CSIRO Biostatistics David Lovell CSIRO. INI Meeting July 2010 - Tutorial - Applications

More Related