1 / 35

The Transcriptome

The Transcriptome. Gene Discovery Quantitation of Gene Expression. Reading: Ch 15.1. BIO520 Bioinformatics Jim Lund. WHY?. The genes (proteins) expressed determine the state of the cell. Signaling. Metabolic capabilities. Differentiation state (cell type).

dacia
Download Presentation

The Transcriptome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Transcriptome Gene Discovery Quantitation of Gene Expression Reading: Ch 15.1 BIO520 Bioinformatics Jim Lund

  2. WHY? • The genes (proteins) expressed determine the state of the cell. • Signaling. • Metabolic capabilities. • Differentiation state (cell type). • Response to changes in environment. • Verifies gene predictions. • Transcriptional regulation • Normal vs. abnormal • Conditional expression

  3. Transcriptome Analysis • Gene (transcript) discovery • transcripts • alternative splicing/processing • Transcript assays • Promoter analysis • Transcription Factors • Cellular control networks

  4. Gene Discovery • Inference from genomic DNA • Prokaryotes & fungi OK • cDNA characterization • EST • SAGE

  5. EST (Expressed Sequence Tag) • Sequence cDNA libraries • proportional libraries • subtracted or normalized libraries • Which end? • 5’ or 3’ or Whole

  6. “regular” or proportional Subtracted Miss alternate transcripts normalized Tissue Primer dT vs random Library Type

  7. Ideal cDNAs

  8. “Real” cDNAs

  9. Which end? • Whole cDNA • BEST & HARDEST (Long) • 3’-end • Consistent technically, limited information • 5’end • Coding “identity” highest • 5’ AND 3’ • Good, but technical & informatic challenge

  10. EST Data Analyses • Clustering Analysis • Assemble ESTs into genes. • Alternative splicing forms • Find coding SNPs. • Truncated, unspliced, and junk ESTs can be misleading • Project: Unigene • Program: stackPACK • Frequency analysis • Digital Differential Display • DDD is a computational method for comparing sequence-based gene representation profiles among individual cDNA libraries or pools of libraries.

  11. EST Results (old) • Known genes (30%) • Similarities to other ORFs, ESTs (30%) • Infer Function? • Novel Class (30%,  w/ time)

  12. Typical Progress/Results • Humans • 6,694,833 ESTs • 124,179 clusters (“sets”) • 29,000 sets contain EST and mRNA seqs. • CGAP EST library ”plateau” broken by: • different tissues, different states • normalized libraries

  13. Data Quality Considerations • 99% correct data (1% errors!). • Frameshifts-effects depend on tools • BLASTX tool to “find” frameshifts • How sensitive? • TBLASTX, TBLASTN to “use” in other projects • How sensitive?

  14. Gene Expression Assays • EST (Poor method) • SAGE • Microarray Hybridization • Next Gen Sequencing. • Transcriptional Fusions • GFP, LacZ fusions

  15. Serial Analysis of Gene Expression (SAGE) • Collect mRNA • Isolate short oligomers from each transcript. • Ligate together the oligomers and clone them. • Sequence thousands of clones. • Map the 1x104 – 1x105 oligomers to their genes. • Find which genes are transcribed and their relative expression levels. • http://www.sagenet.org (Vogelstein at JHU)

  16. SAGE technique • Prepare biotin labeled cDNA • Cleave with anchoring enzyme (NlaIII)

  17. SAGE technique • Ligate on linkers • Cleave with tagging enzyme (BsmFI)

  18. SAGE technique • Ligate, PCR, and gel purify ditags (102bp). • Recleave with anchoring enzyme (NlaIII), ligate to form concatemers. • Size select, clone and sequence concatemers.

  19. Colon cancer vs. normal colon epithelium (SAGE)

  20. Microarray Hybridization • Determine gene expression by parallel hybridization of labeled cDNA to DNA attached to a fixed support. • http://cmgm.stanford.edu/pbrown/

  21. Microarray Hybridization • Producing chips • Producing probes / reading arrays • Analyzing and interpreting data

  22. Transcriptional Array orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA

  23. 1 2 6 8 Transcriptional Array-1 orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 Condition 2 > All human genes mRNA mRNA mRNA

  24. Transcriptional Array-2 orf 1 orf 2 orf 3 1 1 2 2 3 3 3 cm 6 4 5 6 200 spots 7 7 8 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA

  25. Microarray Technologies • Spotted arrays (Brown et al.) • Spot arrays on glass slides • PCR fragments • Long (50-70bp) oligo arrays • Synthesis • Affymetrix (www.affymetrix.com) • High density array of 25 bp oligos • Made using light directed oligonucleotide synthesis and photolithography • Agilent, CombiMatrix • Made using light directed oligonucleotide synthesis and mirrors.

  26. Spotted Arrays

  27. Print Quill

  28. Spotted microarray image

  29. Affymetrix photolithographic technology • Lithographic masks are used to either block or transmit light onto specific locations of the array. • The surface is then flooded with a solution containing either adenine, thymine, cytosine, or guanine, and coupling occurs only in those regions on the glass that have been deprotected through illumination. • The coupled nucleotide also bears a light-sensitive protecting group, so the cycle can be repeated. • Microarray is built as the probes are synthesized through repeated cycles of deprotection and coupling. • Typically ends at 25 bps.) • Current arrays have 1.3 million unique features per array.

  30. GeneChip Expression Assay Design

  31. Affymetrix GeneChips: Expression Analysis • Available for humans and model organisms. • Made only by Affymetrix. • Chip designs change slowly. • GeneChips: • Human: 50,000 RefSeq genes and ESTs • C. elegans: 22,500 genes (12/00 genome annotation) • Rat 230: 30,000 genes, ESTs • Yeast: 6100 gene set • Tiling arrays for model organisms • http://affymetrix.com

  32. Quantitation of fluorescence signals (Image to data) • Hybridization, scan in chip image. • Gridding • Determine where the spots are. • Spot intensity and local background determination. • Normalization • Adjust to make the red and green total signal intensities the same. • Gene expression ratio. • Red channel/green channel. • Programs: • ScanAlyze, http://rana.lbl.gov/EisenSoftware.htm • GenePix, http://www.moleculardevices.com/pages/instruments/microarray_main.html

  33. Microarray data Big tables of numbers!

  34. Viewing microarray data Clustergram Scatter plot: log(ch1) vs log(ch2) M vs A: expression levell vs expression change Volcano plot: log(expr) vs p-value

More Related