1 / 37

Big Data Challenges in Genomics

Big Data Challenges in Genomics. Dr. Kameswara Rao Kottapalli Research Associate Professor. http://www.orgs.ttu.edu/biotechnologyandgenomics/. CBG “Model”. “Three Research Centers in one”! Proteomics/Mass Spectrometry Facility Genomics/Sequencing Facility Bioinformatics Facility

odin
Download Presentation

Big Data Challenges in Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Challenges in Genomics Dr. Kameswara Rao Kottapalli Research Associate Professor http://www.orgs.ttu.edu/biotechnologyandgenomics/

  2. CBG “Model” “Three Research Centers in one”! • Proteomics/Mass Spectrometry Facility • Genomics/Sequencing Facility • Bioinformatics Facility PLUS: MS in Biotechnology Unique educational experiences = TRANSDISCIPLINARY

  3. Research - Projects Bacterial RNA-Seq ICFIE/HSC-BRC Bacterial Whole Genome ICFIE Drought Func. Genomics USDA Epigenetics/small RNA Regu.-Obesity Human Nutrition Soil Microflora Metagenomics PSS Cancer Cell population & SC-T HSC-Abilene Bovine Gut Microbiome & Immune response AFS

  4. Big Data Big Data in Genomics Nothing close to data generated in particle physics from Large Hadron Collider collision data – 200 Petabytes One petabyte is enough to store the DNA of the entire population of the USA – and then clone them, twice - http://www.computerweekly.com/feature/What-does-a-petabyte-look-like

  5. Genomics Big Data HUMAN GENOME = 3 billion bases (ATGC)

  6. Genomics Big Data HUMAN GENOME CHALLENGES Sequencing of Masses in Personalized medicine $1000 for sequencing of Human genome pinpointing medical problems & identifying treatments. Jay Flatley – CEO of Illumina Inc. “Increasing Clinical knowledge of the variation in Genome” It’s one thing to say, ‘Here is the genetic variation’ It’s another to say, ‘Here is what the variation means’

  7. Genomics Big Data Solutions Finding Variations - writing algorithms & scripting Cufflinks Assembly and Differential Expression Bioinformatics Solutions Whole-Genome Sequencing ? Interpreting the Biological significance of variation

  8. Genomics Big Data Texas Cattle Cotton

  9. Genomics Big Data CATTLE GENOME = 3 billion bases (ATGC) CHALLENGES Calves mortality – Increasing Immunity and reduce disease incidence

  10. Genomics Big Data COTTON GENOME = 2.83 billion bases (ATGC) estimated

  11. Patterson et al. 2012 Nature

  12. Systems Biology

  13. Transcriptomics CHALLENGES Splicing Mature RNA Gene coding Titin protein has 312 exons Exon rearrangements & Exon shuffling

  14. Alternate Splicing mRNA-1 999 bp 273 bp 798 bp DBD HR-A/B NLS H3 H3 AHA1 AHA2 NES mRNA-2 951 bp 48 bp 273 bp 798 bp DBD STOP H3 H3 cuuuguaguGUUAAAUUUUCAGUCAUGCUCUGACUGUGAAAAACAAAUGUCCCCAGggauuucg

  15. Transcriptomics Reference transcriptome is a myth

  16. Transcriptomics “According to Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF), in any given cell type or tissue, only a subset of the proteins will ever be expressed. In addition, the particular cells in question may synthesize sequence and/or splice variants that differ from the canonical sequence that a protein database has selected to represent a given protein. In such case, RNA-Seq of the tissue followed by whole proteomics will identify key proteins involved in a particular tissue in response to a treatment” 2013 2014 Plenary talk on Rat Proteomics – Prof. Albert Heck, Netherland Proteomics Center

  17. Sequencing & de novo assembly Stage 4 Control Stage 4 Stressed CAP3 assembly (In-house Python script) Peanut Pod Transcriptome 6-frame Translation (In-house Perl script) Peanut Pod specific proteome Spectral Counting Differentially expressed proteins 16 Fractions Tryptic Digested Annotation & Pathway mapping by MapMan LC-MS/MS (Whole Proteome) Kottapalli et al. 2013 JPR

  18. APPLICATION

  19. Drought is a global problem threatening the human race In 2011, Texas Drought resulted in losses exceeding $ 4 billion

  20. Water consumption is 10 - 40 times recharge • estimated depletion by 2050 based on pumping forecasts Ogallala Aquifer • The Ogallala supplies water for eastern New Mexico, much of west Texas, as well as parts of Colorado, Oklahoma, Kansas, Nebraska, Wyoming and South Dakota • Approximately 170,000 wells draw water from the aquifer • supplies 25% of water used for irrigation in the United States • Contains 3.3 billion acre feet of water (1 acre foot = 326,000 gallons)

  21. Aerial Image 5dpi 6dpi 4dpi 7dpi 3dpi Pivot 1dpi 2dpi Approx. 24h

  22. Objectives • Major objective of our Group • Identify crop varieties for abiotic stress tolerance • Heat stress • Water-deficit stress What are the complex (?) mechanisms of abiotic stress tolerance and the genetic basis of plant adaptation to drought stress?

  23. Introduction SYSTEMS APPROACH Genomics Transcriptomics Epigenomics Interferomics CELL Metabolomics Proteomics Phenomics

  24. CROP SCIENCE, VOL. 47:1718-1727 2007 Phenomics U.S. Peanut Mini-core Collection (Holbrook and Dong, 2005)

  25. COC041 Conviron Program 450C Temperature 280C 7.0 13.0 15.0 21.0 Time h COC166 Phenomics Heat stress experiment 45 days old plants – subjected to 45 ± 2°C thermal stress in Convirons. Leaf samples collected at 0, 12, 24 and 48 hours after induction of stress. Phenotype 14 days after stress

  26. COC041 COC166 Phenomics Water-deficit stress experiment 45 days old plants – subjected to water-deficit stress for 7 days. Leaf and root samples collected at 0, 3, 5 and 7 days after cessation of irrigation Phenotype 5 days post water-deficit stress

  27. * seed yield (g/plant) * 70% PETR Full Irrigation COC041 pvalue COC166 Phenomics

  28. Transcriptomics Gene Expression Analysis by Sequencing RNA-Seq Wang et al. 2009

  29. Transcriptomics MapMan Annotation– Mercator tool 43520 genes Complete EST sequences in FASTA format

  30. Transcriptomics 0 day Tolerant

  31. Transcriptomics 2 day Tolerant

  32. Gel-LC-MS/MS 16 Fractions Tryptic Digested LC-MS/MS (Whole Proteome) Proteomics Gel-MALDI/TOF 1-D (Chip) kDa L S1 S1 S2 S2 S3 S3 S4 S4 Total Protein 4 2-D 7

  33. Shotgun Quantitative Proteomics

  34. Integration of Omics data PGK KAR KAS SAD PDC Oleosin ACCase

  35. Water-deficit/Heat-stress signal ARG2 HSP Calmodulin-binding protein, SRP 5, SAPK3, myo-inositol, protein kinases Lectin DREB Proline MYB nucleus G-Taurine TPR Acetyl CoA carboxylase Malonyl CoA Acetyl CoA Chloroplast Mitochondria Short chain PUFA Flavanoid Long FA/Glycero- phospholipids Oxidative phosphorylation Endoplasmic Reticulum Lignin monomer RuBisCO Lypoxygenase Antioxidants Methionine synthase Jasmonic Acid Methylated lignin monomer Chlorophyll a/b Thick cuticular wax Lignified cell walls Cytochrome b/c Hypothetical Model

  36. Interferomics Col-0 Heat Sensitive mutant KO Heat tolerant mutant 1-11 1-03 2-05 b 2-06 Tolerate 117⁰F for 30 min a

  37. Thank You

More Related