380 likes | 589 Views
Big Data Challenges in Genomics. Dr. Kameswara Rao Kottapalli Research Associate Professor. http://www.orgs.ttu.edu/biotechnologyandgenomics/. CBG “Model”. “Three Research Centers in one”! Proteomics/Mass Spectrometry Facility Genomics/Sequencing Facility Bioinformatics Facility
E N D
Big Data Challenges in Genomics Dr. Kameswara Rao Kottapalli Research Associate Professor http://www.orgs.ttu.edu/biotechnologyandgenomics/
CBG “Model” “Three Research Centers in one”! • Proteomics/Mass Spectrometry Facility • Genomics/Sequencing Facility • Bioinformatics Facility PLUS: MS in Biotechnology Unique educational experiences = TRANSDISCIPLINARY
Research - Projects Bacterial RNA-Seq ICFIE/HSC-BRC Bacterial Whole Genome ICFIE Drought Func. Genomics USDA Epigenetics/small RNA Regu.-Obesity Human Nutrition Soil Microflora Metagenomics PSS Cancer Cell population & SC-T HSC-Abilene Bovine Gut Microbiome & Immune response AFS
Big Data Big Data in Genomics Nothing close to data generated in particle physics from Large Hadron Collider collision data – 200 Petabytes One petabyte is enough to store the DNA of the entire population of the USA – and then clone them, twice - http://www.computerweekly.com/feature/What-does-a-petabyte-look-like
Genomics Big Data HUMAN GENOME = 3 billion bases (ATGC)
Genomics Big Data HUMAN GENOME CHALLENGES Sequencing of Masses in Personalized medicine $1000 for sequencing of Human genome pinpointing medical problems & identifying treatments. Jay Flatley – CEO of Illumina Inc. “Increasing Clinical knowledge of the variation in Genome” It’s one thing to say, ‘Here is the genetic variation’ It’s another to say, ‘Here is what the variation means’
Genomics Big Data Solutions Finding Variations - writing algorithms & scripting Cufflinks Assembly and Differential Expression Bioinformatics Solutions Whole-Genome Sequencing ? Interpreting the Biological significance of variation
Genomics Big Data Texas Cattle Cotton
Genomics Big Data CATTLE GENOME = 3 billion bases (ATGC) CHALLENGES Calves mortality – Increasing Immunity and reduce disease incidence
Genomics Big Data COTTON GENOME = 2.83 billion bases (ATGC) estimated
Transcriptomics CHALLENGES Splicing Mature RNA Gene coding Titin protein has 312 exons Exon rearrangements & Exon shuffling
Alternate Splicing mRNA-1 999 bp 273 bp 798 bp DBD HR-A/B NLS H3 H3 AHA1 AHA2 NES mRNA-2 951 bp 48 bp 273 bp 798 bp DBD STOP H3 H3 cuuuguaguGUUAAAUUUUCAGUCAUGCUCUGACUGUGAAAAACAAAUGUCCCCAGggauuucg
Transcriptomics Reference transcriptome is a myth
Transcriptomics “According to Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF), in any given cell type or tissue, only a subset of the proteins will ever be expressed. In addition, the particular cells in question may synthesize sequence and/or splice variants that differ from the canonical sequence that a protein database has selected to represent a given protein. In such case, RNA-Seq of the tissue followed by whole proteomics will identify key proteins involved in a particular tissue in response to a treatment” 2013 2014 Plenary talk on Rat Proteomics – Prof. Albert Heck, Netherland Proteomics Center
Sequencing & de novo assembly Stage 4 Control Stage 4 Stressed CAP3 assembly (In-house Python script) Peanut Pod Transcriptome 6-frame Translation (In-house Perl script) Peanut Pod specific proteome Spectral Counting Differentially expressed proteins 16 Fractions Tryptic Digested Annotation & Pathway mapping by MapMan LC-MS/MS (Whole Proteome) Kottapalli et al. 2013 JPR
Drought is a global problem threatening the human race In 2011, Texas Drought resulted in losses exceeding $ 4 billion
Water consumption is 10 - 40 times recharge • estimated depletion by 2050 based on pumping forecasts Ogallala Aquifer • The Ogallala supplies water for eastern New Mexico, much of west Texas, as well as parts of Colorado, Oklahoma, Kansas, Nebraska, Wyoming and South Dakota • Approximately 170,000 wells draw water from the aquifer • supplies 25% of water used for irrigation in the United States • Contains 3.3 billion acre feet of water (1 acre foot = 326,000 gallons)
Aerial Image 5dpi 6dpi 4dpi 7dpi 3dpi Pivot 1dpi 2dpi Approx. 24h
Objectives • Major objective of our Group • Identify crop varieties for abiotic stress tolerance • Heat stress • Water-deficit stress What are the complex (?) mechanisms of abiotic stress tolerance and the genetic basis of plant adaptation to drought stress?
Introduction SYSTEMS APPROACH Genomics Transcriptomics Epigenomics Interferomics CELL Metabolomics Proteomics Phenomics
CROP SCIENCE, VOL. 47:1718-1727 2007 Phenomics U.S. Peanut Mini-core Collection (Holbrook and Dong, 2005)
COC041 Conviron Program 450C Temperature 280C 7.0 13.0 15.0 21.0 Time h COC166 Phenomics Heat stress experiment 45 days old plants – subjected to 45 ± 2°C thermal stress in Convirons. Leaf samples collected at 0, 12, 24 and 48 hours after induction of stress. Phenotype 14 days after stress
COC041 COC166 Phenomics Water-deficit stress experiment 45 days old plants – subjected to water-deficit stress for 7 days. Leaf and root samples collected at 0, 3, 5 and 7 days after cessation of irrigation Phenotype 5 days post water-deficit stress
* seed yield (g/plant) * 70% PETR Full Irrigation COC041 pvalue COC166 Phenomics
Transcriptomics Gene Expression Analysis by Sequencing RNA-Seq Wang et al. 2009
Transcriptomics MapMan Annotation– Mercator tool 43520 genes Complete EST sequences in FASTA format
Transcriptomics 0 day Tolerant
Transcriptomics 2 day Tolerant
Gel-LC-MS/MS 16 Fractions Tryptic Digested LC-MS/MS (Whole Proteome) Proteomics Gel-MALDI/TOF 1-D (Chip) kDa L S1 S1 S2 S2 S3 S3 S4 S4 Total Protein 4 2-D 7
Integration of Omics data PGK KAR KAS SAD PDC Oleosin ACCase
Water-deficit/Heat-stress signal ARG2 HSP Calmodulin-binding protein, SRP 5, SAPK3, myo-inositol, protein kinases Lectin DREB Proline MYB nucleus G-Taurine TPR Acetyl CoA carboxylase Malonyl CoA Acetyl CoA Chloroplast Mitochondria Short chain PUFA Flavanoid Long FA/Glycero- phospholipids Oxidative phosphorylation Endoplasmic Reticulum Lignin monomer RuBisCO Lypoxygenase Antioxidants Methionine synthase Jasmonic Acid Methylated lignin monomer Chlorophyll a/b Thick cuticular wax Lignified cell walls Cytochrome b/c Hypothetical Model
Interferomics Col-0 Heat Sensitive mutant KO Heat tolerant mutant 1-11 1-03 2-05 b 2-06 Tolerate 117⁰F for 30 min a