260 likes | 404 Views
Oklahoma and Kansas BRICNET: a B ioinformatics R esearch I nspired C yber NET work. Presenter: Rakesh Kaundal Oklahoma State University Oklahoma EPSCoR Track II Oklahoma State Regents for Higher Education Office, Oklahoma City Wednesday November 16, 2011.
E N D
Oklahoma and Kansas BRICNET: a Bioinformatics Research Inspired Cyber NETwork Presenter: Rakesh Kaundal Oklahoma State University Oklahoma EPSCoR Track II Oklahoma State Regents for Higher Education Office, Oklahoma City Wednesday November 16, 2011
Oklahoma EPSCoR CI(Status and Goals) • RII Track-1: Building Oklahoma’s Leadership Role in Cellulosic Bioenergy • RII Track-2: A cyberCommons for Ecological Forecasting (OK, KS collaboration) • RII Cyber Connectivity (C2): Enhancement of inter-campus and intra-campus cyber connectivity and broadband access within an EPSCoR jurisdiction: Oklahoma Optical Initiative • Oklahoma Cyberinfrastructure Initiative (OCII): MoU between OU & OSU, OneNet connectivity • NSF Major Research Instrumentation (MRI) • Deploying Oklahoma PetaStore (PI: H. Neeman, OU) • New HPC cluster COWBOY (PI: D. Brunson, OSU)
Existing Cyberinfrastructure Many universities/institutes with intersecting bioinformatics needs • Education: OK universities • Biology, Computational Science, Engineering & other STEM: faculty at most OK schools • High Performance Computing: HPCC (OSU), OSCER (OU), TCS (TU); faculty at OU, OSU, TU • High Performance Networking: OneNet, NLR, Internet2 • Sensor Networks: Mesonet; CASA; CyberCommons, Eddy Flux towers, Biosensor projects; faculty at OU, OSU, TU • Grid Computing: OSCER(Computing), OCHEP(Physics), LEAD(Meteorology); BRICNET (Bioinformatics)? OK-BioGRID (Bioinformatics)?
BioinformaticsWhy it’s useful… • All of the information needed to build an organism is contained in its DNA. If we could understand it, we would know how life works. • Preventing and curing diseases like cancer (which is caused by mutations in DNA) and inherited diseases. • Curing infectious diseases (everything from AIDS and malaria to the common cold). If we understand how a microorganism works, we can figure out how to block it. • Understanding genetic and evolutionary relationships between species • Understanding genetic relationships between humans. Projects exist to understand human genetic diversity • Similarly, other Eukaryotes are being sequenced including plants, e.g. to understand plant diseases, their tolerance under stress conditionsetc. • Prokaryotes, Metagenome sequencing……. Susceptible Resistant Abiotic stress
BioinformaticsWhy it’s useful… Complete Understanding of a System Image courtesy: Center for Biological Sequence Analysis, DTU, Denmark
Why Bioinformatics CI is neededThe sequencing pace… • Nucleic acid sequences • Genbank (April 2011) http://www.ncbi.nlm.nih.gov/genbank/ • 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions • 191,401,393,188 bases in 62,715,288 sequence records in the Whole Genome Sequencing • Entire genomes • GOLD Release V.2 (Oct 2011) contains ~2000 completely sequenced genomes. • http://www.genomesonline.org/gold_statistics.htm • Protein sequences • Essentially obtained by translation of putative genes in nucleic sequences (almost no direct protein sequencing). • UniProtKB/TrEMBL (2011) contains 17 million of protein sequences. • http://www.ebi.ac.uk/swissprot/sptr_stats/index.html
Data Explosion!!!!! Biological data production is in terabytes and increasing everyday…….
Multidisciplinarity molecularbiology Bioinformatics & Computational Biology genomics mathematics genetics statistics biochemistry numerical analysis biophysics algorithmics evolution image analysis datamanagement
What is Cyberinfrastructure? Resources and capabilities that enable high end computing & communications for science, engineering and technology that solve real world problems. • Education:Postsecondary, K-12 • High Performance Computing • High Performance Networking • Computational Science & Engineering • Grid Computing • Scientific Visualization • Sensor Networks • Shared Instruments • Shared Databases
What is CI for Bioinformatics? Resources and capabilities that enable high end computing & communications for bioinformatics & computational biology that solve real biological problems. • Education:Postsecondary, K-12 • High Performance Computing • High Performance Networking • Computational Science & Engineering • Grid Computing (BioinfoGRID) • Scientific Visualization (gene regulatory networks, host-pathogen) • Sensor Networks (disease prediction) • Shared Instruments • Shared Databases
BRICNET Collaborative Cyber-enabled Scientific Themes(relevant to Oklahoma and has National importance)
Global Warming and Rhizosphere Metagenomics • The Role of Microbes in Maintaining Atmospheric Carbon Dioxide Balances
Global Warming and the Rhizosphere Community • The rhizosphere, the region of soil immediately surrounding the plant root, is directly influenced by root exudates and associated microorganisms • Because CO2 plays important role in global climate change, understanding rhizosphere microbial community dynamics is fundamental to the plant health and fate of carbon
SwitchgrassRhizosphere Community Metagenome (leverage current EPSCoR Track 1 on Cellulosic Bioenergy) • We will grow switchgrassunder varied CO2 levels in controlled environments to measure impacts on microbial diversity and functional capacity using metagenome analysis. • Masssive sequence information will be generated. Bioinformatics toolswill be developed; critical for analysis and comparison of sequence data
Bioinformatics for Gene – Phenotype BRICNET for Bioenergy GENE LEVEL Phenotype level Quantitative Genetics Marker-assisted selection CROP PHENOTYPE QTL Marker Technology MOLECULAR MARKERS DNA SEQUENCES G X E M X Molecular Genetics CROP MODEL Process Level Functional Genomics GENE FUNCTION AND NETWORK CELL MODEL Pathway Level TRANSCRIPTS PROTEINS METABOLITES Phenomics GENOME CELL CROP
Gaps in current CI (Example: current Track 1 on Bioenergy) DNA sequence -------------------------------------------------------------------Phenotype • Outsourcing of Data Analysis • Sequence, Microarray, Genomic-SSRs, EST-SSRs, miRNA analysis • miRNA/siRNA: 20-30 million reads • Washington University, St. Louis China • Sequence processing and analysis • Switchgrass ESTs: 1 million reads (454 seqn) • Oregon University Danforth Center, St. Louis (MO) • Assembly and annotation • Lack of infrastructure / trained bioinformatics personnel
Gaps in current CI DNA sequence -------------------------------------------------------------------Phenotype • Outsourcing of Data Analysis • Metagenomics • Data assembly, classification, annotation: • Pittsburgh Supercomputing Center • Current resources are over whelmed • Lack of integration of data streams from several disciplines • Time lag – Weeks to months and queuing at peak demand
Bioinformatics for Biosecurity A bioinformatics approach to understand host-pathogen interactions Effectors: Bacterial proteins that are injected into the host cell through a type III secretion system to manipulate host cells. Ralstoniasolanacearumis on of the world’s most important plant pathogen that has very wide host range and cause significant losses to agriculture Burkholderiapseudomalleiis broad host range bacteria that causes melioidosis disease in humans and various live stocks. Interestingly, it can also infect few plant species. Goals: Use a bioinformatics (machine learning) approach to identify host proteins that could potentially interact with bacterial effectors. Experimentally validate the potential interactions using techniques like Yeast-two-hybrid analyses and bimolecular fluorescence complementation.
Integration of Scientific Themes into CI Framework BRICNET Collaborative OSUHPCC OUTREACH DELIVERABLES OK BioGRID CYBER-ENABLED RESEARCH THEMES Decision Support Tools Sensor Networks (Micro Climate) Disease Forecasting Researchers SR NF Storage Server BRICNET Eddy Flux (Macro climate) Software Bioenergy End Users Microbiome sequencing Algorithms App Developers Biosecurity Sequence Data (ACGT, OU) Visualization Tools Servers Community Biomedicine Sequence Data (OMRF) Databases OSCER Higher Ed. Systems Biology Sequence Data cyberCommons Development of BCB resources for Research, Education and OutREACH
University of Oklahoma (8) Botany:L.E. Bartley Chemistry & Biochemistry:B.A. Roe, F. Najar, S.W. Clifton Computer Information Sciences:H. Neeman Microbiology: T. Conway, J. Grissom Oklahoma Climatological Survey: J.B. Basara Samuel Roberts Noble Foundation (2) Bioinformatics: P.X. Zhao Plant Biology: K.S. Mysore Oklahoma Medical Research Foundation (1) Clinical Immunology: J.D. Wren Cameron University (1) Biology: L. Peal Oklahoma State U, Stillwater (15) Biochemistry: R. Kaundal, U. Melcher, P. Hoyt, M. Mahalingam Botany: M. Palmer Computer Science: S. Kak Industrial Engineering: B. Balasundaram, S. Bukkapatnam Information Technology: D. Brunson Microbiology: B. Fathepure Plant Pathology: J. Fletcher, S. Marek Plant & Soil Sciences: G. Kakani, M. Anderson Statistics: M. Payton Oklahoma State U, Tulsa (1) Center for Health Sciences: R. Kaul Langston U (3) Biology: K.J. Abraham, G. Naidoo Computer Information Sciences: P.F. Tiako Oklahoma City U (1) Computer Science: K. Sha BRICNET Participants(Oklahoma)
OUTREACH Develop BioREACH program vercome O U T R E A C H nderstanding raining esearchers nd users pp developers ommunity igher education
Summer Schools • Summer 2013, 2014, 2015 • 1 week @ each participating institute (rotation-wise) • Lectures plus hands-on exercises to students • Students of differing backgrounds (Bio + CS), minorities • Reaching a wider audience • Lectures, exercises, video, on web • More tutorials, 3-4/year • Students, postdocs, scientists • Agency specific tutorials
Summary CI for Bioinformatics: • Enables large, lasting improvements in education, research, intrastate collaboration and economic development across Oklahoma • Leverages existing resourcesacross Oklahoma • Alignswith objectives of individual researchers, teams, the state, other OK RII themes, and the NSF • Spreads the capabilities without diluting the focus • Lots of NSF funding opportunities
Thank You for your Attention! BRIC N E T ? ECFN