240 likes | 389 Views
Bioinformatics Topics Not Covered in this Course BMI 730. Kun Huang Department of Biomedical Informatics Ohio State University. Non-coding RNA MicroRNA Related Bioinformatics Issues MicroRNA prediction and recognition Second order structure prediction Target prediction
E N D
Bioinformatics Topics Not Covered in this Course BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University
Non-coding RNA • MicroRNA Related Bioinformatics Issues • MicroRNA prediction and recognition • Second order structure prediction • Target prediction • Microbial Related Bioinformatics • Metagenomics • Other Omics • Other Informatics
Non-coding RNA • Non-coding DNA • Junk DNA • Pseudogenes • Retrotransposons - Human Endogenous Retroviruses (HERVs) • C-value enigma (e.g., Amoeba dubia genome has more than 670 billion bases; pufferfish genome is 1/10 of human genome) • Findings from ENCODE – nearly the entire genome is transcribed
Non-coding RNA (ncRNA) • Any RNA molecule that is not translated into a protein. • sRNA, npcRNA, nmRNA, snmRNA, fRNA • Also including tRNA, rRNA, snoRNA, microRNA (miRNA), siRNA, piRNA, long ncRNA (e.g., Xist), shRNA • Note the difference between siRNA and miRNA
Non-coding RNA (ncRNA) • RNA-induced silencing complex (RISC) • RNA-induced transcriptional silencing (RITS)
MicroRNA (miRNA) • Another level of regulation
a Myc E2F3 E2F1 E2F2 17-5p 17-3p 18a 19a 20a 19b 92-1 b c p Myc E2F 1 2 m mir-17-92 MicroRNA (miRNA) Reviewed by: Coller et al. (2008), PLoS Genet 3(8): e146 Figures from Dr. Baltz Agula
Non-coding RNA • MicroRNA Related Bioinformatics Issues • Secondary structure prediction • MicroRNA prediction and recognition • Target prediction • Databases
Secondary structure prediction • Applications • RNA folding dynamics • ncRNA discovery • Microarray probe validation/comparison Wang et al.Genome Biology 2004 5:R65
Secondary structure prediction • - Physics-based models • Minimizing free energy / Dynamical programming / other optimization schemes • Parameters come from empirical studies of RNA structural energetics (e.g., nearest neighbor interactions in stacking base pairs using synthesized oligonucleotides) • Restricted from experimental procedure • Scoring models are used • Most ignore sequence dependence of hairpin, bulge, internal, and multi-branch loop energies • Multi-branch loop energies rely on ad hoc scores • Still top performance • Mfold, ViennaRNA, PKnots, RDfold, etc
Secondary structure prediction • Probabilistic approach • Stochastic context-free grammars (SCFG) – e.g., QRNA • Specify grammar rules that induce a joint probability distribution over possible RNA structures and sequences • Parameter easily learnt without experiments • Parameters may not have physical meanings • Performance inferior to physics-model methods • Extensions: Conditional log-linear model (CLLM) – e.g., CONTRAfold • Integrate the learning procedure with energy-based scoring systems
Secondary structure prediction PKnotRG CONTRAfold
Secondary structure prediction • Comparative approach • Single sequence prediction (physics-based, SCFG) have difficulty in searching all configurations • Structures that have been conserved by evolution are far more likely to be the functional form
MicroRNA prediction and discovery • Experimental approach - cloning • MicroRNA array (OSU microarray facility) • Massive sequencing • Select segments in the range of 20-25nt • Using Solexa/SOLiD sequencer • Map to genome • Enrichment analysis / peak calling • Experimental validation
MicroRNA prediction and discovery • Bioinformatics / machine learning approach Wang et al.Genome Biology 2004 5:R65
MicroRNA prediction and discovery • Bioinformatics / machine learning approach • Using evolutionary information Nam, J.-W. et al. Nucl. Acids Res. 2005 33:3570-3581; doi:10.1093/nar/gki668
MicroRNA prediction and discovery • Bioinformatics / machine learning approach • Support vector machine / need features • Features: • Sequence features • Nucleotide frequency counts • Total G/C content • Folding features • Pairing propensity • Minimum free energy (MFE) • Topological features • Packing ratio
MicroRNA target Prediction • Experimental / bioinformatics approach • Blast can identify thousands potential targets – how to pin down the real ones?
MicroRNA target Prediction • Computational / bioinformatics approach • Mutually exclusive transcription pattern between miRNA and its targets • Microarray screening • Existing of complementary sequence • Context score – features • Machine learning approaches (e.g., SVM, regression, etc) • Cell, Volume 136, Issue 2, 215-233, 23 January 2009MicroRNAs: Target Recognition and Regulatory Functions • David P. Bartel
Non-coding RNA • MicroRNA Related Bioinformatics Issues • Secondary structure prediction • MicroRNA prediction and recognition • Target prediction • Databases
Databases • MicroRNA.org: http://www.microrna.org/microrna/getMirnaForm.do • MirBase: http://microrna.sanger.ac.uk • … • Target prediction • MIRDB • TargetScan (http://targetscan.org) • PicTar (http://pictar.bio.nyu.edu) • miRanda (part of Sanger database) • MirTarget • … • Softwares • List at http://en.wikipedia.org/wiki/List_of_RNA_structure_prediction_software
Non-coding RNA • MicroRNA Related Bioinformatics Issues • MicroRNA prediction and recognition • Second order structure prediction • Target prediction • Microbial Related Bioinformatics • Metagenomics • Other Omics • Other Informatics
Metagenomics • study of genetic material recovered directly from environmental samples • a community of spieces – e.g., microbial from the stomach of cow • Challenges: • Who are there? • How many? • 16S riRNA – universal primer, highly conserved, used for profiling • forward: AGA GTT TGA TCC TGG CTC AG • reverse: ACG GCT ACC TTG TTA CGA CTT • Next generation sequencing – more genes (chicken-and-egg) • Community metabolism – identify metabolic pathways within the community • New challenges: comparative study
Non-coding RNA • MicroRNA Related Bioinformatics Issues • MicroRNA prediction and recognition • Second order structure prediction • Target prediction • Microbial Related Bioinformatics • Metagenomics • Other Omics • Other Informatics