260 likes | 455 Views
Oryza. Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof. BIF-30806 8-3-2013. Goal. Generate a platform to analyze gene expression of Saccharomyces cerevisiae using RNAseq data.
E N D
Oryza Arjan van Zeijl Claire LessaAlvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013
Goal • Generate a platform to analyze gene expression of Saccharomyces cerevisiaeusing RNAseq data. • Compare high expressed genes vs. low expressed geneson exon-intron length, GC-content, codon-usage.
MoSCoW MustTopHat, Cufflink ShouldExon-Intron length, GC content CouldGO-annotation, Codon-usage, Palindromes WouldChemostatanalysis, Cytoscape
Pipeline Trimmed RNAseq data TopHat Untrimmed Cufflinks Exon – Intron length GC content Sequence retrieval GO-terms NCBI data Palindrome Codon-usage Validation
Data output RNAseq data Selected Top100 genes per 20% batches of total genes FPKM-value 100 genes 100 genes 100 genes 100 genes 100 genes Perc 1 Perc 2 Perc 3 Perc 4 Perc 5 0-20% 20-40% 40-60% 60-80% 80-100%
NCBI data LOCUS NP_014825 63 aa linear PLN 25-FEB-2013 DEFINITION ribosomal 40S subunit protein S30B [Saccharomyces cerevisiae S288c]. ACCESSION NP_014825 VERSION NP_014825.3 GI:398365605 DBSOURCE REFSEQ: accession NM_001183601.3 KEYWORDS . SOURCE Saccharomyces cerevisiae S288c ORGANISM Saccharomyces cerevisiae S288c Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. ...
Exon - Intron length Ribosomal 40S subunit protein S30B ID SHORT EXON INTRON FPKM CDS GC_CDS L_PALIN GC_PALIN YOR182C RPS30B 192 412 15623.7 189 41.27 0 -
GC content Does more GC means more mRNA? Claire
Palindrome • IR: at least 6 bp long, spacers maximum 10 bp • Conservation: IR must be identical, spacer not Comparative genome analysis suggests characteristics of yeast inverted repeats that are important for transcriptional activity (2011) Humphrey-Dixon EL, Sharp R, Schuckers M, Lock R. Genome 54(11):934-42
Palindrome • Comparative analysis in 4 Saccharomyces genomes: • S. cereviseae • S. paradoxus • S. mikatae • S. bayanus IR in S. cereviseae Conserved in the 4 species • Crossed the top 100 gene lists with the palindrome list to create 3 hash tables using the gene ID as keys: • %gene_palin; • %gene_palinseq; • %GC_palin;
Palindrome length Percentiles 1
Codon usage • Previous studies indicated more extreme codon usage preference in highly expressed genes (Sharp, 1986; Plotkin, 2011) • Codon usage bias was shown to correlate with tRNA abundance (Sharp, 1986) • Non-optimal codons might slow down translation, to allow correct protein folding (Pechmann, 2013) • HOT TOPIC: 2 papers in Nature this week • Non-optimal codon usage is important for circadian clock rhythms
Codon usage • MEASURE: Relative Synonymous Codon Usage (RSCU) • Took mean RSCU over genes in top 100 for each class • Problem annotation: CDS not always dividable by three
GO term enrichment • Long list top 100 • Basically two processes, components, functions • Ribosome and translation related • Glycolysis/gluconeogenesis related • Zoom in on part of the table
Validation • Technical validation use 4 paired end RNA-seq reads • Create multiple copies (total 200, each 25 %) • Run pipeline: 5 hits found! (one maps on two homologous gene on two chromosomes) • FPKM values not equal (large length differences), so this is right
Conclusion • High expressed genes have a high chance to contain introns. • There is a correlation between palindrome length and gene expression. • There is a preference for codon usage in highly expressed genes. • Highly expressed genes are richer in GC content and are shorter • Large differences exist in GC, intron/exon, palindromes and in GO termsbetween the top 100 and the rest