Geuvadis WP4: RNA sequencing Progress, Aims and Data

Geuvadis WP4: RNA sequencingProgress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva

Genomics, meet transcriptomicsRNA sequencing of ~500 individuals from the 1000 Genomes FIN GBR CEU TSI Integrated haplotypes of SNPs, indels, structural variants of total ~ 13M variants + mRNAseq + miRNAseq YRI

Why are we doing this? We might want to do RNAseq on a big scale. What do we get out of it? How should we do it? At least we did lots of cool science.This is how we created the data and analyzed it. I have all these variants from my sequencing study but I don’t know what’s functional. Here’s a pretty good catalogue of regulatory variants. We can also start to predict functional consequences of novel variants based on their properties. I want to use 1000g data in my research, but is there any functional data available? Yes – this the largest genome+transcriptome reference dataset thus far. You can use it in your own research (after our paper is out).

Samples Transformed lymphoblastoid cell lines from Coriell & UNIGE Cell culture at ECACC: Cell pellets for RNA isolation + cell banks for all the partners RNA extracted at UNIGE Sequencing in 7 partner labs Randomization of the sample processing UU 48 ICMB 72 LUMC MPIMG 72 48 HMGU 48 UNIGE 116+168 CRG/CNAG/USC 96

Sequencing • mRNAseq: 2 x 75bp, minimum of 20M mapping reads per sample • total ~15 billion mapping reads • miRNAseq: 1 x 36bp, minimum of 3M total reads per sample • total ~1 billion mapping reads • All sequencing in HiSeq with the latest TruSeq kits • standardization of the methods as much • as possible

Progress and timeline Thomas / Tuuli mappingQC cell line shipments and growing 2010: Pilot of 5 samples, 7 labs sequencing 2011 2012 RNA extraction pilot study design sample selection RNA extraction 10/1212 paper submission

Documentation : wiki http://www.geuvadis.org/group/geuvadis/wikis . Tech support from Gabrielle gabrielle.bertier@crg.eu Contents of the WP4 Wiki • Analysis • Analysis results, methods, etc • Data storage • Locations and descriptions of data files found in EBI (ENA/Arrayexpress or FTP site) • The Wiki is only for sharing small result files, not actual data • Partners and contact info • WP4 participants • Protocols • Protocols, from cells to fastq files • Samples • Information of the samples included in the project, including sample lists for sequencing • Teleconference minutes • Presentations • Presentation slides and abstracts documentation of any analysis that is used by the consortium is obligatory

Data storage : ftp ftp:ftp-private.ebi.ac.uk/upload/geuvadis/wp4_rnaseq/main_project/ Tech support from Natalja (natalja@ebi.ac.uk)

Status of the data: mRNA • Fastqs • All filtered, uploaded to ftp, sample information sheets sorted out, checksums OK • 464 samples in total (1 failed sequencing QC) • Mapping • bwa (Tuuli/Ismael) • All done and uploaded to the ftp site • GEM (Micha/Thasso/Paolo) • GEM files are done. Bam conversion coming • Quantifications • Exon quantifications • bwa: all done and uploaded to the ftp site • GEM • deconvoluted from flux: ready to upload? • read counts: once the bams are done • Transcript quantifications from flux: ready to upload? • QC and normalization • No sample swaps. 5 samples that show signs of cross-contamination. Expression outliers – soon • QTL analysis needs normalization to remove technical variation

mRNA quality statistics

mRNA quality statistics: replicates reference: HG00117, lab1_batch1

mRNA quality statistics: all full-coverage samples

Status of the data: miRNA • Fastqs • All except 48 from Kiel uploaded to ftp, sample information sheets sorted out, checksums OK • Processing of the data ongoing (Marc F) • trimming, mapping, QC

Status of the data: genotypes • 422 individuals from 1000g Phase 1 are OK • genotypes in the final format uploaded to the ftp site • imputation of the Phase 2 individuals • issues either with the input haplotypes from 1000g or filtering of the reference panel… • annotation of the variants • most of the information from 1000g Functional Interpretation Group + additional info by Tuuli and Manny • will be included in the vcf files, format customized from VAT and documented in the wiki VA=1: AlleleNumber C1orf159: GeneName ENSG00000131591.12: GeneID -: Strand nonsynonymous: Type 2/8: FractionOfTranscriptsAffected C1orf159-201: TranscriptName ENST00000294576.5: TranscriptID 23468_23597: ExonStartPosGenomic_ExonEndPosGenomic: 3/7: ExonNumber/TotalExonNumberInTranscript: 1035_944_315_R->Q_1035 TranscriptLength_ PositionOfVariantInTranscript_ PositionOfAminoAcidInPeptide_ AminoAcidChange_ AltAlleleTranscriptLength

(Some of the) questions that we should address • How to do transcriptomics in a big scale? • technical covariates, batch effects, replicates • low-level data processing • SNP calling from RNAseq data • How does the transcriptome vary and interact? • quantitative/qualitative mRNA variation • population variation of miRNAs • interactions (mRNA-miRNA), coexpression networks • Catalogue of genetic variants in 1000g that affect transcriptome variation • common eQTLs, sQTLs, variation QTLs, loss of function variants… • What are the mechanisms underlying regulatory variants? • Functional annotation of regulatory variants • Mapping of causal regulatory variants • Interpretation: population and evolutionary genetic analysis, disease aspects….

The consortium UNIGE (Geneva) Manolis Dermitzakis StylianosAntonarakis Tuuli Lappalainen Thomas Giger Emilie Falconnet Luciana Romano Alexandra Planchon IsmaelPadioleau Alisa Yurovsky CRG/CNAG/USC (Barcelona) Xavier Estivill Ivo Gut RodericGuigo Angel Carracedo Alvarez Gabrielle Bertier MichaSammeth ThassoGriber Paolo Ribeca Pedro Ferreira Jean Monlong Esther Lizano Marc Friedländer Marta Gut SergiBertranAgullo ICMB (Kiel) Stefan Schreiber Philip Rosenstiel Matthias Barann MPIMG (Berlin) Hans Lehrach Ralf Sudbrak Marc Sultan VyacheslavAmstislavskiy LUMC (Leiden) Gert-Jan van Ommen Peter ‘tHoen Irina Pulyakhina UU (Uppsala) Ann-Christine Syvänen OlofKarlberg Jonas Almlöf Mathias Brännvall HMGU (Munich) Thomas Meitinger Tim Strom Thomas Wieland Thomas Schwarzmayr EBI AlvisBrazma NataljaKurbatova Oxford University Manuel Rivas Massachusetts General Hospital Daniel McArthur ECACC Bryan Bolton Karen Ball Edward Burnett Jim Cooper Who is missing??

Geuvadis WP4: RNA sequencing Progress, Aims and Data

Geuvadis WP4: RNA sequencing Progress, Aims and Data

Presentation Transcript

Metabarcoding 16S RNA targeted sequencing

RNA sequencing – a basic introduction

Samples for RNA sequencing

RNA sequencing for differential expression genes

RNA Sequencing

Geuvadis WP4: RNAseq Analysis plans and practicalities

RNA sequencing : Opportunities and Challenges

Geuvadis WP4: RNA sequencing Progress, Aims and Data

Count data analysis in the small RNA sequencing

Fig. S1 Overview of small RNA sequencing data

WP4 progress and planning University of Helsinki

Genome Sequencing and Assembly Progress

Progress Report WP4: Awareness raising, dissemination and communication

Task 4.1 (WP4) – Summary of progress

single cell RNA sequencing

1010Genome launches RNA Sequencing and Transcriptome Analysis Services

Targeted rna sequencing

bacterial rna sequencing

RNA sequencing, transcriptome and expression quantification

DataGRID WP4 Installation Task - in progress

Progress report on WP4 in UC3M

Single Cell RNA Sequencing Solution Market