180 likes | 353 Views
Geuvadis WP4: RNA sequencing Progress, Aims and Data. Tuuli Lappalainen University of Geneva. Geuvadis Analysis Group Meeting, April 16, 2012, Geneva. Genomics, meet transcriptomics RNA sequencing of ~500 individuals from the 1000 Genomes. FIN. GBR. CEU. TSI.
E N D
Geuvadis WP4: RNA sequencingProgress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva
Genomics, meet transcriptomicsRNA sequencing of ~500 individuals from the 1000 Genomes FIN GBR CEU TSI Integrated haplotypes of SNPs, indels, structural variants of total ~ 13M variants + mRNAseq + miRNAseq YRI
Why are we doing this? We might want to do RNAseq on a big scale. What do we get out of it? How should we do it? At least we did lots of cool science.This is how we created the data and analyzed it. I have all these variants from my sequencing study but I don’t know what’s functional. Here’s a pretty good catalogue of regulatory variants. We can also start to predict functional consequences of novel variants based on their properties. I want to use 1000g data in my research, but is there any functional data available? Yes – this the largest genome+transcriptome reference dataset thus far. You can use it in your own research (after our paper is out).
Samples Transformed lymphoblastoid cell lines from Coriell & UNIGE Cell culture at ECACC: Cell pellets for RNA isolation + cell banks for all the partners RNA extracted at UNIGE Sequencing in 7 partner labs Randomization of the sample processing UU 48 ICMB 72 LUMC MPIMG 72 48 HMGU 48 UNIGE 116+168 CRG/CNAG/USC 96
Sequencing • mRNAseq: 2 x 75bp, minimum of 20M mapping reads per sample • total ~15 billion mapping reads • miRNAseq: 1 x 36bp, minimum of 3M total reads per sample • total ~1 billion mapping reads • All sequencing in HiSeq with the latest TruSeq kits • standardization of the methods as much • as possible
Progress and timeline Thomas / Tuuli mappingQC cell line shipments and growing 2010: Pilot of 5 samples, 7 labs sequencing 2011 2012 RNA extraction pilot study design sample selection RNA extraction 10/1212 paper submission
Documentation : wiki http://www.geuvadis.org/group/geuvadis/wikis . Tech support from Gabrielle gabrielle.bertier@crg.eu Contents of the WP4 Wiki • Analysis • Analysis results, methods, etc • Data storage • Locations and descriptions of data files found in EBI (ENA/Arrayexpress or FTP site) • The Wiki is only for sharing small result files, not actual data • Partners and contact info • WP4 participants • Protocols • Protocols, from cells to fastq files • Samples • Information of the samples included in the project, including sample lists for sequencing • Teleconference minutes • Presentations • Presentation slides and abstracts documentation of any analysis that is used by the consortium is obligatory
Data storage : ftp ftp:ftp-private.ebi.ac.uk/upload/geuvadis/wp4_rnaseq/main_project/ Tech support from Natalja (natalja@ebi.ac.uk)
Status of the data: mRNA • Fastqs • All filtered, uploaded to ftp, sample information sheets sorted out, checksums OK • 464 samples in total (1 failed sequencing QC) • Mapping • bwa (Tuuli/Ismael) • All done and uploaded to the ftp site • GEM (Micha/Thasso/Paolo) • GEM files are done. Bam conversion coming • Quantifications • Exon quantifications • bwa: all done and uploaded to the ftp site • GEM • deconvoluted from flux: ready to upload? • read counts: once the bams are done • Transcript quantifications from flux: ready to upload? • QC and normalization • No sample swaps. 5 samples that show signs of cross-contamination. Expression outliers – soon • QTL analysis needs normalization to remove technical variation
mRNA quality statistics: replicates reference: HG00117, lab1_batch1
Status of the data: miRNA • Fastqs • All except 48 from Kiel uploaded to ftp, sample information sheets sorted out, checksums OK • Processing of the data ongoing (Marc F) • trimming, mapping, QC
Status of the data: genotypes • 422 individuals from 1000g Phase 1 are OK • genotypes in the final format uploaded to the ftp site • imputation of the Phase 2 individuals • issues either with the input haplotypes from 1000g or filtering of the reference panel… • annotation of the variants • most of the information from 1000g Functional Interpretation Group + additional info by Tuuli and Manny • will be included in the vcf files, format customized from VAT and documented in the wiki VA=1: AlleleNumber C1orf159: GeneName ENSG00000131591.12: GeneID -: Strand nonsynonymous: Type 2/8: FractionOfTranscriptsAffected C1orf159-201: TranscriptName ENST00000294576.5: TranscriptID 23468_23597: ExonStartPosGenomic_ExonEndPosGenomic: 3/7: ExonNumber/TotalExonNumberInTranscript: 1035_944_315_R->Q_1035 TranscriptLength_ PositionOfVariantInTranscript_ PositionOfAminoAcidInPeptide_ AminoAcidChange_ AltAlleleTranscriptLength
(Some of the) questions that we should address • How to do transcriptomics in a big scale? • technical covariates, batch effects, replicates • low-level data processing • SNP calling from RNAseq data • How does the transcriptome vary and interact? • quantitative/qualitative mRNA variation • population variation of miRNAs • interactions (mRNA-miRNA), coexpression networks • Catalogue of genetic variants in 1000g that affect transcriptome variation • common eQTLs, sQTLs, variation QTLs, loss of function variants… • What are the mechanisms underlying regulatory variants? • Functional annotation of regulatory variants • Mapping of causal regulatory variants • Interpretation: population and evolutionary genetic analysis, disease aspects….
The consortium UNIGE (Geneva) Manolis Dermitzakis StylianosAntonarakis Tuuli Lappalainen Thomas Giger Emilie Falconnet Luciana Romano Alexandra Planchon IsmaelPadioleau Alisa Yurovsky CRG/CNAG/USC (Barcelona) Xavier Estivill Ivo Gut RodericGuigo Angel Carracedo Alvarez Gabrielle Bertier MichaSammeth ThassoGriber Paolo Ribeca Pedro Ferreira Jean Monlong Esther Lizano Marc Friedländer Marta Gut SergiBertranAgullo ICMB (Kiel) Stefan Schreiber Philip Rosenstiel Matthias Barann MPIMG (Berlin) Hans Lehrach Ralf Sudbrak Marc Sultan VyacheslavAmstislavskiy LUMC (Leiden) Gert-Jan van Ommen Peter ‘tHoen Irina Pulyakhina UU (Uppsala) Ann-Christine Syvänen OlofKarlberg Jonas Almlöf Mathias Brännvall HMGU (Munich) Thomas Meitinger Tim Strom Thomas Wieland Thomas Schwarzmayr EBI AlvisBrazma NataljaKurbatova Oxford University Manuel Rivas Massachusetts General Hospital Daniel McArthur ECACC Bryan Bolton Karen Ball Edward Burnett Jim Cooper Who is missing??