120 likes | 217 Views
Geuvadis Analysis Meeting. 16/02/2012 Micha Sammeth CNAG – Barcelona. Quantification of Splice-Forms and Variants. - Quantified 615 datasets based on the Gencode v7 annotation. - Sensitivity is a function of sequencing depth. For every transcript, normalized RPKM values and
E N D
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona
Quantification of Splice-Forms and Variants - Quantified 615 datasets based on the Gencode v7 annotation - Sensitivity is a function of sequencing depth • For every transcript, • normalized RPKM values and • number of deconvoluted reads Correlation coeff. 0.87 (Pearson and Spearman) - Discussion at the end if/what to do before uploading
LoF Definitions [MacArthur et al. 2012] LOF = loss of function of a complete transcript LoF types SNP that introduces (directly) stop codon Indels that disrupt/shift reading frame SNP that disrupts splice site Larger deletions that remove 1st exon or >50% of transcript LoF scope X “partial” LoFaffects just some protein-coding transcripts in a locus X “full” LoF affects all protein-coding transcripts annotated X
LoF Estimates Large deletion Splice across populations Frameshiftindel Large deletion Splice X X Stop in a single individual Stop Frameshiftindel [MacArthur et al. 2012]
Compare RNA-Seq evidence to LoF predictions main difference Geuvadis <> 1000 Genomes: RNA-Seq vs. DNA-Seq } Frameshiftindel directly from mappings / coverage by mappings Large deletion X X X X predicted disruption of splice site X indirectly called from mappings
Confirmation LoF SNPs in Geuvadis Stop - Take phase1 samples where polymorphisms have been found by exome sequencing - Additionally call SNPs by RNA-Seq (exzessive mappings) ~5000 differences, i.e.on average >2 out of 1000 calls differ Example: (not Geuvadis) >2 million genotype calls possible in both Experiments Sufficient coverage in DNA Sufficient coverage in RNA ~1000 cases where RNA is homozygous and DNA not could be explainable by allele-specific expression ~4000 cases where DNA is homozygous and RNA not (!!!) remove FPs from computational or experimental artifacts (PCR artifacts?)
A/A A/A A/G A/G G/G G/G Allele-specific RNA Processing relativeabundancedistribution 1st form relativeabundancedistribution 2nd form [Montgomery 2010 dataset] 100% 1st 2nd 50% Homozygote Common Allele 0% or 100% 0% or 50% Homozygote Minority Allele Heterozygote
LoF and Alternative Splicing (AS) “28.7% LoF events in a single individual affect only a subset of the known transcripts from the affected gene, Emphasizing the need to consider alternative splicing” [MacArthur et al. 2012] classification of AS influences in LoF based on a certain annotation (2) extension of an annotation by RNA-Seq evidence 5’ frame 3’ frame 2 0 2 1 ? X 2 0 X activation of latent splice sites
7 - 1,2,3,6 ^ 1,2,3,6 1,3,5,6 ^ 1,2,3,4,5,6,7 - 1,2,3,6 - 6 1,2,3,4,5,6,7 ^ [ 3,5,6 ^ - - - 1,2,3, 4,5,7 ^ 5 3,5,7 ] 1,2,3, 4,5,7 2 1,2,3,4,5,6,7 1,4 1 4 (1) classification of AS: AStalavista 1 2 3 4 5 6 7 bubble
7 - 1,2,3,6 ^ 1,2,3,6 1,3,5,6 ^ 1,2,3,4,5,6,7 - 1,2,3,6 - 6 1,2,3,4,5,6,7 ^ [ 3,5,6 ^ - - - 1,2,3, 4,5,7 ^ 5 3,5,7 ] 1,2,3, 4,5,7 2 1,2,3,4,5,6,7 1,4 1 4 (2) AS discovery by RNA-Seq Novel exon junctions supported by RNA-Seq add to graph, novel events extend annotated CDSs
My Points • Quantifications: do you want a normalization before uploading or is this in the responsibility of the analyzing group? • Quantifications: • Timeline for studies—main paper Oct-end of the year. • Separate publications possible if there is sufficient material for a separate story? • What would be the constraints for a separate publication on Geuvadis data?
Acknowledgements ThassoGriebel (PhD): Error Models, Pipelining Paolo Ribeca(PhD), Santiago Marco: GEM mapper + conversion EmanueleRaineri (PhD): SNP calling