330 likes | 349 Views
Explore the comparison between the ENCODE and GTEx projects in the EN-TEx dataset, including regulatory element annotations, personal genome reconstructions, and gene expression analysis.
E N D
ENCODE updates Anna Vlasova 08/02/2017 Group meeting
Outline • EN-TEx project • Comparison with the GTEx • Personal genomes • ENCODE Encyclopedia v.3 • Assay matrix • Annotations, candidate regulatory elements • Data access 2
EN-TEx project • A collaboration between GTEx and ENCODE projects • Sequencing: 4 donors x ~20 tissues • EN-TEx Assays: • total RNAseq • small RNAs, micro RNAs, • RAMPAGE • histone marks • ATAC-Seq • Genotyping arrays • Methylation arrays • DNA-Seq: Illumina, PacBio, 10xGenomics 3
EN-TEx project EN-TEx datasets Comparison with GTEx Regulatory elements annotation Personal genome reconstruction 5
EN-TEx project.GTEx comparison • EN-Tex: • total RNAseq,stranded • GTEx: polyA+, non stranded • Non polyA+ RNAs pattern in different tissues • Circular RNAs • Antisense expression • Retained introns • Novel isoforms 6
EN-TEx project.GTEx comparison PCA plot for EN-Tex and GTEx v.7 samples, RPKMs, 46,093 long genes 7
EN-TEx project.GTEx comparison PCA clustering after batch correction with limma Actual EN-Tex vs GTEx differences might be also removed Differences between tissues are dominating 8
Gene expression examples ENSG00000259001.2, gene_name=RPPH1,gene_type=antisense biggest RC value After batch correction with limma Before batch correction The high expression level of this gene in the total RNAseq samples was previously reported Variability in the protocols totalRNA vs polyA+ was removed because it is completely overlap with the batch 16
EN-TEx project.GTEx comparison Differential gene expression analysis between EN-Tex and 80 samples from GTEx Distribution of the upregulated genes by categories EN-Tex GTEx Does it makes sense to compare datasets in such a way? 10
EN-TEx project.Personal genome reconstruction 60x 55x 35x from Michael Schatz 11
EN-TEx project.Personal genome from Alex Dobin 12
EN-TEx project.Personal genome from Alex Dobin 13
Diploid annotation Transcript ENST00000513158.1, gene ENSG00000251314.2 has significant differences between two haplotypes Haplotype 2 Haplotype 1 Reference annotation 14
Diploid annotation chr5_1 HAVANA gene 95266864 95889235 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENSG00000251314.2_1"; chr5_1 HAVANA transcript 95266864 95684208 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000502645.2_1"; chr5_1 HAVANA transcript 95266865 95623568 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000511775.1_1"; chr5_1 HAVANA transcript 95888695 95889235 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000513158.1_1"; chr5_2 HAVANA gene 95264535 95933349 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENSG00000251314.2_2"; chr5_2 HAVANA transcript 95264535 95933349 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000502645.2_2"; chr5_2 HAVANA transcript 95264536 95621206 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000511775.1_2"; chr5_2 HAVANA transcript 95886762 95933264 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000513158.1_2"; 15
Diploid quantification Proportion of allelic expression within an isoform Haplotype _1 or _2 (maternal/paternal) Proportion of allelic expression within a gene
Diploid quantification Gene quantification = sum of all reads assign to all isoforms in both haplotypes 17
EN-TEx project.Personal genome This data is not ready yet! Another personal genome available for training and analysis: GM12878 Mark Gerstein 18
hap1 hap2 5 4 9 Diploid annotation.GM12878 Number of genomic features calculated per chr1-22 and chrX. chrY, chrM and scaffolds are excluded 19
Diploid annotation.GM12878 Transcrip lengths Gene lengths Exon lengths 20
Gene expression.GM12878, total RNAseq x 2 replicates Correlation between diploid and reference mappings Correlation between replicates in each mappings Replicate 1 Replicate 2 21
Gene expression difference.GM12878 There are 147 genes DE , edgeR , log(FC)>=2, FDR<=0.01 Up-regulated in diploid quantifications Up-regulated in reference quantifications 22
Allele speicific gene expression.GM12878 Allele specific expression statistics 23
ENCODE Encyclopedia v.3 26 https://www.encodeproject.org/
ENCODE Encyclopedia v.3 • In total there are >13,000 experiments • ENCODE, modEncode, Roadmap(REMC), Genomics of Gene Regulation (GGR) and encyclopedia of regulatory networks (modERN) • Human, Mouse, Drosophila, C.elegans • Different assays, >40 types • Among others: single cell experiments, 3D chromatin interaction, shRNA/CRISPR genome editing,.. • Uniform data processing. • Pipelines are available in the github 27
ENCODE Encyclopedia v.3 https://www.encodeproject.org/data/annotations/ 28
ENCODE Encyclopedia v.3. Middle level • Number of annotation file sets: expression matrices, promoter/enhancer-like regions, blacklisted regions • Registry of Candidate Regulatory Elements (cREs) • List of candidate enhancers/promoters based on DNase and H3K27ac/H3K4me3 signals • ~2.6M human cREs and ~1.6M mouse CRs • Cell-type specific, data is in the bed format • Web-based tool to access cREs SCREEN http://zlab-annotations-v4.umassmed.edu/ 29
ENCODE Encyclopedia v.3. Top level • Complete set of chromatin states for well characterized epigenomes • Human cell types and mouse embryonic tissues • ChromHMM models for >260 experiments • Segway tool: semi-automated genomic annotation • Cell type-specific annotations and encyclopedia (164 human cell types) • Contiguous regions of high functionality score • Functional labels: inactive region,transcribed, promoter, bivalent,... http://noble.gs.washington.edu/proj/encyclopedia/ 30
ENCODE Encyclopedia v.3. Segway annotation A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types, https://doi.org/10.1101/086025 31
ENCODE Encyclopedia v.3 • Metadata for the files can be found • in ENCODE portal • in the Julien’s index file. index=/users/rg/jlagarde/projects/encode/scaling/whole_genome/3ncod3_production_files/files_local_fullpath_dcc_list.txt 32
Thank you! 33