150 likes | 291 Views
Summary of Preliminary Analysis of the Genes and Transcripts Group. ENCODE Analysis Workshop July 14-17,2005 Bethesda, Md. Central Goal: Identification and characterization of all transcribed regions of the ENCODE. Aims of the Analyses:
E N D
Summary of Preliminary Analysis of the Genes and Transcripts Group ENCODE Analysis Workshop July 14-17,2005 Bethesda, Md.
Central Goal:Identification and characterization of • all transcribed regions of the ENCODE. • Aims of the Analyses: • Identification of all sites of transcription and relate these sites to the most current set of genome annotations and predictions of functional elements identified associated with the detected regions. • Determine the structural characteristics of the detected transcripts, including: • 2.1 Possible 5’ and 3’ termini of transcribed regions • 2.2 Strand of origin of the transcribed regions • 2.3 Length and genomic locations of transcribed regions • 2.4 Estimate protein codingpotential and splicing characteristics • Evaluate the evolutionary characteristics of the transcribed regions • Determine the location of the sites of sequence variation (SNPs, indels,etc.) • associated with the detected transcribed regions • Measure the correlaton of detected trranscribed regions with other functional • elements ( regulatory, replicative, chromatin structural variation, etc.) • Measure the association of transcribed regions with predictions of protein • coding and non-coding transcripts.
Genes and Transcripts Group Members: • Yale • 2. GIS Singapore • IMIM Barcelona • RIKEN Japan • 5. Affymetrix Data Sets: 1. transfrags/ TARs (AFFX/Yale):from 11 cell sources 2. CAGE tags (RIKEN): from 24 human tissues 3. Ditags (GIS): 2 cell sources 4. Manually curated genes mostly from mRNA sequences (IMIM, Sanger, Geneva). • Important Considerations Concerning Maps: • Types of RNA used to generate data sets (whole cell/tissue total, whole cell/tissue • poly A+, cytosolic polyA+) • For tags, multiple tags at same locus gives high confidence and two types of tag • clusters ( tags separated by <100bp in one region and cluster of these groups for same transcript). • Transfrags/TARs have no strand information and maps of transfrags are formed using thresholds for intensities (based on 5% FP rate for detection of bacterial controls), minimum length and allowable gap lengths (50:50). • 4. Two sets of gene predictions Gencode and EGASP (Hinxton meeting 5/2005)
Maps of Transcribed Loci Across ENCODE Regions Based on Data Sets • 1.) 13 Individual maps for all of ENCODE regions based on cell types and platforms. (http://genome.cse.ucsc.edu/ENCODE/) • 2.) Maps of empirically and computationally derived sets of exons (http://genome.cse.ucsc.edu/ENCODE/) • 3.) Summary Maps (J. Rozowakv, F. Denoeud, A. Sandel, A. Shahab, S. Dike) • -Union of all loci covered by AFFX transfrags and Yale TARs._ • -Union of all loci covered by Ditag and CAGE data: 5’ and 3’ boundaries • -Union of all loci consisting of unconnected transcribed bases derived from Transfrags/TARs, CAGE, Ditags and annotated exons. Each base in the ENODE regions has an associated score (0-4) which indicates how many of these data types support transcription at that position. • -Union of all loci consisting of connected transcribed bases derived from Transfrags/TARs, CAGE, Ditags and annotated exons based on strand information of tags and annotations. • (see for details of all maps ) http://hgwdev.cse.ucsc.edu/ENCODE/june2005freeze/encodeSummaries/
Proteasome (prosome, macropain) 26S subunit, non-ATPase, 4 (inhibits cholera-induced intestinal fluid secretion) Chrom 2
Avg % of nt covered by annotated exons, tars/transfrags or CAGE tags or ditags 12%
Number of Transcribed Nucleotides: Annotated nucleotides b) Unannotated nucleotides *CAGE (10.5%) and Ditags (5.5% ) occur in RepeatMasked regions that are not interrogated by arrays TRANSFRAGS CAGE DI-TAGS CAGE DI-TAGS
Annotated Transcription at 11 cell lines/time points for Transfrags/TARs (based on numbers of TF/TARs) while % of annotated transcription is similar for different cell types, there is a shift in the proportion of transcription occurring in introns vs. exons depending on RNA used and cell type 10
Intersection of Pseudogenes with Transcription Data Deyou Zheng, et al (Yale) • By random chance, 20-30 Yale pseudogenes will intersect with TARs. • ~40% ENCODE pseudogenes intersect with TARs. • >50 bp overlap with Transfrag/TAR constitutes intersection
Planned Analyses by Genes and Transcripts Group at this Workshop 1) Correlation of transcription with evolutionary conserved sequences 2) Create a comprehensive list of 5’ and 3’ termini using various data types (transfrags/TARs, ditags, CAGE tags, and site specific regulatory elements) 3) Correlation of transcriptional regulatory elements with sites of transcription correlation with various data types to increase the confidence of unannotated transcribed regions 4) Determination of percent of transcribed regions which show cell/tissue specificity 5) Determination of sites of transcription that correlate with sites of stable and conserved structured RNAs (snoRNAs, miRNAs, etc.)