340 likes | 517 Views
Computational Analysis of Transcript Identification Using GenBank. Slides by Terry Clark. Differentiation of hematopoietic cells. Genome-wide gene expression. SAGE (Serial Analysis of Gene Expression). Figure 1 Schematic illustration of the SAGE process.
E N D
Computational Analysis of Transcript Identification Using GenBank Slides by Terry Clark
Figure 1 Schematic illustration of the SAGE process Jes Stollberg et al. Genome Res. 2000; 10: 1241-1248
What is the chance of duplicate tags? • We can assume we are drawing randomly from the set of all 4-letters sequences of the given tag length • This is the same problem as having unique overlaps in the contig matching problem for shotgun sequencing
Random model does not reflect biological process • Genes evolve by duplication as well as point mutation • Many motifs are repeated • Function widgets at work? • Result is a strong bias in observed biological sequences, not a uniform distribution as the simple model hopes. • Here are some numbers ….
SAGE tags match to many genes(Tags from Hashimoto S, et al. Blood 94:837, 1999)
Tag Frequency Groups for 10-base Tag SetContaining 878,938 Tags for UniGene Human
Myeloid Tag Matches with UniGene Human SAGE Tag Reference Database
Conspirators Terry Clark Andrew Huntwork Josef Jurek L. Ridgway Scott Sanggyu Lee Janet D. Rowley San Ming Wang