180 likes | 187 Views
This study annotates regions of mouse draft genomic sequences by comparing them to human sequences, focusing on gene content, repeat elements, and transcription factor binding sites. The study orders and orients draft sequence contigs to identify conserved regions and under/over-represented TF sites, paving the way for genome annotation advancements.
E N D
Annotation of Mouse Draft Sequence by Comparative Analysis to Human Sequence M. Bucan, T. Wiltshire, A. Lengeling, L. Tarantino, S. Kanes CNB, University of Pennsylvania J. Crabtree, J. Schug, C. Overton, C. Stoeckert CBIL, University of Pennsylvania J. Lehoczky, K. Dewar, B. Birren and the Whitehead/MIT Center for Genome Research
The Gabrg1-Gabra2-Gabrb1-Txk-Tec-Gsh2-Pdgfra-Kit-Kdr(Flk1)-Clock BAC contigs on Chr. 5
The Gabrg1-Gabra2-Gabrb1-Txk-Tec-Gsh2-Pdgfra-Kit-Kdr(Flk1)-Clock BAC contigs on Chr. 5 Sequence available Sequence available
Annotation Overview • Order and orient draft sequence contigs • Perform framework sequence annotation • repeat content, gene content • MARs, CpG islands, BAC ends • TF binding sites • Find regions conserved with human • Identify over/under-represented TF sites
Annotation of Tec-Txk draft sequence (65i8)Ordering and orienting pieces using known genes
Annotation of Tec-Txk draft sequence (65i8)Ordering and orienting pieces using BAC ends
Annotation of Kit draft sequence (232h18)Ordering and orienting using known genes & BAC ends
Annotation of Kit draft sequence (232h18)Ordering and orienting pieces using conserved regions
Annotation of Kit draft sequence (232h18)Transcription Element Search System analysis
TESS Analysis • Searched entire human and mouse syntenic sequences with all TESS matrices. • Identified binding sites over/under-represented in the conserved regions. • Conserved sites dispersed over 150kb. • Over-represented factors include AP2, Pax-6, S8, Oct-1, E2A, E2F-DRTF, TAL1-/E47, CdxA, Ubx, AbdB-r, Engrailed, Hairy, DFD
Conclusions • Order & orient up to 87% of draft sequence using genes, BAC ends, conserved regions. • Discovery of 3 novel genes. • TF binding site analysis alone is not informative (see TESS graphs!) • Third organism (chicken?) • Higher-order patterns (i.e., co-occurrence)
Future work • Methods for identifying conserved regions • local versus global alignments • favor short/high ident. or long/low ident.? • Extend TESS TF site analysis using draft human sequence. • Automate annotation/analysis procedure; provide an on-line resource for BAC annotation.
Annotation of Tec-Txk draft sequence (65i8)Starting material: 23 unordered pieces