180 likes | 186 Views
Annotation of Mouse Draft Sequence by Comparative Analysis to Human Sequence. M. Bucan, T. Wiltshire, A. Lengeling, L. Tarantino, S. Kanes CNB, University of Pennsylvania J. Crabtree, J. Schug, C. Overton, C. Stoeckert CBIL, University of Pennsylvania J. Lehoczky, K. Dewar, B. Birren and the
E N D
Annotation of Mouse Draft Sequence by Comparative Analysis to Human Sequence M. Bucan, T. Wiltshire, A. Lengeling, L. Tarantino, S. Kanes CNB, University of Pennsylvania J. Crabtree, J. Schug, C. Overton, C. Stoeckert CBIL, University of Pennsylvania J. Lehoczky, K. Dewar, B. Birren and the Whitehead/MIT Center for Genome Research
The Gabrg1-Gabra2-Gabrb1-Txk-Tec-Gsh2-Pdgfra-Kit-Kdr(Flk1)-Clock BAC contigs on Chr. 5
The Gabrg1-Gabra2-Gabrb1-Txk-Tec-Gsh2-Pdgfra-Kit-Kdr(Flk1)-Clock BAC contigs on Chr. 5 Sequence available Sequence available
Annotation Overview • Order and orient draft sequence contigs • Perform framework sequence annotation • repeat content, gene content • MARs, CpG islands, BAC ends • TF binding sites • Find regions conserved with human • Identify over/under-represented TF sites
Annotation of Tec-Txk draft sequence (65i8)Ordering and orienting pieces using known genes
Annotation of Tec-Txk draft sequence (65i8)Ordering and orienting pieces using BAC ends
Annotation of Kit draft sequence (232h18)Ordering and orienting using known genes & BAC ends
Annotation of Kit draft sequence (232h18)Ordering and orienting pieces using conserved regions
Annotation of Kit draft sequence (232h18)Transcription Element Search System analysis
TESS Analysis • Searched entire human and mouse syntenic sequences with all TESS matrices. • Identified binding sites over/under-represented in the conserved regions. • Conserved sites dispersed over 150kb. • Over-represented factors include AP2, Pax-6, S8, Oct-1, E2A, E2F-DRTF, TAL1-/E47, CdxA, Ubx, AbdB-r, Engrailed, Hairy, DFD
Conclusions • Order & orient up to 87% of draft sequence using genes, BAC ends, conserved regions. • Discovery of 3 novel genes. • TF binding site analysis alone is not informative (see TESS graphs!) • Third organism (chicken?) • Higher-order patterns (i.e., co-occurrence)
Future work • Methods for identifying conserved regions • local versus global alignments • favor short/high ident. or long/low ident.? • Extend TESS TF site analysis using draft human sequence. • Automate annotation/analysis procedure; provide an on-line resource for BAC annotation.
Annotation of Tec-Txk draft sequence (65i8)Starting material: 23 unordered pieces