330 likes | 683 Views
Comparative Analysis of Human Chromosome 22q11.1-q12.3 with Syntenic Regions in the Chimpanzee, Baboon, Bovine, Mouse, Pufferfish and Zebrafish Genomes. Dr. Bruce A. Roe George Lynn Cross Research Professor Advanced Center for Genome Technology Department of Chemistry and Biochemistry
E N D
Comparative Analysis of Human Chromosome 22q11.1-q12.3 with Syntenic Regions in the Chimpanzee, Baboon, Bovine, Mouse, Pufferfish and Zebrafish Genomes Dr. Bruce A. Roe George Lynn Cross Research Professor Advanced Center for Genome Technology Department of Chemistry and Biochemistry University of Oklahoma broe@ou.edu www.genome.ou.edu LXVIII CSHL Symposium “The Genome of Homo Sapiens” May 28 - June 3, 2003
“The joy of science is the people you meet along the way and how they influence your life” Jochanan Stenesh and Lilian Myers at Western Michigan University and Bernie Dudock at SUNY Stony Brook Bart Barrell and Alan Coulson originally at the MRC-Hills Road Cambridge and Ian Dunham both now at the Sanger Institute Bev Emanuel at Childrens Hospital of Philadelphia Watson and Crick Fred Sanger
Human Chromosome 22 Sequence Features • 39 % of the sequence is occupied by genes including their introns, 5’ and 3’ non-translated regions. • 3 % of the complete sequence encodes the protein products of these genes. • 42 % of the sequence is composed of repetitive sequences, compared to 46 % for the entire genome. • Only slightly over half of the genes predicted for human chromosome 22 can be experimentally validated.* * Shoemaker DD., et al. Experimental annotation of the human genome using microarray technology. Nature. 409, 922-7 (2001).
Siblings by 1 to 2 million bases, ~99.98% identical, with coding regions 99.99999% identical Unrelated humans by 6 million bases, ~99.8% identical overall, with coding regions 99.9999% identical Chimpanzees by about 100 million base pairs ~98% identical Baboons by about 300 million base pairs ~92% identical Mice by about 2.8 billion bases, but coding regions are ~90% identical Leaf spinach by about 2.9 billion bases, but coding regions are ~40% identical An Individual’s Genome Differs from the DNA of:
Differences between individuals AGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGTTAGCTTGTCACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGACCTCTTTGNTCTCTTGCGCAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAGTAAGTACTGGTTTGGGGAGNAGGGTTGCAGCGGCNGAGCCAGGGTCTCCACCCAGGAAGGACTNATCGGGCAGGGTGTGGGGAAACAGGGAGGTTGTTCAGATGACCACGGGACACCTTTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTTCTGGGTGTGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTCCCCTAGCCTGTCTCAGATCCTGGGAGCTGGTGAGCTGCCCCCTGCAGGTGGATCGAGTAATTGCAGGGGTTTGGCAAGGACTTTGACAGACATCCCCAGGGGTGCCCGGGAGTGTGGGGTCCNAGCCAG The yellow underlined sequence is the first exon of the BCR gene involved in leukemia. Only 5 bases (N) differ in non-gene regions.
Human Chromosome 22 Single Nucleotide Polymorphisms* Number of overlaps 335 Size of overlaps 13,203,147 bp Number of SNPs 11,116 (~1/1000 bp) Number of substitutions 9,123 (82%) Number of ins/del 1,193 (18%) Only 48 of the 11,116 SNPs were in coding regions ~ 10 fold lower than in non-coding * E. Dawson, et al. A SNP Resource For Human Chromosome 22: Extracting Dense Clusters of SNPs from the Genomic Sequence. Genome Research, 11, 170-178 (2001).
“We each are like a different symphony orchestra” “All playing the same instruments slightly differently”
Good news and Bad news • Bad news • 2-4 times as many proteins as other species due to extensive alternative splicing in humans. • Good news <40,000 genes (counting dark space?) • We only know the function of about half the predicted genes. • Likely > 1 million different gene products based on alternative splicing and post-translational modifications.
Where we stand now • We essentially have the ‘dictionary’ with all the words (genes) spelled correctly, but only slightly more than half of the words (genes) have definitions. • Slightly over half of the genes predicted for human chromosome 22 have been experimentally validated. • Through comparative genomic sequencing we can annotate the human genome based on evolutionary conserved gene sequences and use model systems to study gene expression.
Chimpanzee and Baboon Genomic Sequencing • Medically important model eukaryotic organisms • The chimpanzee is our nearest evolutionary relative with a genome that has ~98 % sequence identity with the human genome • The baboon genome has ~92 % sequence identity with the human genome
human- specific repeat regions Questionable gene present in primates but not in rodents PIP Plot of a region of human chr22 compared to syntenic regions of baboon and mouse
Variations in the regions syntenic to the human chr 22 immunoglobulin light chain region from chimp, baboon, rat and mouse
Exons in one copy of a zebrafish duplicated gene with 75% homology to human but greatly diverged, <50% homology, in the other copy
Instance of a rare alu deletion in chimp and a gene having very low homology in fish
Conclusions from the analysis of vertebrate genomic sequences • Approximately 40% of the genome is expressed into hnRNA which is processed to 10-fold smaller mature mRNA with extensive alternative splicing (1 gene --> multiple proteins). • Approximately 40% repeat sequence density. • Conserved coding sequences, promoters and enhancers and exon spacing approximately proportional to evolutionary distance from a common ancestor. • Additional endogenous retroviral and alu sequences in the human genome and some regions not present is different vertebrates. • Sequence drift in duplicated gene families. • About half of the predicted genes have yet to be assigned any known function.
“Zebrafish are small people that swim in the water and breathe through gills” Han Wang, Dept. Zoology and Director of the University of Oklahoma Zebrafish Facility
How much of the ~1.7 Gbp genome has been sequenced so far? The whole genome shotgun project comprises roughly 11.6 million traces by now. With an average quality clipped trace length of 517 bp this adds to 6 Gb in total, so the genome is covered 3.5 times. The new assembly Zv2 is built on 11.7 million traces with an average trace length of 651 bp length, adding up to 7.64 Gbp (4.5 x coverage). The current Sanger Institute in-house statistics for the clone sequencing are: * 322,712,747 bp unfinished * 112,494,895 bp finished * 435,207,642 bp total
Zebrafish Developmental stages(HPF*) Description Zygote Period (0-3/4 h) The newly fertilized egg is in the zygote period until the first cleavage occurs Cleavage Period (0.7- 2.2 h) After the first cleavage, blastomeres divide at approximately 15 minute intervals Blastula Period (2 1/4 - 5 1/4 h) Begins at 128-cell stage or 8th zygotic cell cycle. Embryo enters midblastula transition (MBT), the onset of zygotic transcription. Period ends at the onset ofgastrulation. Gastrula Period (5 1/4 - 10 1/3h) Morphogenetic cell movements of involution, convergence, and extension occur, producing the primary germ layers and the embryonic axis. Segmentation Period (10 1/3 - 24 h) Somites develop, the rudiments of the primary organs become visible, the tail bud becomes more prominent and the embryo elongates. The first cells differentiate morphologically, and the first body movements appear. Pharyngula Period (24-48 h) Embryos developing to the phyolotypic stage when it posesses the classic vertebrate bauplan.Migration of the posterior lateral line primordium. Rapid organogenesis continues. Hatching Period (48-72 h) Individuals within a single developing clutch hatch sporadically during the whole period. Kimmel CB, et al. Stages of embryonic development of the zebrafish. Dev Dyn 203, 253-310 (1995).
Gene Expression in Zebrafish • Created and sequenced 10,000 clones from a zebrafish brain and eye cDNA library. • After a blast vs human chromosome 22, obtained the set of zebrafish cDNA clones corresponding to several predicted human chromosome 22 genes. • Picked an EST whose expression profile matched a hypothetical protein with and EST from a human fetal brain library.
72hpf 48hpf 24hpf Probe1 b6 Gene Expression in Zebrafish (cont) • An antisense RNA hybridization probe was generated by in vitro transcription in the presence of dig-UTP after cloning into an expression vector. • Whole mount insitu hybridization was to 24, 48, and 72 hours post-fertilization zebrafish embryos. • Hybridization was detected by anti-dig antibody. Probe1 b6 shows hybridization in the brain from 24 hours onward and in the eye from 48 hours onward. 1b6: AP000557.1.mRNA chr22 position:18495442-18504448 KIAA1020 hypothetical protein matches EST b6n20zf
Exon-specific gene expression in zebra fish embryos during development that is amenable to automation • Incorporated mouse in situ methods for zebrafish that: • shorten the length of probes from 1000 bp to 100 bp, thus exon-specific probes, • hybridizations in a 96 well multiplex microtiter plate format, • digoxigenin labeled ssDNA probes generated from assymetric, single primer amplification off PCR (eliminating sub-cloning of each PCR product into T3/T7 expression vectors), and • eliminated the spurious labeling of the eye by introducing glycine as the reagent of choice to rapidly inhibit the proteinase K used to increase permeability of the embryos.
Whole mount in situ hybridization with ssDNA-digoxigenin labeled probe made from a PCR product. Brain-specific expression of this mRNA during embryonic development
Anti-sense probe Sense probe No probe The importance of a “no probe” antibody staining control to determine if any probe-independent antibody staining occurs in the lens Typically only see anti-sense probe hybridizing, and therefore stained by anti-dig antibody with some probe-independent staining in the eye. 72 hour post fertilization embryo
A probe to the unique 3’ UTR if there are multiple paralogs One last experiment with a surprise ending
Hybridization probe a8h24 unique to 3’ UTR of zebrafish gene 2 based on our zebrafish EST sequence
Anti-sense probe Sense probe No probe One too many controls sometimes results in a surprise observation Both the anti-sense and sense probes hybridized to 72 hour post fertilization embryonic brain. Indicating RNA transcribed from the opposite, non-coding strand?
What’s next for our Genome Center? • Participate in sequencing the mouse, chimp, baboon, lemur, bovine, dog, cat, chicken and zebra fish genomes concentrating on: • Regions of high biological interest and • Regions orthologous to human chromosome 22 • Sequence the Medicago truncatula (alfalfa) genome using a mapped BAC-based approach concentrating on coding regions • Continued sequencing of selected pathogenic bacteria • Investigate the function of the predicted genes with unknown function in the zebrafish system first by whole mount in situ and then expression knock down experiments with morpholino oligos.
Support Teams Reagents & Equip. Maint. Informatics Production DNA Synthesis Administration Phoebe Loh* Sulan Qi Bart Ford* Mounir Elharam* Doug White Clayton Powell** KayLynn Hale Dixie Wishnuck Tami Womack Mary Catherine Williams Rose Morales-Diaz* Mounir Elharam* Steve Shaull** Doug White Work-study Undergraduate students** Jim White Steve Kenton Hongshing Lai Sean Qian*** Laboratory Organization Bruce Roe, PI Research Teams Doris Kupfer Julia Kim* Sun So Graham Wiley** Limei Yang Angie Prescott* Audra Wendt** Mandi Aycock** Ziyun Yao*** Steve Shaull* Youngju Yoon**** Jami Milam**** Sara Downard** Ging Sobhraksha** ShaoPing Lin*** Honggui Jia Hongming Wu Baifang Qin Peng Zhang Fares Najar*** Chunmei Qu Keqin Wang Shuling Li Stephan Deschamps*** Shelly Oommen**** Christopher Lau**** Trang Do Anh Do Lily Fu Yang Ye** Tessa Manning** Fu Ying Liping Zhou Ruihua Shi**** Junjie Wu**** Pheobe Loh * Sulan Qi Bart Ford* Lin Song**** Ying Ni Huarong Jiang Axin Hua*** Weihong Xu**** Yanhong Li * Previous undergraduate res. student ** Present undergraduate res. student *** Previous graduate student **** Present graduate student Funding from the NHGRI, Noble Foundation, DOE, NSF (pending) - Collaborators at Sanger, CWRU, CHOP, Keio, UIUC and Riken