520 likes | 667 Views
Why Mice?. Why Mice?. Fleischman et al. (1991) PNAS 88:10885-10889. Both individuals carry mutations in the c-Kit gene. Humans. Mice. - rich phenotypic variation - genome sequence - biomedical benefits. - inbred strains - dense, accurate genetic maps - genome manipulation
E N D
Why Mice? Fleischman et al. (1991) PNAS 88:10885-10889 Both individuals carry mutations in the c-Kit gene.
Humans Mice - rich phenotypic variation - genome sequence - biomedical benefits - inbred strains - dense, accurate genetic maps - genome manipulation - mutagenesis Comparative Genomics Humans and Mice diverged 80-100 Myrs ago.
What is ‘Draft’ Sequence? Human Genomic DNA vector vector BAC Clone Smaller pieces Library Construction Sequence the pieces Sequence all pieces Draft Sequence Order Unknown (phase 1) Order the pieces (phase 2) Fill in Gaps GAPS All Pieces Sequenced and placed in order Finished Sequence (phase 3)
gaps Some Assembly Required...
Some Assembly Required... • Sequence Layout • Curated Finished Regions • MegaBLAST • BAC chromosome assignment • annotation • STS markers • personal communication • Remove conflicting overlaps, redundant BACs BAC Sequence Fragments Assemble Order NCBI Contig • Sequence Building • Consider fragment:fragment sequence overlaps for each BAC pair in layout • Meld overlapping sequence • Order and Orient (o+o ): • alignments (mRNA, EST) • BAC annotation • paired plasmid reads
Build 21: Dec. 6, 2001 data freeze Build 22: Apr. 1, 2001 data freeze Input Data: Phase 1 Number Length (bp) Phase 2 Number Length (bp) Phase 3 Number Length (bp) 19,268 3.20 x 109 1,632 .24 x 109 8,674 1.00 x 109 Build 21 19,132 3.19 x 109 1,923 .29 x 109 10,204 1.18 x 109 Build 22 Total Contigs Output Number Length Stretches of Contiguous Sequence 5499 2.87 x 109 Build 21 <300 Kb <1 Mb <5 Mb >5 Mb GenBank 706170 111 0 0 Build 22 4727 2.84 x 109 Contigs 113310 1216 151 10 4227* 2.84 x 109 Golden Path *normalized, sequence overlap contigs The Human Sequence
Novel gene? Sequence -> Gene Discovery
mRNA entries in LocusLink: 15,131 placed on the human genome: 12,186 Genes, Genes, Genes??? ESTs that align to the Human Genome: 2,200,575 Models predictd by GenomeScan: 116,803 Lots of things we do’t know about...
Strain A Strain B Strain A A B The power of inbreeding Accurate Maps Isolate Mutations
* TG / TG TG / TG * * TG / TG B5/EGFP/+ intercross progeny GFP inserted into mouse embryos Non-invasive genotyping Genomic Power Tools - Genotype Driven Approach Transgenesis Mark cells Overexpression analysis
Gene of interest ATG marker Targeting vector Genomic Power Tools - Genotype Driven Approach Gene Knock-out: Loss of function...
G R Targeting vector R Genomic Power Tools - Genotype Driven Approach Gene Knock-In Gene of interest Hypomorph Hypermorph Loss of function
loxP loxP Neor Puror puro neo Pr MA RK Cre recombinase loxP loxP Neor Puror puro neo Pr MA RK Deletion Inversion Genomic Power Tools - Genotype Driven Approach
Genomic Power Tools- Phenotype Driven Approach http://www.mouse-genome.bcm.tmc.edu/ENU/ENUexperiment.asp Monica Justice ENU Mutagenesis
Genomic Power Tools- Phenotype Driven Approach http://cmhd.mshri.on.ca/sub/genetrap/paradigm.htm Gene-Trap Mutagenesis Bill Stanford
Genomic Power Tools- Phenotype Driven Approach Mouse Models of Human Disease- Harwell, UK DHGP Mutagenesis Project- Germany Centre For Modeling Human Disease- Toronto Baylor College of Medicine- Mouse Chr 11 McLaughlin Research Institute- APP/PrP models Tennessee Mouse Consortium
Harwell GSF Monica Justice Nat. Rev Gen (2000), 1:109-115 Haematological, clinical chemistry and allergy defects Sensory, neurological and neuromuscular phenotypes Confirmed mutant phenotypes: Skin/ Coat Sensory organs Neurological/ behavioral Clinical Chemistry Skeletal Size Haematological Allergy Nociceptive 27 26 15 17 34 0 0 6 Harwell 0 GSF 31 18 40 4 21 15 9 8 9 Genomic Power Tools- Phenotype Driven Approach Review of Harwell and GSF Centers:
~20,000 mRNAs in LocusLink ~12000 mRNAs in LocusLink H.sapiens M. musculus Human-Mouse Conserved Synteny- Finding Orthologs Donna MaglottLana Dracheva The Jackson Labs 7,137 Orthologous Pairs
AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA Human-Mouse Conserved Synteny- Finding Orthologs Using BLAST to Find Orthologs -Reciprocal Best Hits 1. Hs mRNA (Query) vs Mm mRNA (Database) 2. Mm mRNA (Query) vs Hs mRNA (Database) Hs mRNA 1- Mm mRNA 1Hs mRNA 1- Mm mRNA2 Hs mRNA 2 - Mm mRNA 3 Hs mRNA 3 - Mm mRNA 4 Hs mRNA 3 - Mm mRNA 5 Mm mRNA 1 - Hs mRNA 1 Mm mRNA 2 - Hs mRNA 1 Mm mRNA 3 - Hs mRNA 2 Mm mRNA 4 - Hs mRNA 6 Mm mRNA 4 - Hs mRNA 3 Mm mRNA 5 - Hs mRNA 3
2. Duplication after divergence Ancestral Locus Gene A Human Locus Human Locus Ancestral Locus Gene A Gene A Gene A Gene A’ Gene A’ Gene A’ Mouse Locus Mouse Locus 3. Deletion after divergence Gene A Gene A Human-Mouse Conserved Synteny- Finding Orthologs Why is this not always straightforward? 1. mRNA resources have temporal and spatial biases
For human: NCBI Genome Assembly or Golden Path For mouse: MGD Integrated Genetic Map or Whitehead-MRC Integrated RH Map Building a Conserved Synteny Map Donna Maglott Lana Dracheva Andei Shkeda 1. Identify Orthologous Pairs 2. Locate Genes within Their Respective Genomes 3. Identify Conserved Synteny Breakpoints Use Human Map as the backbone At least two genes from the same region of the mouse genome must be together to define a conserved synteny bin
Ppm1D PPM1D Tbx2 TBX2 Mmu11 49cM-65 cM Tbx4 TBX4 Cyb561 CYB561 Ace ACE Pl1 Mmu13 14 cM PL1 Pl2 PL2 Gh GH1 Igb CD79B Scn4a Mmu11 63-68 cM SCN4A Icam2 ICAM2 Pecam1 - Mmu6 31.5 cM (singleton) PECAM1 Apoh APOH Pkca PRKCA Hsa 17 Building a Conserved Synteny Map
Building a Conserved Synteny Map Possible Origins of Singletons 1. Inappropriate orthology relationship 2. Undiscovered small region of conserved synteny 3. Mapping Errors
3261 3275 3314 183 246 232 207 232 223 18 13 14 116 kb 55 kb 73 kb Building a Conserved Synteny Map GP_oo27 Build 21 Build 22 Loci Conserved Segments Singletons Genes per Segment Mean Segment Length
Building a Conserved Synteny Map Most Recent Map: Total Loci: 6736 (2967 based on sequence) 217 Segments: 300 (187 based only on sequence) Singletons: 1251 Cytogenetic only: 664 Cytogenetic Integrated: Virtually Mapped Mouse Genes: 3188 http://www.ncbi.nlm.nih.gov/Homology/
639880 Bin 1 60723933 61011527 61129567 Building a Conserved Synteny Map Integrating Cytogenetic Data 1 1p36.33 1696376 1p36.32 Bin 1 4119772 1p36.31 5816149 1p36.23 Bin 2 7754865 Hsa 1
http://www.ncbi.nlm.nih.gov/Homology/ Alignments UniSTS UniSTS LocusLink LocusLink Map Viewer Map Viewer Integrated Cytogenetic Locus Virtual Map
Loci 3241 2992 (unique CIDs) Conserved Segments Only 30% in common 200 187 Singletons 202 306 Building a Conserved Synteny Map Human-Mouse Conserved Synteny Map built using the same Human Map (Oct 7) and two different mouse maps. MGD Genetic Map Whitehead-MRC RH Map Increased by improper UniGene clustering 11 conserved synteny bins not on RH map (4 as singletons) 29 conserved synteny bins not on MGD Map (15 as singletons)
Loci 3314 3261 100 % in common Conserved Segments 232 183 Singletons 223 207 (80 unique) (64 unique) 143 Singletons common to both maps Building a Conserved Synteny Map Human-Mouse Conserved Synteny Map built using two different human genome assemblies and the same mouse map (MGD) NCBI Build22 (April 2 data) Golden Path oo23 (Dec 12 data)
CCN2D1 GCK Prox 5 CAMK2B SEMA3E ~200 cR PPIA Prox 11 Prox 11 PPIA (7p13) Dist 12 VIPR2 (7q36) ~680 cR IGFBP1 SGCE IGFBP3 Prox 11 TFPl1 Using a Conserved Synteny Map Example 1: PPIA Singleton on NCBI Map SEMA3A NCBI MAP GoldenPath
LOC58498 AMPH Prox. 13 POLD2 RALA Dist 2 BLVRA ADCY1 Prox. 11 Prox. 11 ADCY1 INHBA GLI3 IGFBP3 IGFBP1 PSMA2 Using a Conserved Synteny Map Example 2: Adcy1 Singleton on the GoldenPath GoldenPath NCBI Map
Building a Conserved Synteny Map Future Directions 1. Integrate EST data 2. Construct new homology map with each genome assembly 3. Start using genomic sequence alignments -BAC ends -Draft BAC Clones
http://greengenes.llnl.gov/mouse/ Lisa Stubbs Can the Human Sequence Help the Mouse Genome?
http://greengenes.llnl.gov/mouse/ Lisa Stubbs Can the Human Sequence Help the Mouse Genome?
http://greengenes.llnl.gov/mouse/ Lisa Stubbs Can the Human Sequence Help the Mouse Genome?
Annotating the Human Sequence Can mouse sequence help us find human genes? mRNA ESTs Genomic Sequence
Mouse mRNA Human mRNA Annotating the Human Sequence Can you align mouse mRNA to human sequence?
Mouse mRNA Human mRNA Alternate splice? Annotating the Human Sequence New Information...
Mouse mRNA Human mRNA Unannotated Human genes? Annotating the Human Sequence
Mouse Finished Sequence: 38.9 Mb Mouse Draft Sequence: 313Mb Mouse Whole Genome Shotgun (WGS) Reads: 15,733,570 Reads (~ 9Gb, or 3X coverage of the mouse genome) Annotating the Human Sequence Mouse Genomic Sequence Resources:
No Annotated Gene in this Region Annotated Gene Annotating the Human Sequence
Annotating the Human Sequence Greg Schuler John Spouge Coding vs. Non-Coding? Alignments in coding regions should have more synonymous substitutions than non-synonymous substitutions (under negative selection) Alignments in coding regions should have fewer stop codons
1200 All Alignments First Alignments 1000 Orthologous Alignments 800 ~45% > 0 600 400 ~65% >0 AF10 POLR2A ~95% >0 200 0 -6 -4 -2 0 2 4 6 Log Ks/Ka Annotating the Human Sequence Human Coding vs. Mouse Coding Alignments
Annotating the Human Sequence CACNA1S - Calcium channel,voltage dependent Human mRNA Mouse mRNA Pct_ID LOG(Ks/Ka) NM_000069 NM_014193 89.9 1.6 NM_000069 NM_009781 74.6 0.1 NM_000069 NM_019582 73.2 0.2 NM_000069 NM_019582 73.2 0.1 NM_000069 NM_019582 73.2 0.1 NM_000069 NM_007579 69.8 -0.2 NM_000069 NM_007579 69.8 -0.4 NM_000069 NM_007579 69.8 -0.5 NM_000069 NM_009872 64.3 -0.7
More Synonymous substitutions in CDS Ks/Ka greater Fewer stop codons in CDS Annotating the Human Sequence Traces vs. CDS Masked Hsa22 CDS vs. Traces .5978 .5353 Identical Codons: Synonymous Substitutions: .2375 .0906 Non-synonymous Substitutions: .1583 .2977 .0013 .0377 Stop Codons:
Annotating the Human Sequence Future Directions: 1. Compare a well annotated region of human genome to finished mouse sequence in a region of known orthology. Can we distinguish coding from non-coding hits based on the ratio of Ks/Ka? 2. Compare a well annotated region of the human genome to mouse traces. Will paralogy make identifying CDSs too difficult? 3. Add assessment of coding propensity