230 likes | 365 Views
SNPs Mutations Haplotypes. Sequence variations. http://www.ebi.ac.uk/mutations/. How much variation in human genome?. 3000 Mb; a SNP every 1kb = 3 milj. Underestimate? Rare ones might be the most interesting 40 000 genes x 100 variants = 4 milj. slow build-up
E N D
SNPs Mutations Haplotypes Sequence variations http://www.ebi.ac.uk/mutations/
How much variation in human genome? • 3000 Mb; a SNP every 1kb = 3 milj. • Underestimate? • Rare ones might be the most interesting • 40 000 genes x 100 variants = 4 milj. • slow build-up • Bottlenecks in human population history!
Variations • SNPs (Single Nucleotide Polymorphisms) • Indels, dinucleotide mutations • Mutations, polymorphisms • Chromosomal rearrangements • inversions • translocations • indels
Mutations Levels of biological data DNA RNA Polypeptide Protein structure Protein function Protein interection pathways Cellular dynamics Tissue interactions Organism (phenotype) Population dynamics
Gene structure & function Diagnosis of inherited disorders Evolution & ecology Trans- plantation Genetic mapping Tissue typing Forensics Epidemi- ology Insurance evaluation ? Association studies Carrier screening Pharmaco- genetics Scientists Biomed students Healthcare professionals "General public" Uses of sequence variations
Nucleotide sequences Amino acid sequences EMBL/GenBank/DDBJ Non-redundant human sequence SWISS-PROT SNPs Data sources Central variation databases Sequence alignments Population studies Direct submissions Literature / Publishers Single Locus Databases Blood Cells, Molecules and Diseases HGVS
DNA Mutation Checker v.2 Bio::LiveSeq Bio::Variation http://bio.perl.org/
SNPs • dbSNP • main repository • HGVbase • clean subset • TSC • verified SNPs • allele frequency project • National SNP projects • Japan, China, ...
SNPs • dbSNP #29 • 2,673,925 (414,853 masked) • HGVbase #13 • 1,451,426 • TSC #10 • 1,389,655 • 1,062,212 mapped
HGVbase • Human Genome Variation database http://hgvbase.cgb.ki.se/ • ex. HGBASE • Three part collaboration betweenTony Brookes (KI), Heikki Lehvaslaiho (EBI) and Peer Bork (EMBL). • text and homology searches • Distributions: SQL dump, XML, flat file, FASTA
SNP synchronization Ensembl dbSNP HGVbase
HGBASE update • Assays • Flanking sequence retrieval • Effects on predicted genes • Chromosomal locations • Similarity scoring • Haplotypes • WOW extensions http://hgbase.cgb.ki.se/
Mutation numbering options Reference Sequence Numbering Schema -1 +1 Coding region cDNA DB entry Coding region gDNA DB entry Genomic gene seq
HGVS • Human Genome Variation Society • http://www.hgvs.org/ • ex. HUGO MDI • Society Journal: Human Mutation • “The Society aims to foster discovery and characterization • of genomic variations including population distribution and • phenotypic associations. • We will promote collection, documentation and free distribution of • genomic variation information and associated clinical variations and • endeavor to foster the development of the necessary • methodology and informatics.”
A human sequence variation database emphasizing data quality and a broad spectrum of data sources WaystationOfficeWarehouse WOW Jamie Cutticia Dick Cotton Heikki Lehväslaiho Tony Brookes
WOWarehouse plans • Expansion of the HGBASE design • Use of the Ensembl framework • Data flow from WayStation • Novel mutations • Direct parsing • Existing resources (SRS) • Need to get the the data in quickly • Haplotype & Genotype descriptions • Phenotype desciption
Other sources LSDB LSDB LSDB LSDB ID WayStation Submitter Warehouse Updates Downloads PubMed Human Mutation Interfaces LSDB LSDB LSDB Editor WOW structure Correction requests Downloads Updates Submission Peer review
Reference Sequence • Strive to use genomic coordinates • Use Ensembl to visualise all variants in genomic context • Ensembl is now using NCBI genome builds => only one, up-to-date reference sequence • Easy way to transform gene coordinates into genomic coordinates
Haplotype representation • Haplotype = list of Marker/AlleleIDs & HaplotypeIDs. • No ordering of IDs in Haplotype definition – taken care of by Marker definition. • No reference haplotypes • Genotype: >2 Haplotypes
Haplotypes • Chr21, 6 chromoses • David Cox (Patil et al, Science) • whole human genome coming • Chr22, >200 individuals • Ian Dunham, in preparation • Haplotype Blocks! • HapMap (Eric Lander, NIH)
Phenotype • Pragmatic approach! • Ideas: • Based on extended GO terminogy developed at Jackson Laboratory • Phenotype = modifier + traits • OMIM compatible • US NML anatomy vocabulary subset?