190 likes | 376 Views
Summarise the online resources you would use when asked to establish an assay to confirm the presence of a specific missense mutation found by a research group. Richard Barber Richard.Barber@bwhct.nhs.uk December 20th. Overview.
E N D
Summarise the online resources you would use when asked to establish an assay to confirm the presence of a specific missense mutation found by a research group. Richard Barber Richard.Barber@bwhct.nhs.uk December 20th
Overview • Find gDNA sequence and define exon boundaries, reading frame and gene variation • Have correct HGVS numbering of nucleotides and amino acids • Determine testing method and design suitable primers
What Gene? • SDHC • OMIM *602413(http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM&itool=toolbar) • Part of Complex II, an important enzyme complex in both the tricarboxylic acid cycle and the aerobic respiratory chains of mitochondria. • Complex II has 4 subunits SDHA, SDHB, SDHC and SDHD. • 1q21 • Mutations cause familial and sporadic paraganglioma • Characterized by the development of the benign vascularized tumours in the head and neck with the carotid body as the major tumour site
NCBI’s GenBank • GenBank is a collection of publicly available annotated nucleotide sequences, including mRNA sequences, segments of genomic DNA, and ribosomal RNA gene clusters. • GenBank is specifically intended to be an archive of primary sequence data. • Sequencing must have been conducted by the submitter. NCBI does some quality control checks and will notify a submitter if something appears amiss, but it does not curate the data; the author has the final say on the sequence and annotation placed in the GenBank record. • Authors are encouraged to update their records with new sequence or annotation data, but in practice records are seldom updated.
NCBI’s RefSeq • This is a curated collection of DNA, RNA, and protein sequences built by NCBI. Unlike GenBank, RefSeq provides only one example of each natural biological molecule for major organisms ranging from viruses to eukaryotes. • For each model organism, RefSeq aims to provide linked records for the genomic DNA, the gene transcripts, and the proteins arising from those transcripts. • RefSeq is limited to major organisms for which sufficient data is available, while GenBank includes sequences for any organism submitted. • To produce RefSeq records, NCBI culls the best available information on each molecule and updates the records as more information emerges. • A commonly used analogy is that if GenBank is akin to the primary research literature, RefSeq is akin to the review literature.
http://www.ensembl.org/index.htmlEnsembl is a joint project between EMBL – European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI) to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.
User can define to some degree the annotation of the sequence, gDNA, cDNA or Protein. Links to dbSNP for polymorphisms.
ensembl • Ensembl has been designed to offer a more visual depiction of sequence information and to be highly annotated and data can be exported • Uses data from dbSNP to highlight polymorphisms • Can be used with microarray data to show positions of deleted/duplicated clones • The data is updated every 2 months so be aware of the date the latest build • A Standard interface makes it easy for others to build custom applications on top of ensembl data • Online tutorials are available • http://www.ensembl.org/info/using/website/tutorials/index.html
Ref Sequence Nomenclature • Compare your ref sequence to the research groups • Are you suing the same accession numbers • Is HGVS nomenclature being used A of ATG is c.1, Methionine is p.1 • See www.hgvs.org for guidelines • Usually historical reasons why numbering may be different • Are there alternative exons with alternative numbering • Does the literature agree with HGVS • Are the commonly known names already in existence? • eg DF508
Mutation specific databases • http://chromium.liacs.nl/lovd_sdh/variants.php?select_db=SDHC&action=view_unique • TCA Cycle Gene Mutation Database (formerly SDH Complex database)
Sequence or not Sequence • Sequencing still is the gold standard • Prices are going down as through-put goes up • You will get a definite answer • What sample type? blood, tumour, prenatal • How many samples and how urgent? • Other options include • Pyrosequencing • dHPLC • MALDI-TOF
Primers • Are you looking for this one missense mutation or do you need to check an entire exon? • Could the exon harbour other relevant mutations? • Are there any known pathogenic intronic mutations you need to include in PCR • Primer design programs such as • http://primer3.sourceforge.net/ • www.invitrogen.com/oligos • Could use the research groups primers but check primers against your reference sequence • SNP check primers using Manchester software to avoid allele drop out • http://ngrl.man.ac.uk/snpcheck/index.html • Positive control DNA from research lab is vital • Hopefully the results will be confirmed, if not a fresh sample may be necessary. Beware of samples from research labs!!
Entrez-NCBI DatabasesSearch engine provided by the NCBI for obtaining data on DNA sequences, protein sequences and structures, genome/chromosome maps and related bibliographic information. • GENATLASCompiles the information relevant to the mapping efforts of the Human Genome Project. This information is collected from original articles in the literature or from the proceedings of Human Gene Mapping and Single Chromosome Workshop. • GenBankNational Institute of Health genetic sequence database containing a collection of publicly available DNA sequences maintained by the NCBI • GeneCardsAn electronic encyclopedia integrating information about genes and their products and biomedical applications from the Weizmann Institute of Science Genome and Bioinformatics. • GeneMap99"A new gene map of the human genome" from the National Center for Biotechnology Information. • GeneSNPs Web resource integrating gene, sequence and polymorphism data. • GENLINKProvides linkage mapping information and software tools that facilitate the integration of physical and genetic linkage data to produce unified maps of the human genome. • HGBASE (Human Genic Bi-Allelic SEquences) A database of intra-genic sequence polymorphism.
HGVbase (the Human Genome Variation Database)This database seeks to provide an accurately and comprehensive catalog of normal human gene and genome variation, useful as a research tool to help define the genetic component of human phenotypic variation. • Human Genome Browser Gateway The Browser, from the University of California in Santa Cruz, stacks annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. The user can look at a whole chromosome, open a specific cytogenetic band to see a positionally mapped disease gene candidate, or zoom in to a particular gene. • Human SNP DatabaseFrom the Whitehead Institute for Biomedical Research/MIT Center for Genome Research. • LocusLinkProvides a single query interface to information about genetic loci, from the NCBI • MITOMAPHuman Mitochondrial Genome Database. • Nuclear Protein Database (NPD)The Nuclear Protein Database is a searchable database of information on • Online Searchable Directory of Genetic ResourcesPUBGENEProvides information on gene and protein relationships from the literature, gene relationships from your gene expression experiments, pointers to pathway information, and other metadata to maximize analysis throughput, and help build sequence homology networks and relate them to literature. • Rockefeller University Laboratory of Statistical Genetics Database of Genetic Analysis Software Contains mainly computer software on genetic linkage analysis, marker mapping, and pedigree drawing. • The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence VariationFrom the National Library of Medicine