1 / 20

MUTATION NOMENCLATURE

MUTATION NOMENCLATURE. Summarise the current recommendations for mutation nomenclature at the DNA and protein level. Include a range of examples (missense, frameshift, whole exon deln etc.). Why is it important to have a uniform approach to nomenclature? ELAINE WHITFIELD.

tyra
Download Presentation

MUTATION NOMENCLATURE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MUTATION NOMENCLATURE • Summarise the current recommendations for mutation nomenclature at the DNA and protein level. • Include a range of examples (missense, frameshift, whole exon deln etc.). • Why is it important to have a uniform approach to nomenclature? • ELAINE WHITFIELD

  2. General points on nomenclature • 2 papers published in 1993 initiated the development of a uniform and unequivocal description of sequence variation at the DNA and protein level • Current rules established in 2000 by Dunnen and Antonarakisto include more complex variations • Hum Mutat. 1993;2(4):245-8.Beaudet & Tsui • Am J Hum Genet. 1993 Sep;53(3):783-5.Beutler • Hum.Mutat. 2000 15:7-12 den Dunnen & Antonarakis

  3. Terminology • In some disciplines the term"mutation" is used to indicate "a change" while in other disciplines it is used to indicate "a disease-causing change". • Similarly, the term "polymorphism" is used both to indicate "a non disease-causing change" or "a change found at a frequency of 1% or higher in the population". • To prevent confusion instead of using the terms mutation and polymorphism (including SNP or Single Nucleotide Polymorphism) neutral terms like "sequence variant", "alteration" and "allelic variant“ may be used. • The term "mutation" has also developed a negative connotation • Human Mutation Vol. 19 ( 1) of 2002

  4. Recommendations • The basic recommendation is to use unequivocal descriptions and systematic names to describe each sequence variation. • All variants should be described at the most basic level, i.e. the DNA level. • Descriptions should always be in relation to a reference sequence, either a genomic or a coding DNA reference sequence. • Although theoretically a genomic reference sequence seems best, in practice a coding DNA reference sequence is preferred because it overcomes difficult cases, including multiple transcription initiation sites (promoters), alternative splicing, the use of different poly-A addition signals, multiple translation initiation sites (ATG-codons) and the occurrence of length variations. • When the entire genomic sequence is not known, a cDNA reference sequence should be used.

  5. to avoid confusion in the description of a variant it should be preceded by a letter indicating the type of reference sequence used. Several different reference sequences can be used; • "c." for a coding DNA sequence (like  c.76A>T) • "g." for a genomic sequence (like g.476A>T) • "m." for a mitochondrial sequence (like m.8993T>C,) • "r." for an RNA sequence (like r.76a>u) • "p." for a protein sequence (like  p.Lys76Asn) • describing genes / proteins, only official HGNC (HUGO gene nomenclature committee) gene symbols should be used (www.genenames.org ) • the DNA reference sequence used should preferably be from the RefSeq database, listing both database accession and version number (like NM_004006.2) www.ncbi.nlm.nih.gov/RefSeq/

  6. DNA-levelin capitals, starting with a number referring to the first nucleotide affected (like c.76A>T or g.476A>T) • RNA-levelin lower-case, starting with a number referring to the first nucleotide affected (like r.76a>u) • protein levelin capitals, starting with the letters referring to first the amino acid affected (like p.Lys76Asn)

  7. coding DNA Reference Sequence • nucleotide numbering: there isno nucleotide 0 nucleotide 1 is the A of the ATG-translation initiation codon the nucleotide 5' of the ATG-translation initiation codon is -1, the previous -2, etc. e.g. in 5’ UTR c.-12G>A the nucleotide 3' of the translation stop codon is *1, the next *2, etc.e.g. in 3’ UTRc.*70T>A

  8. coding DNA Reference Sequence • intronic nucleotides • beginning of the intron; the number of the last nucleotide of the preceding exon, a plus sign and the position in the intron, like c.88+2T>G • end of the intron; the number of the first nucleotide of the following exon, a minus sign and the position upstream in the intron, like c.89-1G>T • in the middle of the intron, numbering changes from "c.77+.." to "c.78-.."; for introns with an uneven number of nucleotides the central nucleotide is the last described with a "+"

  9. genomic Reference Sequence • nucleotide numbering is purely arbitrary and starts with 1 at the first nucleotide of the database reference file • no +, - or other signs are used • the sequence should include all nucleotides covering the sequence (gene) of interest and should start well 5' of the promoter of a gene • when the complete genomic sequence is not known, a coding DNA reference sequence should be used

  10. Specific Changes • ">" indicates a substitution at DNA level (like  c.76A>T) • "_" (underscore) indicates a range of affected residues, separating the first and last residue affected (like c.76_78delACT) • "del" indicates a deletion (like  c.76delA) • "dup" indicates a duplication (like  c.76dupA); duplicating insertions are described as duplications, not as insertions; ACTTTGTGCC to ACTTTGTGGCC is described as c.8dupG (not as c.8_9insG) • "ins" indicates a insertion (like  c.76_77insG) • "inv" indicates an inversion (like  c.76_83inv)

  11. Specific Changes - complex • "con" indicates a conversion (like c.123_678conNM_004006.1:c.123_678) • "[]" indicates an allele (like c.[76A>T]) • "()" is used when the exact position of a change is not known, the range of the uncertainty is described as precisely as possible and listed between brackets (like c.(67_70)insG) • Whole exon deletions c.781-?_1392+?del = exon 3 to 6 deletion, breakpoint not sequenced c.13-23_301-143del  = exon 2 to 4 deletion, breakpoint sequenced • Recessive disease - two alleles listed between square brackets with a + between c.[123A>G]+[456C>T] • >1 change per allele c.[123A>G; 456C>T]

  12. RNA level • Designations at RNA-level, similar to those at protein level, describe the consequence and not the nature of the mutation. Sequence changes at RNA level are basically described as those at the DNA level with the following modifications/additions; • an "r." is used to indicate that a change is described at RNA-level • nucleotides are designated by the bases (in lower case); a (adenine), c (cytosine), g (guanine) and u (uracil); • r.78u>a denotes that at nucleotide 78 a U is changed to an A

  13. Protein Level • Designations at protein level describe the consequence and not the nature of the mutation. The protein reference sequences should represent the primary translation product, not a processed mature protein. Sequence changes at protein level are basically described as those at the DNA level with the following modifications/additions; • the three letter amino acid code is preferred, with "X" designating a translation termination codon amino acid • the translation initiator Methionine is numbered as +1

  14. Substitutions • missense changes · p.Trp26Cys denotes that amino acid Tryptophan-26 (Trp, W) is changed to a Cysteine (Cys) • nonsense changes · p.Trp26X denotes that amino acid Tryptophan-26 (Trp, W) is changed to a stop codon (X) • initiating methionine (Met1) · p.Met1? denotes that amino acid Methionine-1 (Met, M) is changed and that it is unclear what the consequence of this change is. When experimental data show that no protein is made, the description p.0 is recommended

  15. Deletions and Duplications • deletions - are designated by "del" after the first and last amino acid(s) deleted; • · p.Lys2del in the sequence MKMGHQQQCC denotes a deletion of amino acid Lysine-2 (Lys, K) to MMGHQQQCC • · p.Cys28_Met30del denotes a deletion of three amino acids, from Cysteine-28 to Methionine-30 • · if a deletion creates a new amino acid at the deletion junction the change is described as an insertion/deletion, e.g. p.Cys28_Met30delinsTrp (see indels) • duplications - are designated by "dup" after the first and last amino acid affected by the duplication; • · p.Gly4_Gln6dup in the sequence MKMGHQQQCC denotes a duplication of amino acids Glycine-4 (Gly, G) to Glutamine-6 (Gln, Q) (i.e. MKMGHQGHQQQCC) • · duplicating insertions in single amino acid stretches (or short tandem repeats) are described as a duplication, e.g. a duplicating HQ insertion in the HQ-tandem repeat sequence of MKMGHQHQCC to MKMGHQHQHQCC is described as p.His7_Gln8dup (not p.Gln8_Cys9insHisGln)

  16. Insertions & Ins/Dels • insertions - are designated by "ins" after the nucleotides flanking the insertion site, followed by the nucleotides inserted. Duplicating insertions should be described as duplications; • · p.Lys2_Met3insGlnSerLys denotes that the sequence GlnSerLys (QSK) was inserted between amino acids Lysine-2 (Lys, K) and Methionine-3 (Met, M), changing MKMGHQQQCC to MKQSKMGHQQQCC • · if an insertion creates a new amino acid at the insertion junction the change is described as an insertion/deletion, e.g. p.Cys28delinsTrpVal • insertion/deletions (indels) - are described as a deletion followed by an insertion after the nucleotides affected; • · p.Cys28_Lys29delinsTrp denotes a 3 bp deletion affecting the codons for Cysteine-28 and Lysine-29, substituting them for a codon for Tryptophan • · p.Cys28delinsTrpVal denotes a 3 bp insertion in the codon for Cysteine-28, generating codons for Tryptophan (Trp, W) and Valine (Val, V)

  17. Frame shifting mutations • frame shifting mutations - are designated by "fs" after the amino acid(s) affected by the change. Descriptions either use a short ("fs" only) or long ("fsX#") notation. In "fsX#", "X#" indicates at which codon position (here #) the shifted reading frame ends in a stop codon (X). The description should include the change occurring at the site of the frame shift. NOTE: the shifted reading frame is thus open for #-1 amino acids. • · p.Arg97ProfsX23 (short p.Arg97fs) denotes a frame shifting change with Arginine-97 as the first affected amino acid, changing into a Proline and the shifted reading frame ending in a stop at codon 23. • · p.Leu30_Cys42delinsSerfsX3 (short p.Leu30fs) denotes a frame shifting change that deletes amino acids Leucine-30 to Cysteine-42, replacing these for a Serine at the deletion junction and ending in the shifted reading frame in a stop at codon 3

  18. Why is a uniform approach necessary? • To ensure efficient & accurate reporting • To ensure the increase in sequence variation detected is quality controlled, documented and stored correctly to enable future availability

  19. Refs • den Dunnen JT & Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion. Hum Mut 15: 7-12 • Horaitis O. and Cotton R.G.H. (2004) The Challenge of Documenting Mutation Across the Genome: the Human Genome Variation Society Approach.Hum Mut 23: 447-452 • den Dunnen JT & Paalman MH (2003) Standardizing mutation nomenclature: Why bother? Hum Mut 22:181-182 • www.hgvs.org/mutnomen/ • http://www.hgvs.org/mutnomen/ESHG2007_W12_JdD.pdf (dunnen presentation given at ESHG2007 meeting ) • http://leedsdna.info/HUGO/2004/Lecture_Notes

More Related