90 likes | 337 Views
Now you see it, now you don’t. Sneddon TP 1 , Bruford EA 1 , Eyre TA 1 , Khodiyar VK 1 , Lovering RC 1 , Lush MJ 1 , Sneddon KMB 1 , Talbot Jr. CC 2 , Wright MW 1 and Povey S 1.
E N D
Now you see it, now you don’t Sneddon TP1, Bruford EA1, Eyre TA1, Khodiyar VK1, Lovering RC1, Lush MJ1, Sneddon KMB1, Talbot Jr. CC2, Wright MW1 and Povey S1 HUGO Gene Nomenclature Committee (HGNC),1Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London, NW1 2HE, UK; 2The Johns Hopkins School of Medicine, Institute of Genetic Medicine, The Johns Hopkins University, Baltimore, MD, 21205-2196, USA. Email:nome@galton.ucl.ac.uk URL:http://www.gene.ucl.ac.uk/nomenclature/ The work of the HGNC is supported by NHGRI grant P41 HG003345, the UK Medical Research Council and the Wellcome Trust.
Introduction The HGNC has to date approved over 22,000 unique symbols and names, the majority of which are for ‘genes’, i.e. genomic segments that are transcribed and translated into functional proteins. What happens when these genes, initially thought to be single copy in the human genome, turn out to be variable in copy number between individuals? This is the case for an increasing number of genes, including members of the well-established amylase and defensin gene families1-3. It has yet to be determined to what extent these additional gene copies are functional, what effect they have on genome stability and disease susceptibility, and how they contribute to individual phenotypic differences. Whatever their impact they will inevitably need to be discussed in the literature, preferably using a meaningful and systematic nomenclature that is globally accepted.
Examples of variable copy number genes 1. Tandem repeats A gene may be organised in a tandem array and present in zero to multiple copies e.g the amylase genes. The cartoon below illustrates copy number variation of the tandemly duplicated GeneC in two different people. no tandem repeats GeneA GeneB GeneD Genome 1 GeneA GeneB GeneC GeneC GeneC GeneD 3x tandem repeats 1x tandem repeat GeneA GeneB GeneC GeneD Genome 2 GeneA GeneB GeneC GeneC GeneD 2x tandem repeats
Amylase (AMY) genes A ~100 kb repeat on 1p21.1, present in 0-8 copies per human diploid genome, contains the variable copy number AMY1A, AMY1B and AMYP1 genes. Individual haplotypes can be described by AMY2B-AMY2A-[AMY1A-AMY1B-AMYP1]n-AMY1C (where n=0-4)1,4-5. BLAT analysis of the AMY1A NM_001008221 sequence (UCSC Genome Browser on Human May 2004 Assembly) AMYP1 AMY2B AMY2A AMY1A AMY1B AMY1C ~100 kb tandemly repeated region As shown above, the current human genome build includes one copy of the AMY2A, AMY2B, AMY1A, AMY1B, AMYP1 and AMY1C genes (n=1). The HGNC has a single entry and an approved symbol and name for each of these genes. Should we ‘tag’ genes that have been identified as variable copy number e.g. AMY1A#, AMY1B# and AMYP1#?
2. Segmental duplication A group of genes may be organised within a segmental duplication (duplicon) and present in two or more copies at a single locus or different loci and in various orientations in the human genome e.g. the defensin beta genes. The cartoon below illustrates copy number variation of GeneA, GeneB and GeneC contained within a segmental duplication at two separate loci on two different chromosomes. Chromosome A GeneA GeneB GeneA GeneB GeneC GeneC GeneD GeneE segmental duplication segmental duplication Chromosome B GeneA GeneB GeneC GeneD GeneE no segmental duplication
Defensin beta (DEFB) genes A >240 kb segmentally duplicated region on 8p23.1 contains the DEFB4P, DEFB4, DEFB103A, SPAG11 (sperm associated antigen 11), DEFB104A, DEFB106A and DEFB105A genes. This entire region is present 2-12 copies per human diploid genome6-8. DEFB4P DEFB4P endodgenous retrovirus BLAT analysis of the DEFB4 NM_004942 sequence (UCSC Genome Browser on Human May 2004 Assembly) >240 bp segmentally duplicated region Possible further duplicated region >240 bp segmentally duplicated region As shown above, the current human genome build includes two copies of the segmental duplication, one in reverse orientation. The HGNC has a single entry and approved gene symbol and name for each of the genes in this duplicon. Should we name variable copy number genes based on their gene order from the p-arm telomere (ptel) to the q-arm telomere (qtel) e.g. DEFB4#1 (~7.3 Mb) and DEFB4#2 (~7.8 Mb)? Should the numbering be based on the current assembly? If so, what happens if an additional DEFB4 gene is subsequently identified in the assembly gap at ~7.5 Mb?
3. Complex loci A group of genes in a cluster may each be independently present in zero to multiple copies at a single locus, with no constraint on their location within the cluster, in the human genome e.g. the defensin alpha genes. The cartoon below illustrates copy number variation of GeneA, GeneB, GeneC, GeneDand GeneE contained within a cluster at a single locus on two different chromosomes. Chromosome A GeneA GeneB GeneB GeneC GeneD GeneE Chromosome B GeneA GeneE GeneD GeneE GeneC
Defensin, alpha 1 and alpha 3, variable copy number locus The DEFA1A3 locus at 8p23.1 comprises ~19 kb tandem repeats that contain either the DEFA1, DEFA3 or DEFT1P gene. The repeats are present 4-11 copies per human diploid genome (every chromosome contains at least one copy of DEFA1 and DEFT1P although some individuals do not possess any DEFA3 copies) and can differ in location within the repeat array with respect to each other7-9. DEFT1P pseudogene DEFT1P pseudogene BLAT analysis of the DEFT1P NG_005042 sequence (UCSC Genome Browser on Human May 2004 Assembly) DEFA1 DEFA1 DEFA3 ~19 kb ~19 kb ~19 kb ~19 kb ~19 kb DEFA1A3 As shown above, the current human genome build includes two copies of the DEFA1 and DEFT1P genes and one copy of the DEFA3 gene.The HGNC has a single entry and approved symbol and name for the DEFA1A3 locus and for the DEFA1, DEFA3 and DEFT1P genes. As the DEFA1 and DEFA3 gene products differ by one amino acid should they remain as separate entries or should they be merged into the DEFA1A3 gene record?
Summary As illustrated above there is currently no consensus for naming variable copy number genes and, indeed, it may not be feasible to apply a standard nomenclature system to all cases. We would greatly appreciate your opinion and thoughts regarding the potential problems of the current nomenclature systems, benefits and pitfalls of a standard nomenclature, and ideas for a new nomenclature scheme for variable copy number genes. Please visit us at Booth 1128 in the Exhibit Area or email nome@galton.ucl.ac.uk to discuss your views References 1. Iafrate AJ et al. (2004) Nat. Genet. 36(9):949-51. 2. Sebat J et al. (2004) Science.305(5683):525-8. 3. Sharp AJ et al. (2005) Am. J. Hum. Genet. 77(1):78-88. 4. Groot PC et al. (1989) Genomics.5(1):29-42. 5. Groot PC, Mager WH and Frants RR (1991) Genomics.10(3):779-85. 6. Hollox EJ, Armour JA and Barber JC (2003) Am. J. Hum. Genet. 73(3):591-600. 7. Linzmeier RM and Ganz T (2005) Genomics.86(4):423-30. 8. Taudien S et al. (2004) BMC Genomics.5(1):92. 9. Aldred PM, Hollox EJ and Armour JA (2005) Hum. Mol. Genet. 14(14):2045-52. Acknowledgements The HGNC would like to thank Drs Anthony Brookes, Evan Eichler, John Armour, Rose Linzmeier and Stefan Taudien for their helpful suggestions regarding variable copy number gene nomenclature.