New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Opinion: Standardizing gene product nomenclature—a call to action

The current lack of a standardized nomenclature system for gene products (e.g., proteins) has resulted in a haphazard counterproductive system of labeling. Different names are often used for the same gene product; the same name is sometimes used for unrelated gene products. Such ambiguity causes not only potential harm to patients, whose treatments increasingly rely on laboratory tests for multiple gene products, but also miscommunication and inefficiency, both of which hinder progress of broad scientific fields. To mitigate this confusion, we recommend standardizing human protein nomenclature through the use of a Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) gene symbol accompanied by its unique HGNC ID. We call for action across all biomedical communities and scientific and medical journals to standardize nomenclature of gene products using HGNC gene symbols to enhance accuracy in scientific and public communication.
We call on all biomedical communities and scientific and medical journals to standardize nomenclature of gene products to enhance accuracy in scientific and public communication. Image credit: Shutterstock/greenbutterfly.
Use of gene symbols designated by the HGNC [www.genenames.org (1)] is nearly universal. DNA- and RNA-level sequence variation nomenclature has been standardized to use HGNC gene symbols, the Single Nucleotide Polymorphism database (dbSNP) IDs, and genetic variant nomenclature designated by the Human Genome Variation Society (HGVS) to unambiguously designate variants. In striking contrast to the use of universal identifiers for genes and gene variants, there are no universal identifiers for the peptides and proteins that these genes encode.
Many gene products have multiple nomenclatures in widespread use, and many common nomenclatures are used for multiple gene products. For example, the symbol “PD-1” is shared by multiple unrelated gene products and is used to describe PDCD1, SNCA, and SPATA2 gene products. The PDCD1 (PD-1) protein is a well-known target for cancer immunotherapy. However, the term “anti–PD-1” could mean a therapeutic antibody against products from PDCD1, SNCA, or SPATA2. For another example, “TTF-1” (an abbreviation of “thyroid transcription factor 1”) is used to describe a protein encoded by the NKX2-1 gene, causing confusion with a different gene having the official gene symbol TTF1 (transcription termination factor 1). Using “TTF-1” to describe the NKX2-1 gene product is confusing and potentially harmful, particularly if a drug specifically targeting either of these gene products becomes available.
Clearly there are instances in which use of nonstandardized nomenclature may lead to miscommunication (see Table 1 and SI Appendix, Table 1). At best the current situation results in inefficient communication; at worst it creates harm from misunderstanding test results and making inappropriate therapeutic choices. Names matter. Clearly, a better naming convention that eliminates this type of ambiguity is necessary.
Examples of gene products with common ambiguous and/or confusing nomenclature
Our goal is to prevent the harms caused by ambiguous gene product nomenclature. The National Academy of Medicine (2, 3) and the Institute for Safe Medication Practices [ISMP (4)] report alarming data on the cause and impact of medical errors in the United States, with more than 100,000 medical errors reported annually. To mitigate errors caused by misreading medical abbreviations, ISMP and Davis et al. are working on recommendations to avoid uncommon or ambiguous abbreviations (5, 6). We assert that standardizing nomenclature of gene products will synergize with the efforts for reducing medical errors.
Nomenclature for Genes and Gene Products
The 1979 official guidelines for human gene naming remain the basis of gene nomenclature assigned by the HGNC today (7, 8). Their universal, unambiguous nomenclature system for genes was widely adopted with immense value in facilitating communication. In parallel, the International Union of Immunological Societies subcommittees are responsible for naming, for example, Cluster of Differentiation (CD) molecules, immunoglobulins, T cell receptors, and interleukins (9). The Enzyme Commission [EC (10)] designates “accepted names” for enzymes, and the Nomenclature Committee of the International Union of Basic and Clinical Pharmacology (NC-IUPHAR) names biological targets (11). The UniProt Knowledgebase (UniProtKB) (12) includes recommended protein names as well as functional, taxonomic, and structural information.
However, current protein naming systems based on different aspects of structure, function, and cellular localization readily conflict, and they may make less sense if new functions, alternative structures, or novel cellular localizations are discovered. Although HGNC actively collaborates with all of these protein naming committees when assigning gene nomenclature, the absence of a single universally accepted protein nomenclature, combined with multiple independent groups creating varied naming systems based on their preferences, is both startling and potentially dangerous. We believe it is critical that protein nomenclature avoid this significant pitfall.
Roadmap to universal implementation of a gene product nomenclature system. Steps to implement an advanced gene product nomenclature system: (i) form a working group consisting of experts from diverse fields, (ii) develop a standardized nomenclature system and usage guidelines, (iii) implement the system, (iv) receive feedback, and (v) improve the standardized nomenclature system and usage guidelines.
Official Gene Symbols
Several groups, both private (www.biosciencewriters.com/Guidelines-for-Formatting-Gene-and-Protein-Names.aspx) and public (www.ncbi.nlm.nih.gov/genome/doc/internatprot_nomenguide/), have previously recommended using HGNC-approved gene symbols for gene product identification. HGNC further recommends the usage of italics for symbols denoting genes, mRNAs, and alleles to differentiate them from proteins. This approach, which has not been universally adopted, is attractive for several reasons.
HGNC gene symbols are already widely adopted across all major genomic resources. These symbols are standardized by HGNC rules, and HGNC collaborates with UniProt and all of the external protein nomenclature resources mentioned herein (www.genenames.org/help/symbol-report/). There is a one-to-one pairing of a given gene and its official symbol with a unique HGNC ID, with no ambiguity or confusion. The central dogma of biology dictates that coding DNA gives rise to RNA, then to peptide and protein. Although this central dogma does not cover all biological truths, it provides a strong rationale to name gene products using HGNC-approved gene symbols. HGNC IDs remain stable, even if HGNC gene symbols are updated. Using HGNC gene symbols with common colloquial names (if any) and HGNC ID in parenthesis (e.g., PDCD1 protein [PD-1; HGNC: 8760]) can eliminate nomenclature ambiguity.
For a specific gene product isoform, a supplementary unique UniProt ID can be added after the HGNC ID (e.g., PDCD1 protein [PD-1; HGNC: 8760; UniProt: Q15116-1]). UniProt assigns each isoform a unique ID composed of the primary UniProt accession plus a dash and a number. This strict combination of the gene symbol accompanied by HGNC ID and UniProt ID avoids confusion with a small effort. This approach takes advantage of the efforts of the HGNC to assure unique gene identification. In this scenario, italics signifies a gene symbol, whereas nonitalics signifies the encoded protein(s). Thus, the italic term PDCD1 indicates the PDCD1 gene, whereas the nonitalic term PDCD1 indicates the PDCD1 protein, which may be accompanied by nonofficial names such as PD-1 in parenthesis. To eliminate ambiguity, nonofficial names such as PD-1 should not be used without official symbols.
Furthermore, the Vertebrate Gene Nomenclature Committee (VGNC, vertebrate.genenames.org/), which is a sister project to the HGNC, is responsible for assigning standardized nomenclature to genes in key vertebrate species that lack a nomenclature authority. The VGNC coordinates closely with all existing vertebrate gene nomenclature committees, namely the mouse, rat, chicken, xenopus, and zebrafish committees (see SI Appendix, Table 2), to ensure that vertebrate genes are named in line with their human homologs. The VGNC additionally approves gene nomenclature for other selected vertebrates (vertebrate.genenames.org/about/species-list/). Additionally, invertebrates, including important model species such as Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode), Saccharomyces cerevisiae (baker’s yeast), and Schizosaccharomyces pombe (fission yeast), also have established naming committees associated with their model organism databases (see SI Appendix, Table 2). Therefore, the concept of using the approved gene symbol, along with a database ID (and UniProt ID where required), for unambiguous gene product naming is clearly applicable to numerous other species and should be encouraged wherever possible.
Gene Symbol Versus Gene Name
Each gene has one unique long “gene name” and one unique short “gene symbol,” both of which are officially approved by the HGNC. Routinely, the gene symbol is an abbreviation of the gene name.
We recommend use of symbols over names for a multitude of reasons. First, gene symbols are much easier to remember than gene names (e.g., PIK3CA is the symbol for the name “phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha”). Second, changes to gene names have historically occurred more often than symbol changes, and continuity is a valuable means of reducing confusion. Third, a space, hyphen, or comma in a gene name may create ambiguity over where the name ends. Fourth, names may contain words with differential spellings in British and American English; e.g., “oestrogen receptor 1” (nonofficial name) versus “estrogen receptor 1” (official gene name); however, there is only one HGNC-approved symbol: ESR1. Lastly, gene names may be mistranslated by computer software and artificial intelligence. For all of these reasons, the use of official HGNC gene symbols to represent protein products promotes accurate, efficient communication.
More Nomenclature Challenges
The biology of peptides and proteins is very complicated. There are many protein complexes consisting of multiple different peptides, often with other biomolecules. One gene may encode multiple isoforms, which may have structural variants or undergo posttranscriptional/posttranslational modifications. Several peptides are derived from “INS (HGNC: 6081)”; however, UniProt IDs do not currently discriminate against these different entities. Furthermore, the use of prefixes on established symbols, such as “sCD14” to denote “soluble CD14” or “pSTAT3” to denote “phosphorylated STAT3,” can cause confusion. Although dealing with this complexity requires effort beyond the simple use of gene symbols, naming multiple products derived from one gene based on the gene symbol provides a feasible starting point for an unequivocal nomenclature system. We propose developing rules for the unique identification of gene product variants as the next step to standardize gene product nomenclature.
Although some scientists, healthcare providers, and patients may be unfamiliar with certain gene symbols, the widespread use of gene symbols to describe gene products in the literature and medical records should eventually promote their familiarity. Endorsement of a standard nomenclature system by the World Association of Medical Editors and the International Committee of Medical Journal Editors will help educate authors, reviewers, and readers. It is also crucial that this standard is implemented and enforced by journals through their editorial boards and copyeditors. Considering the impact of published scientific data on the scientific process, one could argue that editors have an enhanced obligation to assist with the process of diminishing confusion and misstatements in scientific findings. The potential harms in not adopting such a system far outweigh the short-term discomfort of learning a new naming convention. Additionally, we assert that implementing standardized gene product nomenclature will accelerate clinical and epidemiological research using electronic health records and other healthcare information systems.
Seeking Solutions
We propose the universal use of HGNC-approved gene symbols, along with HGNC IDs, UniProt IDs, and common colloquial names (when appropriate), to unambiguously identify gene products and reduce the potential for serious harm arising from mistaken identification of gene products. The lack of a single unifying protein nomenclature system continues to confuse scientists, clinicians, and the public. This confusion impedes communication, data sharing, and scientific progress. Given the wealth of data on alterations of specific gene products in various diseases and increasing importance of treatment approaches targeting proteins and peptides, the use of nonstandardized names for gene products must end. We believe that regular and strict use of gene symbols along with nonofficial legacy names will provide clarity with minimal difficulty.
Ultimately, a more complex protein nomenclature system must be developed to unequivocally distinguish gene product variants emanating from one gene, as well as complexes of biomolecules containing multiple gene products. We propose to form a working group of experts from diverse fields to develop a robust and complete gene product nomenclature system and guidelines for its use (Fig. 1). Support from journals and publishers, together with clinical and public health leaders, is critical for effective implementation. Standardized use of the HGNC-approved gene symbols along with HGNC IDs to identify gene products is an important first step to achieve this ambitious goal.
Acknowledgments
This work was supported in part by grants from the US National Institutes of Health (U24 HG003345 to E.A.B., R35 CA197735 to S.O., R01 CA151993 to S.O., R21 CA230873 to S.O., R01 CA248857 to S.O.) and from the Wellcome Trust UK (208349/Z/17/Z to E.A.B.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
- ↵1To whom correspondence may be addressed. Email: elspeth@ebi.ac.uk, timothy.oleary@va.gov, or sogino{at}bwh.harvard.edu.
Author contributions: S.O. conceived and designed the overall project. K.F. and S.O. made the initial draft and created the table. All authors edited the manuscript, provided constructive feedback, and approved the final version. K.F., E.A.B., P.M., and C.L.S. contributed equally as co-first authors. S.M.M., T.L.P., H.F., H.E.W., and S.O. contributed equally as co-last authors.
Competing interest statement: C.L.S. is employed through Torreyana Corp, San Diego, CA; Blackhawk Genomics, Concord, CA; and Advagenix, Rockville, MD. Volunteer activities of C.L.S. include those for College of American Pathologists, Northfield, IL, and Clinical Laboratory Standards Institute, Wayne, PA. T.J.O. is a Member, Scientific Advisory Committee, MioDx and Integrated Nano-Technologies. A.W.I.L.’s laboratory receives sponsorships from AstraZeneca (HK) Ltd. and MSD (HK) Ltd. for providing selected companion diagnostic tests free to public patients. N.C. is an employee at Quest Diagnostics, Inc. F.A.M. is an employee and stock option holder at Castle Biosciences, Inc. A.B.C. is paid teaching faculty for the American Medical Informatics Association Clinical Informatics Board Review Course and receives small honoraria as well as travel reimbursement to speak at multiple scientific and professional medical society meetings. T.L.P. has been consulted (compensated) for Bio-Rad, Inc. and is Director of Pathology Strategies for the Sturge Weber Foundation (compensated). H.E.W. has employment through Viapath, a majority National Health Service-owned independent pathology service provider; has been a paid faculty member at Kingston University, accepted paid accommodation and subsistence as an invited speaker to Cytocell User Group meeting for the United Kingdom and Ireland, and accepted paid event registration as an invited speaker to Digital Pathology/Global Engage meeting. H.E.W.’s laboratory received scholarship funds from The International Council for Standardization in Haematology for providing JAK2 testing. The other authors (K.F., E.A.B., P.M., N.R.P., K.P.P., B.S., M.S., M.L.G., S.M.M., H.F., and S.O.) do not have any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this article.
Any opinions, findings, conclusions, or recommendations expressed in this work are those of the authors and have not been endorsed by the National Academy of Sciences.
The opinions are those of the authors and are not to be construed as official or as representing the views of their institutions or governments.
Use of Standardized Official Symbols: We use HGNC (HUGO Gene Nomenclature Committee) approved symbols and root symbols for genes and gene families, including ABL1, ACE, ACE2, ACTA2, ARHGEF7, ARPC5, ASCC1, BCR, CD2, CD14, CD40, CDKN1A, CDKN1B, CDKN2A, CKAP4, CKB, CKM, CLN6, CLN8, COX8A, DCTN2, EDEM, EREG, ESR1, ESR2, FLNB, H3P16, IFI27, IL2RA, INS, ISG20, KLK3, KMT, KMT2B, KMT2D, KRT, MT-CO2, MTTP, NFKB1, NKX2-1, NPEPPS, NSG1, NXF1, PDCD1, PGR, PIK3CA, POLD2, PRH2, PSAT1, PSMD9, PTGS2, SEC14L2, SMN1, SNCA, SPATA2, STAT3, TAP1, TCEAL1, TMEM37, TPRG1, TP63, TTF1, USO1, and ZNRD2; all of which are described at www.genenames.org. The official gene symbols are italicized to differentiate from nonitalicized gene product names, gene root/stem symbols, and nonofficial names.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2025207118/-/DCSupplemental.
Published under the PNAS license.
References
- ↵
- ↵
- L. T. Kohn,
- J. M. Corrigan,
- M. S. Donaldson
- ↵
- Consumer Reports Health
- ↵
- The Institute for Safe Medication Practices
- ↵
- N. M. Davis
- ↵
- N. M. Davis
- ↵
- E. A. Bruford et al
- ↵
- ↵
- ↵
- ↵
- S. P. H. Alexander et al.; CGTP Collaborators
- ↵
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
- Biological Sciences
- Genetics