Previous Article |
Table of Contents
| Next Article
APPLIED BIOLOGICAL SCIENCES
Building a human kinase gene repository: Bioinformatics, molecular cloning, and functional validation



*Harvard Institute of Proteomics, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 320 Charles Street, Cambridge, MA 02141; and
Department of Systems Biology, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02115
Contributed by Ed Harlow, April 21, 2005
| Abstract |
|---|
|
|
|---|
Kinases catalyze the phosphorylation of proteins, lipids, sugars, nucleosides, and other important cellular metabolites and play key regulatory roles in all aspects of eukaryotic cell physiology. Here, we describe the mining of public databases to collect the sequence information of all identified human kinase genes and the cloning of the corresponding ORFs. We identified 663 genes, 511 encoding protein kinases, and 152 encoding nonprotein kinases. We describe the successful cloning and sequence verification of 270 of these genes. Subcloning of this gene set in mammalian expression vectors and their use in high-throughput cell-based screens allowed the validation of the clones at the level of expression and the identification of previously uncharacterized modulators of the survivin promoter. Moreover, expressions of the kinase genes in bacteria, followed by autophosphorylation assays, identified 21 protein kinases that showed autocatalytic activity. The work described here will facilitate the functional assaying of this important gene family in phenotypic screens and their use in biochemical and structural studies.
kinome | autophosphorylation | cell-based screens | high-throughput cloning | survivin
-phosphate from nucleoside triphosphates to a large number of molecules, including proteins, sugars, nucleosides, and lipids, and affect the activity and fate of those molecules and the cell. Phosphorylation is a common posttranslational modification of proteins and plays a key role on protein structure and function and in all aspects of cell physiology. Protein kinases contain well conserved motifs and constitute the largest family of proteins in the human genome (13). Mutations of protein kinases are involved in carcinogenesis and several other pathological conditions (46). Phosphorylations of other biomolecules also play a critical role in the physiology and pathology of cells. Lipid kinases such as the phosphoinositide-3 kinase family members are key modulators of the cellular response to growth factors, hormones, and neurotransmitters and are involved in cancer (7). Nucleotide and nucleoside kinases regulate the intracellular levels of phosphate donors and nucleic acid precursors and are involved in the cellular response to damage and ischemia (8, 9). Sugar kinases regulate the rates of sugar metabolism, energy generation, and transcription activation and are involved in the process of cellular transformation and apoptosis (1012).
The near completion of the Human Genome Project, the ongoing annotation projects, and the availability of sequence databases has allowed the genome-scale search and identification of members of different gene families by using sequence information as well as structural or functional annotations (2, 3, 1315). However, a systematic cloning, sequence analysis, and functional validation effort for any of these gene sets has been challenging. Indeed, a major goal for experimental biology in this postgenomic era is the creation and use of state-of-the-art clone collections that exploit the newly obtained genome sequence and gene annotation. In the most useful collections, clones would represent fully sequenced-verified ORFs, make use of recombination-based cloning techniques, and be arrayed in high-density formats where all positions are fully annotated (16, 17). All these properties will allow high-throughput (HT) subcloning of the genes in these collections, as well as facilitate experimentation (in any in vivo and in vitro system) and data collection/analysis (both positive and negative data).
In this study, we describe the construction and proof of principle use of such a collection for the human kinase genes. We describe the mining of public databases to identify all annotated human kinases (including protein and nonprotein kinases) and the generation of a sequenced verified clone collection for this gene set by using the CREATOR (BD Biosciences Clontech) cloning platform. We furthermore validated the expression of these clones, successfully screened their activity en masse in three independent cell-based assays, and confirmed enzymatic activity for some of those proteins in Escherichia coli. As we demonstrate here, this human kinase clone set will facilitate the study of this important gene class both in in vivo and in vitro settings.
| Materials and Methods |
|---|
|
|
|---|
Molecular Cloning. PCR amplification and cloning was carried out by using a highly automated and laboratory information management system (LIMS)-supported pipeline by using CREATOR recombination-based cloning technologies (BD Biosciences Clontech) (see Supporting Materials and Methods).
Generating Expression-Ready Libraries. ORFs were subcloned from the pDNR-Dual master vector into mammalian or bacterial expression vectors. For mammalian expression, pLP-CMVneo, pLP-EGFP-C1, and pLPS-3'EGFP vectors (BD Biosciences Clontech) were chosen for native, N-, and C-terminal EGFP-tagged version for each kinase, respectively. For bacterial expression, pGEX2tk (Amersham Pharmacia Biotech) was adapted for recombinational cloning (see Supporting Materials and Methods).
Mammalian Expression and Cell-Based Screens. Cotransfections of expression clones and reporter constructs were done by using FuGene6 (Roche Molecular Biochemicals) in a 96-well format. Reporter activity was measured by using luciferase reporter assay and Great EscAPe SEAP detection kits (BD Biosciences Clontech) (see Supporting Materials and Methods).
Bacterial Expression and Autophosphorylation. For more information on bacterial expression and autophosphorylation, see Supporting Materials and Methods.
| Results and Discussion |
|---|
|
|
|---|
Our most recent version of the human kinase gene set, based on the June 2004 analysis, consists of 663 genes. 511 of the genes (77%) encode for protein kinases and 152 genes (23%) for nonprotein kinases (Table 1). Genes encoding protein kinases were further classified in groups according to the extended classification of protein kinases (3). Nonprotein kinases comprise 23% of all annotated human kinases and are composed of heterogeneous groups of enzymes from the point of view of substrate specificity, gene sequence, and protein fold (ref. 21; Gene Ontology Consortium, www.godatabase.org). Data Set 1, which is published as supporting information on the PNAS web site, contains all relevant information for each of the 663 identified genes.
|
|
|
The number of accepted genes from this first pass of cloning and sequence analysis was 270, corresponding to 186 protein kinases and 84 nonprotein kinases. The coverage of protein kinase groups varied from 24% to 58% for the TK and CK1 groups. For the nonprotein kinases, we successfully cloned 55% of the targeted genes. GenBank accession numbers AY335555 [GenBank] -AY335786 have been obtained for the new clones, and the clone collection is now available from multiple distributors including RZPD, MRC, and the DanaFarber/Harvard Cancer Center "DNA Resource Core."
|
Expression of Kinases in Mammalian Cells. To validate the expression and activity of the kinase clones in cells, we transferred the kinase genes into three mammalian expression vectors (see Materials and Methods). We then made use of the C-terminal EGFP-tagged kinase library to analyze the level of expressions of 223 clones, based on the fluorescence level of the tagged kinases. HEK 293T cells were transfected in triplicate in 96-well format with the kinase constructs, and 94% of the clones were found to be positive for GFP fluorescence, compared with those of the empty C-terminal EGFP vector with a plate reader (Fig. 1). The same positive clones identified by the plate reader also scored positive upon microscopic analysis (see Fig. 1 for representative samples). There were reproducible differences of GFP fluorescence signal (up to 86-fold) among different clones, with no significant correlation between complete DNA sequences size and GFP fluorescence. Furthermore, microscopic analysis of the transfected cells allowed us, in most cases, to identify the cellular distribution of the recombinant kinases (data not shown). For example, the recombinant forms of Src family members (CSK, LYN, and YES1) and Tec family members (ITK, BTK, and BMX) showed cytoplasmic or plasma membrane localization, as expected (22, 23). Also, recombinant PIP5K1A and B, ACVR1, and GPRK5 were mainly localized in the cell membrane (24, 25) (Fig. 5, which is published as supporting information on the PNAS web site). Recombinant PLK, CHEK1, and CHEK2 were localized in the nucleus, consistent with the temporal association of these proteins with this cellular compartment (26, 27). GFP fluorescence is only an indirect measurement of the expression of the recombinant kinases. However, analysis of the relative levels of fluorescence and immunoblot signal (by using anti-GFP antibodies), for a subset of genes identified in the screen described below, indicated a good correlation between fluorescence signal and protein levels (Fig. 6, which is published as supporting information on the PNAS web site).
Screens of Clones in Cell-Based Assays. After having validated the expression of the library in mammalian cells, the human kinase expression clone sets were screened in two independent HT cell-based reporter gene assays. In the first screen, we looked for kinase genes with capacity to modulate the survivin promoter element. Survivin (BIRC5) is involved in the inhibition of apoptosis, and its expression is up-regulated in most cancer cells (28). Although survivin levels are controlled at the transcriptional level in a cell cycle-restricted manner (29), mounting evidence indicates that several oncogenic pathways might also regulate its transcription (30). Each of the 223 C-terminal EGFP-tagged kinase clones analyzed before were then cotransfected with a luciferase reporter construct containing survivin promoter region (-1430 to -1), pLuc-1430c (29) in HEK 293T cells in triplicate. Normalized luciferase activity data from two independent experiments are shown in Fig. 2A. A scatter plot of the z values from the two independent assays indicates a good level of correlation and a high degree of reproducibility of the kinase-induced effect (r = 0.92, Fig. 2B).
We then selected the eight genes that showed the highest positive modulatory activity (ADK, ATR, MAPK1, MAP2K5, PFKM, PRKR, STK10, and STK22C), as well as four genes showing inhibitory activity (BLK, HRI, MAP3K7, and PIM1) for further analysis. All eight activating kinases significantly up-regulated the survivin promoter activity in the confirmatory experiments, showing 3- to 32-fold induction (Fig. 2C). Likewise, the four inhibitory kinases reproducibly down-regulated the basal survivin promoter activity. The magnitude of inhibition induced by these kinases was uniform and limited to 0.6-fold inhibition (Fig. 2C). Importantly, consistent with the antiapoptotic role of survivin, expressions of all of the activator kinases, except PFKM, were found to protect cells from TNF-related apoptosis-inducing ligand (TRAIL)-induced apoptosis (data not shown). Consistent with our results, both MAP2K5 and MAPK1 have been found to inhibit TRAIL-induced apoptosis (31, 32) and MAP3K7 (TAK1) has been reported to induce apoptosis through JNK (33) and p38 activation (34). Our initial screen and follow-up experiment has also allowed the identification of genes, including the unexpected results with the nucleoside kinases ADK, that clearly regulate the survivin promoter and protect cells from TRAIL-induced apoptosis and that have not been reported before.
In a second independent cell-based screen, we made use of three different forms of the kinases (native, N- and C-terminal EGFP-tagged) and tested them for their ability to up-regulate the T cell factor (TCF)/lymphoid enhancing factor-binding region of the c-myc promoter (see Materials and Methods), a well characterized responsive element involved in the WNT signaling pathway (35). As shown in Fig. 3A, significant up-regulation of the normalized c-myc promotersecreted alkaline phosphatase reporter activity was achieved by 5 of the 699 constructs tested (233 kinases in three different expression vectors). Only one of the genes identified by the five strongest hits was represented by more than one construct (CK1-
), the other two genes uncovered in the screen (CK1-
and PRKR) were identified by only one of the three alleles generated for each one of those genes. These results clearly highlight the importance of generating and testing genes with alternative tags, thus generating different alleles for each gene, to enhance the chances of identifying the involvement of a gene in a given pathway. To further validate whether the activity of these constructs was specific to the TCF/lymphoid enhancing factor responsive element, we compared their activity and that of some negative controls against the wild type and a point-mutated scrambled TCF responsive element, FOP (35). All but the C-terminal EGFP-tagged PRKR showed specific activation of the TCF responsive element (Fig. 3B). Our findings are consistent with reports that the overexpression of CK1-
or -
are sufficient to activate the WNT signaling, but CK1-
, CK1
, and CK2-
could not induce any significant activation of
-catenin reporter (3638).
|
Summary. Complete exploitation of the human genome sequence requires the building of state-of-the-art and comprehensive ORF repositories at both the gene-family and genome scales. These types of repositories will serve not only as validation tools but also as unique HT discovery tools in the form of clones (for cell-based screens) and proteins (for biochemical and immunological assays). In this study, we have concentrated on the definition of the human kinase gene set and on the initial cloning and characterization of this important clone collection. Expression of the obtained kinase set in both in vivo and in vitro assays, in a HT manner, has allowed the validation of the kinase constructs and has demonstrated the value of this gene set for cell-based and biochemical screens. Some of the hits identified in the two cell-based validation screens correlate with the published literature. Furthermore, some of the hits in the screen for regulators of the survivin promoter suggested new regulatory elements that also affect the apoptotic response of cells. Some of these genes encode sugar and nucleoside kinases and require further analysis.
Use of the clones in biochemical assays also allowed the identification of 21 kinase genes that possess autocatalytic activity when expressed in bacteria. These kinases could prove important for use in in vitro studies to define substrate specificities (by using peptide libraries) and to profile kinase inhibitors.
In addition to the wild-type kinase described here, we recognized the value of generating both a short-hairpin RNA (shRNA) library and a "dominant negative" library for the human kinase gene set. Consistently, the information generated here contributed to the construction of the shRNA library described by Paddison et al. (43) such that 80% of the human kinase genes identified here have been covered by multiple shRNA constructs in that collection. Another positive step could be to generate dominant negative (kinase-dead) alleles for those protein kinases whose wild-type alleles have been captured so far. Coordinated use of the wild-type kinase, the dominant negative, and the shRNA collections will facilitate the identification and understanding of the role of the human kinase genes in any phenotypic assay. Furthermore, construction of wild-type and alternative tagged forms of the genes results in the creation of alleles with differential activities further enhancing the chances of identifying hits in a given cell-based assay by using gene overexpression screens.
The kinase clones described here add to the growing number of recently generated human ORF and arrayed cDNA clone collections, which allow indexed experimentation at the gene family or subgenome scales (38, 44, 45). Future challenges on this area are the completion of relevant gene-family ORF collections (1315) (to include all genes in a given gene family and all alternative spliced forms of every gene) and the production of clone isolates with full-length sequence verification (such as the clones in this study) for all predicted human ORFs.
| Acknowledgements |
|---|
| Footnotes |
|---|
Abbreviations: HT, high throughput; MGC, Mammalian Gene Collection; shRNA, short-hairpin RNA; TCF, T cell factor.
Data Deposition: The gene constructs reported in this paper have been deposited in the GenBank database (accession nos. AY335555AY335786).
To whom correspondence should be addressed. E-mail: lbrizuela{at}hms.harvard.edu.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
A. Rolfs, W. R. Montor, S. S. Yoon, Y. Hu, B. Bhullar, F. Kelley, S. McCarron, D. A. Jepson, B. Shen, E. Taycher, et al. Production and sequence validation of a complete full length ORF collection for the pathogenic bacterium Vibrio cholerae PNAS, March 18, 2008; 105(11): 4364 - 4369. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zuo, S. E. Mohr, Y. Hu, E. Taycher, A. Rolfs, J. Kramer, J. Williamson, and J. LaBaer PlasmID: a centralized repository for plasmid clone information and distribution Nucleic Acids Res., January 12, 2007; 35(suppl_1): D680 - D684. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||