Structure of a PE–PPE–EspG complex from Mycobacterium tuberculosis reveals molecular specificity of ESX protein secretion

Significance Mycobacterium tuberculosis (Mtb) infects nearly a third of the global population, and understanding how Mtb establishes infection and evades host responses is key to development of improved therapies. Two mysterious protein families, called Pro-Glu motif–containing (PE) and Pro-Pro-Glu motif–containing (PPE) proteins, are highly expanded in Mtb and have been linked to virulence, but their function remains unknown. We have determined the crystal structure of a PE-PPE protein dimer bound to ESAT-6 secretion system (ESX) secretion-associated protein G (EspG), a component of the secretion system that translocates PE-PPE proteins to the bacterial cell surface. This structure reveals how each of the four EspGs in Mtb interacts with a different subset of the ∼100 PE and ∼70 PPE proteins, directing specific classes of PE-PPE “effector” proteins through separate secretory pathways. Nearly 10% of the coding capacity of the Mycobacterium tuberculosis genome is devoted to two highly expanded and enigmatic protein families called PE and PPE, some of which are important virulence/immunogenicity factors and are secreted during infection via a unique alternative secretory system termed “type VII.” How PE-PPE proteins function during infection and how they are translocated to the bacterial surface through the five distinct type VII secretion systems [ESAT-6 secretion system (ESX)] of M. tuberculosis is poorly understood. Here, we report the crystal structure of a PE-PPE heterodimer bound to ESX secretion-associated protein G (EspG), which adopts a novel fold. This PE-PPE-EspG complex, along with structures of two additional EspGs, suggests that EspG acts as an adaptor that recognizes specific PE–PPE protein complexes via extensive interactions with PPE domains, and delivers them to ESX machinery for secretion. Surprisingly, secretion of most PE-PPE proteins in M. tuberculosis is likely mediated by EspG from the ESX-5 system, underscoring the importance of ESX-5 in mycobacterial pathogenesis. Moreover, our results indicate that PE-PPE domains function as cis-acting targeting sequences that are read out by EspGs, revealing the molecular specificity for secretion through distinct ESX pathways.

T uberculosis is a major public health challenge, and new interventions are needed to control emerging, highly drugresistant strains (1). Sequencing of the Mycobacterium tuberculosis (Mtb) genome revealed the presence of two mysterious, highly expanded protein families in pathogenic mycobacteria (2), named PE and PPE, due to the presence of N-terminal domains with conserved Pro-Glu (PE) and Pro-Pro-Glu (PPE) sequence motifs. The Mtb genome encodes ∼100 PE and ∼70 PPE genes, accounting for ∼10% of the genome's coding capacity (2), whereas nonpathogenic mycobacteria harbor relatively few PE and PPE genes. Both protein families are highly polymorphic, localize to the cell surface or are secreted (3,4), and are expressed during infection (5), leading to hypotheses that they are involved in virulence, antigenic diversity, or immune evasion (6). Although many PE-PPE proteins are recognized by the immune system during infection (7), it remains unclear whether they are involved in antigenic variation (8). Outside the N-terminal core PE or PPE domain, the C-terminal segments vary widely (2,9), sometimes encoding putative enzymatic domains or large peptide repeat arrays (several greater than 1,000 aa in length). PE and PPE genes often form operons, suggesting PE and PPE proteins interact with each other, and the crystal structure of a PE-PPE protein complex showed directly that they form a heterodimer (10). However, the PE and PPE domains have no detectable sequence or structural homology to other protein families, and their function remains unknown.
Intriguingly, secretion of PE and PPE proteins has been linked to a set of related, specialized protein export pathways of mycobacteria called the 6-kDa early secreted antigenic target (ESAT-6) secretion system (ESX) (11,12). "Type VII" secretion systems distantly related to mycobacterial ESX systems have been identified in numerous other Gram-positive bacteria (13)(14)(15)(16)(17). The protein components of type VII secretion systems differ considerably between bacterial species but generally include the following: (i) one or more small helical proteins of the WXG100 protein family (e.g., ESAT-6, YukE), (ii) an FtsK/SpoIII-type ATPase that is thought to drive protein secretion (e.g., EccC, YukB), and (iii) a multipass transmembrane protein that may form the pore of the translocon (e.g., EccD, YueB). Along with additional species/lineage-specific factors, such as the mycobacterial PE and PPE proteins, these components have been proposed to assemble in the plasma membrane, where they translocate specific protein substrates to the cell surface (or to the periplasmic space of mycobacteria) (figure 4B of ref. 18). The Mtb genome encodes five distinct but evolutionarily related type VII/ESX systems (ESX-1 to ESX-5) at different loci around the genome, and the primary attenuating mutation in the Mycobacterium bovis bacillus Calmette-Guérin vaccine strain is a deletion of a large segment of the ESX-1 locus (19). Each ESX system is thought to secrete a distinct complement of proteins, including the cognate ESAT-6 and 10-kDa culture filtrate protein (CFP-10) homologs encoded in each specific locus (19,20). All ESX gene clusters, with the exception of ESX-4, also encode at least one PE-PPE gene pair, and all of the PE-PPE proteins tested so far are secreted in an ESX-dependent manner (11,21), suggesting that PE-PPE secretion may be an important function of the ESX, although many PE-PPE proteins are encoded outside of ESX loci. Recently, PPE proteins were reported to interact with Significance Mycobacterium tuberculosis (Mtb) infects nearly a third of the global population, and understanding how Mtb establishes infection and evades host responses is key to development of improved therapies. Two mysterious protein families, called Pro-Glu motif-containing (PE) and Pro-Pro-Glu motif-containing (PPE) proteins, are highly expanded in Mtb and have been linked to virulence, but their function remains unknown. We have determined the crystal structure of a PE-PPE protein dimer bound to ESAT-6 secretion system (ESX) secretion-associated protein G (EspG), a component of the secretion system that translocates PE-PPE proteins to the bacterial cell surface. This structure reveals how each of the four EspGs in Mtb interacts with a different subset of the ∼100 PE and ∼70 PPE proteins, directing specific classes of PE-PPE "effector" proteins through separate secretory pathways.  ESX secretion-associated protein G (EspG) (22), another ESXencoded protein of unknown structure and function. However, the molecular basis for this interaction and its functional importance in PPE secretion remain unclear.
Here, we report the crystal structures of three EspGs from Mtb and Mycobacterium smegmatis (Msmeg), and a ternary complex between an Mtb PE-PPE pair and its cognate EspG. These structures define the key elements of the PPE-EspG interaction and, coupled with bioinformatics and biochemical interaction studies, suggest that EspGs function as adaptors to deliver PE-PPE complexes to their cognate ESX system for translocation across the plasma membrane. Moreover, our work shows that the vast majority of PE-PPE proteins in Mtb interact with EspG from the ESX-5 secretion system, which has been hypothesized to play a major role in PE-PPE protein secretion (9,20,23).

Results and Discussion
Structure of the EspG Encoded in the ESX-3 Gene Cluster of Mtb Defines a Novel Protein Fold. To better understand the role of EspG in PE-PPE protein secretion, we determined the crystal structure of the EspG paralog from the ESX-3 system (EspG 3 ) of Mtb (EspG 3Mt ) at a resolution of 2.85 Å (SI Appendix, Table S1). Because no structural homologs of EspG could be identified based upon the primary sequence, crystals of Msmeg EspG 3 (EspG 3Ms ) derivatized with selenomethionine were phased by multiwavelength anomalous dispersion (MAD) (24) (SI Appendix, Fig. S1 and Table S2), and the resulting model was used to phase the EspG 3Mt dataset by molecular replacement. Here, we focus on the structure of EspG 3Mt and will return to discuss EspG 3Ms subsequently.
EspG 3Mt adopts a novel, mixed α/β-fold consisting of a 10-stranded β-sheet and eight α-helices (Fig. 1A). The large sheet is kinked in the middle of the eighth β-strand (β3′) by a nonclassical bulge due to a two-residue insertion, resulting in a slight bend of the sheet. The sheet is flanked on each end by a three-helix bundle (α1-α3 and α1′-α3′; Fig. 1A), and two additional helices (α4 and α4′) pack against the back face of the β-sheet. In contrast, a large portion of the opposite face of the sheet is solvent-exposed, forming the bottom of a shallow basin lined by nearby loops and helices. The C-terminal 18 residues, including several large hydrophobic residues, are largely disordered in the electron density map but may serve as an ideal docking site for EspG onto the larger ESX apparatus anchored in the plasma membrane.
Intriguingly, the overall EspG fold is approximately twofold symmetrical, with each half containing five strands and four helices ( Fig. 1 B and C). The two halves of EspG have the same topology and connectivity, despite little detectable sequence similarity between the two halves. Thus, the extant EspG fold may have arisen from the duplication and fusion of an ancestral subdomain that has been heavily diversified over the course of evolution. Indeed, whereas a search of the Dali Database (25) was unable to identify proteins with a structure similar to the complete EspG, searches using either of the two subdomains identified several distantly related proteins with a similar core topology, although they were generally missing one or more helices and had additional elements not present in EspG (Fig. 1D).
Although the two halves of EspG bear striking resemblance to one another, there are two main regions of divergence between the two subdomains. First, helix α2 from the N-terminal subdomain is rotated ∼70°from the position of helix α2′ in the C-terminal subdomain (Fig. 1C, Upper vs. Lower). In addition, the short linker connecting helices α1′ and α2′ in the C-terminal subdomain forms a much longer, V-shaped "tongue" in the N-terminal subdomain between α1 and α2, and wraps around α1 to interact with the convex surface of the β-sheet nearly 20 Å away. The second major difference between the subdomains lies in strands β2′ and β3′ of the C-terminal subdomain, which are much longer and distorted by the bulge in β3′ described above (Fig. 1C, Upper vs. Lower).

EspG Encoded in the ESX-5 Gene Cluster of Mtb Interacts Exclusively
with the PPE Subunit of PE25-PPE41. EspG has been reported to interact with PPE proteins and is required for PE-PPE secretion (22,26). To understand how EspG recognizes specific PE-PPE complexes and mediates their secretion through the ESX, we determined the crystal structure of a ternary complex of the EspG encoded in the ESX-5 gene cluster of Mtb (EspG 5Mt ) bound to the PE25-PPE41 dimer at a resolution of 2.45 Å (SI Appendix, Fig. S2 and Table S1). In addition, we determined a higher resolution structure of the unbound PE25-PPE41 dimer at a resolution of 1.95 Å (SI Appendix, Table S1). Our unbound PE25-PPE41 dimer is very similar to the previously reported structure (10) (rmsd of ∼0.34 Å over 250 Cα atoms). The PE25-PPE41 dimer is entirely α-helical and forms an elongated, cigar-shaped rod ∼110 Å long and ∼20 Å thick (10) ( Fig. 2A). The PPE subunit spans the full length of the molecule, whereas the smaller PE domain is more compact and binds only to one end of the PPE protein. EspG 5Mt binds to the opposite end of PPE41, distal from the PE25 binding site ( Fig. 2 A and B), resulting in only minor structural changes in PE25-PPE41 upon binding (SI Appendix, Fig. S3). No direct contacts are made between EspG 5Mt and PE25, with ∼10 Å separating the closest pair of residues. It is noteworthy that the C termini of both the PE and PPE subunits also reside at the EspG-distal end of the PE25-PPE41 rod. Because many PE and PPE proteins encode large C-terminal domains of unknown function, binding of EspG far from their C termini may prevent steric clashes with these additional domains.
The EspG-PPE interface is expansive, including 42 residues on EspG and 29 on PPE protein, with a total of 3,500 Å 2 of solvent accessible surface buried upon binding ( Fig. 2 C and D), which is nearly two-to threefold the buried surface area in a typical antibody/antigen interaction (27)(28)(29)(30)(31)(32). Almost the entire EspG contact surface on PPE41 is derived from a single polypeptide segment (residues 116-155), which forms a helix-turn-helix motif at one end of the PE25-PPE41 rod ( Fig. 2A). The tip of this helixturn-helix is inserted into a shallow basin formed on one face of the EspG β-sheet (Fig. 2D), but the interface extends well outside this depression, with interactions running for eight helical turns along the α5 helix of PPE41. A small number of interactions are also made between EspG 5Mt and three N-terminal PPE41 residues (His2, Glu4, and Pro7), although these glancing contacts are predicted to make little contribution to specificity or the overall binding energy. On EspG 5Mt , four major elements are involved in PPE recognition (Fig. 2E): (i) the convex surface of the β-sheet; (ii) helix α1′ from the C-terminal EspG subdomain; (iii) an open, extended loop between strands β2 and β3; and (iv) the long, "tongue-shaped" α1-α2 loop. The α1-α2 loop alone accounts for ∼40% of the interactions with PPE protein (Fig. 2F).
PPE Binding Surface on EspG Differs Between Paralogs. The Mtb genome encodes four EspG paralogs, each sharing less than ∼25% pairwise sequence identity (SI Appendix, Fig. S4), and nearly 70 highly variable PPE proteins. The crystal structure of the PE-PPE-EspG complex defines the interacting surfaces between EspG and PPE proteins, allowing for an analysis of how sequence and structural variation in these regions may be involved in determining the specificity of EspG-PPE interactions. Comparison of EspG 3Mt and EspG 5Mt provides clues into how specificity may be mediated. Additionally, we solved a crystal structure of EspG 3Ms at a resolution of 2.80 Å ( Fig. 3A and SI Appendix, Tables S1 and S2).
However, structural deviations in two key regions significantly alter the overall shape of the PPE binding surface and may have important implications for PPE binding specificity. First, a number of residues that interact with PPE protein reside in loops of variable length and structure (SI Appendix, Fig. S4), such as the β2-β3 loop, which ranges in length from approximately seven residues in EspG 3Ms and homology models of EspG 2Mt to 12 residues in EspG 3Mt and 26 residues in EspG 5Mt (Fig. 3C). Second, the α2 helix and the α1-α2 loop differ considerably between EspG 5Mt and EspG 3Mt (Fig. 3D). In EspG 3Mt and also EspG 3Ms , α2 is approximately five turns long and would clash with a PPE protein bound in the orientation observed in the EspG 5Mt -PE-PPE complex. In contrast, the α2 helix in EspG 5Mt is broken after ∼2.5 turns by a Pro at position 46 (Pro46), which is strictly conserved among mycobacterial EspG 5 s (Fig. 3D and SI Appendix, Table S3). The Pro46 kink reconfigures the adjacent, long α1-α2 tongue from an open V-shape in the EspG 3 structures to a more closed U-shape in EspG 5Mt . The resulting EspG 5Mt tongue cradles PPE41's α5 helix over six helical turns (Fig. 2F), accounting for over 40% of the PPE-EspG interface. The vast majority of EspG 1 and EspG 3 sequences lack a proline in the vicinity of position 46, and consequently may form longer α2 helices with an α1-α2 loop structure that is more similar to the α1-α2 loop structure in EspG 3Mt and EspG 3Ms (SI Appendix, Table S3). In contrast, 100% of EspG 2 sequences analyzed have Pro46, and the α2 helix and α1-α2 loop of EspG 2 may consequently adopt a conformation similar to the conformation of EspG 5Mt . Consistent with this prediction, the sequence of the α1-α2 loop tongue from EspG 2Mt is similar to that of EspG 5Mt , particularly at positions that contact PPE protein in the PE25-PPE41-EspG 5Mt crystal structure (SI Appendix, Fig. S5), whereas the EspG 3 tongue sequence is divergent. Taken together, variation in the PPE binding surface on EspG suggests that each EspG paralog may interact with distinct subsets of the PE-PPE repertoire.
EspG-Interacting Surface on PPE Proteins Is Conserved. In contrast to the variability of the EspGs, sequence analysis of the ∼70 PPE proteins from Mtb Erdman reveal that the core of the EspG-interacting surface on PPE protein is surprisingly well-conserved, particularly in light of the high degree of sequence diversity among PPE proteins (mean pairwise identity of ∼47%) (SI Appendix, Fig. S6). Twenty-nine PPE residues make at least one van der Waals contact with EspG 5Mt bound. The EspG 5Mt contacts on PPE41 are colored red, with key residues indicated. (C) Detailed view of the EspG 5Mt contact surface on PPE41 (red), with key interacting residues labeled. The structure is rotated 90°with respect to the representation in B. (D) Contact surface between PPE41 and EspG 5Mt , oriented to show the shallow basin on EspG 5Mt (white surface with contact residues in purple), which binds PPE41 (red). (E) Four distinct elements on EspG 5Mt that make up the PPE binding surface. EspG 5Mt is oriented similarly in D and E. (F) Interaction between the long tongue of EspG 5Mt (peach) and the α5 helix from PPE41 (red), which accounts for nearly half of the EspG 5Mt -PPE41 interface. with EspG, but many of these residues make just a few interactions per residue, appear somewhat flexible, and are loosely packed at the interface. Consequently, many of these residues probably make only small individual contributions to the binding energy, although in aggregate, they may have a modest effect on binding.
Three contact residues are well-conserved across Mtb PPE proteins, despite making relatively minor interactions with EspG. However, these residues play critical roles in stabilizing the helixturn-helix motif of the PPE tip, suggesting that the EspG-binding surface of most PPE proteins adopts a similar conformation. First, the Asn122 side chain caps the C-terminal end of PPE helix α4, and ∼91% of Mtb PPE proteins have a helix-capping residue at this position (SI Appendix, Fig. S7A). Second, the Asn123 side chain reaches across the helix-turn-helix motif and forms a hydrogen bond with the Phe128 backbone, stabilizing the conformation of the connecting loop (SI Appendix, Fig. S7B). Asn123 is highly conserved, with ∼97% of Mtb PPE proteins carrying an Asn residue at this position. Third, residue Gly126 lies at the tip of the helix-turnhelix motif and adopts a positive phi angle (SI Appendix, Fig. S7B). This region of the Ramachandran plot gives strong preference to Gly, and Gly126 is almost universally conserved (64 of 65 Mtb PPE sequences). The strong conservation of these three residues suggests that the overall conformation of the helix-turn-helix motif is similar in most Mtb PPE proteins, allowing us to model the interaction between various PPE-EspG pairs and make predictions about affinity and specificity, which we subsequently tested.
After excluding residues that make no side-chain interactions (i.e., with little sequence constraint at that position) or relatively low-quality contacts, 10 PPE residues remain that likely have the greatest impact on binding affinity and specificity (Fig. 2C). Many of these residues make buried hydrophobic interactions with EspG, such as Ala124, Leu125, Trp143, and Gly147. Trp143, in particular, is well-packed at the interface, making a total of ∼22 van der Waals contacts and a cation-pi interaction with Arg104 from EspG. In contrast, Ala124, Leu125, and Gly147 are each "underpacked" (SI Appendix, Fig. S8), potentially leaving space for the accommodation of larger hydrophobic side chains at these positions. Indeed, PPE41 has a smaller residue at each of these positions compared with other PPE proteins: Ala124 instead of Val, Leu, Phe, or Trp; Leu125 instead of Phe; and Gly147 instead of Ala or Val. Several other positions make key interactions or may be otherwise conserved due to constraints imposed by the binding interface (details are provided in SI Appendix, Fig. S9). Thus, the EspG-interacting surface on PPE41 is moderately conserved, and most of the observed variation is predicted to have only a modest impact on binding to EspG 5Mt .
Most PPE Proteins in Mtb Bind to EspG 5 . Variation among EspG paralogs from Mtb suggests that each EspG will interact with distinct subsets of the PPE repertoire, yet the general conservation of the EspG binding surface on PPE protein suggests that most of the PPE proteins in Mtb will interact with a single EspG. Based upon the structure of the EspG 5Mt -PE25-PPE41 complex and conservation of the PPE contact surface between PPE41 and most other PPE proteins, we predict that most PPE proteins in Mtb will interact with EspG 5 , consistent with the immunological results of Sayes et al. (7). However, the degree of sequence variation on both sides of the interface makes it difficult to rule out other models, such as each EspG interacting with numerous PPE proteins and facilitating PPE secretion from each ESX cluster. To explore the specificity of the PPE-EspG interaction further, we selected a panel of eight diverse PPE proteins from the Mtb genome for additional study (SI Appendix, Fig. S10A). This panel included six proteins encoded outside of ESX clusters (PPE14, PPE17, PPE20, PPE36, PPE41, and PPE60) and two proteins encoded within the ESX-1 and ESX-2 loci (PPE68 ESX-1 and PPE69 ESX-2 ). Despite moderate variation in the EspG-binding surface of these PPE proteins, all but one were predicted to bind EspG 5 . The remaining sequence, PPE69 ESX-2 , is a clear outlier among all of the PPE sequences in Mtb, with several polymorphisms predicted to disrupt binding to EspG 5 because it (i) lacks a helix-capping residue at position 122, (ii) has a Gly126Trp  substitution that is too bulky to be accommodated at the interface and unable to adopt a positive phi angle, and (iii) has a 2-aa insertion in the vicinity of the 122-126 loop.
Assaying PPE-EspG binding is complicated by the fact that PPE proteins are predicted to interact with specific PE partners and function as obligate heterodimers (33); yet, aside from PE25-PPE41, very few pairs of interacting PE and PPE proteins have been identified (10,34). Indeed, recombinant expression of PPE proteins alone typically yields nonnative, aggregated, or insoluble material (10). To overcome these technical challenges, we constructed chimeric PPE41s by replacing the key EspG-interacting region of PPE41 with the corresponding sequence from another PPE protein of interest (Fig. 4A). The resulting chimeras are referred to as "PPE14-like," "PPE17-like," etc. (SI Appendix, Fig.  S10B). The chimeric PPE proteins were coexpressed with PE25 to produce soluble PE-PPE heterodimers. Using this approach, we produced sufficient quantities of seven of the eight PPE proteins selected for the panel, and only the PPE36 chimera failed to express well. Binding between EspG and PPE proteins was assayed by biolayer interferometry. Binding of the PE25-PPE41 heterodimer to EspG 5Mt was robust, yielding a dissociation constant (K d ) of 1.3 nM (Fig. 4B). In contrast, binding of PE25-PPE41 to EspG 3Mt was undetectable ( Fig. 4C and SI Appendix, Fig. S11). Remarkably, five of the six soluble PPE chimeras also bound specifically to EspG 5Mt with high affinity, including the PPE14like, PPE17-like, PPE20-like, PPE60-like, and PPE68-like ( Fig. 4C  and SI Appendix, Fig. S11). In contrast, the remaining PPE69-like chimera failed to bind EspG 5Mt , consistent with its unusual sequence features in the EspG binding region. Instead, the PPE69-like chimera bound specifically to EspG 3Mt (Fig. 4C and SI Appendix,  Fig. S11). Taken together, our structural, bioinformatic, and biochemical data support a model wherein EspGs specifically recognize PPE domains and the vast majority of PPE proteins in Mtb interact with EspG 5Mt , whereas a small number of more specialized proteins interact with the remaining EspGs.
Model for EccA-Mediated Dissociation of the EspG-PE-PPE Complex.
In contrast to PE and PPE proteins, which are translocated through the ESX, EspG is not secreted (22). In light of the nanomolar K d between PPE protein and EspG, this observation raises questions about how the PE-PPE dimer is released from the complex for secretion. The length and conformation of the β2-β3 loop is arguably the most distinctive feature of EspG 5Mt compared with the shorter loops in the EspG 3 structures (Fig.  3C). However, despite the large size and striking conformation of the β2-β3 loop, it makes only a moderate number of contacts with PPE protein and much of the loop remains solvent-exposed. Intriguingly, this loop lies in close proximity to the highly conserved Pro-Pro-Glu motif (Fig. 5 A-C) that gave rise to the "PPE" protein family name (2), but it makes only glancing contacts with Pro7 and is unlikely to explain its broad conservation. The PPE surface surrounding the Pro-Pro-Glu motif is well-conserved (Fig. 5 A-C), encompassing ∼500 Å 2 of solvent accessible surface area and including residues 6, 10, 109, 113, 139, and 143. This conserved patch is part of a larger, generally hydrophobic surface (Fig. 5 A-C), consistent with the hypothesis that this region may be important for protein-protein interactions. Thus, along with the adjacent β2-β3 loop from EspG, the conserved PPE surface is poised to act as a docking site for other ESX components, such as ESX conserved component A (EccA). EccA is an ATPase that may be involved in dissociating the EspG-PPE interaction because (i) EccA proteins are only encoded in ESX gene clusters that also encode PE and PPE proteins, (ii) EccA mutants accumulate PPE-bound EspG (22), and (iii) EccA interacts with both PPE protein (35) and EspG (SI Appendix, Fig. S12) in yeast two-hybrid assays. Because the β2-β3 loop is one of the most variable regions of EspG, this loop may potentially allow for specific recruitment of the cognate EccA paralog (e.g., recruitment of EccA 5Mt by EspG 5Mt , etc.). Binding across the interface near the conserved site on PPE protein and the β2-β3 loop on EspG might allow EccA to dissociate the EspG-PE-PPE complex, passing the PE-PPE proteins off to the rest of the ESX system for secretion and recycling EspG to recruit additional PE-PPE proteins from the cytoplasm (Fig. 5D). Because EccA is an active ATPase (36), ATP hydrolysis is likely involved in the dissociation of the EspG-PE-PPE complex, although future studies will be required to understand this process better, as well as the relative contributions of the ATPase and tetratricopeptide repeat (TPR) (37) domains.
Model of PE-PPE Secretion by ESX1-ESX5 in Mtb. Of the five ESX clusters in the Mtb genome, ESX-1, ESX-2, ESX-3, and ESX-5 encode EspG, PE, and PPE paralogs. In addition, numerous other PE and PPE genes are scattered throughout the genome. Because each cluster is thought to encode a complete, independent ESX system, it is generally assumed that PE, PPE, and EspG proteins from a given cluster will interact specifically with each other, and only associate with other components from the same ESX cluster (Fig. 6). However, our results suggest that some cross-talk may occur between clusters. For example, PPE68 is encoded within the ESX-1 locus in Mtb and Mycobacterium marinum, and it was previously shown to interact with EspG 1 from the same cluster (22). However, the PPE68-like chimera also binds to EspG 5Mt with high affinity (Figs. 4C and 6). Similarly, PPE69 is encoded within the ESX-2 cluster and presumably interacts with EspG 2Mt ; however, in addition, our PPE69-like chimera binds EspG 3Mt (Fig.  4C). Although binding of the ESX-3-encoded PPE protein to EspG 3Mt has not been directly demonstrated, the Msmeg ESX-3 orthologs form a stable complex on gel filtration (SI Appendix, Fig.  S13), and the contact surfaces on PPE protein and EspG 3 are highly conserved between Msmeg and Mtb.
Taken together, our data support a model consisting of two classes of PPE-EspG interactions (Fig. 6). First, the PE, PPE, and EspG proteins from each ESX cluster interact with one another, accounting for a relatively small segment of the overall PPE repertoire. This intracluster association probably resembles the situation in nonpathogenic mycobacteria, such as Msmeg, Fig. 6. Model for the network of interactions between the PPE protein and EspG families in Mtb. PPE proteins encoded within a given ESX cluster generally interact with the EspG from the same cluster (blue box, black lines) and are secreted through the cognate ESX. However, some cluster-encoded PPE proteins can cross-react with EspGs from other clusters, at least in vitro. In contrast, non-ESX-encoded PPE proteins, which account for the majority of PPE genes in Mtb, interact preferentially with EspG 5Mt (pink box, black lines). In all, ∼95% of all PPE proteins from Mtb are predicted to interact with EspG 5Mt , likely leading to their secretion to the cell surface through the ESX-5 secretion system.
where the ESX-1 and ESX-3 clusters each encode a single PE, PPE, and EspG protein and no other PE or PPE genes are found elsewhere in the genome. In some cases, EspG and PPE proteins may cross-react with components from another cluster, although it remains unclear if this cross-reactivity is functionally relevant or merely arises from incomplete orthogonalization of the different ESX clusters. Second, the non-ESX-encoded PPE proteins interact with EspG 5 , which includes the vast majority of the PPE sequences (Fig. 6). The dramatic expansion and diversification of the PE and PPE protein families, including the distribution of most of these genes throughout the genome and outside ESX clusters, appears to coincide with the acquisition of ESX-5 in pathogenic mycobacteria, which is consistent with our finding that most PPE proteins interact with EspG 5 and are likely secreted in an ESX-5-dependent manner (23).
Although the interaction with EspG sheds light on how PE and PPE proteins interface with ESXs, their molecular functions remain to be defined. One intriguing possibility is that the PE and PPE domains serve as targeting modules, directing any cargo domains attached to their C termini to the ESX for translocation and anchoring to the cell surface (38), a process that is important for modulating interactions with the host during infection (39). In this model, EspG would play a role akin to the signal recognition particle in the conventional secretory pathway, by binding to nascent PE-PPE complexes and directing them to ESX machinery in the plasma membrane for further processing. Consistent with this mechanism, EspG binds ∼100 Å away from the PE and PPE C termini, thereby avoiding steric clashes or specific interactions with the putative cargo domains. However, it is also possible that the PE-PPE subunit may have additional functions apart from directing proteins to the ESX, and further studies are required to explore the relative importance of the PE-PPE vs. C-terminal "cargo" domains in Mtb virulence.

Materials and Methods
Recombinant proteins were expressed in Escherichia coli and purified by standard procedures. All diffraction data were collected at the Advanced Light Source beamline 8.3.1 and solved by MAD or molecular replacement. EspG and PE-PPE binding studies and K d s were assayed using an Octet RED instrument (Forte Bio). Further detailed information is provided in SI Appendix, SI Materials and Methods.