Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Regulation of genetic flux between bacteria by restriction–modification systems

View ORCID ProfilePedro H. Oliveira, Marie Touchon, and View ORCID ProfileEduardo P. C. Rocha
PNAS May 17, 2016 113 (20) 5658-5663; first published May 2, 2016; https://doi.org/10.1073/pnas.1603257113
Pedro H. Oliveira
aMicrobial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France;
bCNRS, UMR 3525, 75015 Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pedro H. Oliveira
  • For correspondence: pcphco@gmail.com
Marie Touchon
aMicrobial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France;
bCNRS, UMR 3525, 75015 Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eduardo P. C. Rocha
aMicrobial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France;
bCNRS, UMR 3525, 75015 Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eduardo P. C. Rocha
  1. Edited by W. Ford Doolittle, Dalhousie University, Halifax, NS, Canada, and approved April 5, 2016 (received for review March 2, 2016)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

The role of restriction–modification (R-M) as bacteria's innate immune system, and a barrier to sexual exchange, has often been challenged. Recent works suggested that the diversification of these systems might have driven the evolution of highly virulent bacterial lineages. Here, we showed that R-M systems were more abundant in species enduring more DNA exchanges and that within-species flux of genetic material was higher when cognate systems were present. Presumably, bacteria enduring frequent infections by mobile elements select for the presence of more numerous R-M systems, but rapid diversification of R-M systems leads to varying patterns of sexual exchanges between bacterial lineages.

Abstract

Restriction–modification (R-M) systems are often regarded as bacteria's innate immune systems, protecting cells from infection by mobile genetic elements (MGEs). Their diversification has been recently associated with the emergence of particularly virulent lineages. However, we have previously found more R-M systems in genomes carrying more MGEs. Furthermore, it has been suggested that R-M systems might favor genetic transfer by producing recombinogenic double-stranded DNA ends. To test whether R-M systems favor or disfavor genetic exchanges, we analyzed their frequency with respect to the inferred events of homologous recombination and horizontal gene transfer within 79 bacterial species. Genetic exchanges were more frequent in bacteria with larger genomes and in those encoding more R-M systems. We created a recognition target motif predictor for Type II R-M systems that identifies genomes encoding systems with similar restriction sites. We found more genetic exchanges between these genomes, independently of their evolutionary distance. Our results reconcile previous studies by showing that R-M systems are more abundant in promiscuous species, wherein they establish preferential paths of genetic exchange within and between lineages with cognate R-M systems. Because the repertoire and/or specificity of R-M systems in bacterial lineages vary quickly, the preferential fluxes of genetic transfer within species are expected to constantly change, producing time-dependent networks of gene transfer.

  • homologous recombination
  • horizontal gene transfer
  • bacterial evolution

Prokaryotes evolve rapidly by acquiring genetic information from other individuals, often through the action of mobile genetic elements (MGEs) such as plasmids or phages (1). In bacterial population genetics, the events of gene transfer are usually termed horizontal gene transfer (HGT) when they result in the acquisition of new genes and homologous recombination (HR) when they result in allelic replacements. The distinction between the two evolutionary mechanisms (HGT and HR) is not always straightforward: incoming DNA may integrate the host genome by double crossovers at homologous regions, leading to allelic replacements in these regions and to the acquisition of novel genes in the intervening ones. HR takes place only between highly similar sequences, typically within species (2). As a result, it usually involves the exchange of few polymorphisms, eventually in multiple regions, between cells (3). It may also result in no change if the recombining sequences are identical, which leaves no traces and cannot be detected by sequence analysis. HGT may occur between distant species, resulting in the acquisition of many genes in a single event. The replication and maintenance of MGEs have fitness costs to the bacterial host and have led to the evolution of cellular defense systems. These systems can sometimes be counteracted by MGEs, leading to evolutionary arms races.

Restriction–modification (R-M) systems are some of the best known and the most widespread bacterial defense systems (4). They encode a methyltransferase (MTase) function that modifies particular DNA sequences in function of the presence of target recognition sites and a restriction endonuclease (REase) function that cleaves them when they are unmethylated (5). R-M systems are traditionally classified into three main types. Type II systems are by far the most abundant and the best studied (6). With the exception of the subType IIC, they comprise MTase and REase functions encoded on separate genes and are able to operate independently from each other. R-M systems severely diminish the infection rate by MGEs and have been traditionally seen as bacteria's innate immune systems (7). However, successful infection of a few cells generates methylated MGEs immune to restriction that can invade the bacterial population (8). Hence, R-M systems are effective as defense systems during short periods of time and especially when they are diverse across a population (9, 10). In particular, it has been suggested that they might facilitate colonization of new niches (11). Type II R-M systems are also addictive modules that can propagate selfishly in populations (12). Both roles of R-M systems, as defense or selfish systems, may explain why they are very diverse within species (13, 14). Accordingly, R-M systems endure selection for diversification and are rapidly replaced (15, 16).

Several recent large-scale studies of population genomics have observed more frequent HR within than between lineages (17, 18). This suggests that HR might favor the generation of cohesive population structures within bacterial species (19). Specific lineages of important pathogens that have recently changed their R-M repertoires show higher sexual isolation, such as Neisseria meningitidis, Streptococcus pneumoniae, Burkholderia pseudomallei, and Staphylococcus aureus (20⇓–22). For example, a Type I R-M system decreased transfer to and from a major methicillin-resistant S. aureus lineage (23). Diversification of R-M target recognition sites could thus reduce transfer between lineages with different systems while establishing preferential gene fluxes between those with R-M systems recognizing the same target motifs (cognate R-M). However, these results can be confounded by evolutionary distance: closely related genomes are more likely to encode similar R-M systems, inhabit the same environments (facilitating transfer between cells), and have similar sequences (that recombine at higher rates). The advantages conferred by new genes might be higher when transfer takes place between more similar genetic backgrounds.

Here, we aimed at testing the effect of R-M systems on the genetic flux in bacterial populations. We concentrated on Type II R-M systems because they are the best studied, very frequent, and those for which we could predict sequence specificity. We inferred genome-wide counts of HR and HGT and tested their association with the frequency of R-M systems encoded in the genomes. We then made a more precise test of the key hypothesis that bacteria carrying similar R-M systems establish highways of gene transfer, independently of phylogenetic proximity and clade-specific traits.

Results

Quantification of Homologous Recombination, HGT, and Their Covariates.

We analyzed a dataset of 79 core genomes and pangenomes (SI Methods) corresponding to a total of 884 complete genomes. These clades were based on taxonomy, i.e., the genomes of a named species were put together. They spanned many different bacterial phyla (Fig. 1A and SI Methods). The pangenomes varied between 466 and 18,302 gene families (Dataset S1), and correlated with genome size (Spearman’s ρ = 0.89, P < 10−4) and phylogenetic depth, defined as the average root-to-tip distances in the clade phylogenetic tree (SI Methods and Dataset S2) (Spearman’s ρ = 0.42, P < 10−4). Hence, our dataset represents a large diversity of bacteria in terms of taxonomy, genome size, and intraspecies diversity.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Analysis of HR and HGT events. (A) 16S rRNA phylogenetic tree of the 79 bacterial species. The tree was drawn using the iTOL server (itol.embl.de/index.shtml) (40). The innermost circle layer indicates the species and associated clade. The six subsequent layers correspond (in an outwardly direction) to the average number of HGT events per genome computed using Count; the number of recombined genes per genome given by NSS, MaxChi, and PHI; and the number of recombination events per genome given by Geneconv and CFML (outermost layer), respectively. These values are given in Dataset S1. (B) Distribution of the average number of horizontal gene transfer (HGT) events and homologous recombination (HR) events (inferred by Geneconv) per clade according to genome size (GS). Spearman’s ρHGT = 0.65, P HGT < 10−4; Spearman’s ρGeneconv = 0.32, PGeneconv < 10−2. Data obtained with the remaining recombination inference tools are shown in Fig. S1.

HR is notoriously difficult to quantify accurately (24). We used five different programs to detect HR in the core genome (SI Methods). These programs detect different types of signals, and together they should provide a thorough assessment of HR. Among the 79 core genomes, we found an average of 329 (NSS), 374 (MaxCHI), 264 (PHI), 504 (Geneconv), and 1,035 (ClonalFrameML, CFML) HR events per core genome (Datasets S1 and S3). Even if the different methods provided different numbers of events, their results were highly correlated (average Spearman’s ρ = 0.84, all comparisons P < 10−4). Accordingly, we focused our analysis on the results of Geneconv, which provides the positions of recombination tracts and directions of transfer necessary for the last part of this study.

We used Count (25) to infer the events of HGT from the patterns of presence and absence of gene families in the species’ trees (SI Methods and Dataset S4). We identified 236,894 events of gene transfer in the 79 pangenomes (Dataset S1). These events were very unevenly distributed among clades, from close to none in the genomes of obligatory endosymbionts to 1,538 events per genome in Rhodopseudomonas palustris (Fig. 1A).

The frequencies of HR and HGT were expected to depend on a number of variables, including the following: (i) genome size; (ii) phylogenetic depth (deeper lineages accumulate more events of exchange); and (iii) the number of genomes in the clade (larger samples capture more past events). We built stepwise linear models to assess the role of these variables in explaining the variance in HGT and HR (Table S1, part A). These showed that genome size had a strong direct effect on HR and HGT (Fig. 1B and Fig. S1). The remaining variables had significant, but less important, explanatory roles. HR also depended weakly on core genome size (Table S1, part B). Hence, studying the effect of R-M systems on HR and HGT requires control for phylogenetic depth, the number of genomes in the clade, and especially the genome size.

View this table:
  • View inline
  • View popup
Table S1.

Most informative attributes affecting horizontal gene transfer (HGT) and homologous recombination (HR) events

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

Association between genetic flux and genome size. Distribution of the average homologous recombination (HR) events per clade computed using NSS (A), MaxChi (B), PHI (C), and CFML (D) in function of genome size (GS, given in megabases). Similarly to what was obtained with Geneconv (Fig. 1B), we can observe positive associations between HR + 1 and GS (Spearman's ρNSS = 0.42, PNSS = 10−4; Spearman's ρMaxChi = 0.48, PMaxChi < 10−4; Spearman's ρPHI = 0.40, PPHI < 10−3; Spearman's ρCFML = 0.48, PCFML < 10−4).

Association Between R-M Systems and Genetic Transfer.

We identified 1,352 R-M systems among the 79 clades using a previously published methodology (4) (SI Methods and Dataset S1), including 233 Type II R-M systems (excluding Type IIC). The number of HGT events was higher in genomes with more R-M systems (Fig. 2A), and especially in those with Type II systems (Fig. 2B). The number of HR events increased with the number of R-M systems (Fig. 2C) and especially in the presence of Type II R-M systems (Fig. 2D). Similar results were obtained for the remaining HR inference tools (Fig. S2).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Association between gene transfer and R-M systems. Distribution of the average HGT events (A) and homologous recombination (HR) events inferred by Geneconv (C) per clade according to the total number of R-M systems. Spearman's ρHGT = 0.43, Spearman's ρGeneconv = 0.62; both P < 10−4. Distribution of the average HGT (B) and Geneconv HR events (D) per clade according to the presence (Yes)/absence (No) of Type II R-M systems (both P < 10−4; Mann–Whitney–Wilcoxon test). We obtained similar qualitative results with the remaining recombination inference tools (Fig. S2).

Fig. S2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S2.

Association between gene transfer and R-M systems. Distribution of the average HR per clade computed using NSS (A), MaxChi (B), PHI (C), and CFML (D) in function of the total number of R-M systems. Positive associations were observed in all cases (Spearman's ρNSS = 0.50, Spearman's ρMaxChi = 0.55, Spearman's ρPHI = 0.53, Spearman's ρCFML = 0.60; all P < 10−4). Also shown are the average HR per clade computed using NSS (E), MaxChi (F), PHI (G), and CFML (H) in function of the presence (Yes) or absence (No) of Type II R-M systems (all P < 10−4; Mann–Whitney–Wilcoxon test).

We then tested the effect of R-M systems on the number of HGT events and the rates of HR, while controlling for their covariates mentioned above. A stepwise regression showed that the numbers of Type II R-M systems were not significant predictors of HGT when the three previous variables were already introduced in the regression (the latter explaining ∼76% of all variance; Table S1, part C). An analogous analysis for the frequency of HR showed that genome size and the number of Type II R-M systems were both significant predictors of HR (R2 = 0.42, both variables P < 10−4; Table S1, part C). These results show that genomes carrying more R-M systems acquire more genetic material by both HR and HGT, even if the latter association might be the result of clade-specific traits such as genome size.

Evolution of Target Motifs and Identification of Cognate R-M Systems.

To test the hypothesis that R-M systems affect the genetic flux between genomes, one needs to identify the systems recognizing the same target recognition motif. Such systems are cognates, i.e., DNA methylation by one system will protect from the other. We could not identify a method to identify cognate R-M systems in the literature. Hence, we created one based on the sequence conservation of MTases and REases. For this, we used the “gold-standard” component of REBASE (26) and plotted the frequency with which MTases or REases of a given type recognized the same motif (SI Methods) for a given bin of sequence similarity. Only nearly identical homologs of Types I and III MTases and REases recognized the same motifs (Fig. 3 A and B). The analysis of the Specificity (S) and target recognition domains (TRDs) led to similar conclusions (SI Methods and Fig. S3A). The small number of such systems in REBASE gold standard resulted in small statistical power for this analysis, but adding more recent data from REBASE PacBio database did not change these conclusions (SI Methods and Fig. S3 B–E). The rapid evolution of sequence target specificity precludes the identification of systems with similar restriction sites from the alignment of REases or MTases in both Type I and Type III R-M systems.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Relation between target specificity and protein similarity in R-M components. Percentage of equal target motifs recognized by Types I, II, and III MTases (A) and REases (B) according to their pairwise protein sequence similarity. (C) Plot of all pairwise similarities of Type II MTases versus the cognate Type II REases of the REBASE gold standard. Blue dots correspond to equal target motifs, red dots to unequal target motifs, and green dots to nested motifs. The dashed horizontal and vertical lines indicate the threshold similarity limits for MTases and REases. (D) The same dataset was used to plot the corresponding receiver operating characteristic (ROC) curves. These curves depict the Sensitivity (true-positive rate) versus 1-Specificity (false-positive rate) for several values of percentage similarity of Type II MTases and REases. We selected the cutoff values of similarity that maximized the true-positive rate and minimized the false-positive rate. Details on the number of R-M proteins of each type can be found in Table S2. ROC data including curve-fitting equations can be found in Table S3.

View this table:
  • View inline
  • View popup
Table S3.

Values of Sensitivity (true-positive rate) versus 1-Specificity (false-positive rate) for several values of percentage similarity of Type II MTases and REases

Fig. S3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S3.

Assessing the robustness of the recognition target motif predictor. (A) Percentage of equal target motifs recognized by Type I Specificity domains and Type III TRDs according to their pairwise sequence similarity. (B–E) Reanalysis of the data from Fig. 3 including PacBio data. Percentage of equal target motifs recognized by Types I, II, and III MTases (B) and REases (C) according to their pairwise protein sequence similarity. (D) Plot of all pairwise similarities of Type II MTases versus the cognate Type II REases of the gold standard of REBASE. Blue dots correspond to equal target motifs, red dots to unequal target motifs, and green dots to nested motifs. The dashed horizontal and vertical lines indicate the threshold similarity limits for MTases and REases. (E) The same dataset was used to plot the corresponding ROC curves. These curves depict the Sensitivity (true-positive rate) versus 1-Specificity (false-positive rate) for several values of percentage similarity of Type II MTases and REases.

In contrast, homologs of Type II REases and MTases, which are much more numerous in the database, have different target motifs only when their sequence similarity is low (typically less than 50% for MTases and 55% for REases; Fig. 3). We used these thresholds to estimate the probability that two homologous systems recognize the same target recognition motif, and restricted our subsequent analyses to Type II systems.

R-M Systems Promote Preferential Genetic Transfer Fluxes.

The observation of higher genetic fluxes in the presence of R-M systems might seem unexpected in the light of the role of the latter in degrading exogenous DNA. To explain these results, we put forward three hypotheses.

  • Hypothesis 1: The relative abundance of R-M systems in a clade results from the selective pressure imposed by the abundance of MGEs in that clade. Selection for multiple R-M systems is expected to be stronger for clades enduring infections by many MGEs. R-M systems have limited efficiency and might not completely prevent MGE infection and transfer (8). This results in a weak positive association between transfer of genetic information and the abundance of R-M systems.

  • Hypothesis 2: R-M systems favor transfer of genetic material between cells by generating restriction breaks that stimulate recombination between homologous sequences.

  • Hypothesis 3: Type II R-M systems encoded in MGEs favor genetic transfer by selfishly stabilizing the element's presence in the new host (16). Genomes enduring more transfer would have more R-M systems if they were carried by MGEs. This last hypothesis is unlikely to explain our results, because we have shown that R-M systems are rare in MGEs (4). Furthermore, the association between genetic transfer and number of R-M systems remained significant when we excluded Type II R-M systems from the analysis (those more likely to act as selfish elements; Fig. S4). This fits recent findings that R-M systems occur and recombine in genomes in ways that are independent of the presence of MGEs (5).

Fig. S4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S4.

Association between gene transfer and R-M systems excluding all Type II R-M systems (IIC included). Distribution of the average HGT events per clade computed with Count (A) and homologous recombination (HR) events per clade computed with NSS (B), MaxChi (C), PHI (D), Geneconv (E), and CFML (F) according to the total number of Type I, III R-M systems and Type IV REases. Positive associations were observed in all cases (Spearman's ρHGT = 0.41, Spearman's ρNSS = 0.43, Spearman's ρMaxChi = 0.51, Spearman's ρPHI = 0.46, Spearman's ρCFML = 0.54; all P < 10−4 with the exception of HGT for which P < 10−3).

To distinguish between the first two hypotheses, we analyzed the genetic flux between pairs of genomes with cognate Type II R-M systems. If R-M systems predominantly prevent genetic transfer (hypothesis 1), then the flux of genetic material between genomes encoding cognate R-M systems should be higher. If R-M systems predominantly stimulate genetic transfer (hypothesis 2), then pairs of genomes encoding cognate R-M systems should show lower than average genetic flux.

We tested the two hypotheses for HR and HGT separately. We selected the HR events that took place between terminal branches in the phylogenetic trees of the clades. Each terminal branch was then associated with the respective focal genome (the tip), which was labeled in terms of the target recognition motifs of the R-M systems encoded in the focal genome. We excluded HR or HGT occurring in the internal branches of the tree because of the high incertitude in the inference of ancestral R-M systems (Fig. S5). We then computed the number of HR events between terminal branches associated with genomes encoding cognate R-M systems and compared it with the other pairs of genomes encoding R-M systems. Similar analyses were performed for HGT events that simultaneously affected pairs of terminal branches, i.e., for genes transferred to two terminal branches in two independent events. In both cases, we observed that lineages represented by genomes encoding cognate R-M systems coexchanged more genetic information (Fig. S6 A and B).

Fig. S5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S5.

Analysis of the rate of turnover of R-M systems in the clades and how that relates to the length of the tips of the tree. (A) Schema of the analysis. We calculated the frequency of Type II R-M systems shared by the genomes of two taxa (R). For this, we computed the number of systems in the genomes, while grouping together in a family those that are part of the same family of the pangenome (e.g., duplicated systems X and X″ are put together with X′ when they are all more than 80% identical in protein sequence). We then computed the number of families with members in both genomes (one in the example: X, X′, and X″), divided by the total number of families (with members in at least one of the two genomes, three in the example: the family X, X′, and X″ and the families W and Z). The values of R are in general small. In more than 50% of the comparisons, R < 0.1. Note that two R-M systems can be cognate and not be put in the same family of pangenome (if they are not sufficiently similar, e.g., because they were acquired independently from another species). The Count model can be used to analyze the evolution of orthologous families, but not of cognate families because the dataset is not large enough to parameterize the model. (B) Distribution of the patristic distances (d) between genomes with R < 1 (i.e., at least one R-M system not in common). (C) Distribution of the sizes of tips. The comparison between B and C shows that the length of the tips is, on average, smaller than the patristic distances between genomes with different R-M systems. Therefore, the R-M system found in the tip is likely to have been in the lineage for most if not all of the time since the split with the closest neighbor of the taxa in the tree. The comparison also shows that the length of the largest tips is close to the patristic distances for which one starts finding noncognate genomes. Hence, one cannot reliably assume that a given R-M system is present in most of the internal branches because the trait evolves fast.

Fig. S6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S6.

Gene flux in bacteria encoding R-M systems. Contrary to Fig. 4 in the main text, we have not filtered any clade in this analysis: all 79 clades are represented. (A) Histogram of patristic distances (colored by quartiles) between bacteria with Type II R-M systems. (B) Median values of HGT and recombination events for each quartile (Q) and for the full dataset (All) between terminal branches of bacteria with Type II R-M systems recognizing (or not) the same target motif. (C) Correlation between Wagner parsimony gene family gains and maximum likelihood (ML) gains for values of posterior probability (PP) between 0.2 and 0.9. Spearman ρ values are indicated in each graph, and in all cases, P < 10−4; *P < 0.05; **P < 0.01; and ***P < 0.001.

Next, we restricted our analysis to clades having at least 10 comparisons between genomes encoding cognate R-M systems and 10 comparisons between genomes lacking cognate systems (but encoding R-M systems). This avoids the confounding effect of putting together in the same analysis clades with few R-M systems or with little diversity in these systems. This restricted our dataset to eight clades: Bacillus amyloliquefaciens, Bifidobacterium longum, Escherichia coli, Haemophilus influenzae, Listeria monocytogenes, N. meningitidis, Salmonella enterica, and S. pneumoniae. Within this restricted dataset, the results were qualitatively identical: lineages associated with genomes encoding cognate R-M systems coexchanged more genetic information (Fig. 4 A–C). We confirmed that these results were insensitive to uncertainties in phylogenetic reconstruction and to the effects of HR in phylogenetic inference (SI Methods and Fig. S7). The results on HR might be strongly affected by the ability of bacteria to engage in natural transformation. We restricted our analysis to the five naturally transformable species, following ref. 27, and found similar results (P < 10−3).

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Gene flux in bacteria encoding R-M systems. (A) We analyzed the patterns of HR and HGT in the tree of each clade, comparing the flux between tips ending in cognate (similar recognition motifs) or noncognate (different motifs) extant taxa. (B) Histogram of patristic distances (colored by quartiles) between bacteria with Type II R-M systems. (C) Median values of HGT and recombination events for each quartile (Q) and for the full dataset (All) between terminal branches of bacteria with Type II R-M systems recognizing (or not) the same target motif. We analyzed Bacillus amyloliquefaciens, Bifidobacterium longum, Escherichia coli, Haemophilus influenza, Listeria monocytogenes, Neisseria meningitidis, Salmonella enterica, and Streptococcus pneumoniae. *P < 0.05; **P < 0.01; ***P < 0.001 (see Fig. S6 A and B for the data including all clades). (D) Genetic flux in function of time and the presence of R-M systems. As lineages diverge and R-M systems change (circles indicate such changes), the lineages with cognate R-M systems (same color) share more genetic material than the other lineages. For example, the lineage B changes R-M systems twice since the last common ancestor (LCA). Initially transfer is favored with all lineages, then with the sister lineage A, and finally with the distantly related lineage C.

Fig. S7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S7.

Distribution of ΔmedianHGT in the 100 bootstrap experiments (boxplot on Top and histogram on Bottom). The red dashed line indicates the (null) expectation if the flux between R-M cognate genomes was similar to that of noncognate ones.

We then tested whether the clade-associated traits covarying with HR and HGT—phylogenetic depth, average genome size, and number of genomes—were affecting our conclusions by making the comparisons on each clade separately. We observed more HGT and HR among pairs of genomes encoding cognate R-M systems in six of the eight clades, which was statistically significant (each P = 0.035, binomial test, P = 0.01 for the combined test). One species (L. monocytogenes) was an exception to the general trend both concerning HR and HGT. This species showed very low rates of HGT and HR, and the differences in HR and HGT between R-M cognate and R-M noncognate genomes were not significant.

We mentioned in the Introduction that closely related taxa are expected to exchange more genetic information independently of the R-M systems they encode. To verify that the presence of cognate R-M systems is associated with increased genetic exchange independently of evolutionary distance, we binned the comparisons between events occurring in terminal branches in terms of the phylogenetic distance between pairs of genomes. We then ran the same analysis in each bin separately. These analyses showed more cotransfer between genomes encoding cognate R-M systems in nearly all bins, even if this analysis had lower statistical power (fewer comparisons per bin) (Fig. 4 B and C for the eight clades and Fig. S6 A and B for all of the data). Importantly, this difference was always significant for the most distant pairs of genomes. Hence, pairs of genomes encoding cognate R-M systems were associated with more frequent HR and HGT, independently of the evolutionary distances between them.

SI Methods

Data.

Curated reference protein sequences of restriction endonucleases (REases) and methyltransferases (MTases) belonging to Types I, II, IIC, and III R-M systems and Type IV REases were downloaded from the dataset “gold standards” of REBASE (26) (last accessed in October 2014). For purposes of building a recognition target predictor of R-M systems, we excluded R-M proteins having more than one recognition motif or recognition motifs smaller than 4 bp, control proteins involved in the regulation of expression of certain REases, nicking enzymes, homing endonucleases, truncated proteins, and orphan MTases for which no R-M type could be assigned either by searching REBASE or the literature. We searched for R-M systems in 884 bacterial genomes from 79 bacterial species (Dataset S1). The sequences and corresponding annotations of these genomes were retrieved from GenBank Refseq (ftp://ftp.ncbi.nih.gov/genomes; last accessed in February 2014) (41). We excluded genes indicated in the GenBank files as partial genes, as well as those lacking a stop codon or having one stop codon within the reading frame. The clades analyzed were based on taxonomy, i.e., the genomes of a named species were put together. In some rare cases, like Escherichia coli and Bacillus cereus, this does not exactly define monophyletic clades (because of Shigella spp. and Bacillus anthracis, respectively).

Identification of Core Genomes and Pangenomes.

We built core genomes for species with at least four complete genomes available in GenBank RefSeq (Dataset S1). We used a methodology previously published (42). Briefly, a preliminary list of orthologs was identified as reciprocal best hits using end-gap free global alignment, between the proteome of a pivot (typically the first completely sequenced isolate) and each of the other strain’s proteomes. Hits with less than 80% similarity in amino acid sequence or more than 20% difference in protein length were discarded. This list of orthologs was then refined for every pairwise comparison using information on the conservation of gene neighborhood. Because (i) few genome rearrangements are observed at these short evolutionary distances, and (ii) HGT is frequent (28), genes outside conserved blocks are likely to be xenologs or paralogs. Thus, positional orthologs were defined as bidirectional best hits adjacent to at least four other pairs of bidirectional best hits within a neighborhood of 10 genes (5 upstream and 5 downstream). These parameters (four genes being less than one-half of the diameter of the neighborhood) allow retrieving orthologs on the edge of rearrangement breakpoints and therefore render the analysis robust to the presence of rearrangements. The core genome of each species was defined as the intersection of pairwise lists of positional orthologs.

Pangenomes are the full complement of genes in the species and were built by clustering homologous proteins into families for each of the 79 species. We determined the lists of putative homologs between pairs of genomes (including plasmids) with BLASTP v.2.2.28+ (default parameters) and used the e values (<10−4) to cluster them using SILIX (v1.2.8, lbbe.univ-lyon1.fr/SiLiX) (43). SILIX parameters were set such that a protein was homologous to another in a given family if the aligned part had at least 80% of identity and if it included more than 80% of the smallest protein.

Phylogenetic Analyses.

We built a tree to display the phylogenetic distribution of our dataset using the 16S rRNA sequences of the sequenced type strain of the 79 analyzed bacterial species (Fig. 1A and Dataset S2A). We made a multiple alignment of the 16S sequences with MAFFT, followed by manual correction with SEAVIEW. The tree was computed by maximum likelihood with PHYML v3.0 (44) under the general time reversible (GTR)+Γ(4)+I model. This tree is never used in the calculations; it is only used in Fig. 1A to display the relative position of each species in the phylogeny of bacteria.

We made core genome trees for each clade using a concatenate of the multiple alignments of the core genes (available upon request). Each species tree was computed with RAxML v8.00 (45) under the GTR model and a gamma correction (GAMMA) for variable evolutionary rates. All trees are shown in Dataset S2B. We performed 100 bootstrap experiments on the concatenated alignments of each clade to assess the robustness of the topology of the tree. The vast majority of nodes were supported with bootstrap values higher than 90% (Dataset S2B). We inferred the root of each phylogenetic tree using the midpoint-rooting approach of the R package “phangorn” v1.99.14 (46).

The phylogenetic depth was defined as the average root-to-tip distance, and was computed as the diagonal mean of the phylogenetic variance–covariance matrix of each tree, using the vcv.phylo function in the R package “ape.” We retrieved the patristic distances between taxa using the cophenetic.phylo function from the same R package.

Inference of HR.

We inferred HR on the multiple alignments of the core genes of each species using several programs. Multiple alignments were made on protein sequences using MUSCLE v3.8 with default parameters (47), and then backtranslated to DNA, to increase accuracy. They were not edited to avoid biasing the inference of HR rates. We computed for each multiple alignment the neighbor similarity score (NSS) (48), the maximum χ2 (MaxChi) method (49), and the pairwise homoplasy index (PHI) test using PhiPack (50) (downloaded in February 2015) with 10,000 permutations. The Geneconv v1.81 program (51) was used with the options /w123 to initialize the program’s internal random number generator, /lp to compute the local pairwise P values, and the parameter gscale = 2 to allow for mismatches between fragments. Geneconv outputs “inner” fragments that have arisen by gene conversion events between ancestors of aligned sequences, and “outer” fragments that represent either conversion events involving an ancestor outside the alignment or events that have been eroded by subsequent mutations. Only inner fragments were considered in this study. Besides, only gene conversion tracts with P < 0.05 in the global analysis were used. To analyze the exchanges between specific taxa within clades, we used the fragments of Geneconv, because it is the only program producing this information. We also identified events of HR with ClonalFrameML (CFML) v10.7.5 (3) with a predefined tree, default priors R/θ = 10−1, 1/δ = 10−3, and ν = 10−1 and 100 pseudobootstrap replicates, as previously suggested (3). Mean branch lengths were computed with the R package “ape” v3.3 (52), and transition/transversion ratios were computed with the R package “PopGenome” v2.1.6 (53). The priors estimated by this mode were used as initialization values to rerun CFML under the “per-branch model” mode with a branch dispersion parameter of 0.1.

Reconstruction of the Evolution of Gene Families.

We assessed the dynamics of gene family repertoires using Count (25) (downloaded in April 2015). This program offers the most general phylogenetic birth-and-death models currently available in the literature. It models the gains of novel families, as well as expansion and contraction of existing families, while accommodating rate variations across phylogenetic lineages and across families. The analysis with Count starts with the estimation of the parameters of the model by maximum likelihood using the pangenome matrix of gene presence and absence. Count then uses these parameters to calculate the expected size of each family in every internal node of the species tree. It also computes the expected number of gain, loss, expansion, and contraction events along each branch.

Rates were computed with default parameters, assuming the Poisson family size distribution at the tree root, and uniform gain, loss, and duplication rates. One hundred rounds of rate optimization were computed with a convergence threshold of 10−3. After optimization of the branch-specific parameters of the model, we performed ancestral reconstructions by computing the branch-specific posterior probabilities of evolutionary events, and inferred the gains in the terminal branches of the tree.

The manual of Count provides no guidelines on which threshold probabilities to use in a branch. We consulted the literature and found very different values (ranging from 0.2 to 0.95) and methods to compute these (54, 55). To have an objective way of defining this threshold, we computed the gain/loss scenarii using the Wagner parsimony (same parameters, relative penalty of gain with respect to loss of 1). We then made a correlation analysis of the number of HGT events inferred by maximum likelihood and under Wagner's parsimony using posterior probability thresholds of 0.2, 0.3, 0.5, and 0.9. The correlations were always very high (Spearman’s ρ > 0.87), and maximal for 0.2 (Spearman’s ρ = 0.96, P < 10−4) (Fig. S6C). Hence, the posterior probability matrix was converted into a binary matrix of presence/absence of HGT events using a threshold of 0.2. At a given terminal branch, the expected total number of acquisitions was computed by summing all family-specific gene gains obtained from the posterior probability binary matrix. Transfer co-occurring in pairs of terminal branches in a tree, was computed by summing all common pairwise gene acquisitions.

To control for the effects of the choices made in the definition of our model, we made three complementary analyses. First, we verified that genetic flux was higher for pairs of genomes encoding cognate R-M systems than for the others using different thresholds for the branch-specific posterior probabilities of evolutionary events (0.5: P < 0.01, Mann–Whitney–Wilcoxon test; 0.9: P < 0.05, Mann–Whitney–Wilcoxon test). Second, we allowed for different gain–loss and duplication–loss rates within branches and found very similar distributions of gene transfer events (always Spearman's ρ > 0.90, P < 10−4). Third, we allowed rates to follow a gamma distribution with four categories and found very similar values of gene transfer events (Spearman's ρ > 0.88, P < 10−4).

Identification of R-M Systems.

Identification of R-M systems was performed as previously described (4). Briefly, all-against-all searches were performed for REase and MTase standard protein sequences retrieved from REBASE using BLASTP v2.2.28+ (default settings, e value < 10−3). The resulting e values were log transformed and used for clustering into protein families by Markov Clustering (MCL) v10-201 (56). Each protein family was aligned with MAFFT v7.205 (57) using the E-INS-i option, 1,000 cycles of iterative refinement, and offset 0. Alignments were visualized in SEAVIEW v4.5.3 (58) and manually trimmed to remove poorly aligned regions at the extremities. Hidden Markov model (HMM) profiles were then built from each multiple sequence alignment using the hmmbuild program from the HMMER v3.0 suite (59) (default parameters). Type II MTases were retrieved using the PFAM-A profiles PF01555.12, PF02086.9, PF00145.1, and PF07669.5 (last accessed in February 2013). Both Type II and Type IV REases diverge rapidly, resulting in sequences that produce poor multiple alignments (6) that cannot be used to build protein profiles. In these cases, BLASTP was used to scan the genomes for homologs (default settings, e value < 10−3, and minimum coverage alignment of 50%). Types I, II, and III R-M systems were identified by searching genes encoding the MTase and REase components at less than four genes apart. The output was subsequently curated to eliminate multiple occurrences of the same R-M system (this occurs when two REase or MTase genes are encoded in the same locus). R-M systems containing more than one Specificity (S) gene were considered as a single system. Situations involving ambiguous identification may also occur, for example, between REases of Types II and IV, or between Type IIC systems and other MTases or REases. In these cases, the R-M type was defined on the basis of the corresponding genomic context (presence or not of a linked REase or MTase) and on the output of the analysis of the system using REBASE. Type IIC R-M systems were defined as those including a gene encoding both an MTase and a REase function with similarity to Type IIC MTases and REases.

Robustness of the Recognition Target Motif Predictor.

We tested the robustness of our recognition target motif predictor in two ways.

Type I and III systems showed rapid change of sequence recognition in terms of sequence divergence. This precluded the inference of cognate systems in these types. We made an additional analysis to check whether these problems were due to the analysis of all of the sequences of the MTases and REases, which may evolve fast and recombine. We therefore focused our analysis on the evolution of the target recognition domains (TRDs), which are directly involved in the recognition of the site (Fig. S3A). Type I Specificity domain sequences were retrieved from the REBASE gold standard database (Table S2), whereas Type III TRDs were selected from our dataset of Type III mod genes. The former are typically flanked by the classical MTase specific motifs IV–VIII at the N terminus of the mod core and the I–III and X motifs at the C terminus (60).

View this table:
  • View inline
  • View popup
Table S2.

Number of Types I, II, IIC, and III R-M proteins analyzed, and corresponding number of proteins recognizing different target motifs (including nested), or recognizing nested motifs only

Since the previous test confirmed our earlier results, we made a second analysis where we increased the size of the initial dataset by retrieving additional MTase and REase sequences from the REBASE PacBio database (rebase.neb.com/rebase/rebpbe.html) (Fig. S3 B–E). At the time it was last accessed (January 2016), REBASE PacBio had information on 642 organisms corresponding to a total of 2,446 PacBio records. More than 50% of the recognition sequences had not yet been assigned to known enzymes, more than 26% of the 642 sequencing projects were still not classified as complete (still at “shotgun” phase), and many MTases (and cognate REases) had not been thoroughly characterized from the biochemical point of view (i.e., were not classified by REBASE as gold standards). To minimize the risk of adding a large number of eventually poorly characterized protein sequences and/or recognition motifs, we retrieved only the sequences pertaining to the eight clades analyzed in Fig. 4 excluding data not yet assigned to known enzymes and genomes from unfinished sequencing projects. This resulted in 213 Type I, II, and III MTase and REase sequences from 54 genomes. Importantly, this allowed to increase the dataset of MTases in ∼38% and of REases in ∼70% (for Type I and Type III). Nevertheless, the results remained unchanged (Fig. S3).

Robustness of the Count Analysis to Phylogenetic Reconstruction.

The robustness of HGT analysis using Count depends on the quality of the phylogenetic reconstruction. We checked the robustness of the results of Count in the light of phylogenetic uncertainty in two ways.

  • i) We used the 100 bootstrap trees mentioned above (SI Methods, Phylogenetic Analyses) of each of the eight clades analyzed in Fig. 4. We then analyzed the matrix of presence and absence of genes in the pangenome with Count for every single bootstrap tree. This analysis was done exactly as the one of Fig. 4. For each set of bootstrap experiments (eight bootstrap trees, one per clade), we computed the difference between the median HGT gains among R-M cognate genomes and R-M noncognate genomes (ΔmedianHGT). The analysis of this distribution showed that, in the 100 experiments, there was no single value smaller than zero (Fig. S7). This means that all sets of bootstrap experiments showed an overrepresentation of HGT between cognate genomes (relative to noncognate). Hence, incertitude in the phylogenetic reconstruction does not affect our conclusions.

  • ii) The phylogenetic trees that we have used in our work have not been modified to account for distortions associated with recombination. Hence, we also tested the robustness of the method by running Count using the trees obtained from ClonalFrame (CF) v1.2 (61). This program provides rooted trees purged from recombination; hence it allows to test the effect of the use of sequences not purged from recombination and of midpoint rooting. For each core genome of our eight-clade dataset, we ran CF (200,000 iterations with the first half discarded as burn-in, and the second half sampled at each iteration). This tree was then used in the Count analysis. For each clade we ran at least three more independent runs of CF to check convergence and mixing between runs. This was then manually checked. We then reran Count using the CF rooted trees (using the same method as for Fig. 4). The results were qualitatively simple, because R-M cognate genomes had twice the frequency of HGT as the others (an average of 10 and 5 events, respectively; P < 0.001).

Discussion

Genome size is the result of the balance between accretion and deletion events moderated by natural selection. Larger bacterial genomes are expected to engage in more frequent HGT because this is the dominant mechanism of genetic accretion (28). However, there are remarkably few studies demonstrating an association between HGT and genome size (29). Here, we found that larger genomes exchange DNA at higher rates, both by HGT and by HR. This association is not just caused by sexually isolated endosymbiotic bacteria with very small genomes—e.g., Chlamydiae, Buchnera, or Spirochaetes (Fig. 1 and Dataset S1)—because it remains significant for genomes larger than 2 Mb, which include few obligatory endosymbionts. Many reasons might explain the association between HGT, HR, and genome size: bacteria with larger genomes might have more diverse lifestyles, select for more diverse types of genes, inhabit more environments, or accommodate more MGEs. Even if the test of these different hypotheses falls outside the scope of this work, this association is important and must be accounted for when assessing the impact of R-M systems in genetic fluxes. The higher frequency of HR and HGT among larger genomes suggests that the latter are more targeted by MGEs. Accordingly, larger bacterial genomes encode more transposable elements (30), more prophages (31), and more conjugative elements (32). If MGEs targeting bacteria with larger genomes are more abundant, they might lead to strong selection for R-M systems in their bacterial hosts. This might explain why we found more R-M systems in larger genomes (4). It might also explain the positive association between the frequencies of HR and HGT and the abundance of R-M systems (Fig. 2).

R-M systems have a well-known inhibitory effect on the transfer of genetic information (9). However, whether this trait is an important driver of their evolution has remained controversial (12, 33, 34). Our results contribute to the clarification of these two issues. R-M systems can function as a barrier to MGE infection when encoded in the chromosome or other MGE. They can also stabilize the presence of MGEs in cells by preventing infections by other competing MGEs. Our previous observation that MGEs encode few R-M systems and many solitary MTases (4), suggests that R-M systems are more frequently a chromosomal-encoded barrier to MGEs than an MGE-encoded tool for cell infection. The coassociation of MGEs, bacterial genome size, and R-M systems might thus result from increased selection for R-M systems in the face of abundant MGEs in large genomes.

Contrary to the popular view that R-M systems limit the flux of genetic material (9), it has been proposed that restriction actually favors evolvability by producing DNA double-stranded ends that are recombinogenic (33, 34). This hypothesis is compatible with the observation that genomes enduring more HGT and HR have more R-M systems. However, it is not in agreement with the observation that pairs of genomes encoding cognate R-M systems coexchange more DNA. It is also hardly reconcilable with the notorious deleterious effect of R-M systems on bacterial genetic transformation in the laboratory (35). Although R-M systems have been shown to favor intragenomic HR events (12), the overall effect of R-M systems on genetic exchange is to decrease both HR and HGT between bacteria encoding noncognate R-M systems.

Our statistical analyses could not explicitly account for the presence of the many other systems affecting genetic transfer between cells. Some of them facilitate transfer, e.g., MGEs or competence for natural transformation, and we checked that all of the clades in Fig. 4 have known phages and conjugative elements. Restricting the analysis to the five naturally transformable bacteria did not change our results. Importantly, all of these clades encoded the key enzymes involved in RecA-mediated homologous recombination, including the presynaptic pathways RecBCD/AddAB and RecOR (36). Hence, there is little ground to think that our results are strongly biased by lack of mechanisms for gene transfer. Some systems disfavor transfer between bacteria, including CRISPRs, abortive infection, or other R-M systems. It is not possible to account for all these factors in a statistical model, because of the lack of quantitative data. Nevertheless, we could verify that cognate genomes did not have fewer R-M systems than the other genomes. Even if other barriers to DNA exchange are certainly present in these species, our use of a diverse set of well-known species, numerous alternative analyses, and focus on intraspecies comparisons (in which lifestyles and other general traits are much less variable), suggests that our results are robust.

Our work shows that noncognate genomes have reduced DNA exchanges. This decreases the power of natural selection and increases the effect of drift, potentially leading to the accumulation of deleterious mutations. Importantly, R-M systems’ diversification at the origin of a lineage may increase its genetic cohesion by disfavoring exchanges with the closest related ones, as previously suggested for some pathogens (20⇓–22). Interestingly, diversification can also increase the genetic flux between distant bacteria encoding cognate R-M systems with which there were previously few genetic exchanges. Hence, R-M systems might shape population structure in complex ways depending on the repertoire of R-M systems in the other lineages.

The study of the flux of genetic information among bacteria using network-based approaches is rising in importance (37⇓–39). Our work shows that R-M systems may carve preferential routes of DNA exchange between certain bacterial subpopulations. Their rapid diversification constantly changes these preferences, thereby producing complex patterns of genetic exchange with time.

Methods

Details on the data used, identification of core genomes and pangenomes, phylogenetic analyses, inference of HR, reconstruction of the evolution of gene families, identification of R-M systems, robustness of the target motif predictor, and robustness of the Count analysis to phylogenetic reconstruction can be found in SI Methods.

Acknowledgments

We thank Vincent Daubin (Université de Lyon) for the suggestion to use Count. We also thank Florent Lassale (University College London) and the anonymous reviewers for critically reviewing the manuscript. This work was supported by European Research Council Grant EVOMOBILOME 281605.

Footnotes

  • ↵1To whom correspondence should be addressed. Email: pcphco{at}gmail.com.
  • Author contributions: P.H.O. and E.P.C.R. designed research; P.H.O., M.T., and E.P.C.R. analyzed data; and P.H.O., M.T., and E.P.C.R. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603257113/-/DCSupplemental.

Freely available online through the PNAS open access option.

View Abstract

References

  1. ↵
    1. Frost LS,
    2. Leplae R,
    3. Summers AO,
    4. Toussaint A
    (2005) Mobile genetic elements: The agents of open source evolution. Nat Rev Microbiol 3(9):722–732
    .
    OpenUrlCrossRefPubMed
  2. ↵
    1. Vulić M,
    2. Dionisio F,
    3. Taddei F,
    4. Radman M
    (1997) Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria. Proc Natl Acad Sci USA 94(18):9763–9767
    .
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Didelot X,
    2. Wilson DJ
    (2015) ClonalFrameML: Efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11(2):e1004041
    .
    OpenUrlCrossRefPubMed
  4. ↵
    1. Oliveira PH,
    2. Touchon M,
    3. Rocha EP
    (2014) The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res 42(16):10618–10631
    .
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Mruk I,
    2. Kobayashi I
    (2014) To be or not to be: Regulation of restriction-modification systems and other toxin-antitoxin systems. Nucleic Acids Res 42(1):70–86
    .
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Pingoud A,
    2. Wilson GG,
    3. Wende W
    (2014) Type II restriction endonucleases—a historical perspective and more. Nucleic Acids Res 42(12):7489–7527
    .
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Vasu K,
    2. Nagamalleswari E,
    3. Nagaraja V
    (2012) Promiscuous restriction is a cellular defense strategy that confers fitness advantage to bacteria. Proc Natl Acad Sci USA 109(20):E1287–E1293
    .
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Korona R,
    2. Korona B,
    3. Levin BR
    (1993) Sensitivity of naturally occurring coliphages to type I and type II restriction and modification. J Gen Microbiol 139(Pt 6):1283–1290
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Thomas CM,
    2. Nielsen KM
    (2005) Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3(9):711–721
    .
    OpenUrlCrossRefPubMed
  10. ↵
    1. Labrie SJ,
    2. Samson JE,
    3. Moineau S
    (2010) Bacteriophage resistance mechanisms. Nat Rev Microbiol 8(5):317–327
    .
    OpenUrlCrossRefPubMed
  11. ↵
    1. Korona R,
    2. Levin BR
    (1993) Phage-mediated selection for restriction-modification. Evolution 47(2):565–575
    .
    OpenUrl
  12. ↵
    1. Kobayashi I
    (2001) Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res 29(18):3742–3756
    .
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Xu Q,
    2. Morgan RD,
    3. Roberts RJ,
    4. Blaser MJ
    (2000) Identification of type II restriction and modification systems in Helicobacter pylori reveals their substantial diversity among strains. Proc Natl Acad Sci USA 97(17):9671–9676
    .
    OpenUrlAbstract/FREE Full Text
  14. ↵
    1. Jeltsch A,
    2. Pingoud A
    (1996) Horizontal gene transfer contributes to the wide distribution and evolution of type II restriction-modification systems. J Mol Evol 42(2):91–96
    .
    OpenUrlCrossRefPubMed
  15. ↵
    1. Seshasayee AS,
    2. Singh P,
    3. Krishna S
    (2012) Context-dependent conservation of DNA methyltransferases in bacteria. Nucleic Acids Res 40(15):7066–7073
    .
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Kusano K,
    2. Naito T,
    3. Handa N,
    4. Kobayashi I
    (1995) Restriction-modification systems as genomic parasites in competition for specific sequences. Proc Natl Acad Sci USA 92(24):11095–11099
    .
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Didelot X, et al.
    (2011) Recombination and population structure in Salmonella enterica. PLoS Genet 7(7):e1002191
    .
    OpenUrlCrossRefPubMed
  18. ↵
    1. Doroghazi JR,
    2. Buckley DH
    (2010) Widespread homologous recombination within and between Streptomyces species. ISME J 4(9):1136–1143
    .
    OpenUrlCrossRefPubMed
  19. ↵
    1. Fraser C,
    2. Hanage WP,
    3. Spratt BG
    (2007) Recombination and the nature of bacterial speciation. Science 315(5811):476–480
    .
    OpenUrlAbstract/FREE Full Text
  20. ↵
    1. Budroni S, et al.
    (2011) Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci USA 108(11):4494–4499
    .
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Croucher NJ, et al.
    (2014) Diversification of bacterial genome content through distinct mechanisms over different timescales. Nat Commun 5:5471
    .
    OpenUrlCrossRefPubMed
  22. ↵
    1. Nandi T, et al.
    (2015) Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles. Genome Res 25(1):129–141, and erratum (2015) 25(4):608
    .
  23. ↵
    1. Roberts GA, et al.
    (2013) Impact of target site distribution for Type I restriction enzymes on the evolution of methicillin-resistant Staphylococcus aureus (MRSA) populations. Nucleic Acids Res 41(15):7472–7484
    .
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Chan CX,
    2. Beiko RG,
    3. Ragan MA
    (2006) Detecting recombination in evolving nucleotide sequences. BMC Bioinformatics 7:412
    .
    OpenUrlCrossRefPubMed
  25. ↵
    1. Csurös M
    (2010) Count: Evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26(15):1910–1912
    .
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Roberts RJ,
    2. Vincze T,
    3. Posfai J,
    4. Macelis D
    (2010) REBASE—a database for DNA restriction and modification: Enzymes, genes and genomes. Nucleic Acids Res 38(Database issue):D234–D236
    .
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. Johnston C,
    2. Martin B,
    3. Fichant G,
    4. Polard P,
    5. Claverys JP
    (2014) Bacterial transformation: Distribution, shared mechanisms and divergent control. Nat Rev Microbiol 12(3):181–196
    .
    OpenUrlCrossRefPubMed
  28. ↵
    1. Treangen TJ,
    2. Rocha EP
    (2011) Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet 7(1):e1001284
    .
    OpenUrlCrossRefPubMed
  29. ↵
    1. Cordero OX,
    2. Hogeweg P
    (2009) The impact of long-distance horizontal gene transfer on prokaryotic genome size. Proc Natl Acad Sci USA 106(51):21748–21753
    .
    OpenUrlAbstract/FREE Full Text
  30. ↵
    1. Touchon M,
    2. Rocha EP
    (2007) Causes of insertion sequences abundance in prokaryotic genomes. Mol Biol Evol 24(4):969–981
    .
    OpenUrlAbstract/FREE Full Text
  31. ↵
    1. Bobay LM,
    2. Rocha EP,
    3. Touchon M
    (2013) The adaptation of temperate bacteriophages to their host genomes. Mol Biol Evol 30(4):737–751
    .
    OpenUrlAbstract/FREE Full Text
  32. ↵
    1. Guglielmini J,
    2. Quintais L,
    3. Garcillán-Barcia MP,
    4. de la Cruz F,
    5. Rocha EP
    (2011) The repertoire of ICE in prokaryotes underscores the unity, diversity, and ubiquity of conjugation. PLoS Genet 7(8):e1002222
    .
    OpenUrlCrossRefPubMed
  33. ↵
    1. Arber W
    (2000) Genetic variation: Molecular mechanisms and impact on microbial evolution. FEMS Microbiol Rev 24(1):1–7
    .
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Vasu K,
    2. Nagaraja V
    (2013) Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol Mol Biol Rev 77(1):53–72
    .
    OpenUrlAbstract/FREE Full Text
  35. ↵
    1. Corvaglia AR, et al.
    (2010) A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proc Natl Acad Sci USA 107(26):11954–11958
    .
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Rocha EP,
    2. Cornet E,
    3. Michel B
    (2005) Comparative and evolutionary analysis of the bacterial homologous recombination systems. PLoS Genet 1(2):e15
    .
    OpenUrlCrossRefPubMed
  37. ↵
    1. Halary S,
    2. Leigh JW,
    3. Cheaib B,
    4. Lopez P,
    5. Bapteste E
    (2010) Network analyses structure genetic diversity in independent genetic worlds. Proc Natl Acad Sci USA 107(1):127–132
    .
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Skippington E,
    2. Ragan MA
    (2011) Lateral genetic transfer and the construction of genetic exchange communities. FEMS Microbiol Rev 35(5):707–735
    .
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Popa O,
    2. Hazkani-Covo E,
    3. Landan G,
    4. Martin W,
    5. Dagan T
    (2011) Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res 21(4):599–609
    .
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Letunic I,
    2. Bork P
    (2011) Interactive Tree of Life v2: Online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39(Web Server issue):W475–W478
    .
    OpenUrlAbstract/FREE Full Text
  41. ↵
    1. Pruitt KD,
    2. Tatusova T,
    3. Maglott DR
    (2007) NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(Database issue):D61–D65
    .
    OpenUrlAbstract/FREE Full Text
  42. ↵
    1. Touchon M, et al.
    (2009) Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5(1):e1000344
    .
    OpenUrlCrossRefPubMed
  43. ↵
    1. Miele V,
    2. Penel S,
    3. Duret L
    (2011) Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 12:116
    .
    OpenUrlCrossRefPubMed
  44. ↵
    1. Guindon S, et al.
    (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321
    .
    OpenUrlAbstract/FREE Full Text
  45. ↵
    1. Stamatakis A
    (2014) RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313
    .
    OpenUrlAbstract/FREE Full Text
  46. ↵
    1. Schliep KP
    (2011) phangorn: Phylogenetic analysis in R. Bioinformatics 27(4):592–593
    .
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Edgar RC
    (2004) MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
    .
    OpenUrlCrossRefPubMed
  48. ↵
    1. Jakobsen IB,
    2. Easteal S
    (1996) A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput Appl Biosci 12(4):291–295
    .
    OpenUrlAbstract/FREE Full Text
  49. ↵
    1. Smith JM
    (1992) Analyzing the mosaic structure of genes. J Mol Evol 34(2):126–129
    .
    OpenUrlCrossRefPubMed
  50. ↵
    1. Bruen TC,
    2. Philippe H,
    3. Bryant D
    (2006) A simple and robust statistical test for detecting the presence of recombination. Genetics 172(4):2665–2681
    .
    OpenUrlAbstract/FREE Full Text
  51. ↵
    1. Sawyer S
    (1989) Statistical tests for detecting gene conversion. Mol Biol Evol 6(5):526–538
    .
    OpenUrlAbstract
  52. ↵
    1. Paradis E,
    2. Claude J,
    3. Strimmer K
    (2004) APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20(2):289–290
    .
    OpenUrlAbstract/FREE Full Text
  53. ↵
    1. Pfeifer B,
    2. Wittelsbürger U,
    3. Ramos-Onsins SE,
    4. Lercher MJ
    (2014) PopGenome: An efficient Swiss army knife for population genomic analyses in R. Mol Biol Evol 31(7):1929–1936
    .
    OpenUrlAbstract/FREE Full Text
  54. ↵
    1. Wolf YI,
    2. Makarova KS,
    3. Yutin N,
    4. Koonin EV
    (2012) Updated clusters of orthologous genes for Archaea: A complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol Direct 7:46
    .
    OpenUrlCrossRefPubMed
  55. ↵
    1. Cohen O,
    2. Pupko T
    (2010) Inference and characterization of horizontally transferred gene families using stochastic mapping. Mol Biol Evol 27(3):703–713
    .
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Enright AJ,
    2. Van Dongen S,
    3. Ouzounis CA
    (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575–1584
    .
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Katoh K,
    2. Standley DM
    (2014) MAFFT: Iterative refinement and additional methods. Methods Mol Biol 1079:131–146
    .
    OpenUrlCrossRefPubMed
  58. ↵
    1. Gouy M,
    2. Guindon S,
    3. Gascuel O
    (2010) SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 27(2):221–224
    .
    OpenUrlAbstract/FREE Full Text
  59. ↵
    1. Finn RD,
    2. Clements J,
    3. Eddy SR
    (2011) HMMER Web server: Interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37
    .
    OpenUrlAbstract/FREE Full Text
  60. ↵
    1. Furuta Y,
    2. Kobayashi I
    (2012) Mobility of DNA sequence recognition domains in DNA methyltransferases suggests epigenetics-driven adaptive evolution. Mob Genet Elements 2(6):292–296
    .
    OpenUrlCrossRefPubMed
  61. ↵
    1. Didelot X,
    2. Falush D
    (2007) Inference of bacterial microevolution using multilocus sequence data. Genetics 175(3):1251–1266
    .
    OpenUrlAbstract/FREE Full Text
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Regulation of genetic flux between bacteria by restriction–modification systems
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Genetic flux and restriction–modification systems
Pedro H. Oliveira, Marie Touchon, Eduardo P. C. Rocha
Proceedings of the National Academy of Sciences May 2016, 113 (20) 5658-5663; DOI: 10.1073/pnas.1603257113

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Genetic flux and restriction–modification systems
Pedro H. Oliveira, Marie Touchon, Eduardo P. C. Rocha
Proceedings of the National Academy of Sciences May 2016, 113 (20) 5658-5663; DOI: 10.1073/pnas.1603257113
Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 113 (20)
Table of Contents

Submit

Sign up for Article Alerts

Article Classifications

  • Biological Sciences
  • Evolution

Jump to section

  • Article
    • Abstract
    • Results
    • SI Methods
    • Discussion
    • Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Abstract depiction of a guitar and musical note
Science & Culture: At the nexus of music and medicine, some see disease treatments
Although the evidence is still limited, a growing body of research suggests music may have beneficial effects for diseases such as Parkinson’s.
Image credit: Shutterstock/agsandrew.
Large piece of gold
News Feature: Tracing gold's cosmic origins
Astronomers thought they’d finally figured out where gold and other heavy elements in the universe came from. In light of recent results, they’re not so sure.
Image credit: Science Source/Tom McHugh.
Dancers in red dresses
Journal Club: Friends appear to share patterns of brain activity
Researchers are still trying to understand what causes this strong correlation between neural and social networks.
Image credit: Shutterstock/Yeongsik Im.
White and blue bird
Hazards of ozone pollution to birds
Amanda Rodewald, Ivan Rudik, and Catherine Kling talk about the hazards of ozone pollution to birds.
Listen
Past PodcastsSubscribe
Goats standing in a pin
Transplantation of sperm-producing stem cells
CRISPR-Cas9 gene editing can improve the effectiveness of spermatogonial stem cell transplantation in mice and livestock, a study finds.
Image credit: Jon M. Oatley.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490