Cryo-EM structure and in vitro DNA packaging of a thermophilic virus with supersized T=7 capsids

Significance Understanding molecular events during virus assembly and genome packaging is important for understanding viral life cycles, and the functioning of other protein–nucleic acid machines. The model system developed for the thermophilic bacteriophage P23-45 offers advantages over other systems. Cryo-EM reconstructions reveal modifications to a canonical capsid protein fold, resulting in capsids that are abnormally large for this virus class. Structural information on the portal protein, through which the genome is packaged, demonstrates that the capsid influences the portal’s conformation. This has implications for understanding how processes inside and outside the capsid can be coordinated.


Bacteriophage propagation and capsid isolation
A 400 ml volume of TB medium (0.8 % w/v Tryptone, 0.4 % w/v NaCl, 0.2 % w/v Yeast extract) was innoculated with 15 ml of an overnight (∼16 hour) culture of Thermus thermophilus HB8 and grown for approximately 3 hours at 70 o C and 200 rpm, until and A 600 of 0.2 had been reached. The culture was infected with P23-45 by addition of 400 µl of a 2x10 9 pfu/ml stock, and after a further 2 hours the culture had fully lysed, with an A 600 of 0.05. The lysate was moved to Concentrations were estimated by A 280 measurements and sample quality assessed by negative staining transmission electron microscopy, SDS-PAGE, and mass spectrometry.

Portal protein purification
The portal protein gene for bacteriophage G20c encoding residues 1-438 was cloned into vector pET22b between NdeI and XhoI restriction sites with a C-terminal His-tag downstream. Protein was expressed in E.coli BL21 (DE3) at 16 o C for 18 hours by induction with 1 mM IPTG. Cells were lysed in 20 mM Imidazole, 50 mM Tris-HCl pH 7.5, 1 M NaCl, 5 % v/v glycerol. The lysate was applied to a 5 ml HisTrap FF column and eluted with a linear gradient to 500 mM Imidazole, the target protein were pooled and loaded onto a Superose 6 10/30 column in 20 mM Tris-HCl pH 7.5, 1 M NaCl, 5 % v/v glycerol. Fractions containing the target protein were pooled and concentrated in a 100 kDa MWCO centrifugal concentrator to 20 mg/ml (39.8 μM).

Large terminase purification
The P23-45 large terminase gp85 construct was fused to an N-terminal SUMO-His 6 tag. The gene was codon optimised for expression in E.coli. and synthesized by ThermoFisher Scientific, and cloned into vector pET151-DTOPO between TOPO sites, downstream of a T7 promoter and Concentrations were estimated by A 280 measurement. Whilst enzyme activity was ultimately confirmed by the DNA packaging assay, initially nuclease activity was confirmed by digestion of pUC18 plasmid DNA visualised by agarose gel electrophoresis, and ATPase activity was confirmed using the EnzCheck phosphate assay kit as per the manufacturer's instructions. Image processing was performed using RELION 2.10 (2). Movie frame alignment was performed with MotionCor2 1.0.2 (3), using patches size of 5 x 5, B-factor of 150, without grouping or binning frames. CTF estimation was performed using CTFFIND4.1 (4) without dose-weighting, and particles were extracted from dose-weighted motion-corrected micrographs. The box size for the procapsid was 972 pixels and for the expanded capsid was 1032 pixels. Particles were downsampled by a factor of 1.5 resulting in a pixel size of 1.5975 Å/pixel. Particles were subsequently 2D classified and subsets were select for 3D reconstructions. An initial reference model was generated from particles with randomised orientations, resulting in a spherical model. Subsequent cycles of refinement used an icosahedral capsid reference low-pass filtered to 60 Å. Refinement was carried out in RELION 2.10 using 3D auto-refine imposing icosahedral symmetry. A total of 38,044 particles contributed to the icosahedral procapsid reconstruction, and 2,372 particles contributed to the icosahedral expanded capsid reconstruction. For the expanded capsid, CTF estimates were additionally refined using CTFRefine in RELION 3.0.
Postprocessing was carried out using a soft-edged mask, supplying the MTF file of the detector.
The postprocessing resolution of the procapsid was 4.39 Å and for the expanded capsid was 3.74 Å, according to 0.143 FSC criterion (5). Data collection and refinement statistics are summarized in the table below (Table S1). Figures were generated in UCSF Chimera (6).

Reconstructions depicting the portal protein
In the icosahedral reconstructions calculated using RELION a faint density of the portal protein was apparent at each vertex. The relion_particle_symmetry_expand function was used to generate 60 icosahedral symmetry-related orientations per particle. Particles were re-extracted with downsampling by a factor of 6 and subjected to 3D classification into 10 classes without changing orientations or imposing symmetry (C1), with a high resolution limit of 12 Å, using a soft-edged mask centered at one vertex. A soft-edged mask was generated using the portal protein crystal structure applying a low-pass filter of 60 Å and expanding the mask edge at a low threshold. After 3D classification, one class contained density for the portal protein. Particles from this class were re-extracted, downsampling by a factor of 1.5 from the original micrographs.
Subsequent rounds of 3D classification without changing orientations, with a high resolution limit of 5-8 Å, resolved the symmetry mismatch between the portal and procapsid vertex. In the expanded capsid, C5 symmetry was imposed in the final reconstruction. Where more than one symmetry-related orientation of the same original particle remained, only the particle with the highest LogLikeliContribution value was retained in a new version of the ".star" file. Particles were split into half-sets and reconstructed without changing orientations, using C1 symmetry for the procapsid and imposing C5 symmetry for the expanded capsid. A total of 16758 particles contributed to the C1 procapsid reconstruction, and 902 particles contributed to the C5 expanded capsid reconstruction. Postprocessing was carried out using a soft-edged mask, supplying the MTF file of the detector. The postprocessing resolution for the asymmetric procapsid was 9.33 Å and for the C5 expanded capsid was 9.56 Å, according to the 0.143 FSC criterion (5). Table S1.

Capsid structure modelling and analysis
A section of the expanded capsid map was extracted and segmented in UCSF Chimera using Segger (6, 7) for initial modelling. An atomic model was initially built using Phenix Map to Model, with manual rebuilding in Coot (8) followed by refinement using Phenix Real Space Refine (9).
The resulting model for the expanded capsid was fitted into the procapsid map and manually adjusted in Coot, followed by refinement with Phenix. Refinement statistics are summarized in the table below (Table S1). For internal capsid volume measurement, capsid maps were inverted in EMAN2 (10) and low-pass filtered of 20 Å, and the capsid core was segmented using the PDB was first converted to a volume (.mrc) in UCSF Chimera.

Portal protein crystal structure determination
For crystallization, the sitting-drop vapor diffusion method was used. Protein concentration was 20 mg/ml (39.8 μM). In addition, a 42 residue synthesized peptide (NovoPro), corresponding to the C-terminal segment of the large terminase protein gp85, was added to a final concentration of 2 mg/ml (413 μM) in drops, with the ratios of 100 nl protein : 50 nl peptide : 150 nl reservoir solution. Crystals formed in 40 % v/v MPD, 200 mM NaH 2 PO 4 . X-ray data were collected at Diamond Light Source, Oxfordshire, UK ( Table S2). The structure was determined by molecular replacement using the CCP4 suite of programs (11). Molecular replacement was carried out using the coordinates available for a N-terminally truncated version of the portal protein (PDB code 4ZJN). The structure was refined by REFMAC, with manual rebuilding performed in Coot (Table S2). Data collection and refinement statistics are summarized in the table below.

Portal protein Normal Mode Analysis
Normal modes were computed using an Elastic Network Model (ENM) where the motions of only the Ca carbon atoms of the protein backbone were considered. The atomic coordinates were postprocessed to remove all atoms except for the Ca atoms. Normal modes were computed using an ENM with an 8 Å cut-off for the interactions, and the lowest 100 frequency modes were computed. Modes of interest occurred within the 3 lowest frequency modes.

Bacteriophage Genome Termini Analysis
DNA was isolated from concentrated and purified high titer P23-45 stock by incubation in 0.5 % w/v SDS, 20 mM EDTA and 50 µg/ml proteinase K at 56 °C for 1-3 h. The DNA was extracted with phenol-chloroform and precipitated with ethanol supplemented with sodium acetate (12).
Genome sequencing was performed using Miseq Illumina in pair-end 250-bp long reads mode according to manufacturer's protocols (13). The generated reads were analyzed with the PhageTerm software (14) on a Galaxy-based server (https://galaxy.pasteur.fr) with default parameters.

Densitometric analysis of agarose gels
Gel images were analyzed using ImageJ software (imagej.nih.gov/ij/). Regions containing bands were boxed in separate lanes using the rectangular selection tool, and analyzed using the 'Plot lanes' tool. Areas under the curve were measured and normalized against the input DNA (control) lane.