New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
On the direct determination of threedimensional crystallographic phases at low resolution: Crambin at 6 Å

Communicated by Herbert Hauptman, HauptmanWoodward Medical Research Institute, Buffalo, NY (received for review July 29, 1999)
Abstract
Using a pseudoatom approach, the threedimensional crystallographic phases for the protein crambin (a = 40.76, b = 18.49, c = 22.33 Å, β = 90.61°, space group P2_{1}) were determined to 6 Å by direct methods. First, the centrosymmetric h0ℓ set was assigned phases by symbolic addition, and the initial solution was then refined by Fourier methods. Phase values of strong reflections were then permuted, and the decision to change the phase value for two of these was made by consulting a crosscorrelation of the experimental density histogram to the theoretical or known histogram for the protein. The twodimensional basis was then extended by the Sayre equation into three dimensions by assigning a phase to a third allowed hkℓ origindefining reflection and an algebraic value to another axial reflection. The correct solution was again identified by the histogram correlation, yielding a solution in which the mean phase error for all 98 reflections was 61.5° or 23.1° for the 21 most intense reflections. A parallel study with another protein indicates this method may have general utility.
Significant strides have been made recently in solving the crystal structures of proteins at atomic resolution by direct methods for crystallographic phase determination. The concept of atomic resolution may denote either the true resolution of all atomic components of the polypeptide backbone (1) or the resolution of just the heavy atoms used, e.g., to obtain an anomalous scattering signal (2) because it is merely the Rayleigh criterion for resolving these scattering entitites that seems to be important.
There is also considerable interest in attacking the phase problem for macromolecules at low resolution. One reason for this is that an accurate phase assignment to relatively highresolution data can sometimes lead to an ambiguous definition of the molecular envelope boundaries (3), particularly if the lowresolution diffraction maxima are unrecorded or not assigned accurate phase values. Although conventional direct phasing methods seem to retain their validity within the first scattering intensity envelope (4) (e.g., to a resolution near 6 Å where the average intensity has a nodal value near zero), the phase problem in this region remains a challenge. For example, there are no conservative geometric rules relating density regions in the macromolecule at low resolution of the type that are similar to the reasonable chemical bonding constraints between atoms. Thus, one cannot identify readily which attempt at a solution is “correct” just from its appearance.
Partial success in twodimensional ab initio phase determination at low resolution (e.g., with electron crystallographic data) has resulted from several approaches to the problem, maximum entropy and likelihood techniques (5, 6). Globular approximations have also been successful in two ways. In xray crystallography, random glob generators coupled with a suitable figure of merit have determined molecular envelopes from threedimensional data, after clustered trial solutions have been averaged (7). A reciprocal space approach has also been explored. From a pseudoatomic distribution of density in the protein, the unit cell was rescaled so that an atomic scattering factor could be used to normalize the observed intensity data, following a suggestion made by Harker (8). This has also been found to be effective for phasing twodimensional electron diffraction data sets from proteins with a large αhelix content (9–12), where the atomistic assumption has an obvious application. In this paper the extension of this methodology to threedimensions is explored with xray data for the phase determination of crambin at 6 Å.
Data and Methods
Diffraction Data.
Calculated structure factor amplitudes and phases from the protein crambin (13) (M_{r} = 6,287.82) were taken from an atomic model that accurately simulates the hydration, as well as the protein structure, and hence the density of the crystal is well modeled (1, 14). The space group is P2_{1} where a = 40.76, b = 18.49, c = 22.33 Å, β = 90.61°. There are 98 unique hkℓ diffraction maxima within the 6 Å limiting resolution used in this study. In a parallel study, 6 Å data (104 reflections) from monoclinic rubredoxin (15) (P2_{1}: a = 19.97, b = 41.45, c = 24.41 Å, β = 108.39°, M_{r} = 7465.23), a model with accurately predicted solvent structure (14), were similarly treated.
It was assumed that the distribution of density could be simulated by pseudoatomic globs and that the Fourier transform of these globular subunits could be modeled, as shown earlier (9–12), after a 10fold reduction of the unit cell dimensions, by an atomic scattering factor. (For the [010] projection, a pseudoatom model with coordinates chosen at identified glob centers was found to fit the published phases by a mean error of 28.4°; for the [001] projection the mean error, 15.3° was somewhat better. For rubredoxin, corresponding values are: [010]: 18.0°; [001]: 29.1°) In this case the electron scattering factor for carbon (16) was used as the model for the glob transform. Normalized structure factors (17) were generated, therefore, from E_{h}^{2} = I_{h}/ɛΣf_{c}^{2}. The atomic scattering factor was also used for (zonal) structure factor calculations during Fourier refinement.
Phase Determination.
The Σ_{1} and Σ_{2}threephase structure invariants were then generated from the threedimensional normalized structure factors. For space group P2_{1} (baxis unique), symmetryequivalent reflections the phase relationships of equivalent reflections are given in the International Tables for Xray Crystallography (18). The minimal information required for a basis set (19) to assign phase terms to all other reflections was assessed by a convergence test (20). By symbolic addition (21), the centrosymmetric (h0ℓ) data were then assigned phase values [including setting two permissible origindefining reflections (19), accepting some from Σ_{1}estimates and assigning some algebraic values], followed by Fourier refinement. Most intense reflections in the final phase list were then permuted, and several figures of merit (see below) evaluated the need for further changes to this zonal set. After completion, the strongest reflections were used as a basis for expansion into the threedimensional data set via the Sayre equation (22): F_{h} = (θ/v)Σ_{k}F_{k}F_{h−k}. A third origindefining reflection (19) was then defined as well as an algebraic phase term to be permuted. For the Sayre expansions, a correct value was given for F_{000}, but an estimate that minimized the negative density of ensuing electron density maps could be used just as well to stabilize the convolution (23).
Figures of Merit.
Often, it was necessary to pick an optimal phase solution from two or more choices. Several figures of merit were evaluated for this purpose. One that has been often used is the Luzzati (24) test for electron density map flatness: 〈Δρ^{4}〉, where Δρ = ρ − ρ̄. (Ideally, the best phase solution minimizes this figure of merit.) When F_{000} = 0.0 for calculating these maps, the average term ρ̄ is also zero. With an atomistic estimate of glob centers used for a structure factor calculation, a Patterson correlation coefficient (25), (Σm_{o}m_{c}/(Σm_{o}^{2}Σm_{c}^{2})^{1/2}), where m_{o} = F_{o}^{2} − 〈F_{o}^{2}〉, etc., was also evaluated. (The subscripts o and c denote “observed” and “calculated” values, respectively.) Finally, it was assumed that the density histogram (26) v(t) could also be determined a priori for this protein (4, 7, 27). Its expected appearance at low resolution (7, 26, 27) is shown in Fig. 1a. Given an observed density histogram v(t) from the electron density map calculated for any phase solution, then a figure of merit is immediately suggested because the crosscorrelation function (28) ψ_{12}(τ) = ∫v_{1}(t + τ)v_{2}(t) dt of an experimental histogram (trial phase set) with the expected distribution should approximate the autocorrelation function ψ(τ) = ∫v(t + τ)v(t) dt (Fig. 1b) as the phase error decreases. Although the autocorrelation function is maximally peaked at the zero value, the test for a skew distribution of the crosscorrelation function comparing normalized sum of differences at ψ(τ) and ψ(−τ) was more useful. The minimum value of skewness was sought because the sum of differences in the mirrorsymmetric autocorrelation function is zero. This criterion is the reverse of the one used to predict the correctness of an experimental density histogram: i.e., where its skewness is a desirable property (29). The appearance of the histograms for maps generated with different amounts of phase error has been depicted by Lunin (27), where increasing phase error leads to a more symmetric, Gaussianlike, distribution. The histogram for crambin based on phased 6 Å xray data actually resembled a theoretical case in which some slight phase error was present (27).
Results
For the 10 h0ℓ Σ_{2}phase invariant sums with the largest values of A = (2/)E_{h}E_{k}E_{−h−k}, the average value of φ = ϕ_{h} + ϕ_{k} + ϕ_{−h−k} is 72° when the value 0° is expected (17). Nevertheless, the origin was defined by setting ϕ(303) = 0 and ϕ(20 3̄) = 0. Four Σ_{1}estimates were also accepted: namely, a 0° estimate for (402) and a 180° estimate for reflections: (400), (20 2̄), and (40 2̄). Two reflections, (40 3̄) and (302), were assigned algebraic values. This generated four phase sets with 16 terms, from which electron density maps were calculated, monitoring the value of the Luzzati (24) figure of merit, testing density flatness. Coordinates of globs in maps from the two solutions with the lowest value of 〈Δρ^{4}〉 lay near one another, so these were averaged for the structure factor calculation to estimate the complete h0ℓ phase list. This was followed by three cycles of Fourier refinement. In this phase set, there were 11 of the 38 reflections with incorrect phase assignments [including the shift of the weak ϕ(303) term], most of which are associated with medium or weak intensity reflections. The phases of the most intense reflections were then individually permuted, and the asymmetry of the test ψ_{12}(τ) crosscorrelation functions was evaluated after the histograms were obtained for the ensuing h0ℓ maps (including the F_{000} term in the calculation). A separate symbolic phasing of hk0 data had also been carried out. From an ambiguous assignment indicated for an intense reflection, the ϕ(300) phase term was shifted from 0 to 180° to test for the best solution based on the properties of ψ_{12}(τ). After this test, resulting in a phase shift, the next reflection most likely to be shifted in phase was determined in the same way: i.e., the ϕ(401) value was changed from 0 to 180°. As shown in Table 1, there were no remaining phase errors for reflections in which F_{h} ≥ 200.0. These 12 largest phased structure factors were reserved for a basis set to be expanded by the Sayre equation. The electron density maps for this projection are compared in Fig. 2. (For rubredoxin, symbolic addition finds 13 entirely correct phases for the h0ℓ data; Fourier refinement yields only 3 errors for all 20 predicted phases, all associated with weak reflections.)
For the threedimensional phase expansion via the Sayre equation (22), the value of a third origindefining phase, ϕ(312) = −158.9°, as permitted by the space group (19), was included. [This value, taken from the previous xray determination (1), was used only to facilitate comparison to the published phase values.] In addition, an algebraic term was permuted ±45°, ±135° for ϕ(020), which has an actual phase value of 141.1°, to generate four threedimensional phase sets. Selection of the best solution was again based on the optimal value of the crosscorrelation function ψ_{12}(τ), using density histograms constructed from sections at y/b = 0.0 and 0.333. The minimum of this crosscorrelation asymmetry S = Σ_{i}ψ(τ_{i}) − ψ(τ_{−i})/ψ(0) again defined this best solution unequivocally, as shown in Table 2, where a summary of mean phase values for different classes of reflections is also given. Examples of the experimental density histograms and their crosscorrelation functions are shown in Fig. 1. It is clear that the best phase accuracy is found for the strongest reflections, but mean phase averages over other classes of reflections are again much better than the random estimate and in accord with other favorable phase predictions for proteins at low resolution (3). The electron density maps calculated with the most intense reflections reveal that much of the polypeptide backbone is covered correctly with an electron density envelope (Fig. 3). (For rubredoxin, a similar expansion gives an overall mean error of 75.6° for all 104 phases, but only 24.8° or 56.3° for the 14 or 26 most intense reflections, respectively.)
Discussion
The phase determination outlined above was not exactly a straightforward process. First of all, it is probably unrealistic to expect the globular model to be accurate for threedimensional potential distributions. The best fit of a glob model would be anticipated for individual projections. Nevertheless, when the zonal Σ_{2} phase invariant sums were evaluated with published phase values, there were 4 values of φ = ϕ_{h} + ϕ_{k} + ϕ_{−h−k} differing by 180° for the 10 ranked according to the largest value of A. The evaluation of the hk0 reflections was somewhat more favorable. Although there were two values of φ near 180° in the list of the top ranked triples, these were found for the eighth and ninth ranked invariants. In the top 10 ranked hkℓ phase invariants, however, there were 5 with φ value near 180°, including the triple invariant with the largest A value. It was, therefore, somewhat of a surprise to find that the values of the four most probable Σ_{1} invariants were correctly predicted.
The second problem encountered in this phase determination is the difficulty with finding a truly robust figure of merit for identifying the best phase solution among several. This problem has been echoed by other investigators (4, 30). Although the Luzzati criterion for density flatness is often qualitatively useful, it cannot be relied upon for fine distinctions: e.g., when individual phases are being permuted. Despite favorable indications in earlier twodimensional determinations (9–12), the Patterson correlation coefficient is also unreliable. In this study, this figure of merit was not suitable for picking out the best phase solution when the h0ℓ set was refined by Fourier methods. In fact, the Luzzati figure of merit was a better criterion.
In this study, the crosscorrelation of the observed density histogram with the expected value has been the most reliable means of determining the best phase set. Although an experimental histogram for crambin was used in the initial discrimination of phase solutions via crosscorrelation with the trial histograms, subsequent evaluations of the threedimensional phase sets, via the ideal histogram in Fig. 1a or one with a slight amount of phase error (see ref. 27), did not affect the identification of the best solution. However, there was an additional problem with the evaluation of skewness in the crosscorrelation functions because it was not altogether clear where to locate the peak origin. Following the definition of correlation coefficients in signal analysis (31), the origin problem might be resolved when the Fourier transform of the crosscorrelation function U(f) = Σ_{i}I(f_{i}) is utilized in a figure of merit F = U_{AB}(f)/U_{AA}(f), where I(f_{i}) are the intensity values of the frequency components at f_{i}, for either the crosscorrelation function (AB) or the autocorrelation function (AA). As shown in Table 2, its maximum value of this ratio is also useful for detecting the best phase set. (A maximum value of F greater than 1.00 indicates some error in the Fourier transform calculation.)
Seemingly, the best tactic for threedimensional phase determination at low resolution is to take great pains first of all to obtain the best phase set possible for a zonal projection. For threedimensional phase determination, the Sayre equation is preferable to a symbolic addition approach because it averages over several possible contributors to a given phase, and, despite the inaccuracy of individual threephase invariants, it seems to provide a useful result, especially for the strongest reflections in the data set. It may also be preferable to retain the structure factor magnitudes in this convolution rather than their normalized values because the shape of the pheomenological scattering factor is only approximately valid, leading to inaccurate prediction of E_{h} magnitudes. (This problem with the amplitude transform of the globular model is indicated by relatively high crystallographic residuals for the h0ℓ and hk0 data sets when the carbon scattering factor approximate is used—respectively, R = 0.54 and 0.40.)
It appears, therefore, that ab initio phase determinations for macromolecules at low resolution may not be an impossible goal. Tests of other representative structures in other space groups must be carried out to determine whether there are general truths to be found rather than episodically favorable outcomes; the experimental amplitudes from these proteins should also be evaluated in future work. Appraisal of optimal figures of merit, seemingly the greatest challenge facing us now for the identification of best solutions, is also a prime consideration. The evaluation of density histograms is certainly worth pursuing further. For example, a recent study (32) demonstrating that a threedimensional lowresolution structure determination of trigonal rubredoxin might be feasible from observed xray structure factor amplitudes was somewhat surprising because the lowresolution data were strongly affected by the high ammonium sulfate concentration of the solvent space. If the crystallographic phases from the xray model (33) were applied to the 6 Å amplitudes, the map density histogram again followed the distribution expected for other proteins. This result indicates that the same endpoint could be exploited as a figure of merit for phase determination, perhaps explaining why phasing attempts with these experimental data were so promising.
Acknowledgments
Thanks are due to Prof. M. M. Teeter for providing the structure factor magnitudes and phases to the HauptmanWoodward Medical Research Institute that were used for this determination and to Dr. M. P. McCourt for generating threedimensional electron density maps. Research was funded by a grant from the National Institute for General Medical Sciences (Grant GM46733), which is gratefully acknowledged.
Footnotes

↵* To whom reprint requests should be addressed. Email: dorset{at}hwi.buffalo.edu.

Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.060019197.

Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.060019197
 Received July 29, 1999.
 Accepted January 18, 2000.
 Copyright © The National Academy of Sciences
References
 ↵
 ↵
 ↵
 ↵
 Podjarny A D,
 Urzhumtsev A G
 ↵
 ↵
 Fortier S
 Gilmore C J,
 Nicholson W V
 ↵
 ↵
 ↵
 Dorset D L
 ↵
 ↵
 ↵
 ↵
 ↵
 Doyle P A,
 Turner P S
 ↵
 Hauptman H A
 ↵
 Henry N F M,
 Lonsdale K
 ↵
 Ladd M F C,
 Palmer R A
 Rogers D
 ↵
 ↵
 ↵
 ↵
 ↵
 Luzzati V,
 Mariani P,
 Delacroix H
 ↵
 Drenth J
 ↵
 Zhang K Y J,
 Main P
 ↵
 ↵
 Gaskill J D
 ↵
 ↵
 ↵
 Mason S J,
 Zimmermann H J
 ↵
Dorset, D. L. & McCourt, M. P. (2000) Z. Kristallogr., in press.
 ↵
Citation Manager Formats
More Articles of This Classification
Biological Sciences
Biophysics
Related Content
 No related articles found.
Cited by...
 No citing articles found.