TY - JOUR
T1 - Optimizing model representation for integrative structure determination of macromolecular assemblies
JF - Proceedings of the National Academy of Sciences
JO - Proc Natl Acad Sci USA
SP - 540
LP - 545
DO - 10.1073/pnas.1814649116
VL - 116
IS - 2
AU - Viswanath, Shruthi
AU - Sali, Andrej
Y1 - 2019/01/08
UR - http://www.pnas.org/content/116/2/540.abstract
N2 - Macromolecular structures are increasingly determined by an integrative approach, relying on diverse types of data. Recognizing its importance, Worldwide Protein Data Bank created an archive for these structures. The choice of representation of the modeled structure in integrative structure determination is an example of a model selection problem in statistics. Representation is generally specified ad hoc, selecting from a range of atomic and coarse-grained representations. We introduce the concept of objectively optimizing representation, based on varying amounts of information available for different parts of the structure. The optimized representation facilitates exhaustive sampling and therefore can produce a more accurate model and a more accurate estimate of its uncertainty for larger structures than were possible previously.Integrative structure determination of macromolecular assemblies requires specifying the representation of the modeled structure, a scoring function for ranking alternative models based on diverse types of data, and a sampling method for generating these models. Structures are often represented at atomic resolution, although ad hoc simplified representations based on generic guidelines and/or trial and error are also used. In contrast, we introduce here the concept of optimizing representation. To illustrate this concept, the optimal representation is selected from a set of candidate representations based on an objective criterion that depends on varying amounts of information available for different parts of the structure. Specifically, an optimal representation is defined as the highest-resolution representation for which sampling is exhaustive at a precision commensurate with the precision of the representation. Thus, the method does not require an input structure and is applicable to any input information. We consider a space of representations in which a representation is a set of nonoverlapping, variable-length segments (i.e., coarse-grained beads) for each component protein sequence. We also implement a method for efficiently finding an optimal representation in our open-source Integrative Modeling Platform (IMP) software (https://integrativemodeling.org/). The approach is illustrated by application to three complexes of two subunits and a large assembly of 10 subunits. The optimized representation facilitates exhaustive sampling and thus can produce a more accurate model and a more accurate estimate of its uncertainty for larger structures than were possible previously.
ER -