Nucleosome positions alone can be used to predict domains in yeast chromosomes

Significance DNA is packaged into chromosomes, which are further organized into domains: Regions of the genome which are more likely to self-interact. Domains have been observed in species ranging from bacteria to humans and are thought to play an important role in gene regulation. Yet the mechanisms of domain formation are not fully understood. Here we use computer simulations to investigate domain formation in yeast. Our model reproduces the experimentally observed domains using only nucleosome positioning information as an input, implying that (unlike in higher eukaryotes) domain boundary locations are largely determined at this level. Our results reveal how irregular nucleosome spacing impacts the 3D chromosome organization, pointing to a direct link between nucleosome positioning and genome regulation at the large scale.

where ri,i+1 = |ri−ri+1| is the separation of the beads, and the first term is the Weeks-Chandler-Andersen (WCA) potential which represents a steric interaction preventing adjacent beads from overlapping. In Eq. (2) dij is the mean of the diameters of beads i and j. The diameter of the DNA beads is a natural length scale with which to parametrize the system; we denote this by σ, and use this to measure all other length scales. The second term in Eq. (1) gives the maximum extension of the bond, R0; throughout this work we use R0 = 1.6 σ, and set the bond energy K FENE = 30 kBT for linker DNA beads. The bending rigidity of the polymer is introduced via a Kratky-Porod potential for every three adjacent DNA beads where θ is the angle between the three beads as given by and K BEND is the bending energy. The persistence length in units of σ is given by lp = K BEND /kBT . Finally, steric interactions between non-adjacent DNA beads are also given by the WCA potential [Eq. (2)]. Figure 1. In our first simple model, as depicted in Fig. 1a, nucleosomes are represented by 10 nm (4 σ) diameter beads, connected to linker DNA beads using FENE bonds according to Eq. (1) but with the appropriate choice for R0 (i.e. R0 = 3.6 σ for the bond between a DNA bead and a nucleosome bead, or R0 = 5.6 σ for the bond between two nucleosome beads); as before steric interactions between nucleosome beads, and between nucleosome and DNA beads are given by the WCA potential, with dij being the mean of the diameters of the two beads.

B. Nucleosome model of
C. More detailed geometry nucleosome model of Figure 4. In the more detailed model, as depicted in Fig. 4a, nucleosomes are represented by a rigid body composed of five component beads including four 5 nm (2 σ) beads and a 2.5 nm (σ) connector bead (see Fig. 4a). The four core beads are arranged with their centres on the corners of a square of size 4.2 nm (1.68 σ); the connector bead is positioned 5.75 nm (2.3 σ) from the centre of the square. Linker DNA beads are connected to nucleosome connector beads using harmonic springs with the associated potential U HARM (ri,i+1) = K HARM (ri,i+1 − R0) 2 , [5] where ri,i+1 = |ri − ri+1| is the separation of the beads, and R0 = 1.1 σ is the equilibrium separation.
To constrain the entry-exit angle for linkers emerging from a nucleosome, a bending interaction between three connected DNA-connector-DNA beads is given by [6] where θ is the angle between the three beads, and θ0 is the desired equilibrium angle, set at θ0 =72 • , so as to match the entry-exit angle measured from the canonical nucleosome crystal structure (15). We set the interaction energy to be the same as that used for the linker DNA beads.

Simulation Method
In our coarse grained molecular dynamics simulations, the position of the ith bead ri changes in time according to the Langevin equation where mi is the mass of bead i, γi is the friction it feels due to an implicit aqueous solvent, while η i is a vector representing random uncorrelated noise which obeys the following relations The noise variance is scaled by the thermal energy, given by the Boltzmann factor kB multiplied by the temperature of the system T , taken to be 310 K for a cell. The potential Ui is a sum of interactions between bead i and all other beads, as described above. For simplicity we assume that all beads in the system have the same mass and friction mi ≡ m, and γi ≡ γ. Eq. (7) is solved using the LAMMPS software (16) which uses a standard velocity-Verlet algorithm; we use a time step of ∆t = 0.005 τ (where τ is the simulation time unit, as defined in the next section). Our simulations are initialized such that the nucleosome and DNA bead positions follow a random walk; we then evolve the dynamics for 122 τ to obtain an equilibrium polymer conformation. We simulated for a further 50×10 3 τ and saved configurations every 250 τ ; this was repeated 20 times for each region using a different set of random numbers to generate the noise η i (t) each time. We use periodic boundary conditions, with a simulation box size of 400×400×400 σ, meaning that the system is dilute (though, as detailed in the next section, the diffusivity of the chromatin is mapped to in vivo measurements meaning this effectively takes into account slow-down of the dynamics due to macromolecular crowding). To map between simulation and real time units, we first note that the above defined length, energy and mass units lead to a natural time unit τ = mσ 2 /kBT . Another important time scale is the Brownian time τ B = σ 2 /Di, which is the time scale over which a DNA bead diffuses across its own diameter σ. Here Di is the diffusion constant for bead i, given through the Einstein relation by Di = kBT /γi. With our choice of γi = 1 this means τ B = τ . To map to real times we measure the mean squared displacement (MSD) for all beads; we then find the value of τ B which gives the best fit to experimental results from Ref. (17), who measured the MSD for various chromatin loci in live yeast cells. This results in τ B = 80 µs, meaning that each 10 7 time step simulation run represents approximately 4 s of real time.

Analysis of MNase data
A number of methods have been developed to obtain nucleosome positioning information in vivo, and the most extensively employed is micrococcal nuclease (MNase) digestion combined with high-resolution microarrays or high-throughput sequencing. The premise is that MNase will digest any DNA which is not protected by its association with proteins, and that mapping of undigested fragments reveals which DNA regions were associated with nucleosomes. For most of this work we used MNase-seq data, but in section 10 below we consider some alternative methods.
We used MNase-seq data published in Ref.
(1) (available at GEO:GSM53721). Paired-end reads were aligned to the Saccharomyces cerevisiae reference genome (SacCer3 build) using bowtie2 (18); duplicates and any reads with mapping quality less than 30 were removed. Nucleosome coverage maps (e.g. blue bar plots in Fig. 1c and Fig. S1) were then obtained by piling-up the centre points of each paired-end read.
Since MNase-seq data is obtained from a population of cells, and it is expected that nucleosomes positioning will show cell-to-cell variability, obtaining an average or "most likely" set of nucleosome positions requires a statistical approach (and we note that this is a distinct problem from understanding the factors which drive nucleosome positioning in vivo, e.g. DNA sequence and chromatin remodelling factors (19)). A number of methods for obtaining such positions have been presented in the literature, and these tend to fall into two categories: methods which infer an effective nucleosome binding energy landscape from the data, and methods which find peaks from the pile-up of fragments. The NucPosSimulator software (2) is a popular tool which falls into the first category. In short, it takes the centre point of each paired-end read and builds a frequency count profile, which is then used to generate an effective potential landscape for nucleosome binding. A Metropolis Monte-Carlo algorithm is then used to add, remove and move nucleosomes within this landscape. We use NucPosSimulator in "simulated annealing" mode, where the system starts at a high temperature (where nucleosomes will be highly dynamic) before being slowly cooled. During this process the nucleosomes settle into their most likely positions. Full details of the software are given in Ref. (2).
A popular tool which instead uses peak finding to extract nucleosome positions from MNase data is DANPOS2 (3). Comparing the output generated by DANPOS2 and NucPosSimulator we found that over 90% of nucleosome positions changed by less than 22 bp between the two maps, but NucPosSimulator tended to identify more nucleosomes than DANPOS2 (4.9% of nucleosomes found by NucPosSimulator were not found by DANPOS2, and 1.9% of nucleosomes found by DANPOS2 were not found by NucPosSimulator). These results are summarized in Fig. S2. This indicates that there is only a small difference in nucleosomes identified by these two software tools; DANPOS2 tended to find more nucleosome free regions than NucPosSimulator, leading to a slightly larger mean nucleosome spacing. We decided to use the positions generated by NucPos-Simulator in our simulations because the average nucleosome spacing it generated was closer to that quoted in the literature, and there was a better visual comparison between the raw MNase pile-up and the extracted positions (Fig. S2d).

Analysis of MicroC and MicroC XL data
MicroC data are obtained from Ref. (8) (available at GEO: GSE68016), while MicroC XL data are obtained from Ref. (7) (available at GEO:GSE85220, specifically we used samples GSM2262329, GSM2262330 and GSM2262331). We follow the same processing procedure for each data set, and this is similar to that described in Ref. (8). After trimming adapters from the paired-end data we use bowtie2 (18) to align reads to the Saccharomyces cerevisiae reference genome (SacCer3 build); the data were treated as single-end reads, since ligation fragments will not form proper pairs when aligned. Duplicates were removed, and reads filtered to retain those where both pairs attained a mapping quality of at least 30. Following Ref. (8) we further filter reads according to the strand each read in the pair maps to, in order to avoid including reads resulting from runs of undigested nucleosomes. Interactions can then either be binned to obtain standard interaction maps, or can be mapped onto specific nucleosomes to obtain a nucleosomenucleosome interaction map.
To generate nucleosome level interaction maps we use the nucleosome positions generated by NucPosSimulator from the MNase-seq data as detailed above. Treating each of the pair from each MicroC read separately, reads which overlap with a single nucleosome are unambiguous; reads which do not overlap with a nucleosome, but map to a position where their centre point is within 200 bp of the centre of one or more nucleosomes are assigned to their closest nucleosome. Reads which overlap with more than one nucleosome are assigned to that with which they have the largest overlap. Reads which do not map within 200 bp of a nucleosome, or which overlap with two nucleosomes by the same amount are discarded. Only read pairs where both members of the pair are assigned to nucleosomes are retained as informative interactions. For the MicroC data, across 20 replicates, starting with 73,943,603 interactions, we were able to assign 73,803,602 of these unambiguously to pairs of nucleosomes; i.e. less than 1% of read pairs were discarded as it was ambiguous as to which nucleosomes they represented. For the MicroC XL data, across 3 replicates, starting with 130,958,525 interactions, we were able to assign 114,652,179 of these unambiguously to pairs of nucleosomes, this time discarding less then 0.05% of read pairs.

Generating Nucleosome interaction maps from simulations
From our simulations we obtain the positions of all beads in 2000 independent configurations for each region as detailed above. In a MicroC experiment nucleosomes are cross-linked, unprotected DNA is digested, and then protected DNA fragments are ligated: we expect that the probability that DNA from two nucleosomes are ligated together is a function of their 3-D separation. To mimic this process in silico we pick two nucleosomes at random from a simulated configuration, we then accept this as an interaction with a probability P (r) which is a function of the nucleosomes separation r, and reject it otherwise. For simplicity we choose an exponential function P (r) = e −r/lc , with interaction (or "crosslinking") length scale lc. For a simulation with N nucleosomes, we perform this operation N 2 times, and then repeat this for each configuration. To obtain a number of simulated interactions which is similar to the number of MicroC reads for a given region we can repeat this entire process multiple times. Finally, to obtain a map which is comparable to the data, we scale all interaction counts by a factor γ which results in there being the same total number of reads in the simulated and experimental interaction maps. We find that the observed domain pattern is largely insensitive to the value of the parameter lc, but this strongly affects the mean interactions vs. genomic separation (as detailed in the main text, and section 8 below).
To compare with experiments we generated simulated interaction maps using different values of lc in the range 3-30 σ, and then used the value which gave the best fit to the experimental interactions vs. separation plot (Fig. S8c-d; this was different for the MicroC and MicroC XL data, as detailed in the main text). In Fig. S7b an alternative method is used to generate the map, where instead of picking two nucleosomes at random, any two beads (nucleosome or DNA) can be chosen. The rest of the procedure follows as above, but then the interactions are sorted into bins with a fixed width in bp. Since each bin can contain a mixture of DNA and nucleosome beads, they can contain different numbers of beads, so interactions are further normalized by the number of beads per bin.
To ensure that the contact maps are generated from a sample of configurations which are representative of equilibrium conformations, we tracked the radius of gyration, Rg, as a function of time (Fig. S3a), ensuring that we use configurations at a point after this had stopped systematically changing. This quantity is defined by where ri is the position of the ith bead in the fibre, and rmean is the mean over all bead positions. We further checked that the configurations were not changing in time by generating contact maps from two different time windows (Fig. S3b). There was no appreciable difference between the maps (at least close to the diagonal, where we make measurements to compare with data). We also checked that considering 20 independent simulations was enough to ensure a representative sample by generating contact maps from each of two sets of 10 independent simulations (Fig. S3c). Again there was no appreciable difference between the maps. Finally we compared mean reads vs. genomic separation plots for each of these contact maps, showing no appreciable difference within the standard error (Fig. S3d).

Calling Boundaries from Interaction Maps
To find domain boundaries for both MicroC data and the simulated interaction maps we use a "sliding box" algorithm.
Essentially a square box is placed off the diagonal of the interaction map (with its corner on i, i + 1), and we sum the values for nucleosome interactions falling with the box. We then slide the box along the diagonal, nucleosome-bynucleosome to obtain a boundary signal as a function of box position. Symbolically the signal at nucleosome k (used to determine if there is a boundary between nucleosomes k and k + 1) is given by where xij is the number of interactions between the ith and jth nucleosome, N is the number of nucleosomes in the region and we use d = 10. At the edges of a region (i.e. k < d or k > N − d) we use the same function but reduce d, e.g. for k = 4 we set d = 3. This is essentially the same as the algorithm used in Ref. (8), where for each nucleosome the number of upstream to downstream interactions (within some range) is counted. This signal gives a measure the number of interactions between regions either side of a given nucleosome (the lower the value the fewer crossing interactions), so minima in s k are potential boundaries. To call a boundary at a minima we require that value of s k is smaller than its local average by at least some threshold factor; i.e. we definē and then if s k <γs k we call the minima a boundary. We tune the value of the factor γ by visual inspection of the called boundaries and the interaction map; we choose this separately for the MicroC and MicroC XL data, but keep the same values for calling boundaries from the corresponding simulation maps. Specifically for MicroC data we set γ = 0.85 and for MicroC XL data γ = 0.97. We note that due to noise in the data, occasionally the boundaries found by the algorithm are not the same as would be expected from visual inspection of the interaction maps: this highlights the difficulty in unambiguously defining domains and boundaries within 3C based interaction maps. We find that the algorithm works better for the MicroC data than for the MicroC XL, though visually the same domains are clearly present in the contact maps for each data set. We deem any boundary found in the simulated interaction map to be a "correct prediction" when it is within one nucleosome of the position of an experimental boundary. As detailed in the main text, we denote a boundary which is found in the simulations but not the MicroC data an "extra boundary", and any boundary found in the MicroC but not in the simulations is a "missing boundary". To obtain a pvalue for the level of agreement between a set of simulated boundaries and the experimental data, we generated a set of random boundary positions which have the same statistics (i.e. the same mean boundary separation), counted the number of correct boundaries, and then repeated this many times. In this way we estimate the probability that the boundaries could be correctly positioned due to random chance (the probability that the null hypothesis is true).
To obtain an insulation signal at each nucleosome we scale s k by its mean value across all regions and take the negative so that the more interactions which are observed between regions either side of nucleosome k, the lower the insulation signal. In Fig. S5 we plot the u k found from simulations vs. that found from the MicroC data. As a quantitative comparison we calculate the Spearman rank correlation coefficient, which would give the same value independent of the logarithm and scaling. The insulation signal at a domain boundary can be viewed as a measure of the strength of that boundary (Fig. S5b).

Interactions versus genomic separation: genomewide analysis
In Fig. 3b in the main text we plot the mean number of MicroC (or MicroC XL) interactions as a function of genomic separation averaged over the regions which we simulated. As discussed there the plot is different for the two different experimental data sets, and by comparison with our simulations (Fig. 3d in the main text and also Fig. S8c-d) we found that the slope of this plot depends on the "crosslinker length". In Fig. S8ab we show a similar plot of the genome-wide average for the two experimental data sets. Noting that linear regions on a log-log plot imply a power law relationship, we find that there are three different linear regimes. In Fig. S8a we observe that for the MicroC data the number of reads initially decreases steeply with genomic separation (with an exponent close to 2), but then this becomes shallower, and there is another linear regime with exponent close to 1. There is a plateau at large separations -as discussed in Ref. (8) the original MicroC method fails to capture long range interactions. The MicroC XL data shows a much shallower slope for small separation, with an exponent less than 1; it then becomes steeper with a linear regime with a slope close to 1. At long ranges (separation s > 500) there is again a slope close to 1. It is unclear what the origin of these different regimes is, though we note that HiC data typically shows an exponent close to 1 for large separations. Here we plot separations in units of nucleosomes, but similar exponents are observed if we instead plot separations as DNA length (Fig. S8b).

Further model refinements also fail to show improved agreement with data
Given the surprising result that including a more detailed nucleosome geometry in the model does not give an improved agreement with the data, we considered two further refinements to the model described in Fig. 4. First we included a short ranged attractive interaction between nucleosomes to model direct nucleosome-nucleosome interactions which could be mediated by surface charges or histone tail interactions, see Fig. S11. To do this we included a Lennard-Jones interaction between any pair of nucleosomes, given by U LJcut (r) = U LJ (r) − U LJ (rcut), r < rcut 0, otherwise, [9] where U LJ (r) = 4 n dn r where r is the separation between the centre points of the nucleosomes (a point at the geometric centre of the four large beads making up the rigid body), n is the interaction energy, dn = 2.0 σ, and rcut = 3.0 σ is the range of the interaction. We found no improvement in agreement with the data compared to the simple model (comparing the boundary calls, insulation signal, and reads vs. genomic separation plots, Fig. S11). Since in Ref. (8) the authors noted that nucleosomes within inactive genes tended to interact more, we reasoned that the presence or lack of certain histone modifications could be used to identify nucleosome which might interact attractivelyhowever a model where the nucleosome-nucleosome interaction was present only in some regions of the simulated fibre also did not lead to an improvement in agreement with data. Second, we considered that certain histone modification might lead to partial unwrapping of DNA from the histone octamer. Although our model is too simple to include this in detail, we reasoned that it could be included in a simple way by removing the angle constraint on the entry/exit DNA for a subset of nucleosomes with acetylation modifications. Again we saw no improvement in agreement with the MicroC data.

Genome-wide nucleosome spacing: further analysis
In the main text we present some analysis of nucleosome linker lengths based on MNase-seq data. We note here that similar results are obtained using data obtained from site-directed DNA cleavage experiments (from Ref. (11)), which offer higher resolution. Specifically the method uses an Saccharomyces cerevisiae mutant with a unique cysteine in histone H4, which allows chemical cleavage of DNA at precise locations near the nucleosome dyad. Linker lengths can be obtained by two approaches. First, the fragments that the experiment yields are stretches of DNA between two nucleosome centres; these can be run on an agarose gel to reveal bands corresponding to DNA fragments from pairs, triplets etc. of nucleosomes. Purifying the lowest molecular weight band and performing paired end sequencing gives directly the separation of pairs of nucleosomes as they appear in single cells; however, this purification selects against longer linkers (fragments representing linker lengths between ∼ −22 and 53 bp are enriched for; here a negative linker length means that the nucleosomes overlap). Fig. S12a shows the linker length distribution obtained in this way, alongside that from the MNase data and NucPosSimulator discussed in the main text. Two features of note are that there are negative length values, which indicates overlapping nucleosomes, and that peaks at ∼ 10 bp intervals are visible; we discuss these points in detail below. Though we cannot use the chemical cleavage data to analyse long linkers, we can still count the number short (< 3 bp) and medium (4-50 bp) length linkers in different genomic regions. Fig. S12b-c shows that, consistent with the MNase based positioning (Fig. 5e-f), both within gene bodies and within the region upstream of their TSS, the proportion of linkers which are shorter than 3 bp is higher for active than for inactive genes. The fact that a similar distribution, and similar trends for different genomic regions, can be observed implies that the features discussed in the main text are not artefacts arising because nucleosome positions are based on a population level map.
The second approach to obtaining linker lengths from the chemical cleavage data, as detailed in Ref. (11), is to pile-up the strand-dependent cleavage points across the genome to form a nucleosome position map. Due to the specific pattern of cleavage positions on opposite strands, nucleosome centres can be identified from the strand-specific shape of the peaks (since the method is sensitive to the shape of the peak and not the height, the purification bias discussed above does not have an effect (11)). Here we used the "unique nucleosome map" provided as supplementary material in Ref. (11); this does not allow nucleosomes to overlap by more than 40 bp (taking the positions with the highest score where larger overlaps occur). We note that, like MNase data, this map gives a "population" picture, and may not represent the situation within individual single cells. From this map we obtain the proportions of short (< 3 bp), medium (4-50 bp), and long (51-200 bp) linkers as in Fig. 5, and again find that short linkers appear more frequently within active than inactive genes and promoter regions (Fig. S12e-f), and that long linkers (NDRs) appear more frequently in promoter regions than genome wide (though the difference between active and inactive promoters was not statistically significant, Fig. S12f).
As detailed in Ref. (11), the higher-resolution obtained by the chemical cleavage method reveals that a significant proportion of nucleosome overlap with the territory of their neighbours, and that there is a strong ∼ 10.5 bp periodicity in the linker length distribution. The former manifests itself in the data as nucleosome centre-centre DNA fragments shorter than 147 bp (negative linker lengths); this has been studied using an in vitro positioning system (20) -there the authors observed that partial unwrapping of DNA from the histone octamer allows nucleosomes to invade their neighbour's territory and that extreme overlapping coincides with a loss of an H2A-H2B dimer from one nucleosome. In Ref. (21) a statistical mechanics model which constructed a nucleosome-DNA binding free energy based on the assumption of a strong DNA-histone interaction at points where the minor groove contacts the protein could reproduce the observed distribution of (positive and negative) linker lengths -as DNA unwraps there is an energy increase because hydrogen bonds are broken, but at the same time the entropy of the unwrapped section increases, leading to an oscillatory free energy. This model (which does not have a sequence dependent component) also explains the linker length periodicity through a DNA-nucleosome free energy profile which extends beyond the 147 bp footprint -this could arise through DNA-histone tail interactions, steric interactions between neighbouring nucleosomes (the bp separation of nucleosomes determines their relative orientation), or interactions with other proteins such as H1 linker histone (HHO1p in yeast). Intriguingly if we use the NucPosSimulator software with the chemical cleavage pile-up data as an input to obtain the most likely nucleosome positions, the linker periodicity is lost; this is despite the other reported trends being retained, and the positions of over 80% nucleosome centres being with 40 bp of those of the "unique map" obtained from peak finding. NucPosSimulator assumes a "hard" steric nucleosome-nucleosome interaction, and gives similar linker distributions even if the nucleosome footprint is reduced to allow overlaps; this suggests that a "soft" nucleosome-nucleosome interaction would be required to retain the linker periodicity (as implied by Ref. (21)). At any rate, a highly irregular pattern of linker lengths is prevalent across all of these nucleosome mapping methods. Furthermore, we do not expect the periodicity in linker lengths to affect our results, since the small (<5bp) changes in nucleosomes positions are below the resolution of our coarse-grained model (where DNA is represented by a chain of beads with size ∼ 7.35 bp).

Domain patterns can also be predicted analytically
The observation that nucleosome positions so strongly predict domain structure implies that the properties of chromatin fibres in yeast are highly dependant on the linker DNA between nucleosomes. To show this more explicitly we use some analytical concepts from polymer physics. First, we imagine taking our model chromatin fibre, and replacing all nucleosome beads with DNA beads, such that the chain is completely homogeneous. We then consider generating a contact map, but only using the beads which were originally nucleosomes; rather than doing a simulation, we instead assume that the level of interaction between beads i and i + s has a power-law dependence ∼ s −1 (using an exponent close to that measured in Fig. S8). This reveals a domain pattern strikingly similar to the MicroC data ( Fig. S13; actually using a different exponent, e.g. for a random or self-avoiding walk, gives similar results). The presence of DNA wrapped into nucleosomes means that the chromatin fibre has real domains of increased self-interaction. Additionally, in experimental and simulation (but not analytical) maps, domains persist when interactions are scaled by the "expected interactions" as common in HiC (Fig. S13b). This shows that the heterogeneous physical structure of the fibre in the nucleosomes and nucleosome depleted regions (present in experiments and simulations, but not represented in the analytics) leads to additional enrichment of contacts within a domain.
We can use this analytical approach to examine different patterns of linker lengths, to understand how these might lead to domains. For example, a possible chromatin structure might be for nucleosomes to be spaced regularly, but with occasional long linkers (nucleosome depleted regions or NDRs). Fig. S13c shows a map generated assuming nucleosomes are spaced with ∼ 30 bp linkers, with occasional ∼ 100 bp gaps (these lengths might be inferred from values of the nucleosome repeat length commonly quoted in the literature); we observe only a very weak domain-like pattern. The pattern can be strengthened by placing the regularly spaced nucleosomes closer together and increasing the length of the NDRs. However these domains look uniform and featureless compared to the pattern obtained with real nucleosome positions. From this we conclude that the observed domain patterns arise due to the whole distribution of long and short linkers -the former leading to boundaries, and the latter further increasing interactions with the domains.

Cell-to-cell variability in nucleosome positions
Above and in the main text we have detailed simulations where a single set of nucleosome positions is used for each region investigated. This was the set of "most likely" nucleosome positions generated from MNase data using the NucPosSimulator software. Here we investigate the effect of cell-to-cell variability of nucleosome positions. NucPosSimulator can also generate an ensemble of nucleosome position profiles which are consistent with the MNase data (see Ref. (2) for details; the software generates a single "most likely" set of positions when used in "simulated annealing" mode, or an ensemble of configurations by default). By varying the parameters used by the software the level of variation between sets of positions can be controlled, and one might expect this to affect the output. Here we consider two cases: high and low variability. In both cases the mean linker length is kept consistent with values quoted in the literature (e.g. see Fig. S2c).
In Fig. S14a 20 sets of nucleosome positions (green) are shown alongside the "most likely" positions (blue) and the MNase-seq data from which these are obtained: this is the high variability case. We run a simulation for each set, and contact maps (where each row or column represents one nucleosome) generated from three of these are shown in Fig. S14b. Since there can be a different number of nucleosomes in each set of positions, the contact maps have different sizes, and so are not aligned vertically. Nevertheless it can be seen that some domains or boundaries are present across all three examples, whereas others are not. Since each map comes from a single simulation (averaged over time) they are noisier than a map generated from several simulations. To see how simulation-tosimulation variation affects a combined contact map, we must first convert the nucleosome-nucleosome based interactions into bp based interactions (since there is no one-to-one mapping of nucleosomes between the simulations). The top panel in Fig. S14c shows this combined map, with the other panels showing similar bp based maps from the MicroC data and a set of simulations based on the "most likely" nucleosome positions. We note here the poor agreement between the "cell-to-cell variation" simulations and the data (compared to the "most likely" simulations): the interaction map does not show clear domains.
In Fig. S15 similar results are shown, but here there is much lower variability between the different sets of nucleosome positions. In Fig. S15c there is clearly a much better visual agreement between the simulations (top) and data (middle); in fact the visual agreement is arguably better than that of the "most likely nucleosome position" simulations (bottom). This result suggests that in reality for yeast there is a low level of variability between cells within a population.
A number of new experimental methods for probing nucleosome position in singe cells are currently being developed (22,23), and as more data becomes available in the future, it would be interesting to further investigate domain formation in single cells and how cell-to-cell variation affects population measures of 3-D structure. It would also be interesting to study cell-to-cell variation in other organisms, since, for example, nucleosome spacing and cell-to-cell variability of positions is expected to be much higher in higher eukaryotes.