## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# A priori calculations of the free energy of formation from solution of polymorphic self-assembled monolayers

Contributed by Noel S. Hush, September 15, 2015 (sent for review July 6, 2015; reviewed by Markus Lackinger, Todd J. Martinez, and H. Fritz Schaefer)

## Significance

First-principles free energy calculations, characterizing polymorphism of self-assembled monolayers (SAMs) of porphyrin molecules formed from solution onto graphite, are performed using efficient methods previously applied only to small-molecule reactivity. SAM structures are typically optimized in the absence of solvent using density functional theory embodying explicit dispersion corrections. Added then are dispersion-dominated implicit solvation energies and SAM formation entropies derived from both molecular and phonon vibration frequencies. Scanning tunneling microscopy (STM) images are measured, and polymorph formation free energies are approximated. Close parallels between experiment and theory support the hypothesis that the first seconds of SAM formation are under thermodynamic control, despite formed SAMs being kinetically trapped. Polymorphism is associated with large opposing changes to entropy and substrate−molecule and solvent−molecule interaction energies.

## Abstract

Modern quantum chemical electronic structure methods typically applied to localized chemical bonding are developed to predict atomic structures and free energies for *meso*-tetraalkylporphyrin self-assembled monolayer (SAM) polymorph formation from organic solution on highly ordered pyrolytic graphite surfaces. Large polymorph-dependent dispersion-induced substrate−molecule interactions (e.g., −100 kcal mol^{−1} to −150 kcal mol^{−1} for tetratrisdecylporphyrin) are found to drive SAM formation, opposed nearly completely by large polymorph-dependent dispersion-induced solvent interactions (70–110 kcal mol^{−1}) and entropy effects (25–40 kcal mol^{−1} at 298 K) favoring dissolution. Dielectric continuum models of the solvent are used, facilitating consideration of many possible SAM polymorphs, along with quantum mechanical/molecular mechanical and dispersion-corrected density functional theory calculations. These predict and interpret newly measured and existing high-resolution scanning tunnelling microscopy images of SAM structure, rationalizing polymorph formation conditions. A wide range of molecular condensed matter properties at room temperature now appear suitable for prediction and analysis using electronic structure calculations.

A priori calculations of the free energies of chemical reactions using density functional theory (DFT) and/or ab initio methods are now well established for gas-phase processes (1, 2), gas surface reactions (3), and, using continuum self-consistent reaction field (SCRF) methods, for condensed phase processes also (4). Over the last few years, a major advance in computational methods has occurred, however, allowing for rapid and accurate evaluation of intermolecular dispersion interactions. This makes feasible similar SCRF calculations for physisorption, macromolecule structuring, and other self-assembly processes in condensed phases. We present the first application of this type, to our knowledge, considering the free energy of formation of various polymorphs of tetraalkylporphyrin self-assembled monolayers (SAMs) at solid/liquid interfaces on highly ordered pyrolytic graphite (HOPG) surfaces. Calculated structures and free energies are used to interpret new and existing high-resolution scanning tunneling microscopy (STM) images, focusing on the critical roles played by the dispersion interaction in driving SAM formation and by entropy and dispersion-based desolvation effects that oppose it.

Calculating a priori free energies for SAM formation from solution is a significant challenge. Accurate representations of the substrate−molecule energies, the intermolecular energies, the solvent interaction energies, and the effects of solvent structure are required, a set of tasks that, with a few exceptions (see e.g., refs. 5 and 6), has remained prohibitive a priori. Alternatively, model calculations have been useful for identifying key qualitative features (7⇓⇓⇓–11), whereas some full molecular dynamics simulations of the free energy using empirical force fields have been revealing for SAMs (12⇓⇓–15) and also for macromolecules (16), and DFT molecular dynamics approaches are appearing (17). A priori computational methods have significant advantages in that they do not require parameterization, they treat interactions both generally and accurately, and, through use of implicit solvation models, they can avoid extensive liquid structure simulations. However, applications to self-assembly processes have historically been prevented by difficulties in treating discrete and continuum van der Waals dispersive forces both quantitatively and economically. Modern methods (18⇓⇓⇓⇓⇓⇓–25) can now describe discrete interactions well, and self-assembly processes such as molecular crystal formation (22) and host−guest chemistry (26) have now been successfully modeled. Dispersive interactions with a solvent continuum (18, 27) are also now included in many codes, including Gaussian (28) and, very recently, VASP (29). Like discrete dispersion interactions, the best method for including continuum solvent dispersion is a topic of considerable current interest (30⇓–32). In both cases, many options are available, and consideration of SAM formation offers avenues for finding the most robust approaches. Herein we consider three possible approaches obtained by combining commonly used options for different aspects of the calculations, as described in *Methods*. In overview, our computational approach is to (*i*) systematically search through conformational options to establish sets of possibly feasible SAM structures; (*ii*) optimize these SAM structures typically in the absence of solvent; (*iii*) determine SAM solvation energy contributions at these optimized structures for comparison with bare-surface and free-molecule results; and (*iv*) determine zero-point energy, entropy, and enthalpy contributions to the free energy of SAM formation, including evaluating both the molecular and phonon vibration frequencies in situ in the SAM as well as the vibration frequencies of the gas-phase monomer molecules.

To be useful, a new computational strategy must address relevant experimental challenges, and indeed the measurement of free energies for SAM formation is in itself a significant challenge facing modern nanotechnology (33). A critical experiment, the first measurement of the temperature dependence of composition of SAM components, has recently been reported (34), opening this field to future quantitative analysis. Indeed, in total, only a few studies of SAM temperature dependence have ever been performed, as recently reviewed (12, 34, 35). An important conclusion drawn is that the nature of observed SAM composition can be controlled by either kinetic and/or thermodynamic effects (9, 12, 14, 15, 33⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–44). SAMs in thermodynamic equilibrium can have free energies determined directly from dynamics observed in the STM (17). More generally, although methods for measuring enthalpies of SAM formation are becoming available (45), it was recently concluded that “The Achilles’ heel of [the current] approach is the semi-theoretical evaluation of the dewetting enthalpy which at this point necessarily relies on plausible assumptions” (46). A priori calculations can not only directly calculate the required quantities but are also relevant to this field of endeavor as they can directly examine postulates concerning kinetic versus thermodynamic control. They can also reveal the factors that control observed free energies, providing insight into SAM synthetic strategies and functional properties.

Herein our techniques for a priori SAM simulation are applied to understand the factors controlling 2D polymorphism. This is attractive to study as it involves only one type of adsorbate molecule, simplifying analysis. We consider room temperature *meso*-tetraalkylporphyrin (M-C_{m}P) SAMs on HOPG as they show controllable polymorphism and synthetic diversity, considering (see Fig. 1) C_{13}P and Co−C_{13}P, presenting their syntheses and STM measurements, C_{12}P, reprocessing observed images (47, 48) using recently (49) developed internal calibration techniques to provide improved quantitative analysis, Zn−C_{12}P (47), C_{11}P (50), and Cu−C_{11}P (50, 51), as well as, briefly, C_{19}P (49).

Different 2D polymorphs of Cu−C_{11}P have been produced by varying the porphyrin concentration in the supernatant solution, with higher-coverage polymorphs forming at higher solution concentrations (51, 52). Increasing the porphyrin concentration once a SAM has formed brings about dynamic changes to the SAM only around defects, however, indicating that the SAMs are kinetically trapped, and this trapping takes only seconds to manifest after first application of the solution to the substrate. Similar results have recently been observed for porphyrin SAMs on Au(111) (34). This kinetic trapping makes thermodynamic quantities difficult to measure and sets a key challenge for future experimental work. As what happens in those first few seconds qualitatively reflects thermodynamics (high-coverage polymorphs develop at high porphyrin concentrations), we introduce the simplistic hypothesis that indeed thermodynamics completely controls this aspect.

Experimentally testing this hypothesis requires quantitative information (7, 8, 33), as has been measured, e.g., for codeposition of single SAMs containing mixed porphyrins (40) to reveal that molecules inside SAM phases are not in equilibrium with those in solution. However, only limited data are available concerning relative polymorph propensities. Here, we test the hypothesis by comparing observed structures and crudely estimated formation free energies to calculated ones. Although the uncertainties in the experimentally derived free energies are large, they are obtained using new and existing data of the type commonly available from STM measurements. Our method therefore has widespread application, providing starting points for quantitative analyses of the factors driving SAM formation.

The computational studies also focus on the factors that control free energies of SAM formation and polymorphism. For *meso*-tetraalkylporphyrin SAMs, chain length variation can be exploited to obtain understanding of the entropy/enthalpy balance (39), and we consider three different chain lengths, C_{11} to C_{13}. Solvent dependence can also provide direct means of modulating free energies of polymorphism (43), but herein we do not dwell on this effect, considering only solvents that are likely to have similar critical dielectric and density parameters, ignoring the differences between them.

## Results

### STM Images and Their Calibration.

For Cu−C_{11}P, various polymorphs have been observed at the HOPG/1−octanoic acid interface, and we consider only the low-density (L) and medium-density (M) forms (51). Fig. 2 *A* and *C* shows sample images, and Fig. 2*B* shows an interface region (I) containing both motifs. Fig. 2*D* shows the observed (27, 28) M polymorph for C_{12}P in *n*-tetradecane (no L polymorph has so far been observed in this system). Also shown are the first observed STM images of C_{13}P: in Fig. 2*F*, an L polymorph for C_{13}P measured in 1-phenyloctane, and in Fig. 2*E* a corresponding M polymorph and also an L−M interface. All of these images are scaled and skewed to an orthogonal coordinate system, as determined by our internal calibration procedures (see *Materials and Methods*); the corresponding raw images are shown in *SI Appendix*, *Details of STM Imaging*, where images for Co−C_{13}P are also shown. SAM geometrical properties, averaged over 10–20 similar images for Cu−C_{11}P, Co−C_{13}P, and C_{13}P, are given in Table 1. Of particular note, the observed lattice properties of Co−C_{13}P and C_{13}P are statistically equivalent, a result also observed (48) for Zn−C_{12}P and C_{12}P and for (at least) the M polymorphs of Cu−C_{11}P and C_{11}P (50), simplifying analyses. No coadsorption of solvent molecules is observed in any of these SAMs.

### Choice of Representative Computational Schemes.

A wide range of methods is available for treating dispersion interactions in electronic structure calculations (21, 23, 53) and we consider four alternatives. Three present different flavors of dispersion-corrected DFT; these are labeled PBE (Perdew, Burke, and Ernzerhof)-D3/PAW, PBE-DRSLL (Dion, Rydberg, Schröder, Langreth, and Lundqvist)/DZP, and PW91-D3/USPP. These combinations explore the choice of the density functional [PBE (54) vs. PW91 (55)], the choice of dispersion correction [D3 (19, 20) vs. DRSLL (18)], and the choice of basis set and core potential [plane waves with ultrasoft pseudopotentials (USPP) (56) or projector augmented wave (PAW) (57) or the double-zeta plus polarization (DZP) atomic basis set with basis set superposition error (BSSE) corrections applied to all calculated energies]. In addition, an older quantum mechanical/molecular mechanical (QM/MM) scheme is also used that combines (49) a traditional quality method for representing the intramolecular aspects of porphyrin macrocycle and kinked chain structure, B3LYP/6-31G* (58, 59), with the well-established AMBER intermolecular force field (60) and a specialized HOPG−alkane force field that has been fitted (61) to experimental thermodynamic data. The advantages of this approach are in terms of both computational speed and agility as well as the conceptual one that each of its parts has a solid foundation so that quality results are expected. Of particular significance, this method easily treats incommensurate SAM and substrate lattices whereas the other methods demand that calculations be performed at only certain discrete geometries at which the lattices become commensurate (see *Materials and Methods*). Its disadvantage is the inclusion of an empirically derived aspect that would need to be expanded to treat many related systems.

### Simulation of Possible SAM Polymorph Structures.

As *meso*-tetraalkylporphyrins are extremely conformationally flexible and must take on locally distorted kinks to form SAMs (49), very many atomic structures are feasible. These kinks involve conformation changes to the chain torsional angles specifying the locations of the third, fourth, and fifth C atoms along the chain; five low-energy kink patterns leading to stable L polymorphs have been previously identified and related to experimental STM images (49). We consider only these five possibilities for the structures of C_{11}P and C_{13}P.

For the M polymorphs, alternate alkyl-chain conformations are required so as to direct chains either into solution or else, e.g., to lie on top of the base porphyrin layer. Many seemingly independent molecular structural features combine to produce many different ways in which the alkyl chains can lift off the surface. Although, in principle, molecular dynamics simulations can be performed to identify structures, this is not feasible for kinetically trapped SAMs. Instead, we use a systematic conformational search procedure, constructing all possible variations among a set of critical structural variables. Most of the structures so produced can be easily eliminated, however, as rotation of the porphyrin macrocycles around their centers always accompanies changes in the structures of the alkyl chains, and only very restricted combinations of the identified structural elements lead to viable SAMs. As an example of what can be achieved using systematic conformational searching, for alkanethiol SAMs on Au(111), analogous methods identified over 10^{9} possible conformers, of which all but a few hundred could be immediately eliminated (62, 63). For alkylporphyrin SAMs, nine different conformations were chosen for the chains not lying flat on the surface and then these were combined with the set of five feasible kinked structures for the flat-lying chains identified for the L polymorphs. This is done by scanning all possible porphyrin ring angular orientations, eliminating structures with overlapping nonbonded atoms. As a result, a set of 40 initial SAM structures was prepared for each of the C_{11}P, C_{12}P, and C_{13}P SAMs. Optimization of these initial structures resulted in the identification for each molecule of, at most, 22 unique M SAM polymorphs. These structures were then assessed based on their calculated free energies of formation at 298 K and compared with the observed STM images.

### Identification of SAM Atomic Structures by Comparison of Observed and Calculated STM Images.

Comparison of the observed STM images and lattice parameters both for pure phases and for interfaces with those from the 27 unique optimized structures (5 for L, 22 for M) reveals that for each observed image, only one optimized structures matches. The matching structure is always either the one of calculated lowest free energy or else the second-lowest-energy one, as discussed quantitatively in *Comparison of Observed and Calculated Free Energies of SAM Formation*. These are overlaid on the observed images in Fig. 2, highlighted in Fig. 3, and specified in full in *SI Appendix*, *Optimized Atomic Coordinates from QM/MM* and *Optimized Atomic Coordinates from PBE-D3/PAW*, and calculated and observed lattice parameters are compared in Table 1. The simulated images are prepared using the Tersoff−Hamann approximation (64), displaying surfaces at a constant value of the electron density chosen such that resolved features parallel those resolved in the observed constant-current images. *SI Appendix*, *PBE/PAW Simulated STM Images* compares the actual observed and calculated images, indicating that the calculations offer explanation for the key resolved internal structural features. This verifies that the feature assignments implied by the simplistic atomic coordinate overlay shown in Fig. 2 are indeed valid.

Overviewing the results shown in Fig. 2, the structures for the L polymorph have all chains lying flat on the surface whereas the M polymorphs have two chains so positioned and two chains dangling into solution. Interface molecules between the two polymorphs typically have one chain in solution only. Across interfaces, the orientations of the porphyrin macrocycle rings and two of the alkyl chains do not change significantly, meaning that just changes in the conformations of the other two alkyl chains to/from their flat-lying orientations distinguish the atomic structures of the molecules in the abutted L, I, and M regions. The same structural motif is preserved for the L polymorphs of C_{11}P and C_{13}P and corresponds to that observed (49) for C_{19}P, but the M polymorphs of C_{11}P, C_{12}P, and C_{13}P all take on different forms. In particular, the chains not lying on the surface can orient in a variety of ways, producing complex 3D patterns above the surface, as highlighted in Fig. 3. These patterns are controlled equally by the interactions with the completed SAM layer, interactions between different dangling chains, conformational energy costs, and, most important, solvation effects.

Considering the observed STM images in more detail, we note that metalloporphyrins may generate significant STM signal from their metal centers, but otherwise images are dominated by the kinks in the alkyl chains at their β and γ positions (49). Sometimes, atomically resolved tunneling is also seen among the long flat-lying chain regions, dominated by the chain hydrogen atoms (65, 66). Naively, one hydrogen atom per methylene group is expected to be observed, and indeed sometimes is, but, more often, the images reveal only a hydrogen atom on every second group. This arises owing to the chains not sitting perfectly flat at room temperature and to consequences of the SAM lattice being incommensurate with that of HOPG (65). In all cases, the alkyl chains are calculated by QM/MM and observed to lie parallel or nearly parallel to the HOPG substrate

Considering the comparison (Table 1) between observed and calculated lattice parameters in more detail, we note that agreement is excellent except for two outliers. For the L polymorphs, differences of 1 Å are found for the parameters *a* and *c* (see Table 1 and Fig. 2), with corresponding deviations in the associated angles of a few degrees. However, the observed interface structures for both C_{11}P and C_{13}P (see Table 1) portray L lattice parameters that are sightly distorted and actually much closer to those calculated for the pure L polymorphs, indicating that the L polymorphs are quite pliable. Indeed, calculated electronic interaction energies between the surface and the SAM increase by only up to 5 kcal mol^{−1} when the porphyrin rings are forcibly rotated about the surface normal by 15°, driving large changes to the optimized lattice parameters. Hence the difficulty that the calculations perceive in predicting accurately these lattice parameters leads to an explanation of a significant unexpected observed property of the L SAMs. At first observation, the large variation observed between structures was believed sufficient to warrant the introduction of a second label “B” depicting interacting interface molecules (51), whereas we are now inclined to interpret “B” as being instead just a form of “L” distorted by the boundary conditions imposed on the interface.

The other outlier is for the M polymorph of C_{13}P and seems to be the opposite in nature: The observed structure appears to be very unique and quite strictly defined. For this, the observed STM images display some very unusual features and, of all of the optimized structures for this polymorph, only one, the structure presented, reproduces them; these features include the very short distance between chain kinks and the critical observation that the macrocycle ring has the same angular orientation in both the L and M polymorphs, with the chain kinks pointing directly in the *b′* direction (see Fig. 2*F*). The calculations therefore indicate that this structural motif is intrinsically one of high free energy and that its stabilization for C_{13}P is fortuitous. It is also an unintuitive structure, as it fails to cover significant portions of the HOPG surface (black regions in Fig. 3), unlike almost all of the other optimized structures for C_{13}P. This highlights the general importance of the specific size-dependent effects on the structure and properties of medium-density polymorphs and the need for a thorough scan of plausible atomic structures when interpreting even high-resolution STM images such as those obtained.

### Estimating Free Energies of Formation from Observed SAM Properties.

Although formed SAMs are kinetically trapped (34, 52), our hypothesis explaining the formation of higher density polymorphs at higher solution porphyrin concentrations is that equilibrium thermodynamics controls the first few seconds of SAM production. We label the porphyrin molecules as either P_{(L)}, P_{(M)}, or P_{(S}_{)} depending on whether they are in the L polymorph, in the M polymorph, or in solution, respectively, and represent the (nonintegral) number of carbon atoms of HOPG per porphyrin molecule in a SAM as *x*. The raw (but solvated) surface is therefore labeled as HOPG_{x}, the L polymorph as HOPG_{x}P_{(L)}, and the M polymorph as HOPG_{x′}P_{(M)} (note *x*′ ≠ *x* as the coverages differ). Formation at low concentration of the SAM of the L polymorph is then expressed as*κ* is the ratio of the surface areas of regions containing the SAM to regions of uncovered HOPG. The standard Gibbs free energy change for SAM formation for *n* moles of adsorbed porphyrin is then given by*C*^{0} specifies a standard state of 1 M. This relationship allows estimates of SAM free energies of formation based on the SAM formation conditions provided that some (crude) estimate is available for *κ*. Under our experimental conditions, dense highly ordered SAMs cover most of the surface, but disordered regions that are presumably also quite dense are observed as well. The fraction of seemingly uncovered surface was never observed to be greater than 1% (*κ* = 10^{2}) and here we take a rough estimate of *κ* = 10^{3}, but larger values are plausible. Estimated free energies of formation change by −2.303*RT* ≈ −1.4 kcal mol^{−1} per decade increase in *κ*. Although uncertainties of this order are large compared with what accurate experimental measurements can achieve, they are tiny compared with the magnitudes of the competitive forces controlling SAM production (see *The Factors Controlling SAM Binding and Polymorphism*) and so facilitate commencement of quantitative analysis.

For Cu−C_{11}P, stable SAMs of the L polymorph are observed (51) at ^{−7} M in 1-octanoic acid, implying a SAM formation free energy of Δ*G*_{L}/*n* = −13.7 kcal mol^{−1} assuming *κ* = 10^{3}. Here we find stable SAMs for the L polymorph of C_{13}P in 1-phenyloctane at ^{−5} M, implying ^{−1}. These crude estimates are reported in Table 2.

Reporting free energy changes in the form of the associated chemical potential changes Δ*G*/*n* is common for the consideration of the thermodynamics of solids, liquids, and gases, and we report such quantities so that SAM formation may be readily understood in terms of traditional chemical analyses. However, the total Gibbs free energy of a SAM is proportional to the available substrate surface area, making the thermodynamically most stable SAM that which has the most negative value of Δ*G*/*A*, where *A* is the surface area covered by the SAM per molecule (*A* = *ab* sin *θ*; see Fig. 2 and Table 1) (3). These values are also reported in Table 2. Only Δ*G*/*A*, not the chemical potential, is a thermodynamic state property whose value can be simply manipulated during SAM composition changes. Therefore, although the free energy of production of the M polymorph can be described by analogy to Eq. **1** using**2** has no analog. The chemical potential of the M polymorph can be expressed in terms of that of L, the covered surface areas per molecule *A*_{M} and *A*_{L}, and the porphyrin solution concentration at which both polymorphs are equally represented as_{11}P and C_{13}P, the two phases are stable at porphyrin concentrations ^{−5} M and 10^{−4} M, respectively, therefore implying Δ*G*_{M}/*n ≈* −12.0 kcal mol^{−1} and −10.9 kcal mol^{−1}. These values are listed in Table 2, as are the associated values of *N*_{A}Δ*G*/*A* (where *N*_{A} is Avagadros’ number), which, as one would intuitively expect, are more negative for M than for L. Improved estimates could be obtained by systematically varying the solution concentration and recording many images, but the estimates we obtain using only known information are accurate to on the order of 1 kcal mol^{−1} and can be readily applied to interpret many previously observed images.

### Comparison of Observed and Calculated Free Energies of SAM Formation.

Table 2 compares the observed (Δ*G*^{OBS}) and calculated (Δ*G*^{QM/MM}, Δ*G*^{PBE-D3/PAW}, Δ*G*^{PBE-DRSLL/DZP}, and Δ*G*^{PW91-D3/USPP}) free energies of SAM formation from solution. The computational methods predict Δ*G*/*n* values that differ by up to 40 kcal mol^{−1} from each other, a large range that would not be expected if this computational field was mature. The a priori methods used have been optimized for the treatment of small van der Waals dimers, not for condensed matter properties such as SAM formation, and improved methods are likely to be developed in the future. Two of the computational methods, the QM/MM scheme and PBE-D3/PAW, agree qualitatively with experiment in that the variations in Δ*G* with alkane chain length and with polymorph variations are small; both methods underestimate the binding strengths by ∼6 kcal mol^{−1}, however. The other two methods, PBE-DRSLL/DZP and PW91-D3/PAW, significantly overestimate the attraction and cannot account for the observed chemical variations; hence these methods are not preferred. The variable nature of the PBE-DRSLL/DZP results most likely stem from the use of the small double-zeta DZP basis set and the imprecision of BSSE corrections. However, D3 corrected PW91 also yields erratic results during the fitting of the two D3 parameters to the test data set (see *SI Appendix*, *Derivation of PW91-D3 Parameters*). Although this functional often gives results better than any other similar one when dispersion is excluded, including dispersion appears to expose its weaknesses.

For the conserved L polymorph of C_{11}P, C_{13}P, and C_{19}P, five possible structure types have been considered, as discussed previously (49) for C_{19}P; this set explores various ways in which the chains can lie on the surface (the plane through the carbon−carbon bonds can be parallel or perpendicular), and various ways of making the required chain kinks. The lowest-energy structure calculated for each of the porphyrins is indeed the one that matches the experimental observations, with all other structures being calculated by QM/MM to be at least Δ*G*/*n* = 7 kcal mol^{−1} (0.3 eV) higher in energy. For the M polymorphs of each porphyrin, approximately four structures had a calculated free energy at most 5 kcal mol^{−1} higher than the lowest-energy structure, with either the lowest-energy structure or the second-lowest-energy one (typically 1 kcal mol^{−1} higher in energy) matching the experimental images. The calculations are therefore useful for predicting the polymorph atomic structures as well as for identifying them.

### The Factors Controlling SAM Binding and Polymorphism.

As the calculations are able to predict the geometry and free energy of the SAMs and their chemical variation with chain length and polymorph, it is anticipated that they also provide a realistic description of the driving forces. A detailed breakdown of the energy components calculated by all 4 computational methods is given in Table 2, and key features are indicated in Fig. 4 for the L and M polymorphs of C_{13}P evaluated using QM/MM and PBE-D3/PAW (an expanded figure showing similar trends predicted by the PBE-DRSLL/DZP and PW91-D3/USPP methods is provided in *SI Appendix*, *Expansions of Figure 4*). The most striking feature of Fig. 4 is the energy scale, which ranges over 200 kcal mol^{−1} (9 eV). Although individual interatomic dispersion interactions are weak, their collective strengths driving the solvent interactions, the intermolecular interactions and the molecule−substrate interactions are very large indeed.

Fig. 4*A* views the total free energies of SAM formation of C_{13}P L and C_{13}P M in terms of the solvation, thermal, and binding contributions listed in Table 2. The thermal contributions opposing SAM formation of Δ*G*_{corr}/*n* = 37 kcal mol^{−1} and 25 kcal mol^{−1} for L and M, respectively, are dominated by the entropy contributions at 298 K of −*T*Δ*S/n* = 38 kcal mol^{−1} and 26 kcal mol^{−1}, respectively. Also opposing SAM formation are differential solvation energy contributions of Δ*G*_{s}/*n* = 106 kcal mol^{−1} and 70 kcal mol^{−1}, respectively. Driving SAM formation are the binding energy contributions of Δ*E*^{QM/MM}/*n* = −149 and −103 kcal mol^{−1}, respectively, calculated by QM/MM, or of Δ*E*^{PBE-D3/PAW}/*n* = −148 kcal mol^{−1} and −100 kcal mol^{−1} calculated by PBE-D3/PAW. Near-perfect cancellation of these opposing large effects results in very similar free energies for the L and M polymorphs (the maximum calculated L−M difference is just 2 kcal mol^{−1}). Table 2 show that this cancellation is modulated as the alkyl chain length changes from 11 to 13 carbon atoms to control structure.

The net solvation energy Δ*G*_{s} itself arises from the differences between the large solvation energies Δ*G*_{s,P}, Δ*G*_{s,HOPG}, and Δ*G*_{s,SAM} of the molecules in solution, the bare HOPG surfaces (which differ in area for L and M), and the SAMs on these surfaces, respectively. Details are given in Table 2 and depicted diagrammatically in Fig. 4*B*.

Similarly, Table 2 and Fig. 4*D* show that the QM/MM binding energies Δ*G*^{QM/MM} can be decomposed into small distortion energies required to kink the alkyl chains Δ*E*_{dist}, significant alkylporphyrin to alkylporphyrin intermolecular interactions between neighboring molecules in the SAMs Δ*E*_{P-P}, and strong attraction between the alkylporphryins and HOPG Δ*E*_{P-HOPG}, made up of roughly equal contributions coming from alkyl chain and central porphyrin ring interactions. Although the distortion energies in absolute magnitude are quite high at 4–6 kcal mol^{−1}, this effect is dwarfed by the magnitudes of the other contributing terms, explaining how a wide range of chain conformers can be stabilized. Also, as the intermolecular interactions and substrate interactions are large, we see that subtle modulation effects could have a profound impact on relative polymorph free energies, explaining the diverse nature of observed SAM properties.

Although dispersion is naively expected to be critical to binding, Fig. 4*C* shows an important result: The raw PBE density functional predicts that the molecules in the gas phase would be repelled by HOPG. Use of dispersion correction is essential to even the most qualitative description of the binding, contributing *ca*. −175 kcal mol^{−1} and −115 kcal mol^{−1} for the L and M polymorphs of C_{13}P, respectively (Table 2). This effect is well known for small-molecule interactions in gas-phase dimers and in molecular solids (18⇓⇓⇓⇓⇓–24), and also for SAMs of small molecules (69, 70). However, the magnitude of the effect we see for SAMs of large molecules is unprecedented, not just in absolute value but also in terms of its variations between feasible polymorphs. This stresses the criticality of dispersion forces to the self-assembly processes—not only for oligoporphyrin molecules but also for natural and synthetic polymers as well.

## Discussion

This work presents, to our knowledge, the first use of modern electronic structure computational methods to determine a priori the free energy of formation of a SAM from solution. It shows that computational techniques standard for the interpretation of gas-phase chemistry and small-molecule solution chemistry also depict complex self-assembly processes of large molecular systems. Calculations can reveal and/or predict the atomic structure of partially resolved SAMs as well as expose the key chemical features controlling SAM production and polymorphism.

Our application is specifically for tetraalkylporphyrin SAMs on HOPG for which the interplay between kinetic and thermodynamic effects in controlling the properties of hydrophobic SAMs is of considerable current interest. As our calculated free energies parallel those implied by observed SAM compositions, support is given to the hypothesis that thermodynamics controls the critical first few seconds of SAM growth before the SAMs becoming kinetically trapped. This hypothesis therefore warrants the development of new techniques permitting further experimental and computational investigation.

However, our methods are of general use and can be applied to all types of self-assembly processes on surfaces from solution, as well as to general processes involving macromolecular self-assembly. As an example of the general importance of the concepts discussed, one critical feature found here for hydrophobic SAMs, the importance of intermolecular dispersion interactions in controlling SAM structure (and chirality), has recently been demonstrated also for gold−sulfur SAMs with strong head−group interactions (62, 63). In terms of specific details, in a fully general method, Eq. **6** would need to be replaced with one of the many available solvation energy protocols embodying the use of arbitrary atoms and also electrostatic contributions. Such protocols may need slight modification to support the required 2D boundary conditions of the SAM and/or solvent-excluded surfaces.

Some of the computational and experimental techniques combined in this work will be essential to many subsequent studies. These include advances in theoretical methods to treat the entropy of a SAM as opposed to standard gas-phase entropies, and methods to treat incommensurate lattices (see *Materials and Methods*). Our advances in STM calibration and data analysis allow for accurate lattice parameters and molecular shapes to be determined, as we demonstrate using either commercial equipment (at University of Sydney), or purpose-built apparatus (at Nijmegen), or by reprocessing already published images. These advances will all help drive state-of-the-art modern experiments aimed at determining the thermodynamics and kinetics of SAM production, allowing for greatly enhanced control procedures of widespread applicability through nanotechnology and molecular electronics.

Many used experimental and computational techniques require significant improvement, however. Our method for crudely estimating free energies of SAM formation provides a means to engage in quantitative thermodynamics discussions, as it can be used to reprocess a wide range of already published data. Its accuracy is below that needed in the future, however, and well below that already achieved in pioneering studies performed over the last few years. New and accurate experimental techniques need to be developed. The four computational methods used were chosen to explore very different ways of completing the calculations. If essentially the same results had been predicted independent of method, then this would have provided a powerful statement of confidence in the methods, as would be expected in a mature research field. Instead, two very different and yet reasoned highly applicable methods, QM/MM and PBE-D3/PAW, encouragingly gave very similar results, whereas PBE-DRSLL/DZP and PW91-D3/USPP showed anticipatable flaws. From this, we conclude that plane wave basis sets are much more suited to SAM calculations than are atomic basis sets, owing to the criticality of BSSE, and that methods focusing on correctly representing chemical interactions are preferred to those that can give good results by cancellation of internal errors. The PW91 method was for a long time the preferred method for calculations involving intermolecular interactions, as its poor representation of covalent bonds gave more useful results than methods that described them better, such as PBE. When dispersion is evaluated more rigorously, as in PBE-D3 and PW91-D3, only PBE-D3 gives realistic results, however. PBE-D3 is still somewhat crude, given the wide range of improved hybrid, double-hybrid, and meta-GGA (generalized gradient approximation) density functionals now available, including those with correct behavior of the long-range potential. Our study shows how important it is to do accurate calculations in which likely errors are reduced to below 1 kcal mol^{−1}. This is a difficult task for just the interaction energy, let alone the required additional solvation and thermal energies. The obtained results imply that the D3 explicit dispersion correction and the Floris−Tomasi−Pascual Ahuir solvation dispersion interaction are finely balanced. This is a profound result, as these terms were determined independently and are usually applied independently. It requires further investigation.

Many challenges thus remain, yet the benefits are high. Calculations using implicit solvent require dramatically less resources than do molecular dynamics simulations, allow much higher-quality methods to be used for evaluating energies, can be readily applied to consider chemical diversity (e.g., herein to *ca*. 150 self-assembly processes in solution), and directly reveal the thermodynamic quantities of greatest interest to the understanding of new phenomena.

## Materials and Methods

C_{13}P and Co−C_{13}P were synthesized in gram amounts by the methods of Crossley et al. (71), and established STM imaging and calibration techniques were used, as described in *SI Appendix*, *Synthesis* and *Details of STM Imaging*, respectively.

The QM/MM calculations were performed using a method previously applied to *meso*-tetraalkylporphyrin SAMS (49). A limitation of this approach is that it explicitly treats the HOPG surface as being rigid. However, it is justified, as PBE-D3/PAW calculations indicate that substrate relaxation contributes less than 1 kcal mol^{−1} to binding energies. The Gaussian (28) EXTERNAL command is used to include the nonstandard force field elements into an ONIOM (72) QM/MM force field. Long-range intermolecular interactions are treated using a real-space sum over replicated images of the incommensurate HOPG and SAM lattices, with the unit cell parameters optimized externally to Gaussian using analytical derivatives. All atomic geometry optimizations are performed using initial analytical second derivatives for the Hessian matrix. All optimized structures are confirmed to be local minima by second-derivative analysis. Entropy is evaluated using the standard procedures applied by Gaussian (73) except that gas-phase translational and rotational contributions are not included for the SAMs and are replaced by vibrational contributions derived from calculated phonon mode frequencies; the phonon modes and their interactions with the molecular vibrations are evaluated using a real-space sum over all vibrations of a 5 × 5 grid of SAM unit cells. As the HOPG surface is taken to be frozen, it does not contribute to the free energy. The net “thermal” correction to the free energy is therefore taken to be the sum of the entropy, enthalpy, and zero-point energy differences calculated for the SAM and for the adsorbate molecule in the gas phase. This is then combined with a solvation entropy correction, needed for conversion of the gas-phase standard state of 1 atm concentration to the solution conditions, of 1 M of −1.89 kcal mol^{−1} (73, 74). Changes in entropy completely dominate the net thermal correction, reflecting the confinement that the molecules experience when the SAM forms. During these calculations, the geometry optimizations are always performed so as to minimize the binding energy only. Results from PBE-D3/PAW calculations also justify this approach, as they have been performed optimizing both this binding energy and Δ*G*/*A*, but no significant geometrical differences were found.

The PBE-D3/PAW calculations were performed using the VASP package (74, 75) to optimize the molecular geometries using 12–15 commensurate lattices with lattice vectors similar to the QM/MM optimized lattices. Periodic imaging of the dispersion interactions is not performed in the direction normal to the SAM surfaces, and not performed at all for “isolated” molecule calculations. All calculations are performed at the Γ-point of the Brillouin zone using the default precision (plane wave cuttoff 300 eV); this basis set is sufficiently close to the complete-basis-set limit so that BSSE is not manifest in these calculations. Results are reported for only the commensurate lattice that produces the lowest value of Δ*G*/*A*. The correct orientation of the alkyl chains with respect to the HOPG lattice is not preserved in this process and as a result of this and the use of discrete lattice points, calculated free energies are likely to be too positive by up to 2–3 kcal mol^{−1}. STM images are simulated using the Tersoff−Hamann approximation (64) at a bias voltage of 0.8 V.

The PW91-D3/USPP calculations were similarly formed using VASP, whereas the PBE-DRSLL/DZP calculations were performed using SIESTA (76). In both cases, only the VASP/PAW-D3 optimized incommensurate lattice was considered. For the PW91 density functional, the parameters in the D3 functional form (19) with Becke−Johnson damping (20) were determined to be s6 = 1, s8 = 1.9598, a1 = 0.6319, and a2 = 4.5718 by fitting data for 130 noncovalent energies (see *SI Appendix*, *Derivation of PW91-D3 Parameters*), but the fit was unusually poor, indicative of the poor quality of the PW91 functional. Dispersion-corrected PBE is thought to be always preferable to dispersion-corrected PW91.

For the PBE-D3/PAW, PBE-DRSLL/DZP, and PW91-D3/USPP calculations, QM/MM values of the thermal corrections are added to the native calculated binding energies. Although, in principle, such terms can be evaluated directly (3), the size of the system treatable in this fashion remains small, and the use of more approximate methods for thermochemical analysis is standard chemical practice (1, 2).

Solvation energies are evaluated at geometries calculated by each computational method. However, the variations in these quantities as a function of the associated geometry changes is found to be small, too small to be apparent in Fig. 3. In most cases, geometries are optimized using the surface adsorbate energy and forces only, as is often a common practice. Solvation can affect geometry, however (77), and we find that this applies to SAMs with tight internal cavities like the identified structure for C_{12}P M (Fig. 3). Only for this structural motif were the calculations performed alternatively, optimizing the energy in the presence of the solvent. Note that an implementation of this type of optimization (without a solvent-excluded-surface option) has just become available in VASP (29).

To obtain the net contribution of solvation energy to the free energy of SAM formation, individual energies are evaluated for the adsorbate molecules in free solution, for the raw HOPG surface, and for the SAMs. This process fully includes the effects of wetting/dewetting surfaces and dangling chains and compliments the entropic contributions previously included within the thermal correction.

Each solvation energy is determined using an interpolating function. The parameters in this function were determined by calculating the dispersion-dominated solvation energies of gas-phase molecules by Gaussian using a dielectric constant of 2.5, solvent density of toluene, and a refractive index of 1.49 to mimic all three solvents of interest: 1-phenyloctane, 1-octanoic acid, and *n-*tetradecane. In this process, the method of Floris et al. (27) is used to depict the dispersive contributions to the solvation energy, a method in widespread current use that is reported to be a useful component in calculations that predict thermochemical properties of small molecules to chemical accuracy, *ca*. 2 kcal mol^{−1} (78). This approach expresses the dispersion contribution to the solvation free energy as being proportional to the exposed molecular surface area (27, 29), and so we express the change in solvation energy on SAM formation as

where Δ*E*_{s,SAM}, Δ*E*_{s,P}, and Δ*E*_{s,HOPG} are the solvation energies of the SAM, the porphyrin in solution, and a solvated bare HOPG surface, respectively, and *A*_{SAM}, *A*_{P}, and *A*_{HOPG} are the corresponding exposed surface areas. The proportionality constant *α* = 0.0866 kcal mol^{−1} Å^{−2} is fitted to B3LYP/6-31G* calculated data for the optimized structures of C_{11}P and C_{19}P. All exposed surface areas are evaluated using the solvent-excluded surface obtained using van der Waals radii of 1.55 Å, 1.7 Å, and 1.2 Å for N, C, and H, respectively, and the solvent “radius” is taken to be 2.2 Å. This procedure ensures that only the SAM surface area is included next to which one of the alkyl chains from the 1-phenyloctane, 1-octanoic acid, or *n*-tetradecane solvent molecules can be placed, eliminating any small internal cavities from consideration. These parameters depict the exposed surface area for HOPG as being *ca*. 3.3% larger than the substrate lattice surface area *A*, owing to the atomic structure of HOPG.

## Acknowledgments

We thank the Australian Research Council for funding this research (Grants LP0455238, DP12010259, and DE140100550) and National Computational Infrastructure and INTERSECT for provision of computing resources. J.A.A.W.E. thanks the Council for the Chemical Sciences of the Netherlands Organization for Scientific Research for a Vidi Grant (700.58.423), and the European Research Council (ERC) for an ERC Starting Grant (NANOCAT-259064), and the Ministry of Education, Culture and Science (Gravity Program 024.001.035).

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: reimers{at}shu.edu.cn, noel.hush{at}sydney.edu.au, jeffrey.reimers{at}uts.edu.au, or maxwell.crossley{at}sydney.edu.au.

Author contributions: J.R.R., J.A.A.W.E., N.S.H., and M.J.C. designed research; J.R.R., D.P., J.V., Y.C., C.T., L.G., M.J.F., M.S., T.-J.S., M.J.J.C., and B.L.M.H. performed research; and J.R.R. wrote the paper.

Reviewers: M.L., Technische Universität München; T.J.M., Stanford University; and H.F.S., University of Georgia.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1516984112/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- ↵.
- Reuter K,
- Scheffler M

_{2}(110) as a function of oxygen pressure. Phys Rev B 65(3):035406 - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Mammen M,
- Shakhnovich EI,
- Whitesides GM

- ↵.
- Mammen M,
- Shakhnovich EI,
- Deutch JM,
- Whitesides GM

- ↵
- ↵
- ↵
- ↵.
- Meier C, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Sure R,
- Antony J,
- Grimme S

- ↵
- ↵.
- Frisch MJ, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Kresse G,
- Joubert D

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Hentschke R,
- Schuermann BL,
- Rabe JP

- ↵
- ↵
- ↵
- ↵
- ↵.
- Dapprich S,
- Komaromi I,
- Byun KS,
- Morokuma K,
- Frisch MJ

- ↵.
- Ochterski JW

- ↵
- ↵.
- Kresse G,
- Furthmüller J

- ↵.
- Soler JM, et al.

*N*materials simulation. J Phys Condens Matter 14(11):2745–2780 - ↵
- ↵