BioChemical matters: computational methods

Showing posts with label computational methods. Show all posts

Wednesday, July 8, 2015

Gamess (US) frequently asked questions. Part 7: How to distinguish alpha from beta orbitals in the $VEC deck

Each line in a $VEC group contains the coefficients of five basis functions for a given orbital. These are formatted in a special way, with seven numbers in each line. These numbers are:

1st) the number of the orbital to which the coefficients belong (written with at most two characters, so that 1 means orbital 1, .. , 99 means orbital 99, 00 means orbital 100) . This number is repeated in the beginning of every line, until all coefficients for that orbital have been written

2nd) this number tells the program how to assign the coefficients to the basis functions. "1" means that the coefficients are for basis functions 1-5, "2" means that the coefficients are for basis functions 5-10, etc. In general , that number "n" directs the program to assign the five coefficients present in the line to basis functions 5*(n-1)+1 to 5*n.

3rd to 7th) coefficients of five basis functions

BETA orbitals are punched as a group immediately after all ALPHA orbitals.

This format entails that in molecules with more than 100 orbitals the $VEC group contains several blocks with the same 1st number. For example, in a molecule with 200 orbitals, alpha orbital 27 is described by the first block of lines beginning with "27", and alpha orbital 127 is described by the SECOND block of lines beginning with "27".

I usually find the beginning of the BETA orbitals by repeating a search for the string " 1 1" : if that string is preceded by a block beginning with "00 1", it usually refers to orbitals 101, or 201, etc. (the exception being those systems with exactly 100, 200, etc. orbitals). If string " 1 1" is NOT preceded by a block beginning with "00 1", you are sure to have found the beginnning of the BETA orbitals

Thursday, March 19, 2015

My new preprint is up

As part of their undergraduate training, our students are required to write a short thesis. Usually, due to the paucity of research funding, their theses take the format of a literature review. A few years ago, however, I proposed a computational study to the student I had been assigned. Despite no previous acquaintace with the subject, she eagerly took the task and performed some computations on possible reaction mechanisms of the organomercurial lyase MerB. She only had the time to compute a few of the possible pathways and therefore, after she had written her thesis with the data she had managed to gather, I completed the analysis of the other pathways we had thought of at the time, and a few that we had not envisaged. Writing it as a paper took me much longer than I had anticipated, mostly because I kept postponing it due to the thrill of running computations on other enzymes and projects. I have now managed to finish it and submitted it to PeerJ, where it is undergoing review. I have made it available as a Preprint, and would be thankful for any comments about it.

Addendum: the paper has been published

Monday, September 8, 2014

Making good on my "Open Access" pledge

My most recent paper has just been published in PeerJ . It was a LONG time in the making, to the point that my 12-yo daughter once told me (only half-in-jest), that I should "cut my losses and forget about it". I am quite happy about how it turned out: besides describing an analysis of a reaction mechanism and the influence of the redox state of a hard-to-converge Fe-S cluster , it also contains the first computations including the weighed contributions of 1.2*10¹³ protonations states of a protein on the reaction it catalyzes. The computational approach described here is relatively simple to perform provided that one has a good estimate of the relative abundances of those protonation states, which can be obtained through Monte Carlo sampling once the site-site interactions have been computed with a Poisson-Boltzmann solver. To my mind, this is clearly superior to the usual approach of considering only the "most likely" protonation state (which may often not be the state with the most significant influence on the electrostatic field surrounding the active site). What do you think of it?

Programs needed to use this approach:
MCRP, by Baptista et al., ITQB, Lisbon
MEAD, by Don Bashford, currently at St. Jude Children's research hospital
Any molecular mechanics code, to compute the change of the total electrostatic energy as each individual amino acid is protonated/deprotonated

Thursday, April 24, 2014

Gamess (US) frequently asked questions part 6: Obtaining proper SCF convergence (Anti-)ferromagnetic coupled Fe-S clusters

Obtaining SCF convergence of FeS clusters is a very demanding task.

The problem in FeS clusters is the arrangement of spins on the Fe atoms: if you have a cluster with 4 Fe atoms, each of them with 5 up-spins, and a total spin of zero, the arrangement of spins on the atoms could be

Fe1 and Fe2 up-spin, Fe3 and Fe4 down-spin; or
Fe1 and Fe4 up-spin, Fe2 and Fe3 down-spin; or
Fe1 and Fe3 up-spin, Fe2 and Fe4 down-spin;

The problem is compounded if you have a mixture of Fe²⁺ and Fe³⁺, which may lead to 12 (or more) different spin arrangements, depending on the number of Fe²⁺ atoms. However, if you have a good guess SCF for one instance instance, you may simply substitute the coordinates of Fe2 with those of Fe4 to get a comparably good guess for the second instance, and so forth... This is the approach suggested by Greco, Fantucci, Ryde, de Gioia (2011) Int. J. Quantum Chem. 111, 3949-3960. Obtaining the guess for one of the instances is in itself quite difficult, and I usually follow the approach outlined by Szilagyi, R. K. and Winslow, M. A. (2006) J. Comput. Chem., 27: 1385–1397 .

It goes like this:

- obtain orbitals for bare Fe²⁺, Fe³⁺, S^2-, and isolated ligands, with proper spins on the Fe atoms (5/2 for Fe³⁺, 2 for Fe²⁺)

- Manually split the "alpha/up" and "beta/down" portions of the resulting $VEC groups. For example, assuming you have a system with three Fe atoms (two Fe²⁺ and one Fe³⁺) with total spin S=5/2 and the $VEC groups for bare Fe²⁺ and bare Fe³⁺, you should cut the $VEC groups of Fe²⁺ and Fe³⁺ as:

$VEC for the alpha (up) electrons of Fe²⁺   (let's call it "Fe2+_5_d_electrons")
$VEC for the alpha (up) electrons of Fe³⁺   (let's call it "Fe3+_5_d_electrons")
$VEC for the beta (down) electrons of Fe²⁺   (let's call it "Fe2+_1_d_electron")
$VEC for the beta (down) electrons of Fe³⁺   (let's call it "Fe3+_0_d_electrons")

The total spin S=5/2 in this sample problem implies that both Fe²⁺ atoms spins should annull each other, i.e., one Fe²⁺ is mostly "up" and the other is mostly "down". Building the new guess for the "up" electrons should therefore include:

"Fe2+_5_d_electrons" for one of the Fe²⁺ ions,
"Fe2+_1_d_electrons" for the other Fe²⁺,
"Fe3+_5_d_electrons" for the Fe³⁺

Building the new guess for the "down" electrons should include:
"Fe2+_1_d_electrons" for the FIRST Fe²⁺ ions,
"Fe2+_5_d_electrons" for the other Fe²⁺,
"Fe3+_0_d_electrons" for the Fe³⁺

- combine the orbitals using the small utility called combo, which you may obtain from Alex Granovsky's Firefly website.

- Manually paste the "alpha" and "beta" guesses into a single $vec group, which would be the proper guess.

- cross all your fingers and toes, and expect it to converge into the proper state. If it does not converge, change convergers (SOSCF=.T. DIIS=.F.), onset of SOSCF (SOGTOL=1e-3) , etc.

- after SCF optimization using this guess, manually scramble the ordering of Fe atoms in your input, to ascertain whether a lower energy solution can be obtained with a different spin distribution.

Good Luck!

Friday, July 5, 2013

Gamess (US) frequently asked questions Part 5: "THE VIBRATIONAL ANALYSIS IS NOT VALID"

Gamess (US) and Firefly by default assume geometric convergence has been achieved when the maximum gradient is below 1e-4 and the RMS gradient is smaller than 1/3 of the maximum gradient. This convergence criterion may be changed by the user with

$STATPT OPTTOL=<your desired convergence criterion> $END

It is well known that the vibrational analysis is strictly valid mathematically when the Hessian is computed in true stationary points (i.e when the gradient is exactly equal to zero). If the maximum gradient is sufficiently close to zero, the vibrational analysis (although not absolutely correct) is still close enough to the "true" solution for all practical purposes.

This introduction brings us to today's FAQ. A recurring question in both the Gamess-US list and the Firefly forums concerns the message often printed by the program after a vibrational analysis:

*THIS IS NOT A STATIONARY POINT ON THE MOLECULAR PES THE VIBRATIONAL ANALYSIS IS NOT VALID*

This message arises from the way gradients are analyzed by Gamess: gradients are originally computed in one set of coordinates (cartesian coordinates, I believe) , and then transformed into the coordinate system specified by the user. Optimizations stop when the "transformed gradient" lies below OPTTOL, but Gamess uses the original, non-transformed, gradient to decide whether to consider the geometry as a stationary point on the molecular PES. Therefore, if the geometry is converged, the scary message in capital letters above may be safely disregarded. When in doubt, simply decrease your OPTTOL value, continue the optimization and re-compute the hessian.

Wednesday, June 19, 2013

Gamess (US) frequently asked questions Part 1: SCF convergence

In spite of the very high quality of the Gamess(US) documentation, the Gamess(US) list is very often flooded with requests from new users regarding the lack of convergence of the SCF procedure. A few words of advice:

When your SCF does not converge, you should re-run the job including a $guess guess=moread $end line, as well as the complete $VEC group present in the output PUNCH file (usually called <jobname>.dat, and present in you scratch directory).

Addendum:

Whenever you read a $VEC group from a UHF run you must assign NORB in the $GUESS group. An additional problem is that by default the $VEC group only includes the occupied orbitals, and this means that in UHF runs the $VEC group does not include equal numbers of alpha and beta orbitals (e.g., a run with 41 electrons and MULT=2) will have 21 alpha orbitals and 20 beta orbitals. Therefore, if you include

$guess guess=moread NORB=21 $end

Gamess will crash because there are not 21 beta orbitals, and if you input

$guess guess=moread NORB=20 $end

there will be another error, since there are more than 20 alpha orbitals. In these cases, you should check the number of alpha and beta orbitals. Then , copy the coefficients of the extra alpha orbitals to the end of the beta orbitals. In my example above

$guess guess=moread NORB=21 $end

will yield no problems, since the modification of the VEC group yields equal numbers of alpha and beta orbitals. There is also an option to PUNCH every orbital (occupied+virtuals) at every step. In this case, Gamess always punches a full $VEC group, making it very easy to assign NORB as one can simply inspect the output file to learn the number of orbitals. However, this yields gigantic PUNCH files, and may therefore not be feasible.

You should also experiment with changing convergers, damping, etc. Some systems are notoriously hard to converge, and may require several re-iterations of the whole process.

Thursday, March 15, 2012

QM/MM vs. QM-only studies of large cluster models

How large must a quantum model of an enzyme active site be to achieve optimum results? Proponents of the so-called "cluster model" argue that, most often, good results may be obtained even with small models (< 100 atoms). Fahmi Himo has repeatedly shown that fully including the first layer of aminoacids surrounding the reacting substrate (i.e. to about 150 atoms) yields results that are insensitive to the inclusion of a polarizable-continuum solvent field, and has concluded from these data that such models are sufficient to capture all the relevant enzymatic effexts on catalysis.

Walter Thiel has now published a QM/MM analysis of the reaction mechanism of acetylene hydratase (previously studied by Fahmi Himo using increasingly large QM-only models). Inclusion of the surrounding protein dramatically changed the results for the largest model studied by Himo, due to the absence (in the "cluster model") of two negatively charged phosphate groups adjacent to the active site. Although these charges are quite "shielded" from the active site because of neighbouring positively-charged amino acids, they originate local charge assymmetries that interact differently with the active site during each step of the catalytic cycle. This effect is quite similar to the major influence of the internal protein dipoles on enzyme catalysis expounded by Arieh Warshel, and should be kept in mind by all of us who tend to prefer the QM-only approach: a polarizable-continuum model assumes a homogeneous environment surrounding the QM system, and in proteins "it ain't necessarily so".

Tuesday, November 29, 2011

The limits of homology modeling

The computational prediction of three-dimensional structures of protein sequences may be performed using a wide variety of techniques, such as homology modeling or threading. In threading, the correct fold is searched for by evaluating the energy of the intended sequence when it is "forced" to adopt each of the known folding patterns. In homology modeling, one looks for a high-similarity protein sequence with experimentally-determined 3D structure, and mutates it in silico until the desired sequence is obtained. Many different programs and web-servers are now available for these tasks, differing among themselves in the forcefields used, alignment algorithms, etc. Performance is usually quite good when templates with similarity >40% are used.

Recently, two small proteins with very high homology (>95%) but widely differing structure have been designed and studied. Starting from a pair of proteins with < 20 % identity and different 3D structures, the authors gradually mutated one sequence into the other, and ended up generating two sequences differing only in one amino acid, but with different folds. Attempts to unravel the precise mechanisms governing the selection of one fold over the other have however been inconclusive, because current molecular dynamics protocols and force fields are not accurate enough to measure the small energy differences involved.

Monday, October 17, 2011

Limitations of PCM

A new paper claims to compute the pKa of nitrous acidium ion from gas phase DFT computations followed by estimation of solvation effects by a Polarizable Continuum Method (PCM). It is true that most often geometries do not change too much when going from gas phase to solution, but I strongly doubt the results are as accurate as they could be: PCM does not include the contribuiton from hydrogen bonds between the solute and the solvent, and I would expect that effect to be quite different in neutral HONO and protonated H₂ONO⁺

Tuesday, September 20, 2011

QM molecular dynamics

In classical molecular dynamics simulations, we follow the evolution of a system of particles that interact with each other according to newtonian mechanics. The correct description of chemical bonds, angles and torsions in classical mechanics can only be achieved by introducing carefully parameterized expressions that represent the change in electronic energy upon stretching/compressing a bond, or bending an angle. These parameterized force fields (AMBER, CHARMM, GROMOS, YASARA, OPLS) allow the simulation of very large systems (>10000 atoms) for long simulation times (>20 ns) with an obvious drawback: the quality of the simulations is only as good as the quality of the parameterized expressions, and therefore one is limited to the simulation of specific classes of previously characterized molecules/functional groups. Simulating chemical reactions is generally not possible without special protocols (like thermodynamic integration).

Ab initio molecular simulations (e.g. Car-Parrinello MD) are much more expensive, and are generally limited to (at most) a few dozen atoms and <100 ps. Two papers from Prof. Shogo Sakai's group show that QM molecular simulations can be performed with considerable time-savings if the system is partitioned into several smaller systems. They have not yet developed the theory to the point where one can attempt bond-breaking, but theirs seems a fruitful approach to the problem.

Thursday, July 14, 2011

Fe-S clusters

Biological Fe-S clusters come in many sizes and flavours:

2Fe-2S clusters ligated by four cysteines

2Fe-2S clusters ligated by three cysteines and one aspartate

2Fe-2S clusters ligated by cysteines and histidines (the so-called Rieske clusters)

3Fe-4S clusters ligated by three cysteines

4Fe-4S clusters ligated by four cysteines

4Fe-4S clusters ligated by three cysteines and one aspartate

the hideously complex cluster present in hybrid cluster protein (also known as fuscoredoxin or "prismane protein")

the P-cluster in nitrogenase

etc., etc., etc.
The large number of electrons in Fe and the complexity of the possible couplings between spin states make the theoretical analysis of the electronic structures in Fe-S clusters quite difficult.
Takano et al. have recently published a paper on the differences between a Cys₃Asp ligated 4Fe-4S cluster and the "regular" (all Cys) 4Fe-4S cluster. The authors nicely analyze the influence of the Asp (and other) ligands on the electronic structure of the 4Fe-4S cluster, observe a -0.10 V difference in redox potential (vs. normal 4Fe-4S) in high dielectric constants, and offer this observation as the reason for the low potential of this cluster.
I do not accept this last conclusion for two reasons:

redox potentials of Cys-ligated 4Fe-4S clusters may differ by >0.4 V from each other, which shows that the influence of the charge distribution of the protein is much more important than the small difference observed by the authors

the 0.1 V difference found amounts to ca. 2.3 kcal/mol, which is well within the error range of the computational methods used.

Thursday, July 1, 2010

Computing redox potentials

First-principles computations of redox potentials in solution is a difficult task due to the large number of solvent molecules that must be included. As the computational cost increases steeply with the number of basis functions, a common approach consists of performing a geometry optimization of the reduced and oxidized species in vacuo, and then computing the energy of these species with a larger basis set and a continuum method that represents the influence of the solvent on the solute electron distribution. Besides the error introduced by assuming that the geometry does not change upon solvation, this approach includes two main sources of errors:
a) the intrinsic error of the theoretical level used to compute the electronic energies
b) the error associated with the continuum method itself.

Whereas the first error may be rigorously quantified by comparison with experimental gas phase values and made very small with the choice of an appropriate basis set/theory level combination , most continuum methods yield less predictable errors (especially when the redox-active portion of the solute is present in a very heterogenous environment, like an enzyme active site).

Dejun Si and Hui Li have now improved the continuum solvation methods by including the possibility of assigning different dielectric constants to different parts of the solute cavity surface, thus improving the description of heterogeneous environments. These authors have also shown this approach to correctly predict the relative redox potentials of the type I copper centers (optimized in vacuo) in eleven different proteins with maximum errors < 0.1 V (provided that the systems include approximately 100 protein atoms around the Cu Center). The error can be minimized to < 0.05 V by optimizing the geometries using the newly-developed heterogenuous polarizable continuum.

This new continuum method is implemented in the latest release of GAMESS, a free and very powerful quantum chemistry package available from Mark Gordon's group, at Iowa State University.

BioChemical matters