Department of Chemistry, Faculty of Science, University of Kurdistan

Protein Setup for Molecular Dynamics Simulations

Molecular Dynamics (MD) simulations are widely used in computational chemistry to study the structural and dynamical behavior of biomolecules. The technique has been extensively applied to enzymes, biological catalysts responsible for performing and accelerating chemical reactions in living organisms.

Three-dimensional structures of enzymes are available in the Protein Data Bank (PDB) (https://www.rcsb.org/) in .pdb format. However, raw PDB files are not directly suitable for simulations. They often contain incomplete residues, alternate conformations, and crystallographic artifacts that must be corrected before starting an MD simulation.

This tutorial describes a standardized setup protocol for preparing protein structures for MD simulations, using Aphid Myrosinase (PDB ID: 1WCG) as a case study. The procedure is optimized for the AMBER software package but can be adapted for other molecular mechanics programs.

For some setup steps, we use our PDBtoORCA toolkit. Be sure to download it from its GitHub repository (https://github.com/iranimehdi/pdbtoorca).

1. Preliminary Steps

1.1 Gather Structural Information

Before modification, inspect the PDB file to understand:

· Number of chains and completeness of the structure

· Presence of missing residues or atoms (see REMARK 465 and 470)

· Metal centers and Cys–Cys disulfide bonds

· Non-standard molecules (check HET and HETNAM lines)

· Experimental pH and crystallographic details

It is highly recommended to read the associated publication describing your PDB structure.

1.2 Setting Up the Working Directory

mkdir MD-1WCG

cd MD-1WCG

wget https://files.rcsb.org/download/1WCG.pdb

mv 1WCG.pdb 00-1WCG.pdb

Maintain sequential filenames (e.g., 00-, 01-, 02-) for each step to ensure reproducibility and clarity.

2. Structure Cleanup

2.1 Remove Unnecessary Lines

Keep only the coordinates and termination lines:

cat 00-1WCG.pdb | egrep "^ATOM|^HETATM|^TER" > 01-tidy_up.pdb

2.2 Remove Buffer Ions

Example: remove sulfate ions (SO₄²⁻)

sed '/SO4/d' 01-tidy_up.pdb > 02-no_buffer_ions.pdb

2.3 Remove Irrelevant Molecules

Example: remove glycerol molecules (GOL)

sed '/GOL/d' 02-no_buffer_ions.pdb > 03-no_GOL.pdb

2.4 (Optional) Remove Crystal Waters

Useful before docking or initial preparation. After docking, you can relocate the crystal water molecule, but remove those that are in short contact with the docked ligand.

sed '/HOH/d' 03-no_GOL.pdb > 04-no_HOH.pdb

3. Select a Representative Chain

If the structure contains multiple homologous chains, you may select one (e.g., chain A) to reduce computational cost.

This can be done manually or with the PDBtoORCA toolkit using the command:

pdbtoorca <<EOF

04-no_HOH.pdb

Chain

05-chainA.pdb

EOF

4. Handling Missing Residues or Atoms

Check REMARK 465/470 in the PDB file.
If side chains are incomplete, open the file in Swiss-PdbViewer, which automatically reconstructs missing atoms (highlighted in pink).
Save the repaired structure afterward. In our case (1WCG), no residues or atoms were missing.

Usually, it is not necessary to attempt to model missing residues, especially if they are located at the start or end of chains.

5. Managing Alternative Locations

Atoms or residues with alternative conformations have occupancy numbers < 1.00.

Keep the conformation with the highest occupancy and delete the others.

Automate this with: pdbtoorca occ

6. Checking for Short Contacts

Short contacts may arise from overlapping residues, water molecules, or alternate locations.

Identify them using: pdbtoorca shortcon

Remove or adjust problematic atoms accordingly.

7. Assigning Protonation States

Assigning correct protonation states is essential for accurate electrostatics and catalytic modeling.

Key principles:

· Charged residues should generally be located on the protein surface to maintain solubility and realistic electrostatic distribution.

Buried charges should be carefully examined. They are acceptable only if:

o participating in metal coordination,

o forming an ionic pair, or

o acting as part of a catalytic residue (e.g., nucleophile).

7.1 Calculate pKa Values Using PROPKA

Upload your PDB file to the PROPKA server to estimate residue pKa values and identify surface/buried charges.

7.2 Protonation States of Arginine (Arg) and Lysine (Lys)

Typically charged (pKa > 7).

· If pKa < 7, they may be buried.

o Check for ionic pairs (Asp or Glu) using RasMol, e.g.:

o restrict within(3.5, 26)

· If no ionic partner is present, neutralize Lys → LYN.

(No default parameter for neutral Arg in AMBER.)

7.3 Protonation States Glutamate (Glu) and Aspartate (Asp)

Usually charged (pKa < 7).

· If pKa > 7, they may be protonated (neutral):

o Asp → ASH

o Glu → GLH

· Always retain the charged form when:

o Coordinated to metal ions (e.g., Zn²⁺)

o Acting as a nucleophile in catalysis

o Example: Glu-374 in Aphid Myrosinase (nucleophile [1]) → GLU

o Example: Glu-167 (acid/base role) → GLH

⚠️ In AMBER, if you protonate GLU/ASP, the hydrogen is added to OE2/OD2 by default. To protonate OE1/OD1, swap their coordinates before tleap.

7.4 Protonation States of Histidine (His)

· pKa > 7: Protonated (HIP): particularly if they are solvent-exposed or located in buried regions near negatively charged residues such as aspartate or glutamate

· pKa < 7: Neutral (HID or HIE)

· Examine hydrogen-bonding interactions:

o ND1 near a backbone carbonyl → HID

o NE2 near a backbone nitrogen → HID

o NE2 near a backbone carbonyl → HIE

o ND1 near a backbone nitrogen → HIE

· If coordinated to metal (e.g., Zn²⁺ via NE2), protonate the other nitrogen:

o Example: His-54 in 1QIN [2]→ HID

⚠️ If there are too many negative charges in the protein, those His residues that are on the surface can be assigned as HIP.

8. Protonation States of Cysteine (Cys)

· Default: CYS (protonated)

· Metal coordination → CYM

· Disulfide bonds → CYX (check SSBOND lines in PDB)

9. Finalizing the Structure

After assigning all protonation states:

· Modify residue names manually (e.g., ASP→ASH, GLU→GLH).

· Or automate histidine adjustments:

o pdbtoorca his

This finalized structure now represents the biologically relevant state for parameterization and solvation in the AMBER software.

10. References

[1] S. Jafari, U. Ryde, M. Irani, QM/MM study of the catalytic reaction of aphid myrosinase, Int. J. Biol. Macromol. 262 (2024) 130089. https://doi.org/https://doi.org/10.1016/j.ijbiomac.2024.130089.

[2] S. Jafari, N. Kazemi, U. Ryde, M. Irani, Higher Flexibility of Glu-172 Explains the Unusual Stereospecificity of Glyoxalase I, Inorg. Chem. 57 (2018) 4944–4958. https://doi.org/10.1021/acs.inorgchem.7b03215.