Department of Chemistry, Faculty of Science, University of Kurdistan

Analyzing AMBER files after MD simulations

On this page, I explain performing some analyses using the CPPTRAJ module [1] of the AMBER program. We need topology and trajectory files for most analyses. The tleap module of the program generates topology files. The trajectory file can be generated by MD simulations, which are done by the sander or pmemd modules of the AMBER program.

1. Cutting a few frames of an AMBER trajectory file

To cut the first 1000 frames of an AMBER trajectory file (input.crd) and write them in a new trajectory file (output.crd) you can execute the following command. In the following, prmtop in the name of the corresponding topology file. To cut other frames, only change ‘1-1000’ to your desired range of frames.

cpptraj <<EOF

parm prmtop

trajin input.crd

trajout output.crd onlyframes 1-1000

run

quit

EOF

2. RMSF

The following reads a topology (prmtop) and a trajectory (mdcrd5) file and generates the RMSF.out file containing RMSF values of each residue, neglecting the hydrogen atoms.

cpptraj <<EOF

parm prmtop

trajin mdcrd5

rms first

average crdset MyAvg

run

rms ref MyAvg

atomicfluct out RMSF.out !@H= byres

EOF

The following calculates the RMSF of CA atoms of each residue.

cpptraj <<EOF

parm prmtop

trajin mdcrd5

rms first

average crdset MyAvg

run

rms ref MyAvg

atomicfluct out RMSF-CA.out @CA byres

EOF

The following calculates the RMSF of the backbone atoms of each residue except the backbone hydrogens.

cpptraj <<EOF

parm prmtop

trajin mdcrd5

rms first

average crdset MyAvg

run

rms ref MyAvg

atomicfluct out RMSF-backbone.out @CA,C,O,N byres

EOF

3. B-factor

B-factor and RMSF are related by RMSF=((3/8π²)B-factor)^1/2. The following calculates B-factor values.

cpptraj <<EOF

parm prmtop

trajin mdcrd5

rms first

average crdset MyAvg

run

rms ref MyAvg

atomicfluct out B-factor-backbone.out @CA,C,O,N byres bfactor

EOF

4. RMSD

The following performs mass-weighted RMSD analysis and the starting structure as the reference.

# generate a PDB file (md0.pdb) from the topology (prmtop) and starting coordinate (prmcrd) files as the reference of the RMSD analysis.

ambpdb -p prmtop -c prmcrd >md0.pdb

# RMSD analysis bu cpptraj

cpptraj <<EOF

#read the topology file

parm prmtop

#read the trajectory file

trajin mdcrd5

# read the referece file

reference md0.pdb

#Protein

rms 0 :1-252&!@H= reference out Protein.agr mass

#Studied residues, Chain A

rms 1 :115&!@H= reference out 115-ASP-A.agr mass

#Studied residues, Chain B

rms 2 :241&!@H= reference out 115-ASP-B.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain A

rms 3 :19-20,114-121&!@H= reference out Ne115-ASP-A.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain B

rms 4 :145-146,240-247&!@H= reference out Ne115-ASP-B.agr mass

#Studied residues, Chain A

rms 5 :56&!@H= reference out 56-GLU-A.agr mass

#Studied residues, Chain B

rms 6 :182&!@H= reference out 56-GLU-B.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain A

rms 7 :5-8,41,55-57,74&!@H= reference out Ne56-GLU-A.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain B

rms 8 :131-134,167,181-183,200&!@H= reference out Ne56-GLU-B.agr mass

#Studied residues, Chain A

rms 9 :122&!@H= reference out 122-GLU-A.agr mass

#Studied residues, Chain B

rms 10 :248&!@H= reference out 122-GLU-B.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain A

rms 11 :5,74-77,111-112,121-123&!@H= reference out Ne122-GLU-A.agr mass

#Studied residues and all residues with at least one atom within 3.5 Å, Chain B

rms 12 :131,200-203,237-238,247-249&!@H= reference out Ne122-GLU-B.agr mass

rms reference

run

writedata Protein.dat 0

writedata 115-ASP-A.dat 1

writedata 115-ASP-B.dat 2

writedata Ne115-ASP-A.dat 3

writedata Ne115-ASP-B.dat 4

writedata 56-GLU-A.dat 5

writedata 56-GLU-B.dat 6

writedata Ne56-GLU-A.dat 7

writedata Ne56-GLU-B.dat 8

writedata 122-GLU-A.dat 9

writedata 122-GLU-B.dat 10

writedata Ne122-GLU-A.dat 11

writedata Ne122-GLU-B.dat 12

quit

EOF

\rm *.agr

######################################################################

#remove the first line of the RMSD files

sed -i -e "1d" Protein.dat

sed -i -e "1d" 115-ASP-A.dat

sed -i -e "1d" 115-ASP-B.dat

sed -i -e "1d" Ne115-ASP-A.dat

sed -i -e "1d" Ne115-ASP-B.dat

sed -i -e "1d" 56-GLU-A.dat

sed -i -e "1d" 56-GLU-B.dat

sed -i -e "1d" Ne56-GLU-A.dat

sed -i -e "1d" Ne56-GLU-B.dat

sed -i -e "1d" 122-GLU-A.dat

sed -i -e "1d" 122-GLU-B.dat

sed -i -e "1d" Ne122-GLU-A.dat

sed -i -e "1d" Ne122-GLU-B.dat

#cut the RMSD column of each file

cut -c13-21 Protein.dat > 0

cut -c13-21 115-ASP-A.dat > 1

cut -c13-21 115-ASP-B.dat > 2

cut -c13-21 Ne115-ASP-A.dat > 3

cut -c13-21 Ne115-ASP-B.dat > 4

cut -c13-21 56-GLU-A.dat > 5

cut -c13-21 56-GLU-B.dat > 6

cut -c13-21 Ne56-GLU-A.dat > 7

cut -c13-21 Ne56-GLU-B.dat > 8

cut -c13-21 122-GLU-A.dat > 9

cut -c13-21 122-GLU-B.dat > 10

cut -c13-21 Ne122-GLU-A.dat > 11

cut -c13-21 Ne122-GLU-B.dat > 12

#paste the columns in a single file

echo 'Frame GLN-39 GLN-39-backbone GLN-39-side HID-141 HID-141-backbone HID-141-side ASN-186 ASN-186-backbone ASN-186-side GLN-187 GLN-187-backbone GLN-187-side ARG-259 ARG-259-backbone ARG-259-side TYR-330 TYR-330-backbone TYR-330-side GLU-409 GLU-409-backbone GLU-409-side GLU-464 GLU-464-backbone GLU-464-side CGT-503 Protein Protein-backbone Protein-side' >Result

paste 1 2 3 4 5 6 7 8 9 10 11 12 0 > Result

\rm 1 2 3 4 5 6 7 8 9 10 11 12 0 *.dat

5. Radius of gyration

A protein's radius of gyration (Rg) describes its general size. It indicates global changes in the protein's shape. To perform Rg analysis, first, create a PDB file from the initial structure as the reference.

ambpdb -p prmtop -c prmcrd > md0.pdb

Then execute something like the following.

cpptraj <<EOF

parm prmtop

trajin mdcrd5

reference md0.pdb

# calculate only the mass-weighted radius of gyration (not the maximum) of the non-hydrogen atoms for all heavy atoms of residues 1-1181

radgyr :1-1181&!@H= out RoG-complex.dat mass nomax

run

quit

EOF

6. Extract specific snapshots from a trajectory file and save them as individual PDB files

cpptraj.MPI <<EOF

parm prmtop

trajin mdcrd.last400ns

#trajout mdcrd.last400ns-wrap.00001 onlyframes 1 restart

trajout mdcrd.last400ns-wrap.04000 onlyframes 4000 restart

trajout mdcrd.last400ns-wrap.08000 onlyframes 8000 restart

trajout mdcrd.last400ns-wrap.12000 onlyframes 12000 restart

trajout mdcrd.last400ns-wrap.16000 onlyframes 16000 restart

trajout mdcrd.last400ns-wrap.20000 onlyframes 20000 restart

trajout mdcrd.last400ns-wrap.24000 onlyframes 24000 restart

trajout mdcrd.last400ns-wrap.28000 onlyframes 28000 restart

trajout mdcrd.last400ns-wrap.32000 onlyframes 32000 restart

trajout mdcrd.last400ns-wrap.36000 onlyframes 36000 restart

trajout mdcrd.last400ns-wrap.40000 onlyframes 40000 restart

# determine the central residue of your protein

image origin center familiar com :27

quit

EOF

# convert the created trajectro files to PDBs

ambpdb -p prmtop -c mdcrd.last400ns-wrap.04000 > 04000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.08000 > 08000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.12000 > 12000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.16000 > 16000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.20000 > 20000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.24000 > 24000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.28000 > 28000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.32000 > 32000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.36000 > 36000.pdb

ambpdb -p prmtop -c mdcrd.last400ns-wrap.40000 > 40000.pdb

7. Hydrogen bonds

To analyze hydrogen bonds from a trajectory file, you can use the CPPTRAJ module, which can find and track hydrogen bonds throughout a trajectory. Here we focus on the analysis of intra-molecular hydrogen bonds. For a thorough tutorial visit the following page.

https://amberhub.chpc.utah.edu/hydrogen-bond-analysis-within-a-protein/

Here we analyze hydrogen bonds between a ligand and protein residues. The residue ID and residue number of our ligand are PIO and 263, respectively. We will use the hbond command and will use a geometric consideration (distance of 3.0 Å and an angle cutoff of 135°) to consider whether a hydrogen bond is formed.

cpptraj <<EOF > anal.log

parm prmtop

trajin mdcrd5 1 last 1

# Analyse hdrogen bonds between residues 1-263

hbond contacts :1-263 avgout avg.dat series uuseries hbond.gnu nointramol dist 3.0 angle 135.0

lifetime contacts[solutehb] out contacts-lifetime.dat

quit

EOF

Use grep to extract the hydrogen bonds involving your desired residue from the output file.

grep "PIO" avg.dat > PIO.dat

8. Principal Component Analysis

Principal Component Analysis (PCA) is a widely utilized technique for analyzing conformational dynamics of proteins. It offers valuable insight into the essential motions and structural variations present in protein systems. By reducing the dimensionality of complex MD trajectories, PCA enables the identification of the most significant collective motions that govern the conformational landscape of a protein [2]. The following are example scripts that we use in our works to calculate histograms and projection of PC1 on PC2

8.1 PC Projection

cpptraj.cuda <<EOF

parm prmtop

trajin mdcrd5

rms first :1-274&!@H=

average crdset AVG

run

rms ref AVG :1-274&!@H=

matrix covar name MyMatrix :1-274&!@H=

createcrd CRD1

run

runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs

crdaction CRD1 projection evecs MyEvecs :1-274&!@H= out project.dat beg 1 end 2

quit

EOF

8.2 Histogram

cpptraj.MPI -p prmtop -i cPCA_FEL.ptraj

Contents of the cPCA_FEL.ptraj file:

#read trajectory

trajin mdcrd5

#get coords into memory

createcrd ALIGNED

run

#create avg structure

crdaction ALIGNED average avg.pdb

#align

parm avg.pdb

reference avg.pdb parm avg.pdb

crdaction ALIGNED rms reference :1-274

#covariance matrix

crdaction ALIGNED matrix covar name COVAR :1-274

#write eigenvectors

runanalysis diagmatrix COVAR out covar.dat vecs 30 name EVECTORS

#project to trajectory

crdaction ALIGNED projection PROJ modes covar.dat beg 1 end 3 out projection.dat :1-274

#1D histograms (open with xmgrace)

hist PROJ:1 free 310 bins 50 out hists.agr name PC1

hist PROJ:2 free 310 bins 50 out hists.agr name PC2

hist PROJ:3 free 310 bins 50 out hists.agr name PC3

#2D histograms (open with gnuplot)

hist PROJ:1 PROJ:2 bins 50 out hists_1-2.gnu name PC12 free 300

hist PROJ:1 PROJ:2 bins 50 out hists_1-3.gnu name PC13 free 300

hist PROJ:1 PROJ:2 bins 50 out hists_2-3.gnu name PC23 free 300

9. References

[1] D.R. Roe, T.E. Cheatham, PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data, J. Chem. Theory Comput. 9 (2013) 3084–3095. https://doi.org/10.1021/ct400341p.

[2] C.C. David, D.J. Jacobs, Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins, in: D.R. Livesay (Ed.), Humana Press, Totowa, NJ, 2014: pp. 193–226. https://doi.org/10.1007/978-1-62703-658-0_11.