Department of Chemistry, Faculty of Science, University of Kurdistan |
Analyzing AMBER files after MD simulations On this page, I explain performing some analyses using the CPPTRAJ module [1] of the AMBER program. We need topology and trajectory files for most analyses. The tleap module of the program generates topology files. The trajectory file can be generated by MD simulations, which are done by the sander or pmemd modules of the AMBER program. 1. Cutting a few frames of an AMBER trajectory fileTo cut the first 1000 frames of an AMBER trajectory file (input.crd) and write them in a new trajectory file (output.crd) you can execute the following command. In the following, prmtop in the name of the corresponding topology file. To cut other frames, only change ‘1-1000’ to your desired range of frames. cpptraj <<EOF parm prmtop trajin input.crd trajout output.crd onlyframes 1-1000 run quit EOF
2. RMSFThe following reads a topology (prmtop) and a trajectory (mdcrd5) file and generates the RMSF.out file containing RMSF values of each residue, neglecting the hydrogen atoms.
cpptraj <<EOF parm prmtop trajin mdcrd5 rms first average crdset MyAvg run rms ref MyAvg atomicfluct out RMSF.out !@H= byres EOF
The following calculates the RMSF of CA atoms of each residue. cpptraj <<EOF parm prmtop trajin mdcrd5 rms first average crdset MyAvg run rms ref MyAvg atomicfluct out RMSF-CA.out @CA byres EOF
The following calculates the RMSF of the backbone atoms of each residue except the backbone hydrogens.
cpptraj <<EOF parm prmtop trajin mdcrd5 rms first average crdset MyAvg run rms ref MyAvg atomicfluct out RMSF-backbone.out @CA,C,O,N byres EOF
3. B-factorB-factor and RMSF are related by RMSF=((3/8π2)B-factor)1/2. The following calculates B-factor values. cpptraj <<EOF parm prmtop trajin mdcrd5 rms first average crdset MyAvg run rms ref MyAvg atomicfluct out B-factor-backbone.out @CA,C,O,N byres bfactor EOF
4. RMSDThe following performs mass-weighted RMSD analysis and the starting structure as the reference.
# generate a PDB file (md0.pdb) from the topology (prmtop) and starting coordinate (prmcrd) files as the reference of the RMSD analysis. ambpdb -p prmtop -c prmcrd >md0.pdb # RMSD analysis bu cpptraj cpptraj <<EOF #read the topology file parm prmtop #read the trajectory file trajin mdcrd5 # read the referece file reference md0.pdb #Protein rms 0 :1-252&!@H= reference out Protein.agr mass #Studied residues, Chain A rms 1 :115&!@H= reference out 115-ASP-A.agr mass #Studied residues, Chain B rms 2 :241&!@H= reference out 115-ASP-B.agr mass #Studied residues and all residues with at least one atom within 3.5 Å, Chain A rms 3 :19-20,114-121&!@H= reference out Ne115-ASP-A.agr mass # #Studied residues and all residues with at least one atom within 3.5 Å, Chain B rms 4 :145-146,240-247&!@H= reference out Ne115-ASP-B.agr mass # #Studied residues, Chain A rms 5 :56&!@H= reference out 56-GLU-A.agr mass #Studied residues, Chain B rms 6 :182&!@H= reference out 56-GLU-B.agr mass #Studied residues and all residues with at least one atom within 3.5 Å, Chain A rms 7 :5-8,41,55-57,74&!@H= reference out Ne56-GLU-A.agr mass # #Studied residues and all residues with at least one atom within 3.5 Å, Chain B rms 8 :131-134,167,181-183,200&!@H= reference out Ne56-GLU-B.agr mass # #Studied residues, Chain A rms 9 :122&!@H= reference out 122-GLU-A.agr mass #Studied residues, Chain B rms 10 :248&!@H= reference out 122-GLU-B.agr mass #Studied residues and all residues with at least one atom within 3.5 Å, Chain A rms 11 :5,74-77,111-112,121-123&!@H= reference out Ne122-GLU-A.agr mass # #Studied residues and all residues with at least one atom within 3.5 Å, Chain B rms 12 :131,200-203,237-238,247-249&!@H= reference out Ne122-GLU-B.agr mass # rms reference run writedata Protein.dat 0 writedata 115-ASP-A.dat 1 # writedata 115-ASP-B.dat 2 # writedata Ne115-ASP-A.dat 3 # writedata Ne115-ASP-B.dat 4 # writedata 56-GLU-A.dat 5 writedata 56-GLU-B.dat 6 writedata Ne56-GLU-A.dat 7 writedata Ne56-GLU-B.dat 8 writedata 122-GLU-A.dat 9 writedata 122-GLU-B.dat 10 writedata Ne122-GLU-A.dat 11 writedata Ne122-GLU-B.dat 12 quit EOF \rm *.agr ###################################################################### #remove the first line of the RMSD files sed -i -e "1d" Protein.dat sed -i -e "1d" 115-ASP-A.dat sed -i -e "1d" 115-ASP-B.dat sed -i -e "1d" Ne115-ASP-A.dat sed -i -e "1d" Ne115-ASP-B.dat sed -i -e "1d" 56-GLU-A.dat sed -i -e "1d" 56-GLU-B.dat sed -i -e "1d" Ne56-GLU-A.dat sed -i -e "1d" Ne56-GLU-B.dat sed -i -e "1d" 122-GLU-A.dat sed -i -e "1d" 122-GLU-B.dat sed -i -e "1d" Ne122-GLU-A.dat sed -i -e "1d" Ne122-GLU-B.dat #cut the RMSD column of each file cut -c13-21 Protein.dat > 0 cut -c13-21 115-ASP-A.dat > 1 cut -c13-21 115-ASP-B.dat > 2 cut -c13-21 Ne115-ASP-A.dat > 3 cut -c13-21 Ne115-ASP-B.dat > 4 cut -c13-21 56-GLU-A.dat > 5 cut -c13-21 56-GLU-B.dat > 6 cut -c13-21 Ne56-GLU-A.dat > 7 cut -c13-21 Ne56-GLU-B.dat > 8 cut -c13-21 122-GLU-A.dat > 9 cut -c13-21 122-GLU-B.dat > 10 cut -c13-21 Ne122-GLU-A.dat > 11 cut -c13-21 Ne122-GLU-B.dat > 12
#paste the columns in a single file
echo 'Frame GLN-39 GLN-39-backbone GLN-39-side HID-141 HID-141-backbone HID-141-side ASN-186 ASN-186-backbone ASN-186-side GLN-187 GLN-187-backbone GLN-187-side ARG-259 ARG-259-backbone ARG-259-side TYR-330 TYR-330-backbone TYR-330-side GLU-409 GLU-409-backbone GLU-409-side GLU-464 GLU-464-backbone GLU-464-side CGT-503 Protein Protein-backbone Protein-side' >Result paste 1 2 3 4 5 6 7 8 9 10 11 12 0 > Result
\rm 1 2 3 4 5 6 7 8 9 10 11 12 0 *.dat
5. Radius of gyrationA protein's radius of gyration (Rg) describes its general size. It indicates global changes in the protein's shape. To perform Rg analysis, first, create a PDB file from the initial structure as the reference. ambpdb -p prmtop -c prmcrd > md0.pdb
Then execute something like the following. cpptraj <<EOF parm prmtop trajin mdcrd5 reference md0.pdb # calculate only the mass-weighted radius of gyration (not the maximum) of the non-hydrogen atoms for all heavy atoms of residues 1-1181 radgyr :1-1181&!@H= out RoG-complex.dat mass nomax run quit EOF
6. Extract specific snapshots from a trajectory file and save them as individual PDB files
cpptraj.MPI <<EOF parm prmtop trajin mdcrd.last400ns #trajout mdcrd.last400ns-wrap.00001 onlyframes 1 restart trajout mdcrd.last400ns-wrap.04000 onlyframes 4000 restart trajout mdcrd.last400ns-wrap.08000 onlyframes 8000 restart trajout mdcrd.last400ns-wrap.12000 onlyframes 12000 restart trajout mdcrd.last400ns-wrap.16000 onlyframes 16000 restart trajout mdcrd.last400ns-wrap.20000 onlyframes 20000 restart trajout mdcrd.last400ns-wrap.24000 onlyframes 24000 restart trajout mdcrd.last400ns-wrap.28000 onlyframes 28000 restart trajout mdcrd.last400ns-wrap.32000 onlyframes 32000 restart trajout mdcrd.last400ns-wrap.36000 onlyframes 36000 restart trajout mdcrd.last400ns-wrap.40000 onlyframes 40000 restart # determine the central residue of your protein image origin center familiar com :27 go quit EOF
# convert the created trajectro files to PDBs ambpdb -p prmtop -c mdcrd.last400ns-wrap.04000 > 04000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.08000 > 08000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.12000 > 12000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.16000 > 16000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.20000 > 20000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.24000 > 24000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.28000 > 28000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.32000 > 32000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.36000 > 36000.pdb ambpdb -p prmtop -c mdcrd.last400ns-wrap.40000 > 40000.pdb
7. Hydrogen bondsTo analyze hydrogen bonds from a trajectory file, you can use the CPPTRAJ module, which can find and track hydrogen bonds throughout a trajectory. Here we focus on the analysis of intra-molecular hydrogen bonds. For a thorough tutorial visit the following page. https://amberhub.chpc.utah.edu/hydrogen-bond-analysis-within-a-protein/ Here we analyze hydrogen bonds between a ligand and protein residues. The residue ID and residue number of our ligand are PIO and 263, respectively. We will use the hbond command and will use a geometric consideration (distance of 3.0 Å and an angle cutoff of 135°) to consider whether a hydrogen bond is formed.
cpptraj <<EOF > anal.log parm prmtop trajin mdcrd5 1 last 1 # Analyse hdrogen bonds between residues 1-263 hbond contacts :1-263 avgout avg.dat series uuseries hbond.gnu nointramol dist 3.0 angle 135.0 go lifetime contacts[solutehb] out contacts-lifetime.dat go quit EOF
Use grep to extract the hydrogen bonds involving your desired residue from the output file. grep "PIO" avg.dat > PIO.dat
8. Principal Component AnalysisPrincipal Component Analysis (PCA) is a widely utilized technique for analyzing conformational dynamics of proteins. It offers valuable insight into the essential motions and structural variations present in protein systems. By reducing the dimensionality of complex MD trajectories, PCA enables the identification of the most significant collective motions that govern the conformational landscape of a protein [2]. The following are example scripts that we use in our works to calculate histograms and projection of PC1 on PC2 8.1 PC Projection
cpptraj.cuda <<EOF parm prmtop trajin mdcrd5 rms first :1-274&!@H= average crdset AVG run rms ref AVG :1-274&!@H= matrix covar name MyMatrix :1-274&!@H= createcrd CRD1 run runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs crdaction CRD1 projection evecs MyEvecs :1-274&!@H= out project.dat beg 1 end 2 quit EOF
8.2 Histogram
cpptraj.MPI -p prmtop -i cPCA_FEL.ptraj
Contents of the cPCA_FEL.ptraj file: #read trajectory trajin mdcrd5 #get coords into memory createcrd ALIGNED run #create avg structure crdaction ALIGNED average avg.pdb #align parm avg.pdb reference avg.pdb parm avg.pdb crdaction ALIGNED rms reference :1-274 #covariance matrix crdaction ALIGNED matrix covar name COVAR :1-274 #write eigenvectors runanalysis diagmatrix COVAR out covar.dat vecs 30 name EVECTORS #project to trajectory crdaction ALIGNED projection PROJ modes covar.dat beg 1 end 3 out projection.dat :1-274
#1D histograms (open with xmgrace) hist PROJ:1 free 310 bins 50 out hists.agr name PC1 hist PROJ:2 free 310 bins 50 out hists.agr name PC2 hist PROJ:3 free 310 bins 50 out hists.agr name PC3 #2D histograms (open with gnuplot) hist PROJ:1 PROJ:2 bins 50 out hists_1-2.gnu name PC12 free 300 hist PROJ:1 PROJ:2 bins 50 out hists_1-3.gnu name PC13 free 300 hist PROJ:1 PROJ:2 bins 50 out hists_2-3.gnu name PC23 free 300
9. References[1] D.R. Roe, T.E. Cheatham, PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data, J. Chem. Theory Comput. 9 (2013) 3084–3095. https://doi.org/10.1021/ct400341p.
[2] C.C. David, D.J. Jacobs, Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins, in: D.R. Livesay (Ed.), Humana Press, Totowa, NJ, 2014: pp. 193–226. https://doi.org/10.1007/978-1-62703-658-0_11.
|
Mehdi Irani Teaching duties Methods |