VMD+and+PDB+Editing

=Using VMD and AWK to edit and reorder PDB files=


 * Problem:** You've generated or obtained a PDB of a system containing many copies of a solute particle -- now you're interested in changing some of these solute particles into others. Making matters worse, these molecules are made of 17, 20, and 23 (this is the one in your PDB now) atoms. We'll call these systems A, B, and C respectively. I want to generate a TCL script which in conjunction with a shell script will read in a fraction of A, B, and C and then randomly reassign some C molecules into the desired fraction of A and B. We're lucky that our current PDB file is populated by the biggest of these molecules, else this process is made significantly more difficult! Let's see one path to resolve this issue (NOTE: may not be the best, most efficient, etc. -- but it works).

Let's begin by glancing over the TCL script I managed to produce (I am by no means an expert with TCL). For this example, X is an atom common to all molecules C. The variable i is currently assigned a value XXYYZZ which is sed in using the shell script! In my expression to set the value of e, the +687 refers to the number of solvent and counterions before residue with resname C begin. Shift this value correspondingly. The if-then test here checks if the string has been assigned to molecule B yet (there is a separate and similar file for swapping in A molecules). In setting the variable k, I don't think this works as intended, but the X2* here refers to an atom name (e.g., O24 or Br25). Renaming the atoms I wanted to exclude using the variable k to DUD and then excluding all atoms with name DUD at the end does the trick though! I remove atoms and rename the tail to be identical to that of molecule C (because it is made of the same atoms and it makes for a nicer looking .itp file for Gromacs). Variable d computes a rounded integer to decide the number of swaps and is placed as the target value in the while loop -- so this script iterates until d unique structures have been mutated from C to B. As I said above, a similar TCL script is used to mutate C to A (you just hack off more atoms!) and using the C->B and C->A scripts you can create 2-body and 3-body composition coordinates. Of course, once you print this PDB file and look at it -- oops! the structures are all out of order and Gromacs won't even begin to fiddle with it. So let's take a look at a renumbering script using AWK. code format="tcl" set a [atomselect top "resname C and type X"] set b [$a get {resid}] set c [llength $b] set i XXYYZZ set d [expr {round($c*$i)}] set j 0 while {$j < $d} { set e [expr {round(rand*$c)+687}] set f [atomselect top "resid $e"] set g [$f get {resname}] if {$g=="C C C C C C C C C C C C C C C C C C C C C C C"} then { $f set resname "B B B B B B B B B B B B B B B B B B B B B B B" set k [atomselect top "all not (resid $e and (name X24 or (name X25 or (name X26))))"] set r12 [atomselect top "resid $e and name X12"] set r13 [atomselect top "resid $e and name X13"] set r14 [atomselect top "resid $e and name X14"] set r15 [atomselect top "resid $e and name X15"] set r16 [atomselect top "resid $e and name X16"] set r17 [atomselect top "resid $e and name X17"] set r18 [atomselect top "resid $e and name X18"] set r19 [atomselect top "resid $e and name X19"] set r20 [atomselect top "resid $e and name X20"] set r21 [atomselect top "resid $e and name X21"] set r22 [atomselect top "resid $e and name X22"] set r23 [atomselect top "resid $e and name X23"] set r24 [atomselect top "resid $e and name X24"] set r25 [atomselect top "resid $e and name X25"] set r26 [atomselect top "resid $e and name X26"] $r12 set name "X15" $r13 set name "X16" $r14 set name "X17" $r15 set name "X18" $r16 set name "X19" $r17 set name "X20" $r18 set name "X21" $r19 set name "X22" $r20 set name "X23" $r21 set name "X24" $r22 set name "X25" $r23 set name "X26" $r24 set name "DUD" $r25 set name "DUD" $r26 set name "DUD" incr j } } set name [format "B.pdb"] set sel [atomselect top "all not name DUD"] $sel writepdb $name

exit;

code To renumber the PDB files, I use a simple loop in a shell script to pull out molecules B and A to separate text files, then append them to a new temporary PDB which lacks an END card (this is appended by the AWK script). The AWK script does not insert the header, so I sed it in at the end. Run this and Gromacs will adore you for your ability to satisfy its silly requirements! code format="bash" for i in *.pdb; do x=${i%.pdb}; file=`echo ${x##*/}` header=`head -n1 $file.pdb` egrep -v "B|A" $file.pdb | sed '/END/d' > all.txt egrep "B|A" $file.pdb > tmp.txt; egrep "B" tmp.txt > B.txt; egrep "A" tmp.txt > A.txt; rm tmp.txt cat all.txt B.txt A.txt > $file.renumb; rm all.txt B.txt A.txt awk -f renumberPDB.awk $file.renumb | sed "1i\ $header" > $file.pdb # note, this overwrites (I know it works for me, you should check yours first)! rm $file.renumb done code code format="bash"
 * 1) File to renumber (atom number and residue number) a PDB file
 * 2) having only ONE chain (a single TER atom entry) to make the numbers continuous
 * 3) Usage:
 * 4)  awk -f
 * 5) Created by:
 * 6)  Sandeep Somani
 * 7) Last updated:
 * 8)   01 Sept, 2004 -- added the END section   <-- Guess who didn't write this code but totally scavenged it!
 * 1) Last updated:
 * 2)   01 Sept, 2004 -- added the END section   <-- Guess who didn't write this code but totally scavenged it!
 * 1)   01 Sept, 2004 -- added the END section   <-- Guess who didn't write this code but totally scavenged it!

BEGIN { resid=0 i=0 }


 * 1) print the headers
 * 2) !/^ATOM/ && !/^TER/ {
 * 3) print $0
 * }

/^ATOM/ || /^END/{ i=i+1 if (i==1) {      resid = resid + 1 resid_old = substr($0,23,4) }    else {      resid_curr =  substr($0,23,4) if (resid_old != resid_curr) {         resid = resid + 1 resid_old = resid_curr }    }
 * 1) process the atoms

first = substr($0,1,6) second= substr($0,12,11) last = substr($0,27,54)

printf("%s%5d%s%4d%s\n",first,atomno+i,second,resid,last) }

END { printf("END\n") }

code

TPP 06/10/2013