bioshell.bioinformatics cookbook

This page provides a plenty of handy one-liners that solve most staple bioinformatics problems, such as crmsd calculation, sequence and structure handling, basic calculations and many others. All these commands uses only BioShell programs, jython therefore is not required.
Data conversion - sequence related

• To extract a FASTA sequence from a PDB file (result will be printed on the screen):
java apps.Seqc -ip=2gb1.pdb -of
Data conversion - secondary structure

• To convert a DSSP file into FASTA style string, that contains secondary structure in HEC code:
java apps.Seqc -id=2gb1.dssp -of -seqc.show_ss
• To convert a DSSP file into SS2 file format:
java apps.Seqc -in.dssp=2gb1.dssp -out.ss2=2gb1.dssp.ss2
The SS2 file format consist of three columns that provide probability for each secondary structure type (H, E or C) at each sequence position. In the case of DSSP the secondary structure is defined by the structure, so the probability values are either 1.0 or 0.0. The long option -in.dssp can be abbreviated as -id
• To count residues by secondary structure: H, E or C (Helix, Extended or Coil):
java apps.Seqc -id=2gb1.dssp -of -seqc.show_ss | tail -1 | sed "s/./&`echo -e 'nr'`/g" | sort | uniq -c
Superpositions and other crmsd - related calculations

Recipes that compares two structures, superimpose them and compute distances and scores
• To calculate crmsd distance between two structures having the same number of residues; the value will be computed on alpha-carbons only :
java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb -rms
• To calculate crmsd, drmsd, GDT and LCS distances (or scores) between two protein structures having the same number of residues; the value will be computed on alpha-carbons only :
java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb
• To superimpose one protein structure (-qp ) on the other (-tp ) based on alpha-carbons; the transformation (rotation+translation) is based on C-alpha but all the atoms from the query structure will be transformed:
java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb -op
• To calculate crmsd distance between some models in a subdirectory and a reference structure; when the file mask is omitted, all the files from the subdirectory are used for calculations. Skip -rms flag to compute crmsd, drmsd, GDT, TMscore and LSC rather than just crmsd:
java apps.RmsCalc -align.query.pdbdir=./models/ -in.pdb.file_mask=2*.pdb -tp=2gb1.pdb -rms
• To calculate pairwise crmsd bewteen all models in a directory; note that in this case there is no reference structure provided :
java apps.RmsCalc -align.query.pdbdir=./models/ -calc.crmsd.all_pairs
Computing structural properties of proteins: Phi, Psi and chi angles, distances, etc

• To compute Phi, Psi dihedrals (results in radians, by default):
java apps.StrCalc -ip=2gb1.pdb -strcalc.phi_psi
• To compute Phi, Psi dihedrals (results in degrees, by default):
java apps.StrCalc -ip=2gb1.pdb -strcalc.phi_psi -strcalc.use_degrees
• Create data points for a Ramachandran map for Lysine (an similarily, any other amino acid); the command assumes that there is a subdirectory called ./pdb/, full of input PDB files:
java apps.StrCalc -in.pdb.dir=./pdb/ -strcalc.phi_psi -strcalc.use_degrees | grep LYS | awk '{print $4,$5}'Computing structural properties of proteins: distance and contact maps

• Create a distance map based on Cα atoms for a given protein structure. The last column of the output provides distance in Angstroms between any two Cα atoms in the provided structures:
java apps.StrCalc -ip=2gb1.pdb -strcalc.strcalc.distmap.ca
• Create a distance map based on all heavy atoms for a given protein structure. The last column of the output provides minimum distance between two given residues (in Angstroms):
java apps.StrCalc -ip=2gb1.pdb -strcalc.strcalc.distmap.minres
• Create a contact map based for a given protein structure, where contact is defined by any two heavy atoms closer than 4.5 Å to each other:
java apps.StrCalc -ip=2gb1.pdb -strcalc.strcalc.distmap.minres | awk '{if($3<4.5) print $0}'
