bioshell.bioinformatics package
The bioinformatics package of the BioShell suite provides several staple utulity programs that allow manipulate with the most common data formats such as FASTA or PDB. Some of the programs are capable of performing most typical calculations. More advanced tasks can be solved by calling numerous subroutines gathered in the software library. This page will review quickly the programs and provide a brief summary of the library modules.
If you want to see the package in action, browse the tutorials posted on BioShell's blog! You may also want to browse BioShell Cookbook.
I. The programs
To invoke any of the executables (e.g. Strc), try:
(to see teh full manual page) or just java
to glimpse on the available command line flags. To read more about command line option system, have look on this blog post.
| ACorr | ACorr is a tool for computing various correlations and autocorrelations on time-series. The program reads a file with data observations in a flat format: several columns and many rows. By default it calculates an autocorrelation function for a given column [by default for the first one] for time steps in the range of [0,t_max], where t_max can be specified by option -acorr.tmax. It is also possible to calculate crosscorrelation between two columns or autocorretation of vectors |
| BBQ | BBQ reconstructs protein backbone atoms based on CA coordinates |
| Clust | Clust program finds groups of similar objects (cluters) by means of hierarchical clustering algorithm |
| Hist | Hist calculates a 1D od 2D histogram from one-dimensionalor two-dimensional dataApplication reads a file with multiple columns. Then prepares a histogramfrom datafound in a specified column. |
| PRAline | Using PRAline (Profile Alignment) tool one can calculate optimaland sub-optimal alignments between two amino acid sequences or sequence profiles. In general, with PRAline one can: - align two amino acid sequences: use one of the following flags: [-qp -qf -qs] and one of these: [-tp -tf -ts] to provide query and template sequences, respectively - align two amino acid profiles: use -qb and -tb options to provide input profiles - align two amino acid sequences (or profiles) with secondary structure information In this case secondary structure must be provided by SEQ files (-qs and -ts options). |
| PsiBlastSearch | PsiBlastSearch - runs PsiBlast program. It is possible to scan a whole range of PsiBlast input prameters. The jobs may be distributed among several threads. |
| PsiBlastAnalyse | PsiBlastAnalyse - parser and filter for PsiBlast results. The program is intended to parse the results from PriBlastSearch tool, although it can read in any ouput fromBlast or PsiBlast program, providing that is in format "0" (i.e. -m 0 was used to run Blast). This program can read in and combine several blast outfiles. Results may be filtered in several different ways. |
| RmsCalc | RmsCalc is an extremely flexible program for crmsd and drmsd calculations. It can compare a reference structure against one ore more target structures.It is also possible to point RmsCalc to a directory of PDB files. File names may be filtered by a regular expression. The program can calculate crmsd, drmsd, GDT, TM_score and MaxSub scores. These parameters may be evaluated solely on C-alpha atoms, protein backbone or on all atoms that are common for the two structures being compared. It is also possible to compare based on a sequence alignment. |
| Seqc | Seqc is a tool for manipulating protein sequences.Seqc reads files in formats: PDB, FASTA, DSSP, SEQ, Blast (sequence profile),and PsiPred (secondary structure prediction output). The informationmay be written in one of the following formats: FASTA and SEQ |
| Strc | Strc is a tool for manipulating protein structures. The program can read a protein structure from PDB or DSSP files. It is also possible to combine XYZ coordinates with an amino acid sequence from a FASTA or a SEQ file.Program can also change protein representation (e.g. from all-atom to Rosetta, CABS or Refin models). |
| StrCalc | StrCalc - simple calculations on protein structures. The program can calculate various distances and angles. It is also possibleto change protein representation into a reduced model basing on a config fileprovided by a user. |
| Trac | Trac is a tool for manipulating files in TRA and PDB formats, containing multiple structures obtained during a simulation. In the case of PDB format, separate structures (frames) are stored as separate MODEL entries (see PDB file format specification for details). |
| WHAM | WHAM program reads a file with multiple columns and computes the density of states. |
| StatPhys | StatPhys calculates various thermodynamic properties as a function of temperature from a density of states and measured observables |
II. The library
As any other Java library, bioshell.bioinformatics has a tree-like structure that stems from jbcl (Java BioComputing Library). Each branch collects modules (classes) of some particular functionality. The whole list is given belo
w.
| jbcl | The root of a hierarchy of Java BioComputating Library (i.e. jbcl) |
| jbcl.algorithms | Some standard algorithms |
| jbcl.algorithms.graphs | Graph data structure and algorithms |
| jbcl.algorithms.patterns | Base classes implementing some of the programming patterns used in BioShell |
| jbcl.algorithms.trees | Tree data structures and algorithms |
| jbcl.calc | The package contains modules for various calculations |
| jbcl.calc.alignment | A general package for calculating alignments |
| jbcl.calc.alignment.scoring | Methods for scoring sequences and profiles |
| jbcl.calc.bbq | Classes related to the BBQ program (see also apps.BBQ) |
| jbcl.calc.clustering | Provides classes related to a data clustering |
| jbcl.calc.enm | Classes related to Elastic Network Model |
| jbcl.calc.numeric | Provides necessary numerical methods |
| jbcl.calc.numeric.algebra | Staple algebra: matrix operations, SVD, QR, eigenvalues, etc |
| jbcl.calc.numeric.functions | Provides various functions as objects |
| jbcl.calc.numeric.minimization | Delivers tools for numeric minimization of functions |
| jbcl.calc.rotamers | Provides utilities for side chain rebuilding and assessment |
| jbcl.calc.statistics | Provides various statistics tools |
| jbcl.calc.statistics.kernels | Kernel functions for kernel estimators |
| jbcl.calc.statphys | Provides means for describing a canonical system on the grounds of statistical physics |
| jbcl.calc.structural.properties | Provides classes that measures various structural properties |
| jbcl.calc.structural.transformations | Several basic transformations, e.g. from global to local coordinates, spherical to Cartesian, etc |
| jbcl.chemistry | Deals with chemical entities, e.g. provides bond definitions, molecule-type operations etc |
| jbcl.data | Provides classes for handling and managing various data types and file formats |
| jbcl.data.basic | Provides basic generic data types such as tuples and arrays |
| jbcl.data.dict | Provides dictionaries with various molecular and chemical properties for amino acids and proteins |
| jbcl.data.formats | These classes provides I/O operations for file formats common in bioinformatics |
| jbcl.data.formats.alignments | I/O methods for sequence alignments |
| jbcl.data.types | Provides all the BioShell related data types |
| jbcl.data.types.selectors | Provides means for selecting atoms, residues, chains etc |
| jbcl.external | Provides classes that interacts with external programs such as PsiBlast |
| jbcl.external.blast | Utilities for running PsiBlast and for parsing its results |
| jbcl.external.gnuplot | Simple gnuplot wrapper |
| jbcl.graphics | Simple utility to plot 3D data as a heat map |
| jbcl.util | Utility classes that do not fit to other packages but are useful for BioShell |
| jbcl.util.exceptions | Several exceptions that may be thrown by jbcl classes |
| jbcl.util.options | The system of command-line options used by BioShell programs and jbcl scripts |

