bioshell.bioinformatics package

The bioinformatics package of the BioShell suite provides several staple utulity programs that allow manipulate with the most common data formats such as FASTA or PDB. Some of the programs are capable of performing most typical calculations. More advanced tasks can be solved by calling numerous subroutines gathered in the software library. This page will review quickly the programs and provide a brief summary of the library modules.

If you want to see the package in action, browse the tutorials posted on BioShell's blog! You may also want to browse BioShell Cookbook.

I. The programs

To invoke any of the executables (e.g. Strc), try:

java apps.Strc -help

(to see teh full manual page) or just java

java apps.Strc -h

to glimpse on the available command line flags. To read more about command line option system, have look on this blog post.

ACorr

ACorr is a tool for computing various correlations and autocorrelations on time-series. The program reads a file with data observations in a flat format: several columns and many rows. By default it calculates an autocorrelation function for a given column [by default for the first one] for time steps in the range of [0,t_max], where t_max can be specified by option -acorr.tmax. It is also possible to calculate crosscorrelation between two columns or autocorretation of vectors

BBQ

BBQ reconstructs protein backbone atoms based on CA coordinates

Clust

Clust program finds groups of similar objects (cluters) by means of hierarchical clustering algorithm

Hist

Hist calculates a 1D od 2D histogram from one-dimensionalor two-dimensional dataApplication reads a file with multiple columns. Then prepares a histogramfrom datafound in a specified column.

PRAline

Using PRAline (Profile Alignment) tool one can calculate optimaland sub-optimal alignments between two amino acid sequences or sequence profiles. In general, with PRAline one can:

- align two amino acid sequences: use one of the following flags: [-qp -qf -qs] and one of these: [-tp -tf -ts] to provide query and template sequences, respectively

- align two amino acid profiles: use -qb and -tb options to provide input profiles

- align two amino acid sequences (or profiles) with secondary structure information In this case secondary structure must be provided by SEQ files (-qs and -ts options).

PsiBlastSearch

PsiBlastSearch - runs PsiBlast program. It is possible to scan a whole range of PsiBlast input prameters. The jobs may be distributed among several threads.

PsiBlastAnalyse

PsiBlastAnalyse - parser and filter for PsiBlast results. The program is intended to parse the results from PriBlastSearch tool, although it can read in any ouput fromBlast or PsiBlast program, providing that is in format "0" (i.e. -m 0 was used to run Blast). This program can read in and combine several blast outfiles. Results may be filtered in several different ways.

RmsCalc

RmsCalc is an extremely flexible program for crmsd and drmsd calculations. It can compare a reference structure against one ore more target structures.It is also possible to point RmsCalc to a directory of PDB files. File names may be filtered by a regular expression. The program can calculate crmsd, drmsd, GDT, TM_score and MaxSub scores. These parameters may be evaluated solely on C-alpha atoms, protein backbone or on all atoms that are common for the two structures being compared. It is also possible to compare based on a sequence alignment.

Seqc

Seqc is a tool for manipulating protein sequences.Seqc reads files in formats: PDB, FASTA, DSSP, SEQ, Blast (sequence profile),and PsiPred (secondary structure prediction output). The informationmay be written in one of the following formats: FASTA and SEQ

StrcStrc is a tool for manipulating protein structures. The program can read a protein structure from PDB or DSSP files. It is also possible to combine XYZ coordinates with an amino acid sequence from a FASTA or a SEQ file.Program can also change protein representation (e.g. from all-atom to Rosetta, CABS or Refin models).
StrCalc

StrCalc - simple calculations on protein structures. The program can calculate various distances and angles. It is also possibleto change protein representation into a reduced model basing on a config fileprovided by a user.

Trac

Trac is a tool for manipulating files in TRA and PDB formats, containing multiple structures obtained during a simulation. In the case of PDB format, separate structures (frames) are stored as separate MODEL entries (see PDB file format specification for details).

WHAM

WHAM program reads a file with multiple columns and computes the density of states.

StatPhys

StatPhys calculates various thermodynamic properties as a function of temperature from a density of states and measured observables

II. The library

As any other Java library, bioshell.bioinformatics has a tree-like structure that stems from jbcl (Java BioComputing Library). Each branch collects modules (classes) of some particular functionality. The whole list is given belobioshell.bioinformatics library mindmapw.

jbcl The root of a hierarchy of Java BioComputating Library (i.e. jbcl)
jbcl.algorithms Some standard algorithms
jbcl.algorithms.graphs Graph data structure and algorithms
jbcl.algorithms.patterns Base classes implementing some of the programming patterns used in BioShell
jbcl.algorithms.trees Tree data structures and algorithms
jbcl.calc The package contains modules for various calculations
jbcl.calc.alignment A general package for calculating alignments
jbcl.calc.alignment.scoring Methods for scoring sequences and profiles
jbcl.calc.bbq Classes related to the BBQ program (see also apps.BBQ)
jbcl.calc.clustering Provides classes related to a data clustering
jbcl.calc.enm Classes related to Elastic Network Model
jbcl.calc.numeric Provides necessary numerical methods
jbcl.calc.numeric.algebra Staple algebra: matrix operations, SVD, QR, eigenvalues, etc
jbcl.calc.numeric.functions Provides various functions as objects
jbcl.calc.numeric.minimization Delivers tools for numeric minimization of functions
jbcl.calc.rotamers Provides utilities for side chain rebuilding and assessment
jbcl.calc.statistics Provides various statistics tools
jbcl.calc.statistics.kernels Kernel functions for kernel estimators
jbcl.calc.statphys Provides means for describing a canonical system on the grounds of statistical physics
jbcl.calc.structural.properties Provides classes that measures various structural properties
jbcl.calc.structural.transformations Several basic transformations, e.g. from global to local coordinates, spherical to Cartesian, etc
jbcl.chemistry Deals with chemical entities, e.g. provides bond definitions, molecule-type operations etc
jbcl.data Provides classes for handling and managing various data types and file formats
jbcl.data.basic Provides basic generic data types such as tuples and arrays
jbcl.data.dict Provides dictionaries with various molecular and chemical properties for amino acids and proteins
jbcl.data.formats These classes provides I/O operations for file formats common in bioinformatics
jbcl.data.formats.alignments I/O methods for sequence alignments
jbcl.data.types Provides all the BioShell related data types
jbcl.data.types.selectors Provides means for selecting atoms, residues, chains etc
jbcl.external Provides classes that interacts with external programs such as PsiBlast
jbcl.external.blast Utilities for running PsiBlast and for parsing its results
jbcl.external.gnuplot Simple gnuplot wrapper
jbcl.graphics Simple utility to plot 3D data as a heat map
jbcl.util Utility classes that do not fit to other packages but are useful for BioShell
jbcl.util.exceptions Several exceptions that may be thrown by jbcl classes
jbcl.util.options The system of command-line options used by BioShell programs and jbcl scripts

Realizacja: