This is a partial list of Softberry products. All programs available
for online testing are also available for licensing, even if they are not in
this list. Complete list of products is available upon request. We license our
products on subscription basis, with licensing fees payable annually. Typically,
we offer only two types of licenses: personal single-user and site license.
Fees for academic single-user licenses vary from several hundred to a few thousand
dollars per year. For instance, such license to EST_Map or Prot_Map is $1,000
per year, FGENESH - $1,440 for the first year and $960 annual renewals etc.
Site licenses usually cost 2.5x as much. Commercial licenses are obviously more
expensive. Please inquire for details. Most of our programs are available for
all major Unix platforms, including Linux and Mac OSX.
Gene Prediction Programs
Algorithm based on pattern recognition of different types of exons, promoters and polyA signals. Optimal combination of these features is then found by dynamic programming and a set of gene models is constructed along given sequence. One of the most popular gene finders available.
FGENES variant that outputs several alternative gene structures. Useful for predicting alternative splicing or providing several gene variants for subsequent experimental testing.
Prediction of multiple variants potential genes in Homo_sapiens genomic DNA.
A version of FGENESH program including noncanonical GC dinucleotide in donor splice sites is installed to use on-line.
This program is useful to analyze alternative gene structure, where non-standard splice sites are often found (see also FGENES-M program to predict alternative gene variants) and create a set of genes and proteins absent in standard gene prediction.
Hidden Markov Model (HMM)-based gene prediction program. The fastest and most
accurate gene predictor available - click here
for more details. Available with parameters for dozens of different genomes/taxonomic
To try FGENES, FGENES-M, FGENESM, FGENESH_GC or FGENESH, click here.
FGENESH variant that uses information on homologous existing proteins for more accurate gene assembly from predicted exons.
Add-on to FGENESH that uses information on homologous cDNA/EST for more accurate gene assembly from predicted exons.
Variant of FGENESH that uses two homologous genomic DNA sequences, such as human
and mouse, for more accurate gene prediction.
To try FGENESH+, FGENESH_C or FGENESH-2, click here.
Prot_Map: Mapping entire protein DB to a genome
Prot_Map is more accurate and hundreds times faster than GeneWise (for accuracy
comparison, click here).
It can be used as stand-alone gene prediction program or in conjunction with
FGENESH+ as a very accurate genefinder. It can also be used for finding pseudogenes.
Try it here.
FGENESH++C: The ultimate genome annotation tool
This set of programs produces fully automated genome annotations of quality
undistinguishable from manual genome annotation. It includes mapping of RefSeq
sequences and NR protein DB to a genome, two rounds of ab initio FGENESH
and two rounds of FGENESH+ evidence based gene prediction. FGENESH++C is not
available for online testing due to its heavy use of computer resources.
SPL and SPLM
Programs for splice site identification, SPLM includes finding non-standard
splice sites such as GC-donor sites or AT-AC sites.
To try SPL or SPLM, click here.
Find splice sites in genomic DNA.
To try FSPLICE, click here.
FGENESB: Bacterial operon and gene finding suite
FGENESB is the fastest and most accuratebacterial gene finder - click here
for more details. It is also the easiest to train on a new genome - all it takes
is ten minutes time and genomic sequence. FGENESB also maps maps tRNA and rRNA
genes, predits bacterial promoters using BPROM program and terminators using
BTERM, predicts operons based on protmoter and terminator locations, densities
of ORFs in a given genome and frequencies of different genes being linked to
each other, and annotates predicted genes based on homology with known proteins
from public databases and maps tRNA and rRNA genes.
To try genefinding part of FGENESB and see annotations of more than a dozen
bacterial genomes, click here.
Recognition of E.coli promoter and start of transcription. As a part of bacterial genome analysis suite of programs, and to enforce operon and gene prediction by FGENESB program, we introduce BPROM, bacterial promoter prediction program.
To try BPROM, click here.
Separating archea and bacterial genomes.
To try AbSplit, click here.
FGENESV: Gene finder for viral genomes
FGENESV can be used with generic parameters, suitable for smaller viral genomes,
or automatically trained on querry genome sequence within nimutes. It supports
alternative genetic codes that can sometimes be found in viruses.
To try FGENESV, click here.
FCLUST: Fast EST clustering and alignment program
FCLUST options include accounting for known mRNAs, repeat masking and many others.
An example of mouse EST clustering can be accessed here.
Promoter and functional Site Prediction Programs
Human promoter prediction. Algorithm predicts potential transcription start positions by linear discriminant function combining characteristics describing functional motifs and oligonucleotide composition of these sites.
These two programs for predicting PolII promoter regions use linear discriminant
function combining characteristics describing functional motifs and oligonucleotide
composition of transcription start sites. TSSW uses elements of older release
of Transfac® database and may require corresponding license, while TSSG
uses data of Dan Prestridge. TSSG is the most accurate promoter preediction
program available - click here
This variant of TSSW/TSSG is designed for predicting PolII promoter regions
in plants. It uses regulatory motif data from Softberry's plant RegSite database.
PromH takes into account, in addition to features realized in TSSW/TSSG, conservation
features of major promoter functiuonal components, such as transcription start
point, TATA-boxex and regulatory motifs, in pairs of ortologous sequences aligned
by Scan2 program. That results in additional 20% accuracy improvement, especially
pronounces on TATA+ promoters.
PolyA site prediction program. Uses linear discriminant functions combining
characteristics describing various contextual features of polyA sites.
NSITE is the program for analysis of regulatory regions and composition of their
functional motifs based on statistical estimation of expected number of a nucleotide
consensus pattern in a given sequence. It uses the datafile consisting of older
release of Transfac® sequences and a set additional functional motifs and
may require Transfac license.
This variant of NSITE is designed to work on plant sequences using functional
motifs from Softberry's own RegSite database.
NSITEM and NSITEM-PL
This version of NSITE can process multiple sequences nad find regulatory elements
(REs) that occur at least in one copy in X% or more of analyzed DNA sequences.
REs can be taken from different databases or defined by user.
Search for Conservative Motifs of Regulatory Elements (REs)from both Collection of thousands REs (of human and animals or plant species)
created by us and Collection of REs given by USER available in both of 2 aligned (in special FORMAT) homologous (orthologous) DNA sequences
(Max. Length - 100 000 nt).
To try FPROM, TSSG, TSSP, POLYAH, PromH, NSITE-PL, NSITEM_PL and NSITEH, click
The program searches for significant patterns in the set of sequences.
ScanWM-PL searches for functional motifs described by weight matrixes of plant regulatory sequences.
Weight matrixes used with the program are build for a subset of plant regulatory sequences from RegSite Database developed by Softberry Inc. using published data on transcription regulation of plant genes.
An assumption used in the program is that if a pattern found in a sequence has a weight greater than a cut-off value for a corresponding weight matrix, it can be expected that the pattern is a functional motif, and the sequence analysed possesses the pattern's function.
The program is intended to search for CpG islands in sequences.
To try PATTERN, ScanWM-PL and CpGfinder, click
Motif and promoter visualization.
To try Motif Explorer, click here.
Database Search, Genome Comparison and RNA/EST/oligo Mapping Programs
DBSCAN performs search of DNA sequence through database in a a BLAST-like manner, except that it can handle multimegabyte-sized sequences.
SCAN2 performs alignment of two multimegabyte sequences.
Both programs have Java viewer for easy visualization and parsing of results.
RNA_MAP is a fast algorithm to accurately map mRNA sequence
to genomic sequence taking into account splice sites flanking intron sequences.
Time to map mRNA 300 bp on 52 of unmasked MB Y chromosome is about 19 sec, for
7300 bp the time about 47 sec (checked both chains, one DEC Alpha processor
To try SCAN2, DBSCAN and RNA_MAP, click here.
EST_SMAP is for mapping the whole set to a chromosome sequence. For example, 11000 sequences of full mRNA from NCBI reference set are mapped to 52 MB of unmasked Y chromosome for about 18-25 min (depending on computer memory size).
OLIGO_MAP is designed to map a set of oligonucleotides used for microarray production. The program map 300000 oligos 25-30 bp long on 49 MB of unmasked Chromosome 22 for 8 min. Program is useful to check location of oligos and their uniqueness in genome.
FMAP is a program for very fast (less than a second) mapping
of a sequence on genome.
To try EST_MAP and FMAP, click here.
The program maps a short sequence to a specified genome and outputs the list of the chromosome regions where the sequence of interest is found at a certain degree of homology.
To try SMAP, click here.
Genomes Match: Fast alignment of two eukaryotic chromosomes
or bacterial genomes.
The program finds all possible significant alignments between genome fragments.
Alignment of two bacterial genomes takes 20-40 seconds on a typical one-processor
Genomes Match can be tried here.
MALI: Multiple alignment of nucleotide or protein sequence
and tree construction.
This powerful multiple alignment program includes an alignment editor, a tree
builder and a tree editor.
Try it here
(nucleotide sequences) or here
Softberry Genome Explorer is a powerful viewer of genomic sequences
with capability for sequence homology search (one second on human genome using
FMAP algorithm), simple pattern, feature ID or word search, retrieval of nucleotide
and amino acid sequences of features, expression data on individual genes and
Follow these links to see
human , mouse, rat
genomes in Genome Explorer.
ProtComp: A program for identification
of sub-cellular localization of eukaryotic proteins.
The program is based
on complex neural-network recognizers, which identify probability of he subcellular
localization in Nuclear, Plasma membrane, Extracellular, Cytoplasmic, Mitochondrial,
Chloroplast, Endoplasmic reticulum, Peroxisomal, Lysosomal or Golgi compartments.
It accuracy reaches 90% for major compartments.
ProtCompB combines several methods of protein localization prediction - Linear Discriminant Function-based prediction; direct comparison with bases of homologous proteins of known localization; comparisons of pentamer distributions calculated for query and DB sequences; prediction of certain functional peptide sequences, such as signal peptides and transmembrane segments. It means that the program treats correctly complete sequences only, containing signal sequences, anchors, and other functional peptides, if any.
The program predicts putative cytotoxic T lymphocyte (CTL) epitopes in protein sequences. These polypeptides are known as potential candidates for vaccine design.
The sequence length for predicted epitopes is 9.
Program for identification of significant Prosite patterns.
You can try ProtComp, ProtCompB, CTL-epitope-Finder and Psite here.
Protein Sequence and Structure Analysis Programs
SSENVID: Protein secondary structure
and environment assignment from atomic coordinates.
SSENVID recognizes secondary
structural elements in proteins from their atomic coordinates. It performs same
tasks as DSSP by Kabsch and Sander (1983) or STRIDE by Frishman & Argos (1995),
analyzing both hydrogen bonds and mainchain dihedral angles, as well as some
probabilistic measures. SSENVID also computes accessible surface area, polarity
and environment classes as defined by Bowie, Luthy, Eisenberg (1991).
You can try SSENVID here.
GETATOMS: Computing side chain conformations
by simulated annealing with frozen main chain atoms.
GETATOMS is a program
of modeling atomic coordinates of a protein with unknown 3D structure. It uses
main chain coordinates from 3D structure of similar protein, which sequence
is aligned with a query protein. It also has an option to provide coordinates
You can try GETATOMS here.
3D-comp: Superposition of two PDB 3D-structures
using alignment of their sequences.
You can try 3D-comp here.
SoftPM: a suite of protein analysis programs that includes
programs for secodary structure prediction, fold recognition, molecular mechanics
and molecular dynamics modeling. The suite consists of seven programs, two of
which being SSENVID and SSPAL described above, and powerful Java-based visualization
tool with ability to correlate structure and sequence information.
Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments.
To try NNSSP, click here.
Program predicts ordered and disordered regions in protein sequences. Minimum required sequence length is 40.
To try PDISORDER, click here.
The program performs the first step in locating of disulphide briges in proteins: prediction of SS-bonding states of cysteins.
To try CYS_REC, click here.
The Program MDynSB is designed to perform multiple tasks with protein structure:
1) Optimization of a protein structure via MD simulation in an implicit water solvent.
2) Optimization and folding of a protein via (the user defined) simulated annealing protocol in an implicit water solvent.
3) Optimization of a predefined protein loops while non-loop parts of the protein molecule is kept fixed in the course of the loop optimization
To try MdynSB, click here.
Energy minimization program by molecular mechanic. In the current version of the program, the PDB file with coordinates of atoms in a protein in the input data. The coordinates may be retrieved from the file or PDB database. For computation, indicate the chain identifier, given in the PDB file.
To try Hmod3dMM, click here.
The program is intended for calculating 3D structure of proteins, provided that 3D structures of individual parts (fragments) of the protein are known, while phi and psi angles between the fragments should be found. This problem may arise when constructing a protein structure from fragments, whose structures were obtained using the search for homology of their primary sequences.
To try AbIni3D, click here.
SelTag: A Tool for Analysis of Expression Data.
SelTag is one of the most elegant tools for analysis of expression
data. It can analyze all or marked groups of genes or tissues, select tissue-specific
genes based on complex criteria, provide visual representation of expression
data, identify genes correlatively expressed in a given set of tissues, select
disease-specific genes with particular characteristics, such as receptors or
secreted proteins. Available in Windows and cross-platform Java version, which
can be tried here.
First Human-Mouse-Rat Synteny Alignment Based on Draft Genomic Sequences
Softberry has developed human-mouse-rat synteny server which
provides information about 18915 human genes mapped to mouse genome draft and
18464 mouse genes mapped to human genome draft, among them 14504 ortologous
gene pairs. This is the most comprehensive data about homology between human,
mouse and rat genomic regions. Compared to NCBI homology maps, Softberry map
contains significantly more genes and is directly linked to genomic sequences.
The data were generated by Softberry programs for gene prediction, EST/RNA mapping
and gemomic sequence comparison - some of them can be tested at this site.
You can view human-mouse-rat synteny here.
RNA structure computing
The program predicts RNA secondary structure using Zuker's algorithm of energy minimization. Energy calculation is made using energy rules which are similar to those of mfold 3.0.
The program searches for a given number of best (most stable) palindroms - hairpin-like, "linear" structures, which can contain bulge or interior loops.
The program searches for bacterial terminators in DNA sequences, using the set of conditions, which can be modified by user. They are stored in the config file (FindTerm.cfg) or any other user-defined config file or even without config, from command line.
You can try FoldRNA, BestPal, and FindTerm here.
Protein/DNA 3D-Visual Works
Visual works with 3D of a Structure.
Visualization of Compared by CE 2 Structures.
Comparing 3D structures of two proteins.
Find structural alignments by real time search in the PDB database.
You can try 3D-Explorer, 3D-Comparison, 3D-Match and 3D-MatchDB here.
Seqman allows to perform a set of manipulaions on a sequences: loading, designing of sequences, search for motifs in a sequence and animoacid translations of a sequences. Also seqman allows to save results and print sequence and results in different formats.
To try Seqman, click here.