HomeAll softwareProductsNew ProductsServicesManagement teamCorporate ProfileContact

Test online

Gene finding
Gene finding with similarity
Gene finding in Bacteria
Gene finding in Viruses
Next Generation
Gene search
Gene explorer
Protein location
RNA structure
Protein structure
Multiple alignment
Analysis of expression data
Plant promoter database
Search and map repeats
Extracting known SNPs



Main Products

This is a partial list of Softberry products. All programs available for online testing are also available for licensing, even if they are not in this list. Complete list of products is available upon request. We license our products on subscription basis, with licensing fees payable annually. Typically, we offer only two types of licenses: personal single-user and site license. Fees for academic single-user licenses vary from several hundred to a few thousand dollars per year. For instance, such license to EST_Map or Prot_Map is $1,000 per year, FGENESH - $1,640 for the first year and $1,120 annual renewals etc. Site licenses usually cost 2.5x as much. Commercial licenses are obviously more expensive. Please inquire for details. Most of our programs are available for all major Unix platforms, including Linux and Mac OSX.


Gene Prediction Programs

Algorithm based on pattern recognition of different types of exons, promoters and polyA signals. Optimal combination of these features is then found by dynamic programming and a set of gene models is constructed along given sequence. One of the most popular gene finders available.

FGENES variant that outputs several alternative gene structures. Useful for predicting alternative splicing or providing several gene variants for subsequent experimental testing.

Prediction of multiple variants potential genes in Homo_sapiens genomic DNA.

A version of FGENESH program including noncanonical GC dinucleotide in donor splice sites is installed to use on-line. This program is useful to analyze alternative gene structure, where non-standard splice sites are often found (see also FGENES-M program to predict alternative gene variants) and create a set of genes and proteins absent in standard gene prediction.

Hidden Markov Model (HMM)-based gene prediction program. The fastest and most accurate gene predictor available - click here for more details. Available with parameters for dozens of different genomes/taxonomic groups.


FGENESH variant that uses information on homologous existing proteins for more accurate gene assembly from predicted exons.

Add-on to FGENESH that uses information on homologous cDNA/EST for more accurate gene assembly from predicted exons.

Variant of FGENESH that uses two homologous genomic DNA sequences, such as human and mouse, for more accurate gene prediction.

To try FGENESH+, FGENESH_C or FGENESH-2, click here.

Prot_Map: Mapping entire protein DB to a genome
Prot_Map is more accurate and hundreds times faster than GeneWise (for accuracy comparison, click here). It can be used as stand-alone gene prediction program or in conjunction with FGENESH+ as a very accurate genefinder. It can also be used for finding pseudogenes. Try it here.

FGENESH++C: The ultimate genome annotation tool
This set of programs produces fully automated genome annotations of quality undistinguishable from manual genome annotation. It includes mapping of RefSeq sequences and NR protein DB to a genome, two rounds of ab initio FGENESH and two rounds of FGENESH+ evidence based gene prediction. FGENESH++C is not available for online testing due to its heavy use of computer resources.

Programs for splice site identification, SPLM includes finding non-standard splice sites such as GC-donor sites or AT-AC sites.
To try SPL or SPLM, click here.

Find splice sites in genomic DNA.
To try FSPLICE, click here.

FGENESB: Bacterial operon and gene finding suite
FGENESB is the fastest and most accuratebacterial gene finder - click here for more details. It is also the easiest to train on a new genome - all it takes is ten minutes time and genomic sequence. FGENESB also maps maps tRNA and rRNA genes, predits bacterial promoters using BPROM program and terminators using BTERM, predicts operons based on protmoter and terminator locations, densities of ORFs in a given genome and frequencies of different genes being linked to each other, and annotates predicted genes based on homology with known proteins from public databases and maps tRNA and rRNA genes.
To try genefinding part of FGENESB and see annotations of more than a dozen bacterial genomes, click here.

Recognition of E.coli promoter and start of transcription. As a part of bacterial genome analysis suite of programs, and to enforce operon and gene prediction by FGENESB program, we introduce BPROM, bacterial promoter prediction program.
To try BPROM, click here.

Separating archea and bacterial genomes.
To try AbSplit, click here.

FGENESV: Gene finder for viral genomes
FGENESV can be used with generic parameters, suitable for smaller viral genomes, or automatically trained on querry genome sequence within nimutes. It supports alternative genetic codes that can sometimes be found in viruses.
To try FGENESV, click here.

FCLUST: Fast EST clustering and alignment program
FCLUST options include accounting for known mRNAs, repeat masking and many others.
An example of mouse EST clustering can be accessed here.



Promoter and functional Site Prediction Programs

Human promoter prediction. Algorithm predicts potential transcription start positions by linear discriminant function combining characteristics describing functional motifs and oligonucleotide composition of these sites.

These two programs for predicting PolII promoter regions use linear discriminant function combining characteristics describing functional motifs and oligonucleotide composition of transcription start sites. TSSW uses elements of older release of Transfac® database and may require corresponding license, while TSSG uses data of Dan Prestridge. TSSG is the most accurate promoter preediction program available - click here for details.

This variant of TSSW/TSSG is designed for predicting PolII promoter regions in plants. It uses regulatory motif data from Softberry's plant RegSite database.

PromH takes into account, in addition to features realized in TSSW/TSSG, conservation features of major promoter functiuonal components, such as transcription start point, TATA-boxex and regulatory motifs, in pairs of ortologous sequences aligned by Scan2 program. That results in additional 20% accuracy improvement, especially pronounces on TATA+ promoters.

PolyA site prediction program. Uses linear discriminant functions combining characteristics describing various contextual features of polyA sites.

nsite is the program for analysis of regulatory regions and composition of their functional motifs based on statistical estimation of expected number of a nucleotide consensus pattern in a given sequence. It uses the datafile consisting of older release of Transfac® sequences and a set additional functional motifs and may require Transfac license.

This variant of NSITE is designed to work on plant sequences using functional motifs from Softberry's own RegSite database.

nsiteM and NSITEM-PL
This version of NSITE can process multiple sequences nad find regulatory elements (REs) that occur at least in one copy in X% or more of analyzed DNA sequences. REs can be taken from different databases or defined by user.

Search for Conservative Motifs of Regulatory Elements (REs)from both Collection of thousands REs (of human and animals or plant species) created by us and Collection of REs given by USER available in both of 2 aligned (in special FORMAT) homologous (orthologous) DNA sequences (Max. Length - 100 000 nt).

To try FPROM, TSSG, TSSP, POLYAH, PromH, NSITE-PL, NSITEM_PL and nsiteH, click here.

The program searches for significant patterns in the set of sequences.

ScanWM-PL searches for functional motifs described by weight matrixes of plant regulatory sequences. Weight matrixes used with the program are build for a subset of plant regulatory sequences from RegSite Database developed by Softberry Inc. using published data on transcription regulation of plant genes. An assumption used in the program is that if a pattern found in a sequence has a weight greater than a cut-off value for a corresponding weight matrix, it can be expected that the pattern is a functional motif, and the sequence analysed possesses the pattern's function.

The program is intended to search for CpG islands in sequences.

To try PATTERN, ScanWM-PL and CpGfinder, click here.

Motif Explorer
Motif and promoter visualization.
To try Motif Explorer, click here.


Database Search, Genome Comparison and RNA/EST/oligo Mapping Programs

DBSCAN performs search of DNA sequence through database in a a BLAST-like manner, except that it can handle multimegabyte-sized sequences.

SCAN2 performs alignment of two multimegabyte sequences.

Both programs have Java viewer for easy visualization and parsing of results.

RNA_MAP is a fast algorithm to accurately map mRNA sequence to genomic sequence taking into account splice sites flanking intron sequences. Time to map mRNA 300 bp on 52 of unmasked MB Y chromosome is about 19 sec, for 7300 bp the time about 47 sec (checked both chains, one DEC Alpha processor 500 MHz).

To try SCAN2, DBSCAN and RNA_MAP, click here.

EST_SMAP is for mapping the whole set to a chromosome sequence. For example, 11000 sequences of full mRNA from NCBI reference set are mapped to 52 MB of unmasked Y chromosome for about 18-25 min (depending on computer memory size).

OLIGO_MAP is designed to map a set of oligonucleotides used for microarray production. The program map 300000 oligos 25-30 bp long on 49 MB of unmasked Chromosome 22 for 8 min. Program is useful to check location of oligos and their uniqueness in genome.

FMAP is a program for very fast (less than a second) mapping of a sequence on genome.

To try EST_MAP and FMAP, click here.

The program maps a short sequence to a specified genome and outputs the list of the chromosome regions where the sequence of interest is found at a certain degree of homology. To try SMAP, click here.

Genomes Match: Fast alignment of two eukaryotic chromosomes or bacterial genomes.
The program finds all possible significant alignments between genome fragments. Alignment of two bacterial genomes takes 20-40 seconds on a typical one-processor computer.
Genomes Match can be tried here.

MALI: Multiple alignment of nucleotide or protein sequence and tree construction.
This powerful multiple alignment program includes an alignment editor, a tree builder and a tree editor.
Try it here (nucleotide sequences) or here (protein sequences).


GenomeSequence Explorer

Softberry GenomeSequence Explorer is a powerful viewer of genomic sequences with capability for sequence homology search (one second on human genome using FMAP algorithm), simple pattern, feature ID or word search, retrieval of nucleotide and amino acid sequences of features, expression data on individual genes and many others.

Follow these links to see human , mouse, rat and bacterial genomes in GenomeSequence Explorer.


Protein location

ProtComp: A program for identification of sub-cellular localization of eukaryotic proteins.
The program is based on complex neural-network recognizers, which identify probability of he subcellular localization in Nuclear, Plasma membrane, Extracellular, Cytoplasmic, Mitochondrial, Chloroplast, Endoplasmic reticulum, Peroxisomal, Lysosomal or Golgi compartments. It accuracy reaches 90% for major compartments.

ProtCompB combines several methods of protein localization prediction - Linear Discriminant Function-based prediction; direct comparison with bases of homologous proteins of known localization; comparisons of pentamer distributions calculated for query and DB sequences; prediction of certain functional peptide sequences, such as signal peptides and transmembrane segments. It means that the program treats correctly complete sequences only, containing signal sequences, anchors, and other functional peptides, if any.

The program predicts putative cytotoxic T lymphocyte (CTL) epitopes in protein sequences. These polypeptides are known as potential candidates for vaccine design. The sequence length for predicted epitopes is 9.

Program for identification of significant Prosite patterns.

You can try ProtComp, ProtCompB, CTL-epitope-Finder and Psite here.


Protein Sequence and Structure Analysis Programs

SSENVID: Protein secondary structure and environment assignment from atomic coordinates.
SSENVID recognizes secondary structural elements in proteins from their atomic coordinates. It performs same tasks as DSSP by Kabsch and Sander (1983) or STRIDE by Frishman & Argos (1995), analyzing both hydrogen bonds and mainchain dihedral angles, as well as some probabilistic measures. SSENVID also computes accessible surface area, polarity and environment classes as defined by Bowie, Luthy, Eisenberg (1991).
You can try SSENVID here.

GETATOMS: Computing side chain conformations by simulated annealing with frozen main chain atoms.
GETATOMS is a program of modeling atomic coordinates of a protein with unknown 3D structure. It uses main chain coordinates from 3D structure of similar protein, which sequence is aligned with a query protein. It also has an option to provide coordinates of H-atoms.
You can try GETATOMS here.

3D-comp: Superposition of two PDB 3D-structures using alignment of their sequences.
You can try 3D-comp here.

SoftPM: a suite of protein analysis programs that includes programs for secodary structure prediction, fold recognition, molecular mechanics and molecular dynamics modeling. The suite consists of seven programs, two of which being SSENVID and SSPAL described above, and powerful Java-based visualization tool with ability to correlate structure and sequence information.

Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments.
To try NNSSP, click here.

Program predicts ordered and disordered regions in protein sequences. Minimum required sequence length is 40.
To try PDISORDER, click here.

The program performs the first step in locating of disulphide briges in proteins: prediction of SS-bonding states of cysteins.
To try CYS_REC, click here.

The Program MDynSB is designed to perform multiple tasks with protein structure:
1) Optimization of a protein structure via MD simulation in an implicit water solvent.
2) Optimization and folding of a protein via (the user defined) simulated annealing protocol in an implicit water solvent.
3) Optimization of a predefined protein loops while non-loop parts of the protein molecule is kept fixed in the course of the loop optimization
To try MdynSB, click here.

Energy minimization program by molecular mechanic. In the current version of the program, the PDB file with coordinates of atoms in a protein in the input data. The coordinates may be retrieved from the file or PDB database. For computation, indicate the chain identifier, given in the PDB file.
To try Hmod3dMM, click here.

The program is intended for calculating 3D structure of proteins, provided that 3D structures of individual parts (fragments) of the protein are known, while phi and psi angles between the fragments should be found. This problem may arise when constructing a protein structure from fragments, whose structures were obtained using the search for homology of their primary sequences.
To try AbIni3D, click here.


SelTag: A Tool for Analysis of Expression Data.

SelTag is one of the most elegant tools for analysis of expression data. It can analyze all or marked groups of genes or tissues, select tissue-specific genes based on complex criteria, provide visual representation of expression data, identify genes correlatively expressed in a given set of tissues, select disease-specific genes with particular characteristics, such as receptors or secreted proteins. Available in Windows and cross-platform Java version, which can be tried here.


First Human-Mouse-Rat Synteny Alignment Based on Draft Genomic Sequences

Softberry has developed human-mouse-rat synteny server which provides information about 18915 human genes mapped to mouse genome draft and 18464 mouse genes mapped to human genome draft, among them 14504 ortologous gene pairs. This is the most comprehensive data about homology between human, mouse and rat genomic regions. Compared to NCBI homology maps, Softberry map contains significantly more genes and is directly linked to genomic sequences. The data were generated by Softberry programs for gene prediction, EST/RNA mapping and gemomic sequence comparison - some of them can be tested at this site.

You can view human-mouse-rat synteny here.


RNA structure computing

The program predicts RNA secondary structure using Zuker's algorithm of energy minimization. Energy calculation is made using energy rules which are similar to those of mfold 3.0.

The program searches for a given number of best (most stable) palindroms - hairpin-like, "linear" structures, which can contain bulge or interior loops.

The program searches for bacterial terminators in DNA sequences, using the set of conditions, which can be modified by user. They are stored in the config file (FindTerm.cfg) or any other user-defined config file or even without config, from command line.

You can try FoldRNA, BestPal, and FindTerm here.


Protein/DNA 3D-Visual Works

Visual works with 3D of a Structure.

Visualization of Compared by CE 2 Structures.

Comparing 3D structures of two proteins.

Find structural alignments by real time search in the PDB database.

You can try 3D-Explorer, 3D-Comparison, 3D-Match and 3D-MatchDB here.



Seqman allows to perform a set of manipulaions on a sequences: loading, designing of sequences, search for motifs in a sequence and animoacid translations of a sequences. Also seqman allows to save results and print sequence and results in different formats.
To try Seqman, click here.


© 2023 www.softberry.com