TSSG - Recognition of human PolII promoter region and start of transcription
TSSG is the most accurate mammalian promoter prediction program. The following table shows results of promoter search on genes with known mRNAs by different promoter finding programs, reproduced with changes from Liu and States (2002) Genome Research 12:462-469. It shows that TSSG has by far the fewest false positive predictions.
Algorithm predicts potential transcription start positions by linear discriminant function combining characteristics describing functional motifs and oligonucleotide composition of these sites. TSSG uses promoter.dat file with selected factor binding sites (TFD, Ghosh,1993) developed by Dan Prestridge to calculate the density of functional sites as in J.Mol.Biol.,1995,249,923-932.
For approximately 50-55% level of true promoter region recognition, TSSG program gives one false positive prediction for about 5000 bp. This accuracy is similar with the test sequences anlysis by Prestridge's method. We estimate an accuracy of finding TSS position on ten test genes where both our and Prestridge's algorithms found promoter region to be as follows (numbers show dictance between actual and predicted TSS):
Another Softberry promoter recognition program TSSW is based on similar ideology, but uses data from older release of Biobase's Transfac® data base (E.Wingender, J.Biotech., 1994, 35, 273-280).
TSSG output example:
HSCALCAC 7637 bp DNA PRI 14-MAR-1995 Length of sequence- 7637 Threshold for LDF- 4.00 1 promoter(s) were predicted Pos.: 1820 LDF- 16.65 TATA box predicted at 1804 Transcription factor binding sites: for promoter at position - 1820 1764 (-) S00098 AACCAAT 1608 (-) S01152 AAGTGA 1741 (+) S01153 AARKGA 1608 (-) S01153 AARKGA 1657 (+) S01090 AATGA 1617 (-) S01027 ACGCCC 1577 (+) S00534 ACGTCA 1580 (-) S00534 ACGTCA 1580 (-) S01257 ACGTCAT ..............................
First line - name of your sequence;
Lower cased letters mean non-conserved nucleotides in the site consensus
The letters except (A,T,G,C) describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide using Standard IUPAC Nucleotide code.
See TABLE at http://www.yeastract.com/help/help_searchbydnamotif.php#Ref1
|© 2020 www.softberry.com|