|
PromH(G) Recognition of human and animal Pol II promoters
(Transcription Start Site and TATA-box)
Method description
To further improve promoter identification accuracy achieved by TSSG
program, we developed a new program, promH(G), by extending the TSSG program
feature set. PromH uses linear discriminant functions that take into account,
in addition to features realized in TSSG, conservation features of major promoter
functional components, such as transcription start points, TATA-boxes and regulatory
motifs, in pairs of orthologous genes aligned by SCAN2
program.
PromH(G) output
OUTPUT file begins with description of the program allocation, used abbreviations
and Search Parameters (Lines 1-10). Next two lines include name and length of
the first query sequence and the number of predicted promoter regions. Then,
positions of predicted sites, their "weights" and TATA-box position (for TATA
promoters) are given. After that, functional motifs are given for every predicted
region; (+) and (-) reflect direct or complementary chain; $... means a particular
motif identificator from Transcription Factors Database, TFD (Ghosh, Nucleic
Acids Res., 1993 , 21, 3117-3118). Then, the same information is given for second
query sequence.
Example of output file
Program promHG (Softberry Inc.)
Search for TATA+/TATA- promoters in 2 aligned DNA sequences
NOTE: PHa - Homology Level of Aligned Sequences in LOCAL Search Area (-100,TSS+40)
PHs - Homology Level of Aligned Sequences around TSS
PHss - Homology Level of Aligned Sequences to Right from TSS
PHt - Homology Level of TATA-boxes in Aligned Sequences
PHr - Mean Homology Level of Regulatory Elements in LOCAL Search Area
Initial / Final Thresholds - 2.00 / 6.00
======================================================================
>H-NPPA/AL021155/[33199:35843/
Length of sequence- 2645
1 promoter(s) have been predicted
Promoter Pos: 2549 (Weight - 16.00) TATA box at: 2517 (Weight - 218.33)
PHa - 78% PHs - 100% PHss - 74% PHt - 100% PHr - 80%
Transcription factor binding sites:
for promoter at position - 2549
2462 (+) S01152 AAGTGA
2378 (+) S00922 AGAGG
2525 (+) S00922 AGAGG
2306 (-) S00922 AGAGG
2499 (-) S00395 CACGCW
..............
--------------------------------------------------
>R-NPPA/J03267/[1638:3722]/-2000:+85/CDS: 3723, premRNA: 3638
Length of sequence- 2087
2 promoter(s) have been predicted
Promoter Pos: 2000 (Weight - 15.59) TATA box at: 1970 (Weight - 217.73)
PHa - 78% PHs - 100% PHss - 77% PHt - 100% PHr - 89%
Promoter Pos: 1662 (Weight: 6.37)
PHa - 76% PHs - 88% PHss - 72% PHr - 74%
Transcription factor binding sites:
for promoter at position - 2000
1915 (+) S01152 AAGTGA
1773 (-) S00922 AGAGG
1716 (+) S00392 AGGAAG
1999 (-) S02113 CCAGCTG
1713 (+) S01003 CCCAG
...........
for promoter at position - 1662
1504 (+) S01090 AATGA
1610 (+) S01013 ACAGCTG
1484 (+) S00922 AGAGG
1505 (+) S01444 ATGAATCAG
...........
|