PSITE Program Description
PSITE - Search for of prosite patterns with statistical estimation
Method description:
The method is based on statistical estimation of expected number of a prosite pattern in a given sequence. It uses the PROSITE database (author: Amos Bairoch,1995) of functional motifs. If we found a pattern which has expected number significantly less than 1, it can be supposed that the analyzed sequence possesses the pattern function. Presented version 1 is the simplest version that search for patterns without any deviation from a given Prosite consensus. In the following version we will include this possibility. In the output of PSITE we can see a prosite pattern, its position in the sequence, accession number, ID, Description in the PROSITE database as well as Document number where is pattern characteristics outlined. It must be noted that patterns which started at the beginning or end of protein sequence will be recognized along the whole sequence in this version. It may be useful for analysis of ORF or 6 frame translation sequences.
Acknowledgments: We acknowledge Ilgam Shahmuradov and Igor Rogozin which took part in development some applications of this method for nucleotide consensuses searching and Asya Salihova for protein sites searching on IBM PC.
Example of PSITE output:
PSITE V1 - search for Prosite patterns
10 20 30 40 50 60
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLAKTFIDQGKLI
70 80 90 100 110 120
PDDVMTRLVLHELKN*TQYNWLLDGFPRTLPQAEALDRAYQIDTVINLNVPFEVIKQRLT
130 140 150 160 170 180
ARWIHPGSGRVYNIEFNPPKTMGIDDLTGEPLVQREDDRPETVVKRLKAYEAQTEPVLEY
190 200 210 220 230 240
YRKKGVLETFSYTETNKIWPHVYAFLQTKLPDANKDDALDQREWSAAAAWLAAAAALDLN
250 260 270 280 290 300
AGCPAAALAAAAAGSAACAAAAAFAAAAAACCAACAAAAAAACAAAADAACGAYAYACAP
ID GLYCOSAMINOGLYCAN; RULE.
AC PS00002;
DE Glycosaminoglycan attachment site.
DO PDOC00002;
PA S-G-x-G.
Sites found: 1 Expected number: 0.0272 95% confidential interval: 0
# Start End Expected Site sequence
1 12 15 0.0272 SGKG
ID EF_HAND; PATTERN.
AC PS00018;
DE EF-hand calcium-binding domain.
DO PDOC00018;
PA D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-
PA [DE]-[LIVMFYW].
Sites found: 1 Expected number: 0.0004 95% confidential interval: 0
# Start End Expected Site sequence
1 212 224 0.0004 DANKDDALDQREW
ID ADENYLATE_KINASE; PATTERN.
AC PS00113;
DE Adenylate kinase signature.
DO PDOC00104;
PA [LIVMFYW](3)-D-G-[FY]-P-R-x(3)-[NQ].
Sites found: 1 Expected number: 0.0000 95% confidential interval: 0
# Start End Expected Site sequence
1 81 92 0.0000 WLLDGFPRTLPQ
Reference:
Solovyev V.V., Kolchanov N.A. 1994,
Search for functional sites using consensus
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim
H.A.), World Scientific, p.16-21.
|