FGENES-M Program Description
FGENES-M 1.5 - Pattern-based prediction of multiple variants of gene structure
There are two reasons to predict several sub-optimal variants of gene structure, instead of only one:
1) Gene prediction algorithms for long genomic sequences are only 70-80% accurate on average, therefore real gene structure might have the score slightly lower than the predicted optimal variant. FGENES-M allows you to see alternative structures that otherwise you might never see;
and
2) Alternative splicing is quite common for mammalian genes, so you may miss real gene structures relying on just one optimal prediction, even supported by experimental data.
Of course, thousands of alternative gene structures can be predicted, and there is currently no established way to distinguish true variants from false ones.
FGENES-M variant (or its older version FGENEM) proved to be useful in providing a set of possible gene structures for further experimental testing in commercial gene hunting.
Method description:
Algorithm outputs several (up to 15, though the number can be changed) suboptimal variants of predicted gene structure. It is similar to FGENES and is based on pattern recognition of different types of exons, promoters and polyA signals and finding optimal combination of them by dynamic programming. Then, a set of gene models along given sequences is constructed.
You may compare validities of predicted variants using GENE WEIGHT parameter. If this parameter is similar in alternative variants, it is reasonable to consider them.
Fgenes-M output:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
TSS - position of transcription start;
TATA - position of TATA-box;
wTATA - Discriminant function score for TATA box;
Start and End - Position of the Feature;
Weight - Discriminant function score for the feature;
ORF - start/end positions where the first complete codon starts and the last codon ends
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 175701.1 Date: 19981005
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 1 Max var= 10 GENE WEIGHT: 24.1
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 4247 369 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDGSELSSTSRTEVSS
VSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 175701.1 Date: 19981005
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 2 Max var= 10 GENE WEIGHT: 15.1
G Str Feature Start End Weight ORF-start ORF-end
1 + 1 CDSf 218 - 321 1.01 218 - 319
1 + 2 CDSi 984 - 1023 1.94 986 - 1021
1 + 3 CDSi 1860 - 2028 1.49 1862 - 2026
1 + 4 CDSi 2675 - 2802 1.00 2676 - 2801
1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES-M 1.5 ACU08131 1 Multiexon gene 218 - 4247 265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 175701.1 Date: 19981005
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 3 Max var= 10 GENE WEIGHT: 14.3
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSi 3558 - 3870 0.78 3558 - 3869
1 + 6 CDSl 4857 - 5131 2.37 4859 - 5128
1 + PolA 5187 0.77
Predicted proteins:
>FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 5131 446 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMVFTGA
RECIEGGQEEEKFVPRGVCASAKSNALNLNSVESGHDSDTGRTNETQHDPPRSLQGLCAS
SQHGSTGTILYIVFDTKACCVPGTSS
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 175701.1 Date: 19981005
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 6 In +chain: 6 In -chain: 0
Predicted genes and exons in var: 4 Max var= 10 GENE WEIGHT: 13.9
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSi 3558 - 3668 0.99 3558 - 3668
1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244
1 + PolA 4650 3.17
Predicted proteins:
>FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 4247 326 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTFRNCIMQLFGKK
VDDGSELSSTSRTEVSSVSNSSVSPA
FGENES-M 1.5.0 Prediction of several variants of multiple genes
Time: 175701.1 Date: 19981005
Seq name: ACU08131
Length of sequence: 5392 GC content: 0.46 Zone: 2
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 5 In +chain: 5 In -chain: 0
Predicted genes and exons in var: 5 Max var= 10 GENE WEIGHT: 13.0
G Str Feature Start End Weight ORF-start ORF-end
1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56
1 + 1 CDSf 521 - 641 1.23 521 - 640
1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361
1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026
1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802
1 + 5 CDSl 3558 - 3875 2.10 3558 - 3872
1 + PolA 4650 3.17
Predicted proteins:
>FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 3875 356 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMG
|