![]() |
![]() ![]() |
![]() | ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
OLIGS2Search for such oligos (4-nucleotide oligos), that occur often in the 1st file and differ significantly in number on comparison of the 1st and 2nd files with sequences. Input data The input file should be in FASTA format and may contain several sequences. Alphabet. The allowed symbols: "ACGTUacgtu" and "NnyYrRBbDdHhKkWwSsMmVv". The symbols to be skipped: "0123456789; \n\r\t\0-". All other symbols are not allowed. Algorithm
For the 1st input file the oligs program searches for the most frequent oligos at deviation multiplier = 0.0.
The result is saved in temporary file.
The ratio of nucleotides number between files - div_sum_len:
Output data Example for program output: Oligs2 1.1 Copyright (c) 2005-2006 Softberry Num seqs=11 Nucleotides=12191 Average seq length=1108.3 A=25.4% C=23.9% G=25.0% T=25.1% N=0.623411% Other=0.000000% Output most frequent oligs, direction=direct, seq_shift=0, seq_step=1 deviation multiplier=0.000000 Num seqs=17 Nucleotides=13702 Average seq length=806.0 A=28.8% C=21.4% G=21.8% T=28.0% N=0.000000% Other=0.000000% Output most frequent oligs, direction=direct, seq_shift=0, seq_step=1 all by distant #olig,total olig counter1,expected number1,unique sequences counter1,total olig counter2, unique sequences counter2,norm deviate1,norm deviate 2,sorter Length 2 TG 899 764.6 11 954 17 0.073743 0.069625 4627.9 CA 873 738.4 11 927 17 0.071610 0.067654 4582.5 GC 832 727.2 11 830 17 0.068247 0.060575 3538.7 TT 871 768.9 11 1296 17 0.071446 0.094585 2905.0 AA 875 784.0 11 1414 17 0.071774 0.103197 2522.1 GA 842 772.1 11 759 17 0.069067 0.055393 2459.4 TC 788 731.2 11 744 17 0.064638 0.054299 1898.7 AT 804 776.4 11 1067 17 0.065950 0.077872 742.5 AG 786 772.1 11 755 17 0.064474 0.055101 426.4 Length 3 CTG 260 182.5 11 210 17 0.021327 0.015326 1803.2 TTT 278 193.0 11 482 17 0.022804 0.035177 1420.5 CAG 247 184.3 11 207 17 0.020261 0.015107 1358.9 CCA 237 176.3 11 232 17 0.019441 0.016932 1171.0 TGC 242 182.5 11 261 17 0.019851 0.019048 1087.2 TGG 246 190.9 11 242 17 0.020179 0.017662 1054.1 AAA 268 198.7 11 568 17 0.021983 0.041454 1025.3 GGA 239 192.7 11 183 17 0.019605 0.013356 1002.7 TCC 222 174.6 11 167 17 0.018210 0.012188 996.6 TTC 235 183.6 11 236 17 0.019277 0.017224 946.2 GCA 234 184.3 11 236 17 0.019194 0.017224 915.3 GAA 243 195.7 11 239 17 0.019933 0.017443 885.2 AGC 229 184.3 11 207 17 0.018784 0.015107 847.7 GCT 227 182.5 11 222 17 0.018620 0.016202 805.0 ATC 223 185.4 11 204 17 0.018292 0.014888 695.8 CAT 224 185.4 11 233 17 0.018374 0.017005 675.8 GAG 223 192.7 11 161 17 0.018292 0.011750 627.2 CAA 228 187.2 11 315 17 0.018702 0.022989 620.2 ATG 226 193.8 11 247 17 0.018538 0.018027 527.2 AAG 227 195.7 11 273 17 0.018620 0.019924 505.0 GCC 202 173.6 11 215 17 0.016570 0.015691 456.8 TCA 210 185.4 11 210 17 0.017226 0.015326 401.4 GAT 214 193.8 11 204 17 0.017554 0.014888 349.7 CGA 202 184.3 11 184 17 0.016570 0.013429 293.3 ATT 216 194.9 11 341 17 0.017718 0.024887 277.3 CTT 202 183.6 11 245 17 0.016570 0.017881 272.4 GTG 207 190.9 11 205 17 0.016980 0.014961 265.2 TGA 207 193.8 11 206 17 0.016980 0.015034 220.4 TTG 206 191.9 11 292 17 0.016898 0.021311 184.7 TGT 204 191.9 11 245 17 0.016734 0.017881 177.7 AGG 198 192.7 11 161 17 0.016241 0.011750 94.3 CGC 177 173.6 11 160 17 0.014519 0.011677 59.6 ACA 190 187.2 11 248 17 0.015585 0.018100 35.4 AAT 200 196.8 11 340 17 0.016406 0.024814 33.2 GGC 183 181.5 11 202 17 0.015011 0.014742 18.5 Detailed description for output data: The program version and name are shown in the first string: Oligs2 1.1 Copyright (c) 2005-2006 Softberry Num seqs=11 Nucleotides=12191 Average seq length=1108.3 A=25.4% C=23.9% G=25.0% T=25.1% N=0.623411% Other=0.000000% Output most frequent oligs, direction=direct, seq_shift=0, seq_step=1 deviation multiplier=0.000000
It is the title for first program run. It is information on 1st input file:
Num seqs=17 Nucleotides=13702 Average seq length=806.0 A=28.8% C=21.4% G=21.8% T=28.0% N=0.000000% Other=0.000000% Output most frequent oligs, direction=direct, seq_shift=0, seq_step=1 all by distant
It is the title for second program run. It is information on 2nd input file:
#olig,total olig counter1,expected number1,unique sequences counter1,total olig counter2, unique sequences counter2,norm deviate1,norm deviate 2,sorter
Further the hint for table of oligos by columns is sown:
Length 3
CTG 260 182.5 11 210 17 0.021327 0.015326 1803.2 TTT 278 193.0 11 482 17 0.022804 0.035177 1420.5 CAG 247 184.3 11 207 17 0.020261 0.015107 1358.9
Further there is the table sorted by descend of 9th column.
|
![]() | © 2023 www.softberry.com |