HomeAll softwareProductsNew ProductsServicesManagement teamCorporate ProfileContact

Test online

Gene finding
Gene finding with similarity
Gene finding in Bacteria
Gene finding in Viruses
Next Generation
Gene search
Gene explorer
Protein location
RNA structure
Protein structure
Multiple alignment
Analysis of expression data
Plant promoter database
Search and map repeats
Extracting known SNPs



Introduction to SpliceDB

A set of 43337 splice junction pairs was extracted from mammalian GenBank annotated genes. 22489 of them are supported by EST sequences. 98.71% of those contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively. 0.56% hold non-canonical GC-AG splice site pairs. The reminder 0.73% occurs in a lot of small groups (with maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only 8 observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). 156 out of 171 human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which 1 was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, 6 AT-AC pairs (of which 2 were errors that corrected to AT-AC), 1 case was produced from non-existent intron, 7 cases were found in HTG that were deposited to GenBank and finally there were only 2 cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22199 entries vs. about 600), and finally a set of 290 EST supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.

© 2023 www.softberry.com