The main concept of the ANNOVAR-like program SNP-select, as well, in fact, as ANNOVAR itself (see http://www.openbioinformatics.org/annovar/), is a stepwise filtering of input mutation set to discriminate those hardly determine a certain hereditary disease or any other "mutational" feature. User can specify the set of filtering steps.
The SNP-select program reads input files where the every line corresponds to a single mutation per genome, including single and/or block substitutions, insertions or deletions. The first 5 values in a line, tab-separated, represent a chromosomal index (may contain 'chr' prior to index), starting and ending positions for mutation, reference and alternative nucleotides. It is allowed to insert additional columns to line that will be output without changes in output files. '0' is specified for indication of a reference nucleotide, if it is unknown. Insertions/deletions can be represented by the '-' symbol for indication of missing nucleotides. The Table 1 represents a number of examples.
To estimate variants in regard to their functional effects, SNP-select utilizes a number of databases that are preliminary loaded and checked (Table 2). Some filters, by user discretion, can be omitted. Every variant (mutation) is being compared to content of databases selected as filters, and, on hit particular genome regions or on exceeding an user-specified score threshold, is being discriminated.
At every step, in the folder, that is specified in initialization file, the files 'step_i' and 'step_i.dropped' with "passed through the filter" and "dropped" at i-th step mutations. The last value in tab-delimited line of mutation contains the commentary on a feature of current filter. As a rule, it's a score value. For the first filter - genomic annotation - it is a reference to mutation localization: exonic or splicing for accepted mutations, and noncoding_exonic, intronic, 3-prim noncoding, 5-prim noncoding, intergenic for filtered, as well as '-' for those discriminated mutations that were not found in annotations DB (located outside the annotated part of a genome).
Once all filters are passed, the list of mutations that are most promising in terms of association with a certain feature/disease is formed.
*) - can be changed by user
|© 2021 www.softberry.com|