140 likes | 276 Views
Some principles and examples related to e valuation of sequence similarities with help of l ength e quivalent m easure s (ELEMS). Jaroslav Kubrycht and Karel Sigler. Prague, 30 November, 2006. Examples and kinds of column identities derived by ELEMS.
E N D
Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel Sigler Prague, 30 November, 2006
Examples and kinds of column identities derived by ELEMS
Minimum aa numbers limiting ELEMS(RDA) derived levels: CCBE aa, high occurrence aa, template motif aa, questionable aa cysteine exhibits the same numbers for both template motif and questionable aa see our pdf file ?
Examples of amino acid similarities and their contradictory dissimilarities in sequence block columns
Questionable amino acids A and V convertible via single triplet mutation present in the same column (cooperating pairs) achieve mixed high occurrence level. On the other, hand collocating template amino acids A and G without mutation relationship form contradictory pairs, which in fact diminish the level of overall extent of aa similarities in their block.
The probability of amino acids present in left column can be represented by a complete column similarity of non-integer height, i.e. by the verticallength equivalent of column (LEA). ELEMS(RDA) in given case determines high occurrence level of aa similarity, which LEA= 3.095.
In addition to LEA, we define also mean compressed height of whole sequence blocks, i.e. LETM. Both given height-related (vertical) length equivalents are restricted by the same number limits in ELEMS distinguishing different kinds of similarities.
Similar compression principle is also used to process gapped sequence block. Thus we result a compressed block with co- lumns containing only identical/similar aa and exhibiting non- integer height done by LETM. HLE random chain However, the first floor of given oblong block belongs to a random chain (in light orange) of the template motif. Only upper area determines HLE value. This means that: HLE = (LETM – 1) x n.
Mild modification in case of double sequence similarity Double sequence similarity uses only a single value of LEA (LEA = 2) following from the presence of only two chains in corresponding sequence block. Since this similarity has no alternative chain, corresponding alignment is accompanied by increased frequency of losses of column similarities in comparison with multiple sequence alignments. This and LEA values higher than necessary induced us to avoid restrictions of mean length equivalent (LETM) value in double sequence similarity, still keeping HLE evaluation. In spite of it, some agreement between BLAST and ELEMSis demonstrated in WP3.2.2.
Alternatively, we can represent HLE as a single chain of non-integer HLE length. This raises the question of minimal length of the chain exhibiting mean aa probability (or score) identical with template motif related to HLE. Corresponding minimum value of non-integer length (SL, i.e. specific limit) can be determined using several statistical procedures. specific limit (SL) HLE chain of sufficient length i.e. HLE > SL
The ratio of HLE to SL is independent of any probability differences. Moreover, this ratio provides a simply and illustrative insight into the difference from minimum significant value. Consequently, we suppose that such value may represent an interesting density- related parameter, which may complement the bit score evaluation. The given ratio was namedrelative block similarity (RBS). RBS is thus determined by the formula: RBS = HLE/SLE
Thank you for your visit of our web page. If you have any questions, our e-mails are: jkub@post.cz sigler@biomed.cas.cz You are invited.