80 likes | 166 Views
Bioinformatics. Ayesha M. Khan 21 st March, 2012. Some statistics of local sequence comparison (BLAST).
E N D
Lec-8 Bioinformatics Ayesha M. Khan 21st March, 2012
Some statistics of local sequence comparison (BLAST) • Once BLAST has found a similar sequence to the query in the database, it is helpful to have some idea of whether the alignment is “good” and whether it portrays a possible biological relationship, or whether the similarity observed is attributable to chance alone. • BLAST uses statistical theory to produce a bit score and expect value (E-value) for each alignment pair (query to hit). Lec-8
BLAST Results: Scores and Values Max score = highest alignment score (bit-score) between the query sequence and the database sequence segment. Total score = sum of alignment scores of all segments from the same database sequence that match the query sequence (calculated over all segments). This score is different from the max score if several parts of the database sequence match different parts of the query sequence. Query coverage = percent of the query length that is included in the aligned segments. This coverage is calculated over all segments. E-value = number of alignments expected by chance with a particular score or better. Lec-8
Some details: Bit score • The bit score gives an indication of how good the alignment is; the higher the score, the better the alignment. • In general terms, this score is calculated from a formula that takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences. • Key element substitution matrix Lec-8
Bit score (contd.) • The BLOSUM62 matrix is the default for most BLAST programs, the exceptions being blastn, megaBLAST and discontigmegablast (programs that perform nucleotide–nucleotide comparisons and hence do not use protein-specific matrices). • Bit scores are normalized, which means that the bit scores from different alignments can be compared, even if different scoring matrices have been used. Lec-8
Some details: E-value The E-value gives an indication of the statistical significance of a given pairwise alignment and reflects the size of the database and the scoring system used. The lower the E-value, the more significant the hit. A sequence alignment that has an E-value of 0.05 means that this similarity has a 5 in 100 (1 in 20) chance of occurring by chance alone. E=Kmne-λS m, n is size of the search space (n is length of query sequence, m is length of the database) K is a scale parameter for size of search space λ is a scale parameter for scoring method S is bit score Lec-8
Multiple Sequence Alignment Why do we need to carry out multiple sequence alignments? • To make connections between more than two family members • To reveal conserved family characteristics MSA is a 2D table rows represent individual sequences and columns the residue positions. Absolute position: Property of the sequence Relative position: Property of the alignment Lec-8
Example: Lec-8