280 likes | 393 Views
Patterns, Profiles, and Multiple Alignment. OUTLINE. Profiles and Sequence Logos Profile Hidden Markov Models Aligning Profiles Multiple Sequence Alignments by Gradual Sequence Adition Other Ways of Obtaining Multiple Alignments Sequence Pattern Discovery. OUTLINE.
E N D
Patterns, Profiles, and Multiple Alignment
OUTLINE • Profiles and Sequence Logos • Profile Hidden Markov Models • Aligning Profiles • Multiple Sequence Alignments by Gradual Sequence Adition • Other Ways of Obtaining Multiple Alignments • Sequence Pattern Discovery
OUTLINE • Profiles and Sequence Logos • Profile Hidden Markov Models • Aligning Profiles • Multiple Sequence Alignments by Gradual Sequence Adition • Other Ways of Obtaining Multiple Alignments • Sequence Pattern Discovery
Profiles and Sequence Logos • Previously we worked on some techniques on aligning two sequences. • Very often, similar sequences (sharing common ancestral sequence) have similar properties at equivalent regions. • Align a sequence to a set of similar sequences. • We need a way to represent the general properties of the set of sequences.
Profiles and Sequence Logos A new sequence can be aligned to this representation. Such representation is called a PROFILE.
Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. In previous techniques we applied a substitution scoring matrix, According to substitution matrices: alignment of residue a and b always receives the score sa,b,
Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. One common use of database search is to discover all known sequences that belong to the same sequence family as the query sequence, To find all family members: Discover initial set of family members (first database search), Include these popsition specific preferences in scoring scheme PSSM
Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. PSSM: Are specific for each family of sequences, is a matrix, Weights sequences according to observed diversity specific to the family of interest,
Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. PSSM:
Profiles and Sequence Logos Position-specific scoring matrices are an extension of substitution scoring matrices. Generate a PSSM: A set of sequences is required, Suppose we have an alignment of Nseq sequences with Laln positions (alignment columns) the PSSM for this alignment will also have Laln columns
Profiles and Sequence Logos Generating a PSSM: If we have a multiple alignment of Nsequences, nu,b is the number of residue of type b at column u, mu,a is the score associated with row a and column u
Profiles and Sequence Logos Generating a PSSM (cont.): Give extra support to conserved residue, use logarithmic form of weighting, the value of ratio of the logs varies between 0 and 1, but in this case residues present in a smaller fraction are relatively under-weighted.
Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: similar to construction of common substitution matrices, if sufficient data is available:
Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: qu,a will cause problems,
Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: qu,a will cause problems, SOLUTION: pseudocounts,
Profiles and Sequence Logos Generating a PSSM (cont.): Using log-ods ratio: A better formula:
Profiles and Sequence Logos Representing a profile as a logo: Entropy: is a measure of the uncertainty, usually refers to the Shannon entropy(information theory), quantifies theexpected value of the information contained in a message, usually in units such asbits.
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: is a measure of the uncertainty, usually refers to the Shannon entropy(information theory), quantifies theexpected value of the information contained in a message, usually in units such asbits.
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: suppose that there are n alternative events, each of the possible event is labeled xi, each event has probability P(xi), then,
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Shannon Entropy is defined as:
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example: consider the amino acid that occurs at a particular position in a protein sequence, possible events are 20 amino acids, the uncertainity about which event will occur depends on P(xi).
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example (cont.): if one of the possible event has a probability of 1 (is certai to occur), all others have probability of zero, entropy will be zero in this case
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Example (cont.): if all the events has equal probability, entropy will be maximum in this case
Profiles and Sequence Logos Representing a profile as a logo (cont.): Entropy: Information present in the pattern at position Iu:
Profiles and Sequence Logos Representing a profile as a logo (cont.):
References • M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science • Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.