100 likes | 231 Views
Chapter 6 - Profiles. Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M
E N D
Chapter 6 - Profiles Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M GARCCMH LCAFARLMLMA Weight matrices or position-specific scoring matrices Not considering gaps Profiles Profiles as Hidden Markov Models Chapter 6 - Profiles
Search with a family of sequences • Align the sequences (multiple) • Make a profile from part of the alignment • Search in the database with the profile • As an option, revise the profile, and search again (iteratively) Chapter 6 - Profiles
Multiple alignments and profiles What weight does amino acid a have in position r in the profile Chapter 6 - Profiles
Example Clustal X (1.64b) multiple sequence alignment XENLA1 ALVSGPQD------NELDG--MQL XENLA2 AQVNGPQD------NELDG--MQF MOUSE1 PQVEQLEL------GGSP---GDL RAT1 PQVPQLEL------GGGPEA-GDL MOUSE2 PQVAQLEL------GGGPGA-GDL RAT2 PQVAQLEL------GGGPGA-GDL Removed CRILO PQVAQLEL------GGGPGA-DDL RABIT LQVGQAEL------GGGPGA-GGL BOVIN PQVGALEL------AGGPG----- SHEEP PQVGALEL------AGGPG----- Removed PIG PQAGAVEL------GGGLGG---L CANFA LQVRDVEL------AGAPGE-GGL HUMAN LQVGQVEL------GGGPGA-GSL CHICK P-LVSSPL------RGEAGV-LPF ORENI LLGFLPPKAGGAVVQGGEN---EV VERMO LLGFLPAKSGGAAAGG-ENEVAEF 12345678******567890*234 * means removed Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le 1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 100 2 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 100 3 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 100 4 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 100 5 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 100 6 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 100 7 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 100 8 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 100 5 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 34 6 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 100 7 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 100 8 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 100 9 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 78 0 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 78 2 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 78 3 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 78 4 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100 * 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0 Chapter 6 - Profiles
What to take into account when creating a profile? 1. The observed amino acids in position r in the alignment. 2. The number of independent ‘observations’ that has been used for constructing the alignment of position r (for example number of different a.a. in the column) 3. The similarity of a to the amino acids observed in column r, to allow for not yet observed amino acids. Amino acid a is more likely to occur in unknown family members if there are many amino acids similar to a in the known sequences. Thus a ‘background’ scoring matrix should be used. 4. The background (a priori) distribution of the amino acids. 5. The diversity and similarity of the sequences, resulting in the importance (or weight) of each sequence. The known sequences are normally not uniformly distributed in the ‘family space’, and should have different weights in the calculation. 6. The number of gaps over column r and the neighbouring columns. These points are not independent. How these aspects are treated varies with the different methods for profile construction. Chapter 6 - Profiles
Database search with a profile Chapter 6 - Profiles
Notations Chapter 6 - Profiles
Position weight No sequence weight considered now • All a.a. In the column count equally • A.a occurring many times are favored • A.a. Occurring many times are ’punished’ Chapter 6 - Profiles
PSI-BLAST Chapter 6 - Profiles
Hidden Markov Model Chapter 6 - Profiles