GROUP MEMBERS: MUHAMMAD KHAIRULANWAR IZZAT BIN HUSSIN AC100076

GROUP MEMBERS: MUHAMMAD KHAIRULANWAR IZZAT BIN HUSSIN AC100076 MURNIYANTI BINTI MALIK AC100078 NG SHEE TING AC100079 SCHEE XIN LIN AC100086 AW MEI YEE AC100062

INTRODUCTION @Ng Shee Ting

INTRODUCTION(cont..) @Ng Shee Ting

PURPOSES @Ng Shee Ting

@Ng Shee Ting

WHAT IS PSSM?? @Ng Shee Ting

PSSM CONT.. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. *Note: A profile is a table of observed frequencies of amino acids (or nucleotides) at each position in a multiple alignment. @Ng Shee Ting

PSSM CONT.. • PSI-BLAST PSSM is derived from local alignments • Only positions present in the query sequence are used • If the query has L positions(length), PSSM will also have L positions and generate a 20 X L matrix. @Ng Shee Ting

Basic Concept for calculation(this example counting for nucleotide) @Izzat

CALCULATION cont… Row Column (Positions) @Izzat

CALCULATION cont… Refer back to Table A Shading indicates fraction of occurances for that base at that position: red (1.0), orange (0.8), yellow (0.6). @Izzat

CALCULATION cont… cThe background frequencies used to calculate the scores are A = T = 0.32; C = G = 0.18. Table 1D was calculated with the default scoring system used by the Gibbs Sampler @Izzat

CALCULATION cont… • In the example shown in Table 1D, the score for an adenine in position one is calculated: • Score(position 1, A) = [3+ √5 (0.32)] / [5 + √5] = 0.51 @Izzat

CALCULATION cont… • Score(position 1, A) = [3+ 0.1(0.32)] / [5 + 0.1] = 0.59 cThe background frequencies used to calculate the scores are A = T = 0.32; C = G = 0.18. Table 1E used the default scoring system of Meme. @Izzat

CALCULATION cont… dEach element of the table is equal to the negative log10 of the corresponding element of Table 1E. (*-log) @Izzat

EXAMPLE 20X L matrix Position 1 Position 15 L positions Y appear twice in this position @Izzat

PSSM CALCULATION Column 1: frequency (A, 1) = 0 / 5 = 0, frequency (G, 1) = 5 / 5 = 1, ... Column 2: frequency (A, 2) = 0 / 5 = 0; frequency (H,2) = 5 / 5 = 1, ... ... Column 15: frequency (A, 15) = 2 / 5 = 0.4, frequency (C, 15) = 1 / 5 = 0.2; ... Some frequencies are equal to 0 because of the number sequence in the multiple alignment . Such a frequency could lead to " exclusion "of the amino acid involved in this position. @Izzat

CONT.. One way around this by adding a "small value" at all frequencies observed. This low " frequency non-observed "is called a" pseudo-count .” In the previous example with a " pseudo-count "of 1: Column 1: f '(A, 1) = (0 +1) / (5 +20) = 0.04, f' (G, 1) = (5 +1) / (5 +20) = 0.24 ; ... Column 2: f '(A, 2) = (0 +1) / (5 +20) = 0.04, f' (H,2) = (5 +1) / (5 +20) = 0.24 ; ... ... Column 15: f '(A, 15) = (2 +1) / (5 +20) = 0.12, f' (C, 15) = (1 +1) / (5 +20) = 0.08 ; ... @Izzat

Table of full calculated f’

PSSM CONT.. The frequency of each amino acid determined at each position is compared to the frequency with which each amino acid is expected in a random sequence . It is assumed that each amino acid is observed with the same frequency in a random sequence. Score ij = log (f 'ij / q i ) where: -Score ij is the score for the residue i at position j -f 'ij is the relative frequency for residue i at position j, corrected by the " pseudo-count " -q i is the relative frequency expected for the residue i in a random sequence @Izzat

PSSM full calculated Score ij @Izzat

Exercise: Since the fully calculated score and f’ are given from the diagram given above. You can calculate the q I [using formula:Score ij = log (f 'ij / q i )] @Izzat

Solution • You can reverse the formula whereby q i = f 'ij/10^ Score ij -Any value with -0.2 in the table, q i =0.0634 -Any value with 2.3 in the table, q i =1.203*10^(-3) -Any value with 0.7 in the table, q i =0.015 -Any value with 1.3 in the table, q i = 6.014*10^(-3) @Izzat

Why PSSM? This PSSM is used to further search the database for new matches, and is updated for subsequent iterations with these newly detected sequences. is a matrix used for biological data, and its main role in PSI-BLAST search is to increase the sensitivity of results. The profile is used to perform a second BLAST search and the results of each "iteration" used to refine the profile. This iterative searching strategy results in increased sensitivity. @Izzat

E Value? • an abbreviated term for “Expected Value” or “Expectation Value”. a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. E value works for the longest row ofmatches in an alignment of length L. @Schee Xin Lin

E Value cont It decreases exponentially with the Score (S) that is assigned to a match between two sequences. Shorter sequences have a high probability of occurring in the database purely by chance. @Schee Xin Lin

E Value cont For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to "0" the more "significant" the match is. @Schee Xin Lin

EQUATION E = Kmn e – λS • This is the equation for calculating the e value. • m :the length of the query sequence • n : the database sequence • S: score • The parameters, K and λ are constants representing the scoring system. @Schee Xin Lin

Example of calculation Constants • λ=0.219 • K=0.082 • s=103 • m=100 • n = 2X10^8 • λ s=0.219x103=22.6 • e- λ s = 1.6x10^-10 • Kmne- λ s = 0 .082x100X2X10^8x1.6x10^-10 = 0.2624 @Schee Xin Lin

In a typical current database search, a protein of length 250 might be compared to a protein database of 50 000 000 total residues. @Schee Xin Lin

Doubling the length of either sequence will double the number of HSPs. • Doubling the score S will exponentially reduce the expected number of HSPs.(The higher the score, the lower the expected number of HSPs) • Thus, we anticipate E is proportional to mn. Also, E is proportional to e – λS. @Schee Xin Lin

Relationship between E and mn Relationship between E and e – λS E E mn e – λS @Schee Xin Lin

HOW PSI BLAST WORKS? @Aw Mei Yee

PSI BLAST FLOW CHART 1 2 3 4 @Aw Mei Yee

PRINCIPLES 1. A standard BLAST search is performed against a database using a substitution matrix (e.g. BLOSUM62). PSI-BLAST principle: 2. A PSSM is constructed automatically from a multiple alignment of the highest scoring hits of the initial BLAST search. High conserved positions receive high scores and weakly conserved positions receive low scores. @Aw Mei Yee

PRINCIPLES cont.. 3. The PSSM replaces the initial matrix (e.g. BLOSUM62) to perform a second BLAST search. 4. Steps 3 and 4 can be repeated and the new found sequences included to build a new PSSM. 5. We say that the PSI-BLAST has converged if no new sequences are included in the last cycle. @Aw Mei Yee

@Aw Mei Yee Sequence in FASTA format Example of FASTA format: >gi|18892811|gb|AAL80910.1| transposase [Pyrococcus furiosus DSM 3638] MVVLSFQRKILIIKSEIYPIVSKHYPKNTRREVISLYDLITFAILAHLHFNGVYKHAYRVLIEEMKLFPK IRYNKLTERLNRHEKLLLLAQEELFKKHAREYVRILDSKPIQTKELARKNRKDKEGSSEVISEKPAVGFV PSKKKFYYGYKLTCYSDGNLLALLSVDPANKHDVSVVREKFWVIVEEFSGCFLFLDKGYVSRGLEEEFLR FGVVYTPVKRGNQISNLEEKKFYKYLSDFRRRIETLFSKFSEFLLRPSRSVSLRGLAVRILGAILAVNLD RLYNFTGGGN

Peptide Sequence Databases Try Choose refseq @Aw Mei Yee

Choose PSI BLAST @Aw Mei Yee

PSI BLAST USES TWO E-VALUE: • the threshold E-value for the initial BLAST. • the inclusion E-value to accept sequences in the PSSM construction (default is 0.005). @Aw Mei Yee

Try Set to 0.0001 Can change threshold (cut off)according to desired for the next iteration Lastly, click Blast to Start the search @Aw Mei Yee

OUTPUT @Aw Mei Yee

FIRST ITERATION Click Go for 2nd iteration @Aw Mei Yee

SECOND ITERATION Click Go for 3rd iteration @Aw Mei Yee

THIRD ITERATION Click Go for 4th iteration @Aw Mei Yee

FORTH ITERATION @Aw Mei Yee

After the second iteration, PSIBLAST E value are not directly comparable to those calculated by BLAST. • This is because that BLAST scores the target sequence against each database sequence using a matrix (PSSM) contain fix value for each amino acid pair. @Aw Mei Yee

Sequence derived from previous iteration Newly searched sequence which homolog with new iteration @Aw Mei Yee

SAMPLE ALIGNMENT (HIT TABLE) identical matches are marked by "+" symbol in a line between the query and the database sequence. Gaps are introduced with a "-" symbol The hit sequence is presented in the Sbjct: line, and the query sequence in the Query: line. @Aw Mei Yee

GROUP MEMBERS: MUHAMMAD KHAIRULANWAR IZZAT BIN HUSSIN AC100076

GROUP MEMBERS: MUHAMMAD KHAIRULANWAR IZZAT BIN HUSSIN AC100076

Presentation Transcript

Group members Muhammad Asgar (23) Muhammad Ali javaid (47) Muhammad Ali shakir(48)

GROUP MEMBERS

Group members

Group Members

GROUP MEMBERS

Group members

GROUP MEMBERS SYED JUNAID HUSSAIN SHAH MUHAMMAD QASIM IRSHAD

GROUP MEMBERS

GROUP MEMBERS

Group Members

Group Members:

Group members

Group Members:

MAJOR MOHD SAKRI BIN HJ HUSSIN

Izzat halikov - Izzat Musical artist

Group Members

Izzat mp3 - Izzat videos