1 / 60

Bioinformatics Approaches to Identifying Candidate Effector Molecules of S. typhimurium

Bioinformatics Approaches to Identifying Candidate Effector Molecules of S. typhimurium. Matthew Sylvester 12/1/03. Lamp-1 H+ ATPase Cathepsins Transferrin R Man. 6-PR. Endocytic Trafficking. SPI-1. SPI-2. ?. Salmonella -containing vacuole. bacterial effector

frazierh
Download Presentation

Bioinformatics Approaches to Identifying Candidate Effector Molecules of S. typhimurium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Approaches to Identifying Candidate Effector Molecules of S. typhimurium Matthew Sylvester 12/1/03

  2. Lamp-1 H+ ATPase Cathepsins Transferrin R Man. 6-PR Endocytic Trafficking SPI-1 SPI-2 ? Salmonella-containing vacuole bacterial effector proteins (SseJ, SifA, SseXs, and several others) Lysosome

  3. Selection of S. typhimurium Proteins • Salmonella effectors are secreted into the host cell via either the Salmonella pathogenicity island 1 (SPI1) or SPI2 type three secretion system (TTSS) • We chose only those proteins shown experimentally in the literature to go out through one or both of these systems (see PubMed at http://ncbi.nlm.nih.gov) • The seventeen identified SPI1 and SPI2-associated effectors were considered as one group for subsequent analysis • As the N-terminal 150 amino acids have been shown to contain conserved sequences for several SPI2 effectors, we compared this region (Miao and Miller, 2000)

  4. Alignment of SPI-2 Effector Proteins Miao E and Miller S. A conserved amino acid sequence directing intracellular type III secretion by Salmonella typhimurium. PNAS. 2000, 97(13). Pp. 7539-7544. Published alignment of known and putative SPI2 effectors identified by a BLAST (Basic Local Alignment Search Tool) search and then aligned using ClustalW. Note the presence of the WEK(I/M)XXFF motif from approx. aa 31-38.

  5. BLAST • Tries to find the most “similar” proteins • Compares a query to sequences in a database and each comparison is given a score (higher scores are more similar) • Scoring matrices (substitution-based) are used to assign a score based on the probability of each residue substitution • Gap penalties are negative scores • The alignment score is the sum of scores at each position • Significance of overall alignment given a p-value or an e-value • e-value = expectation value:The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.

  6. Blocks: Local ungapped alignment with rows = protein segments and columns = amino acid position Building Substitution Matrices: Part I 1 A D E P Q D A 2 A C E P D D A … … … … … ….. …………………… 10 S D E P Q D A New Sequence: A D E P Q R A -count number of matches and mismatches between new sequence and every other sequence in block. -We have 9AA matches and 1 AS mismatch in pos. 1 Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. PNAS (1992). pp.10915-10919.

  7. Building Substitution Matrices: Part II Next, sum the results of each column, store results in a table and add the new sequence to the group By successively adding new sequences, we get a table with all possible pairs If we have 9 A’s and 1 S in the first column, we get 1 + 2 + …+8=36 possible AA pairs and we get 9 AS or SA pairs and we get 0 SS pairs If w = width of amino acids and s = # sequences, we have w*s*(s-1)/2 total possible pairs. Here, we have 36+9=45 or 1*10*9/2=45

  8. Calculating the Lod (log-odds) Matrix • Let fij be the total number of amino acid pairs in the frequency table at position i,j (1<=j<=i<=20) • Then the observed proportion for each amino acid pairing is: • We have fAA=36 and fAS=9, so qAA=36/45 and qAS=9/45

  9. Calculating the Lod Matrix II • Now we need the expected probabilities of occurrence for each amino acid pair • If we assume that the observed frequencies of each amino acid are the population frequencies, we have • For our example, pA=36/45+(9/45)/2 =0.9 and pS=(9/45)/2=0.1 • Then the expected probability (eij)of occurrence is pipj for i=j and pipj+pjpi for i!=j • We have expected probability of AA=0.9*0.9=0.81, AS=2*0.9*0.1=0.18, SS=0.1*0.1=0.01

  10. Calculating the Lod Matrix III • Then we calculate the log-odds score in bits as sij=log2(qij/eij), so if we see more than expected, sij>0, if we see as many as expected, sij=0, and if we see less than expected, sij<0 • Multiplying s by 2 and rounding to the nearest integer, we obtain our values for the block substitution matrix (BLOSUM)

  11. Clustering • To prevent “double-counting” amino acid contributions from closely related proteins, sequences are clustered and counted as a single sequence in counting amino acids • Thus, if two sequences are identical at >X% of their aligned positions, then contributions are averaged between the two • In our example, if we were to cluster 8 of our sequences with A in the first position, we now have 2As and 1S • These matrices will be denoted BLOSUM X, such as BLOSUM 62

  12. Substitution Matrix (log-odds) Based on observed frequencies of substitutions in related proteins; identical amino acids are given high positive scores, frequently observed substitutions get lower positive scores, and seldom observed substitutions get negative scores.

  13. Related Calculations • Relative entropy measures the average information in bits that can be distinguishes an alignment from chance • Expected score in bit units

  14. Bioinformatics Approaches:Primary Structure

  15. Primary Sequence Search Methodology Hmmer search of aligned sequences: • Hmmer uses hidden markov models to make a profile probability matrix of amino acids from aligned sequences • The matrix is searched against the appropriate genome database TRVI search allowing for gaps and substitutions: • A motif is developed by allowing for a flexible number of gaps wherever there are gaps in the alignment • Substitutions of amino acids with similar properties are allowed • The motif is searched against the appropriate genome database MEME/MAST search of unaligned sequences: • Identifies a specified number of domains (probability matrices) across a subset of the input sequences • The domains are searched against the appropriate genome database

  16. How Hmmer Works:Profile Hidden Markov Models for Protein Sequence Analysis • http://hmmer.wustl.edu/

  17. Hmmer Architecture • Squares are match states (consensus positions), diamonds are insertions, circles are deletions and beginning/end. Arrows indicate state transitions.

  18. Hidden Markov Model Background From PMMB—Sandrine Dudoit See also http://www.ai.mit.edu/~murphyk/Bayes/rabiner.pdf

  19. More Hidden Markov Model Background

  20. Still More Background

  21. Hmmer Intro • Each M/D/I is a node and are determined by data and the multiple sequence alignment • Each M state aligns with a single amino acid and carries a vector of 20 probabilities determined by the proportion of times that an amino acid has shown up in a position in a multiple sequence alignment • Capable of handling gapped alignments • At each node either the M (amino acid aligned) or D state is used, and I states occur between nodes and self-transition • Arrows are transition probabilities and are estimated by the residues in each column of the multiple sequence alignment • S,N,C,T,J are “special states” that are algorithm-dependent and controlled externally

  22. Intermediate Hmmer • Want to calculate P(S|M) where the sum over the space of all sequence should be 1 • …The rules of the HMM allow us to do this • Implied that the insertions follow a geometric distribution • From a multiple sequence alignment “seed”, Hmmer make a consensus sequences and searches databases against this consensus sequence

  23. Hmmer Results

  24. ClustalW Alignment of SPI1 Effectors

  25. ClustalW Alignment of All Known Effectors

  26. Analysis of TRVI-Putative Cytoplasmic Proteins • Literature search • YciE not found • YciF classified as a putative structural protein by Blattner et al. • BLAST searches • STM0274 almost exactly SciI (S. typhimurium); other homologies to ImpC and ImpD (Rhizobium leguminosarum), and conserved hypotheticals—no literature on SciI, ImpC, nor ImpD • YciF has homologies to other putative structural proteins in Shigella and E.coli. Also homologous to several conserved hypotheticals • YciE has homologies to YciE from E.coli and other putative cytoplasmic/structural proteins in other species (YciE and YciF do not hit each other) • STM3767 homologous to a 4-hydroxy-2-oxoglutarate aldolase and several hypothetical proteins • STM4192 homologous to a nucleoprotein/polynucleotide-associated enzyme, hypothetical protein YaiL from E.coli, and hypotheticals (YaiL not in literature)

  27. Analysis of TRVI-Microarray Proteins • SseJ and YciE show up • fruF is part of the phosphoenolpyruvate: fructose phosphotransferase system • STM1181 is a putative flagella basal body part

  28. S. typhimurium MEME Motif Summary

  29. MEME MAST Analysis MEME search results using MAST and searched by domain: Domain 1: SseI, SlrP, SopA (putative effector proteins), YebE Domain 2: SseI, SlrP, YeeY, YeaH (putative cytoplasmic protein) Domain 3: SseI, HepA/RapA, Putative inner membrane protein (STM1698) Domain 4: YfeC, Putative periplasmic proteins (STM3783 and STM3605) Domain 5: RffG, OmpR (regulatory protein), PrpA,SirC (invasion regulator) Domain 6: SseI, SlrP, YadF, YaiB, PrpC(protein phosphatase), InvB (part of needle complex) Domain 7: CitC (citrate carrier), YcfN, YjeQ, STM0611, STM2406 Domain 8: DdlA (d-alanine ligase), GlyS, PgtA (phosphoglycerate transporter), STM4502 • Domains 1,3, and 5 look to be important for SPI2 secretion • The other domains are important for small, related subsets of proteins

  30. MEME Including Putative Cytoplasmic Proteins

  31. S. typhimurium Search Results Summary Hmmer search of aligned sequences : Only the input sequences (+ 2 theoretically secreted proteins) were returned. SPI1 and SPI2 effectors both have significant e-values from a combined matrix. TRVI search allowing for gaps and substitutions: 56 hits returned—Possible interesting hits include SseI, 5 LysR family proteins, 5 putative cytoplasmic proteins , 1 putative periplasmic protein, 2 inner membrane proteins, and 3 flagellar proteins. 4 proteins (FruF, SseJ, YciE, and a putative flagellar protein) were also identified in a DNA microarray screen under SPI2 inducing conditions with cholesterol. MEME search results using MAST and searched by domain: Domain 1: SseI, SlrP, SopA (putative effector proteins), YebE Domain 2: SseI, SlrP, YeeY, YeaH (putative cytoplasmic protein) Domain 3: SseI, HepA/RapA, Putative inner membrane protein (STM1698) Domain 4: YfeC, Putative periplasmic proteins (STM3783 and STM3605) Domain 5: RffG, OmpR (regulatory protein), PrpA,SirC (invasion regulator) Domain 6: SseI, SlrP, YadF, YaiB, PrpC(protein phosphatase), InvB (part of needle complex) Domain 7: CitC (citrate carrier), YcfN, YjeQ, STM0611, STM2406 Domain 8: DdlA (d-alanine ligase), GlyS, PgtA (phosphoglycerate transporter), STM4502

  32. Primary Structure Conclusions • The best lead may be YciE, a putative cytoplasmic protein found with two different search methods • The methods did not give the same output • Hypothetical proteins found in the literature such as SipD, SptP (SPI1) and SpiC, SrfJ, SseB,C,D (SPI2) were not found • All proteins that go out via SPI2 do not necessarily have the WEK(I/M)XXFF motif • There is not a clear SPI1 motif

  33. Secondary Structure Prediction • Psipred structure prediction server used • Predictions made by two feed-forward neural networks based on PSI-BLAST output • N-terminal motif (MEME 3)—random coil in all SPI2 proteins • First SPI2 motif at aa 31-38 (MEME 1)—examples are SseJ, SifA, SifB(+F), SlrP(+F), SseI, SspH1(+F) • Second SPI2 motif at aa 105-120 (no MEME)—entirely random coil except for a small segment of SspH2

  34. Secondary Structure Prediction of SifA

  35. Alpha-helical Wheel (SifA,SifB) WEK(I/M)XXFF is the Conserved motif among SPI2 effectors from aa 34 -41 (positions 1,2,3,4,7). All show this profile but SseJ (position 7 is polar-- still a hydrophobic face).

  36. SspH1 Secondary Structure

  37. SspH2 Alpha-Helical Wheel

More Related