440 likes | 583 Views
Common parameters. At the beginning one need to set up the parameters. http://human.thegpm.org. Common parameters. Most important: the input experimental spectra Self-explaining. . Common parameters. Taxon, and database Self-explaining.
E N D
Common parameters • At the beginning one need to set up the parameters. • http://human.thegpm.org
Common parameters • Most important: the input experimental spectra • Self-explaining.
Common parameters • Taxon, and database • Self-explaining. • E.g. samples form human cells should be queried against human protein database. • Sometimes Protein Sequence libraries are available.
Common parameters • Parent mass tolerance • If it is much smaller than the optimal would be: • the correct peptide can be eliminated from the search space • Execution timedecreases Spectra comparison
Common parameters • Parent mass tolerance • If it is much bigger than the optimal would be: • decreases the significance of the scores, • makes execution time longer Spectra comparison
Common parameters • Parent mass tolerance • Usually is around 1Da. Spectra comparison
Common parameters • Fragment ion match tolerance • Depends on the instrument accuracy. • If it is mach small than the optimum:matches will be lost 100% 0% 1 0
Common parameters • Fragment ion match tolerance • If it is much smaller than the optimal would be:Correct matched peaks will be lost.Increases the FDR, increases the false negatives, decreases the sensitivity,
Common parameters • If the fragment ion match tolerance is much bigger than the optimal would be: • Many theoretical peaks will match to an experimental peak • Increases the random scores and it decreases the statistical significance
Fragment ion tolerance (T) T = 0.4Da (correct) T = 0.05Da (too small) T = 2.0Da (too large)
Fragment ion tolerance (T) T = 0.4 (correct) T = 0.05 (too small) T = 2.0 (too large) 217 proteins 713 homologs 930 proteins 132 proteins 406 homologs 538 proteins 197 proteins 589 homologs 786 proteins
Common parameters • Instrument • Some database search software's allow you to select the type of the instruments like ESI QUAD or Quad-TOF • This fine-tunes the search engine according to which fragment ion series will be used for scoring. • E.g.: Immonium ions, a series ions, b-, c-, x-, a-NH3,z+H series, y-H2O etc.
Common parameters • Enzyme, • the enzyme used for enzymatic digestion in the biological sample preparation. • This will be used for the in silico digestion of protein sequences for peptide generation.
Common parameters • E-value cut off
Common parameters • Ion mass search type • Monoisotopic (default) • More accurate, • Average • Might need larger fragment ion tolerance,
Common parameters • Charge state • Too high charge state increases the FDR.
Common parameters • Decoy search • Includes reversed dataset in the peptide identification. • Provides more accurate p-value and FDR estimation • Can double the search time
Common parameters • Error tolerant search. Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. • Underestimated mass measurement error (should be seen in peptide view graphs, • Incorrect determination of precursor charge state • Peptide sequence is not in the database. • Missed cleavage & unexpected cleavage, • Unexpected chemical & post-translational modification.
Input data Peptide assignment Validation Protein inference Interpretation Scores: 13. 15 6. 4 1. 4 9. 3 4. 3 3. 2 7. 2 11. 2 8. 1 10. 1 2. 1 5. 1 12. 1 Quantitation Input data Experimental Spectra Score: 32 Peptide: SHLITLLLFLFHSETICR Cn=(32-4)/32=0.875 Score: 4 Peptide: AELDLNMTR Cn=(4-4)/4=0 Score: 3 Peptide: MEICRGLR Cn=(3-3)/3=0 Score: 15 Peptide: LLHGDPGEEDK Cn=(15-4)/15=0.733 Score: 4 Peptide: MDHPEDESHSEK Score: 5 Peptide: SAEDLEADK Protein sequence DB Score: 3 Peptide: SIEAKLTLR Keep the peptide assignment that exceeds a certain limit.
Peptide assignment Input data Validation Protein inference Interpretation • Scores: • 1. 2 Quantitation Input data Experimental Spectra Unexpected cleavages TFGQVVAR FGQVVAR GQVVAR QVVAR VVAR VAR AR TFGQVVA TFGQVV TFGQV TFGQ TFG TF Spectra comparison: Protein sequence DB >IPI:IPI00000044.1|SWISS-PROT:P01127 MNRTFGQVVARLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Peptide assignment Input data Validation Protein inference Interpretation • Scores: • 1. 2 Quantitation Missed cleavages Input data Experimental Spectra Spectra comparison: Protein sequence DB >IPI:IPI00000044.1|SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Peptide assignment Input data Validation Protein inference Interpretation • Scores: • 2 • 2 Quantitation Missed cleavages Input data Experimental Spectra Spectra comparison: Protein sequence DB >IPI:IPI00000044.1|SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Peptide assignment Input data Validation Protein inference Interpretation • Scores: • 2 • 2 • 1 Quantitation Missed cleavages Input data Experimental Spectra Spectra comparison: Protein sequence DB >IPI:IPI00000044.1|SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Common parameters • Automatic error tolerant search. • Chemical and Post-Translational Modifications (PTMs) • Fixed modification (simply modifies the mass of the Amino Acid) • Variable modifications (can modify the mass) • Search engines iteratively insert all combination of the possible PTMs.
Common parameters • Automatic error tolerant search. • more peptides can be indentified. • enlarges the search space much more • Increases the execution time • Decreases the statistical significance, increases the FDR.
Common parameters • Automatic error tolerant search. • In order to reduce the search space two pass approach is applied. • 1st pass: • Identification of perfect peptides (no PTMs, perfect digestion) • 2nd pass: • Pass the proteins whose one of the peptides was identified in the 1st pass. • Extensive search in the reduced protein sequence, including missed and unexpected cleavage, PTMs, point mutations, etc.
Common parameters • Output parameters • Mainly about formatting the results files. What and how many details want to see.
Common parameters • Other program specific parameters. • Different for X!tandem, Mascot, Sequest, etc.
Good spectrum, good score, bad annotation • Rare if the p-value is significant • Good spectrum, bad score, bad annotation • Peptide might be modified, non-perfect digestion, not in the database.
Trans-Proteomic Pipeline (TPP) • Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. • TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. • The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.
Summary • Protein identification from MS/MS data is not a black box. • Always look at the results and understand how it