160 likes | 301 Views
Peptide-assisted annotation of the Mlp genome. Philippe Tanguay Nicolas Feau David Joly Richard Hamelin. Objective. Use peptide libraries to validate the in silico prediction of gene models. Assumption : « if a peptide protein is detected, then there must be a gene that encodes it ».
E N D
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin
Objective • Use peptide libraries to validate the in silico prediction of gene models Assumption : « if a peptide protein is detected, then there must be a gene that encodes it » Mapping peptides on a translated genome sequence = provides « correct frames of translation »
Waters MassPREP station LTQ ThermoElectron Methodology (hardware) Urediniospores (3729) Extraction Slicing Digestion Elution Protein extraction 1D SDS-PAGE Gel slicing (64) Trypsin digestion LC-MS/MS Bioinformatics Peptide MS/MS data acquisition
Methodology (Bioinformatic) Protein databases built from… Gene catalog (16694 GM) 6 frames translation of the genome Spectral identification by sequence database searching Mascot Sequest Mascot Sequest Statistical validation of peptide identifications 1 - Comparison of results from both db 2- Comparison of peptides and GM (validation/correction of genome annotations)
MLP proteomic results so far • 691 000 MS/MS spectra obtained from the total proteins Unique peptides: 6-frame translation Gene catalog Mascot + Sequest 4699 352 10980 Only Mascot False discovery rate below 1.6% 352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog
Peptide frequency distribution on GM The 10980 + 4699 peptides represent assignments for nearly 10% of the Gene catalog e.g. 1659 GM 300 No. gene model 250 Mean 9 peptides covering 134 AA / GM 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 No. peptide/gene model
Automated classification of peptides with no hit (352) on the Gene catalog • 5’ extension of a predicted GM • If peptide (s) located within the 1000 bp upstream the predicted GM start codon • 3’ extension of a predicted GM • If peptide (s) located within the 1000 bp downstream the predicted GM stop codon • 5’ and 3’ extension of a predicted GM • If peptides located within the 1000 bp upstream the start codon and within the 1000 bp downstream the predicted GM stop codon • Internal extension of a predicted GM • If peptide (s) located in the GM • New GM • If no predicted GM in the vicinity of the peptide (s)
Corrections-Additions to the Gene catalog • Mapping of the peptides with no hit on the genome allowed the following modifications Total 172
Manual curation- Internal extension • EuGene’s prediction is OK
Summary – Peptide-assisted genome annotation • Validated 10 % of the predicted GM • Corrected/found > 170 GM With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis: According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM
Perspectives • Analysing the Sequest output obtained from the 6-frames translation 5051 peptides identified with Mascot (352 with no hits on the Gene catalog) Sequest ? • A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes
Available material • Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions • The peptides GFF files will be made available to the Melampsora community
Finding the peptides on the different model prediction sets % Model prediction set Total GM GM validated Do we need to perform a new spectra search on the whole model prediction sets ?