1 / 12

Guidelines for sequence reports

Guidelines for sequence reports. Outline. Summary Results & Discussion Sequence identification Function assignment Fold assignment Identification of functional residues Methods Web tools: list which ones you used References E.g. functional characterization

aderyn
Download Presentation

Guidelines for sequence reports

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Guidelines for sequence reports

  2. Outline • Summary • Results & Discussion • Sequence identification • Function assignment • Fold assignment • Identification of functional residues • Methods • Web tools: list which ones you used • References • E.g. functional characterization • Maximum length: ten pages

  3. Sequence identification • What is the source of the query sequence? • Database search tools • Blast, PSI-Blast, PFAM-search • Superfamily, GTG, PFAM-squared • Dali (structures only) • All tools give a list of similar sequences. •  Nearest neighbour indicates species or taxon. •  Some sequences are classified in families.

  4. Function assignment • Many proteins are hypothetical; look further down the list for informative functional annotations • If sequence neighbour list shows many different functions at similar distances, build a sequence-tree to see if query sequence groups with one particular function.

  5. Fold assignment • Structures are conserved within families. E.g. PFAM family identification allows you to transfer fold (if no direct hit to PDB). • Sometimes the link may be at clan level (still homologous, conserved fold). • Not all homologous relationships are classified as such in databases. Evidence for remote homology: common fold, common conserved residues, similarity of function

  6. Identification of functional residues • Multiple sequence alignment (MSA) of sufficiently diverse sequences highlights functional residues. Use structural model to identify sites. • It is often difficult to make a good MSA between distant sequences. • Structural alignments show sharp signatures & functional sites • Compare well-aligned set and its secondary structure prediction to structural alignment. If you find conserved SSEs and conserved residues in proper succession, it strengthens the hypothesis of homology.

  7. Example: sequence_9A • Nearest neighbours • GTG server’s Blast search (old database): • 601959|AAG04994|AAG04994 HYPOTHETICAL 39.5 KDA PROTEIN at 62 % identity, alignment score 441 bits, evalue e-123 • 1270496|AAN69757|AAN69757 5-oxo-L-prolinase, putative at 61 % identity, alignment score 429 bits, evalue e-119 • NCBI’s Blast gives a closer match at 66 % identity to a protein from Pseudomonas mendocina ymp. • Conclusion: no perfect match, bacterial sequence, related to Pseudomonas (the query sequence was actually taken from the global ocean sampling survey).

  8. Family membership • GTG matched PFAM families: • PF04909 (best score >22000) • PF02126, PF01026, PF07969 (scores in the range 500-1000) • PF01979, PF0962 (scores below 500) • PF04909 is the Amidohydro_2 family which belongs to the Amidohydrolase (CL0034) clan. The other families found above are also members of this clan. Two PFAM families which are members of the clan but were not found by GTG are PF01244 and PF02811. • The clan was first described by Holm & Sander (1997).

  9. Function assignment • The neighbour list by NCBI’s Blast has many sequences annotated amidohydrolase_2 (very general description) and some annotated as 5-oxo-prolinase. •  phylogenetic tree will tell if the query sequence groups with 5-oxo-prolinases or some other function(s) of amidohydrolases.

  10. Fold assignment • There are known structures in PF04909 (e.g. 2ffi PUTATIVE 2-PYRONE-4,6-DICARBOXYLIC ACID HYDROLASE)  homology modeling is possible. • Quality of model • Partial alignment (bad!) • Manual alignment is difficult  check conservation of SSEs and conserved residues versus superfamily • Conservation mapping: many structures!

  11. Checkpoint (Day 13) • Filling the following fields in Excel sheet: • Sequence id (e.g. Sequence_1A) • Protein identification • (best match in protein database, description line) • Protein family • (e.g. PFAM family name) • Superfamily • (e.g. PFAM clan name) • PDB template found in family / superfamily / not found • Function assignment strategy • (e.g. analysing MSA, or phylogenomic approach) • 3D modelling strategy • (e.g. Swissmodel, manual MSA refinement, or threading)

  12. Returning the reports • Reports must be printed on paper • Send/deliver to • L. Holm, P.O. Box 56 (Viikinkaari 5) • Pigeonhole on floor 4 in Biocenter 2 (wing D) • Deadline: Monday 14 December, 2009

More Related