120 likes | 265 Views
Guidelines for sequence reports. Outline. Summary Results & Discussion Sequence identification Function assignment Fold assignment Identification of functional residues Methods Web tools: list which ones you used References E.g. functional characterization
E N D
Outline • Summary • Results & Discussion • Sequence identification • Function assignment • Fold assignment • Identification of functional residues • Methods • Web tools: list which ones you used • References • E.g. functional characterization • Maximum length: ten pages
Sequence identification • What is the source of the query sequence? • Database search tools • Blast, PSI-Blast, PFAM-search • Superfamily, GTG, PFAM-squared • Dali (structures only) • All tools give a list of similar sequences. • Nearest neighbour indicates species or taxon. • Some sequences are classified in families.
Function assignment • Many proteins are hypothetical; look further down the list for informative functional annotations • If sequence neighbour list shows many different functions at similar distances, build a sequence-tree to see if query sequence groups with one particular function.
Fold assignment • Structures are conserved within families. E.g. PFAM family identification allows you to transfer fold (if no direct hit to PDB). • Sometimes the link may be at clan level (still homologous, conserved fold). • Not all homologous relationships are classified as such in databases. Evidence for remote homology: common fold, common conserved residues, similarity of function
Identification of functional residues • Multiple sequence alignment (MSA) of sufficiently diverse sequences highlights functional residues. Use structural model to identify sites. • It is often difficult to make a good MSA between distant sequences. • Structural alignments show sharp signatures & functional sites • Compare well-aligned set and its secondary structure prediction to structural alignment. If you find conserved SSEs and conserved residues in proper succession, it strengthens the hypothesis of homology.
Example: sequence_9A • Nearest neighbours • GTG server’s Blast search (old database): • 601959|AAG04994|AAG04994 HYPOTHETICAL 39.5 KDA PROTEIN at 62 % identity, alignment score 441 bits, evalue e-123 • 1270496|AAN69757|AAN69757 5-oxo-L-prolinase, putative at 61 % identity, alignment score 429 bits, evalue e-119 • NCBI’s Blast gives a closer match at 66 % identity to a protein from Pseudomonas mendocina ymp. • Conclusion: no perfect match, bacterial sequence, related to Pseudomonas (the query sequence was actually taken from the global ocean sampling survey).
Family membership • GTG matched PFAM families: • PF04909 (best score >22000) • PF02126, PF01026, PF07969 (scores in the range 500-1000) • PF01979, PF0962 (scores below 500) • PF04909 is the Amidohydro_2 family which belongs to the Amidohydrolase (CL0034) clan. The other families found above are also members of this clan. Two PFAM families which are members of the clan but were not found by GTG are PF01244 and PF02811. • The clan was first described by Holm & Sander (1997).
Function assignment • The neighbour list by NCBI’s Blast has many sequences annotated amidohydrolase_2 (very general description) and some annotated as 5-oxo-prolinase. • phylogenetic tree will tell if the query sequence groups with 5-oxo-prolinases or some other function(s) of amidohydrolases.
Fold assignment • There are known structures in PF04909 (e.g. 2ffi PUTATIVE 2-PYRONE-4,6-DICARBOXYLIC ACID HYDROLASE) homology modeling is possible. • Quality of model • Partial alignment (bad!) • Manual alignment is difficult check conservation of SSEs and conserved residues versus superfamily • Conservation mapping: many structures!
Checkpoint (Day 13) • Filling the following fields in Excel sheet: • Sequence id (e.g. Sequence_1A) • Protein identification • (best match in protein database, description line) • Protein family • (e.g. PFAM family name) • Superfamily • (e.g. PFAM clan name) • PDB template found in family / superfamily / not found • Function assignment strategy • (e.g. analysing MSA, or phylogenomic approach) • 3D modelling strategy • (e.g. Swissmodel, manual MSA refinement, or threading)
Returning the reports • Reports must be printed on paper • Send/deliver to • L. Holm, P.O. Box 56 (Viikinkaari 5) • Pigeonhole on floor 4 in Biocenter 2 (wing D) • Deadline: Monday 14 December, 2009