1 / 38

Bioinformatics and Evolutionary Genomics

Bioinformatics and Evolutionary Genomics. Request. We have a small group and also heterogeneous with respect to previous knowledge

leane
Download Presentation

Bioinformatics and Evolutionary Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics and Evolutionary Genomics

  2. Request • We have a small group • and also heterogeneous with respect to previous knowledge • PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

  3. Lectures and computer exercises • Homology, trees, • Genomic context , genome evolution, pathway evolution • HTP data • Eukaryotic Genome Evolution, tree of life. • Exercises … basic abilities, plus impression of what is possible / how type of research is done (albeit on a larger scale)

  4. Literature Discussion • Each (set of) articles will be introduced (=presentation) by a 1 / 2 persons, presentation should last approximately half an hour, followed by a discussion • What to discuss • What are the articles actually saying? What have authors done? (so that everybody knows) • What does this mean in a larger context? (e.g. a discussion of the discussion)

  5. Homology and Domains

  6. Gene / protein sequence evolution: what is homology • Definition homology (biology) • structures are said to be homologous if they are alike because of shared ancestry. • Classic: arms ~ bird wings ~ bat wings, • Genes/proteins/stretches of dna: sequence similarity because derived from the same ancestral sequence • Instead of analogous: with sequences we have convergence, but thought to be limited to specific cases (e.g. coiled-coil, regulatory motifs); but with function we have analogy e.g. analogous enzymes

  7. Why are we interested in homology • Function prediction → Homologous proteins tend to have similar functions • Evolutionary dynamics → Tracing the evolution of genes (duplication, gene trees, origin of new gene families)

  8. How do we detect homology • Similarity of: • 3D structure → most conserved aspect, yet not all structures are available. Structures are compared and classified by “eye” and software packages (Dali). (NB classical homology); criterion shared “idiosyncratic” features that are not strictly necessary for function + sequence features • Sequence → less conserved, many sequences are however available. Homology determination is mainly based on models of sequence evolution and the likelihood that when you compare a sequence to a database you will find a sequence of at least that similarity. • NB Manually curated databases of 3D structure similarity are used as a benchmark for detection of homology by sequence similarity (SCOP, Blundels Bus).

  9. Gene / protein evolution: beyond blast, “distant homology” • Not obvious by blast • Substantial divergence, due to time and/or speed • Use “profile” (HMMer or PSI-BLAST), • In general work better because ECGHR ECGHR C G TCQQL SIGNL ECNHN ECNHN

  10. Gene / protein evolution: beyond blast, “distant homology” • PSI-BLAST a multiple sequence alignment is generated on the fly to detect which residues/positions characterize the family. • OR use CDD, PFAM or SMART • Experts have collected representative and divergent members of a gene family and use HMMer or RPS-BLAST to see if your query sequence belongs to this gene family (i.e. is homologous to the members) • clearer/cleaner than psi-blast or blast.

  11. How to detect very distant homology / superfamilies • When two protein families are homologous but the homology is not obvious they are part of the same so called superfamily How to detect: • In depth PSI-BLAST • Reciprocal • Use of right seed • “hopping” (homology is by definition transitive)

  12. Gene / protein evolution: Distant homology • alignment-vs-alignment, Profile-vs-profile, HMM vs HMM comparison (whereas HHMer, PSI-BLAST compare a profile to a single sequence) • Unfortunately statistic are still poor • “works” because ACRNG ACRNG ACGNR ACGNR C C TCQQL TCQQL TFQQI TCILL

  13. Gene / protein evolution: Distant homology • 3D structure comparison/alignment plus visual inspection of multiple sequence alignment by Alexey Murzin • The results of this are stored in the SCOP database • (Blundel’s bus)

  14. Structural alignment Secondary structure elements • Alpha-helices • Beta strands (beta sheets) • Loops Fold vs superfamily?

  15. An example of distant homology • E.g. superfamily P-loop containing nucleoside triphosphate hydrolase • In humans: AAA 130, ABC_tran 182, SMC_N 29 • Zot; UPF0079; TraG; SMC_N; SKI; Sigma54_activat; Rep_fac_C; Rad17; NACHT; Mg_chelatase; MCM; KTI12; IstB; GSPII_E; DUF853; DNA_pol3_delta; Bac_DnaA; APS_kinase; ABC_tran; AAA_PrkA; AAA_5; AAA_3; AAA_2; AAA;

  16. Apart from sequence and structural features conservation of basic molecular function

  17. Distant Homology:Applications to function prediction • Bacterial protein of unknown function (DUF853) • Member of the P-loop containing nucleoside triphosphate hydrolase superfamily • Thus thought to be an ATPase

  18. Relevance of homology for function prediction: “Similar function“ What is function ? • Various levels of description: • Sequence similarity, Homology has the largest relevance for Molecular Function. This is aspect of protein function that is best conserved, protein sequence, structure can often be interpreted in terms of function.

  19. Using distant homology for function prediction: example from (just) before PSI-BLAST & HMMer Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases.  Cell. 1997 Jan 10;88(1):9-11. Y. Yuan, J. Schultz, M. Mlodzik, P. Bork

  20. Distant Homology: Application to evolution • Invention vs (duplication and) divergence • First determine homology before putting sequences in multiple sequence alignment & tree building software • Two (or more) Proteins families that are present in all three kingdoms of life and which can be determined to be homologous to each other: Information from before the Last Universal Common Ancestor, information about very early evolution b

  21. Protein domains: structural definition: separate in structure • a structural domain ("domain") is an element of overall structure that is self-stabilizing and often folds independently of the rest of the protein chain

  22. Protein domains: sequence/evolutionary definition: Separate in “evolution” • Homologous parts of proteins that occur with different “partners” • Mobile • Modules • Almost always same as structural definition

  23. Implications of domains for homology: • The shared ancestry is not a property of the whole gene but only of part of the gene. • When studying the evolution of gene families, consider fusions / domain combinations (also when making trees etc.)

  24. Domain repeats. Homology? • Blast homology vs the “real” homology unit • Q8TKV1 (Methanosarcina acetivorans) • ?

  25. Q8TKV1

  26. Ramifications for function prediction & understanding of cellular processes: “one domain one (molecular) function” (in contrast to one gene one function) • This bit does this and that bit does that • E.g. • multidomain enzymes • Transcriptional regulators

  27. Example multidomain enzyme: TrpG E.coli

  28. Ramifications for function prediction when doing blast: mind the domains Protein B is wrongly annotated as having the function of domain 1, based on homology with the multidomain protein A, but not with domain 1 (multi-domain architecture problem for annotating proteins via blast) 1 2 A B B

  29. Ramifications for function prediction when doing blast: mind the domains Protein B is incompletely annotated as having the function of domain 2, based on homology with the single domain protein A, the second domain is missed in the annotation 1 2 A B B

  30. Ramifications for function predictionwhen doing blast do psi-blast, cdd / pfam instead. • Rather than discover the domain structure by blast yourself, use e.g. SMART / PFAM / CDD to do it for you • NB CDD

  31. Domains and distant homologies • Promiscuous domains (i.e. that are present in many proteins), are often quite diverged and thus need sensitive homology detection tools in order to be recognized.. • Moreover it is often only the most general functional property of the domain that is conserved over such long evolutionary distances • Over long evolutionary distances genes are often only homologous in the sense that they share a domain, rather than being full length homologous • We THUS use PFAM/SMART etc. for • The domains • And to improve upon BLAST / be cleaner than PSI-BLAST • And because most of the sequences are covered by these database. No need to reinvent the wheel. The ones that are not, are often “non globular”, recent inventions, or very fast evolving

  32. Disclaimer: non-globular regions • Low complexity • Unstructured, Elongated (as opposed to globular) • Many polar/charged residues; few hydrophobic residues • parts of proteins that do not posses a clear 3D structure • Convergence • Do not obey PAM or BLOSUM

  33. Disclaimer: Coiled coil • All alpha: thought to arise independently (convergence) • Hypothesis: reservoir for “new” folds: all alpha folds (Koonin EV) • E.g. ras / rho / rab / ran / -GAPs

  34. Disclaimer: Other protein motifs • Signal peptides • Lipid anchoring • Convergence yet still important to predict • Trans-membrane?

  35. Interesting result on protein evolution regarding domains and duplications: neutral? Black observed Blue: model of recombination & duplication separate Red: also duplication of combinations b

More Related