320 likes | 533 Views
BMMB597E Protein Evolution. Protein classification. Protein families. The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin , were solved (in 1959—60) before the amino acid sequences were determined
E N D
BMMB597EProtein Evolution Protein classification
Protein families • The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined • It came as a surprise that the structures were quite similar • Soon it became clear, on the basis of both sequences and structures, that there were families of proteins
50 years earlier, there were some hints … • E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909) • Crystallography 3 years before discovery of X-ray diffraction?
Reichert and Brown studied interfacial angles in haemoglobin crystals • Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance • They found that the angles differed from species to species • Similarities in values of interfacial angles were consistent with classical taxonomic tree • They even found differences between oxy- and deoxyhaemoglobin
Most premature scientific result ever? • These results implied: • That proteins adopted (or at least could adopt) unique structures, to form a crystal • That protein structures varied between species • That this variation was parallel with the evolution of the species • That proteins could change structure as a result of changes in state of ligation • In 1909!
M.O. Dayhoff • Pioneer of bioinformatics • Collected protein sequences • First curated ‘database’ • Recognized that proteins form families, on the basis of amino acid sequences • Computational sequence alignments • First evolutionary tree • First amino-acid substitution matrix (later replaced by BLOSUM)
Can relationships among proteins be extended beyond families? • Families = sets of proteins with such obvious similarities that we assume that they are related • One question: how much similarity do we need to believe in a relationship? • How far can evolution go? • Convergent evolution? • Cautionary tale: chymotrypsin / subtilisin
Chymotrypsin-subtilisin • Both proteolytic enzymes • Chymotrypsin mammalian • subtilisin from B. subtilis • Both have catalytic triads • Same function – same mechanism • Sequences 12% similar (near noise level) • However, structures show them to be unrelated
How can we classify proteins that belong to families? • Align sequences • Calculate phylogenetic tree (various ways to do this, depend on sequence alignment) • Usually, phylogenetic tree of homologous proteins from different species follow phylogenetic tree based on classical taxonomy • That is reassuring • But what happens as divergence proceeds?
How can we classify proteins that do not obviously belong to families? • Base this on structure rather than sequence • Structural similarities are maintained as divergence proceeds, better than sequence similarities • For closely related proteins, expect no difference between sequence-based and structure based classification • How far can classification be extended?
SCOP Structural Classification of Proteins • Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt • Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree. • For instance, the secondary structures of some proteins are only -helices • Others, have -sheets but no -helices
SCOP • SCOP is a database that gives a hierarchical classification of all protein domains • Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability Fragment of fibronectin
Dissection of structure into domains • It is not always quite so obvious how to divide a protein into domains • There is some (not a lot) of room for argument • Note that sometimes the chain passes back and forth between domains • In these cases one or both domains do not consist entirely of a consecutive set of residues
SCOP, CATH, DALI Database classify protein structures • SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous superfamily) • DALI Database • These web sites have many useful features: • information-retrieval engines, includingsearch by keyword or sequence • presentation of structure pictures • links to other related sites including bibliographical databases.
SCOPhttp://www.scop.mrc-lmb.cam.ac.uk • SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity. • Domains -- extracted from the Protein Data Bank entries. • Sets of domains are grouped into families: sets domains for which imilaritiesin structure, function and sequence imply a common evolutionary origin.
The SCOP hierarchy • Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies • Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds. • Finally, each fold group falls into one of the general classes.
Major classes in SCOP • – secondary structure all helical • – secondary structure all sheet • / – helices and sheets, but in different parts of structure • + – contain -- supersecondary structure • ‘small proteins’ – which often have little secondary structure and are held together by disulphide bridges or ligands; for instance, wheat-germ agglutinin)
Summary of SCOP hierarchy • Class • Fold • Superfamily • Family • Domain
SCOP classification of flavodoxin Protein: Flavodoxin from Clostridium beijerinckii[TaxId: 1520] Lineage: Root: scop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345 Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMN Protein: Flavodoxin [52220] Species: Clostridium beijerinckii[TaxId: 1520] [52226] PDB Entry Domains: 5nulcomplexedwith fmn; mutantchain a [31191] 2faxcomplexedwith fmn; mutantchain a [31194] … many others
Flavodoxin NADPH-cytochrome P450 reductasesame superfamily, different family
Flavodoxin CHEY same fold, different superfamily
Flavodoxin Spinach ferredoxinreductasesame class, different folds
Flavodoxin in the SCOP hierarchy • To give some idea of the nature of the similarities expressed by the differentlevels of the hierarchy • Flavodoxin fromClostridium beijerinckiiand NADPH-cytochrome P450 reductaseare in the same superfamily, but different families. • Flavodoxinand the signal transduction protein CHEY are in the same fold category, but different superfamilies. • Flavodoxin and Spinach ferredoxinreductase are in the same class – + – but have different folds.
CATH presents a classification scheme similar to that of SCOP • CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy. • In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families. • A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry • A topology or fold family comprises sets of homologous superfamiliesthat share the spatial arrangement and connectivity of helices and strands • Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four -helix bundles with different connectivities would share the same architecture but not the same topology in CATH • General classesof architectures in CATH are:. , - (subsuming the / and + classes of SCOP), and domains of low secondary structure content.
Do different classification schemes agree? • To classify protein structures (or any other set of objects) you need to be able to measure the similarities among them. • The measure of similarity induces a tree-like representation of the relationships. • CATH, SCOP, DALI and the others, agree, for the most part, on what is similar, and the tree structures of their classifications are therefore also similar. • However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy. • These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different.