280 likes | 870 Views
Protein structures in the PDB. Domains. proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called domains domains are the basic unit of structure classification
E N D
Domains • proteins can be modular • single chain may be divisible into smaller independent units of tertiary structure called domains • domains are the basic unit of structure classification • different domains in a protein are also often associated with different functions carried out by the protein.
Definition of domain • “A polypeptide or part of a polypeptide chain that can independently fold into a stable tertiary structure...” from Introduction to Protein Structure, by Branden & Tooze • “Compact units within the folding pattern of a single chain that look as if they should have independent stability.” from Introduction to Protein Architecture, by Lesk MBP Figure to go here
Motif (Supersecondary Structure) • there are certain favored arrangements of multiple secondary structure elements that recur again and again in proteins--these are known as motifs or supersecondary structures • a motif is usually smaller than a domain but can encompass an entire domain greek key beta-alpha-beta
Protein Taxonomy-The CATH Hierarchy 1. Divide PDB structure entries into domains (using domain recognition algorithms--the domain is the fundamental unit of structure classification 2. Classify each domain according to a five level hierarchy: Class Architecture Topology Homologous Superfamily Sequence Family the top 3 levels of the hierarchy are purely phenetic--based on characteristics of the structure, not on evolutionary relationships the bottom two levels include some phyleticclassification as well-- groupings according to putative common ancestry based on structural similarity, functional similarity, and sequence similarity protein evolution is not well understood-- there is to date no purely phyletic classification system
Class • In the CATH hierarchy, Class simply describes what type of secondary structure is present. • There are only four classes: mainly a mainly b a & b few secondary structures • 90% of structures are trivial to assign at this level.
Architecture • Architecture is hard to define precisely • In CATH it is defined broadly as describing “general features of protein shape” such as arrangements of secondary structure in 3D space • It does not define connectivities between secondary structural elements--that’s what the topology level does. It does not even explicitly define directionality of secondary structure, e.g. parallel or antiparallel beta-sheets. • in CATH, architectures are presently assigned manually, by visual inspection. • let’s look at some architectures!
Topology (Fold) • if two proteins have the same topology, it means they have the same number and arrangement of secondary structures, and the connectivities between these elements are the same. • this is also sometimes called the foldof a protein. • in CATH, automated structure alignment is used to group proteins according to topology. We will discuss this later. • we will now look at some examples which illustrate differences in topology.
Topology: differences in connectivity • example: a four-stranded antiparallel beta-sheet can have many different topologies based on the order in which • the four beta-strands are connected. “up-and-down” “greek key”
Topology: differences in handedness • example: in a beta-alpha-beta motif, if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands.
Visualizing protein topology--TOPS cartoons • up triangles=up-facing beta strands • down triangles=down-facing beta strands • horizontal rows of triangles=beta sheets (beta barrel would be a ring of triangles) • circles=helices • lines=loops • if loops enter from top, line drawn to ctr. • if loops enter from bottom, line drawn to boundary fold above is clearly an antiparallel beta-sandwich
Visual summary of top three levels of CATH hierarchy CLASS ARCHITECTURE TOPOLOGY
Discovery of New Folds • structural taxonomy reveals that although structures are being solved more rapidly than ever, fewer and fewer of them have new folds! Will we get them all soon?
Homologous superfamily/Sequence family • The lowest two levels in the CATH hierarchy relate to common ancestry • some, but not all proteins with the same fold show evidence of common ancestry • the surest way of identifying common ancestry is that two proteins have sequences roughly >30% identical (sequence family level) • if protein sequences are not that similar, common ancestry may still be inferred on the basis of a combination of structural and functional similarity, and possibly weak sequence similarity (homologous superfamily level)
Multifunctional “Superfolds” some architectures have many folds-- “superarchitecture” some folds have many homologous superfamilies, which means they are used for a variety of functions. these are called “superfolds”
“Common core” • structures need not share exactly the same number, type and connectivity of secondary structural elements to be grouped into a single fold type. • in fact, evolutionarily related proteins often share a common core of structurally related elements but may differ in presence or absence of a secondary structure element or two.
Problems in Fold Classification • “Structure space” has a continuous aspect, especially in certain types of folds, which makes clustering structures into fold families difficult. This is an inherent problem for any classification method based on hierarchical clustering. • It seems reasonable to group as having the same fold proteins which share some common core but differ in addition/subtraction of a few secondary structure elements. • But this can lead to unnaturally large and diverse fold families via the Russian doll effect and motif overlap.
Russian Doll Effect • Acontinuous range of slight size differences will lead to clustering proteins of very different size. small--> medium-->large.
Motif Overlap Motif overlap effects: Sometimes two proteins will share a common core but one of them will share a slightly different (but not necessarily larger) common core with a third protein. A continuous range of overlapping common cores AB-->BC-->CD will lead to grouping proteins that have no common core.
Comparison of SCOP and CATH Hierarchies SCOPCATH classclass architecture fold topology homologous superfamily superfamily family sequence family domaindomain CATH more directed toward structural classification, SCOP pays more attention to evolutionary relationships
Another SCOP/CATH difference • in CATH, there is one class to represent mixed alpha-beta • in SCOP there are two: a/b: beta structure is largely parallel, made of bab motifs a+b: alpha and beta structure segregated to different parts of structure
SCOP and CATH • they have in common that they are hierarchical and based on abstractions • they both include some manualaspects and are curated by experts in the field of protein structure • are there automatedmethods for structure classification/comparison?