390 likes | 555 Views
A computational study of protein folding pathways. Reducing the computational complexity of the folding process using the building block folding model. Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth Nussinov. The building blocks model (Chung Jung Tsai).
E N D
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model. Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth Nussinov
The building blocks model(Chung Jung Tsai) • Protein folding is a hierarchical process. • A protein is constructed from HFU’s. • HFU - the result of a combinatorial assembly of building blocks. • Building block - a contiguous, highly populated fragment. • The building block model allows illustrating the protein folding pathway.
An outline of the building blocks algorithm • Scoring function - measures the relative stability of a candidate building block • Three ingredients: • Compactness • Degree of isolation • hydrophobicity • The result - an “anatomy tree” that illustrates the most probable folding route.
The Scoring Function Z - Compactness H - hydrophobicity I - Isolation
Compactness, Hydrophobicity and Isolation definitions • Compactness - • Hydrophobicity - • Isolation -
The Cutting Procedure • Locating a basket of candidate building blocks (relatively stable contiguous fragments): • Assign a stability score to all the candidate fragments • Collect the local minima in the “fragment map” (best score in a given radius). • Recursively splitting the protein top-down: • Search the “basket” for a set of fragments that constitute the whole fragment, allowing a short overlap (7 residues) and a gap of up to 15 residues. • Minimum building block size - 15. • No node can have only one child (except for the root) • Stop when the node can not be split any further • In this work, building blocks up to level 6.
Usefulness of the anatomy tree • It is possible to see whether a protein folds through single or multiple route(s). • These routes can be observed by inspecting the fragment map (there can be more than one way to construct a tree). • Sequential versus non-sequential folding. • Sequential – contact made only between consecutive building blocks. • Binary anatomy tree sequential folder. • Fast versus slow folding • Sequential folding proteins usually fold faster. • Climbing up the tree allows us to illustrate the folding process.
Critical building blocks (Sandeep Kumar) • Some building blocks may be considered critical for correct folding. • A critical building block is in contact with other building blocks in the protein. • It likely to be inserted between sequentially connected building blocks. • Without it, the other building blocks are likely to mis-associate. • The structure and sequence of a critical BB is more likely to be conserved.
Critical building block algorithm • For each building block: • Compute its diff. contacting surface area . • Compute its Critical building block index : • Compute its Z-score:
Critical building blocks (cont.) • It is found at most levels below the hydrophobic folding unit level • It has a consistently high CIndex at different levels • Its CIndex is significant by at least 2 standard deviations in at least one level of protein anatomy A building block is critical if:
The goals of my research • Clustering the building blocks according to their 3-D structures, using a rigid matching algorithm. • Analyzing the building blocks: Sequence, stability distribution, size. • Analyzing the clusters: Size, stability score distribution, sequence conservation, criticalness conservation.
The goals of my research (cont.) • Analyzing the critical building blocks: position within the protein, relative stability, sequence and structure conservation. • Developing an algorithm that assigns a set of building blocks to a protein sequence, using sequence similarity, relative stability and more information.
Clustering the building blocks • Each cluster has representative members (one or more) • For each building block structure: • Go over the clusters. • Match with cluster representative(s). • If matches (1.5A rmsd, 70% size) - join the building block to the cluster. • If no match found - open a new cluster with this building block as a representative. Problem -O(n²) comparisons n - number of clusters
Clustering of the building blocks Cluster 1 Cluster 2 Cluster n … ? ?
Making clustering more efficient • Dividing the building blocks into SCOP families (proteins from the same family usually produce the same building blocks). • Clustering each family and then merge all the clusters - reduces the number of clusters at each instance.
Sequence analysis of the clusters • Sequence clustering of each structural cluster (using BLAST). • Creating a non-redundant sequence dataset. • Goal - finding a connection between (short) sequences and structures.
Statistical analysis of the clusters and of the critical building blocks • Stability score distribution among cluster members. • Criticalness score distribution among cluster members. • Position distribution of the critical building blocks. • Stability score as a function of criticalness score.
Distribution of the position inside the protein - all-alpha, level 3
Stability score of critical and non-critical building blocks (histogram) Non-critical Critical
Final goal Given a sequence and using the information accumulated so far - is there a way of matching a set of building blocks to it?
The building block assignment algorithm • Perform sequence alignment of the protein sequence against the building block sequence database. • Construct a directed, acyclic graph. • Each matching building block is a graph vertex and is assigned a score depending on the sequence alignment score, building block stability and other parameters. • Directed edges connecting the fragments that match to consecutive areas in the protein sequence, allowing short overlaps and small gaps. • Edge score – average score of connected vertices.
The building block assignment algorithm (cont.) • Add fictitious “start” and “target” vertices. • Connect start to all starting vertices • Connect all ending vertices to target. • Find shortest path from start to target using the Single source shortest path algorithm. • The path is an “optimal” building block assignment covering the protein sequence.
Suggestions for future work • Improving the algorithm and adding new parameters to it (secondary structure alignment, trying other building blocks from the same cluster as the matching building blocks etc.). • Combinatorial assembly – Yuval’s work. • Further cluster analysis – inquiring into sequence conservation • Conformation stability measurements (molecular dynamics…)
Conclusions Using the hierarchical folding model, It may be possible to reduce the folding complexity, assigning local substructures and then assembling them.