290 likes | 306 Views
This article explores the construction of protein sequence space and the concept of fractals to represent patterns in multidimensional sequence space. It discusses the use of different alphabets and dimensions to accommodate varying sequence lengths. The article also suggests a conjecture regarding the fractal representation of a higher-dimensional sequence space.
E N D
Ways to construct Protein Space Construction of sequence space from (Eigen et al. 1988) illustrating the construction of a high dimensional sequence space. Each additional sequence position adds another dimension, doubling the diagram for the shorter sequence. Shown is the progression from a single sequence position (line) to a tetramer (hypercube). A four (or twenty) letter code can be accommodated either through allowing four (or twenty) values for each dimension (Rechenberg 1973; Casari et al. 1995), or through additional dimensions (Eigen and Winkler-Oswatitsch 1992). Eigen, M. and R. Winkler-Oswatitsch (1992). Steps Towards Life: A Perspective on Evolution. Oxford; New York, Oxford University Press. Eigen, M., R. Winkler-Oswatitsch and A. Dress (1988). "Statistical geometry in sequence space: a method of quantitative comparative sequence analysis." Proc Natl Acad Sci U S A85(16): 5913-7 Casari, G., C. Sander and A. Valencia (1995). "A method to predict functional residues in proteins." Nat Struct Biol2(2): 171-8 Rechenberg, I. (1973). Evolutionsstrategie; Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart-Bad Cannstatt, Frommann-Holzboog.
one symbol -> 1D coordinate of dimension = pattern length
Two symbols -> Dimension = length of pattern length 1 = 1D:
Two symbols -> Dimension = length of pattern length 2 = 2D: dimensions correspond to position For each dimension two possibiities Note:Here is a possible bifurcation: a larger alphabet could be represented as more choices along the axis of position!
Two symbols -> Dimension = length of pattern length 3 = 3D:
Two symbols -> Dimension = length of pattern length 4 = 4D: aka Hypercube
Four Symbols: I.e.: with an alphabet of 4, we have a hypercube (4D) already with a pattern size of 2, provided we stick to a binary pattern in each dimension.
hypercubes at 2 and 4 alphabets 2 character alphabet, pattern size 4 4 character alphabet, pattern size 2
3 fractal enlarge fill in outer pattern repeats inner pattern = self similar = fractal
3 character alphapet4 pattern fractal Conjecture: For n -> infinity, the fractal midght fill a 2D triangle Note: check Mandelbrot
Same for 4 character alphabet 1 position 2 positions 3 positions
4 character alphabet continued(with cheating I didn’t actually add beads) 4 positions
4 character alphabet continued(with cheating I didn’t actually add beads) 5 positions
4 character alphabet continued(with cheating I didn’t actually add beads) 6 positions
4 character alphabet continued(with cheating I didn’t actually add beads) 7 positions
Alignment of V F A ATPase ATP binding SU(catalytic and non-catalytic SU)
UPGMA tree of V F A ATPase ATP binding SU with line dropped to partition (and colour) the 4 SU types (VA cat and non cat, F cat and non cat). Note that details of the tree $%#&@.
PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree
Same PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree, but turned slightly. (Giardia A SU selected in grey.)
Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 5th axis. (Eukaryotic A SU selected in grey.)
Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 6th axis. (Eukaryotic B SU selected in grey - forgot rice.)
Problems • Jalview’s approach requires an alignment - only homologous sequences can be depicted in the same space • Solution: One could use pattern absence / presence as coordinates