340 likes | 423 Views
My Research Work and Clustering. Bernard Chen 2009. Outline. Introduction Experimental Setup Clustering Future Works. Central Dogma of Molecular Biology. Amino Acids, the subunit of proteins. Protein Primary, Secondary, and Tertiary Structure. Protein 3D Structure.
E N D
My Research Work and Clustering Bernard Chen 2009
Outline • Introduction • Experimental Setup • Clustering • Future Works
Protein Sequence Motif • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids • Sequence Motif: A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.
Protein Sequence Motif These biologically significant regions or residues are usually: • Enzyme catalytic site • Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin…) • Amino acid involved in binding a metal ion • Cysteines involved in disulfide bonds • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)
Goal of the our group • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries. • Discuss the relation between Protein Primary structure and Tertiary structure
Outline • Introduction • Experimental Setup • Clustering • Future Works
Representation of Segment • Sliding window size: 9 • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. • More than 560,000 segments (413MB) are generated by this method. • DSSP: Obtain 2nd Structure information
Outline • Introduction • Experimental Setup • Clustering • Future Works
Clustering Algorithms • There are two clustering algorithms we used in our approach: • K-means Clustering • Fuzzy C-means Clustering
Outline • Introduction • Experimental Setup • Clustering • Future Works
Original dataset Fuzzy C-Means Clustering Information Granule 1 ... Information Granule M K-means Clustering ... K-means Clustering Join Information Final Sequence Motifs Information Granular Computing Model
Reduce Space-complexity Table 1 summary of results obtained by FCM
Reduce Time-complexity Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)