130 likes | 264 Views
Yeast Dataset Analysis. Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell. Outline . Gene Ontology Annotation Data Preprocessing Cluster Results Conclusion. GO Annotations. Total Number of Gene: 799 327 Gene has GO at level 3 of Biological Process
E N D
Yeast Dataset Analysis Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell
Outline • Gene Ontology Annotation • Data Preprocessing • Cluster • Results • Conclusion
GO Annotations • Total Number of Gene: 799 • 327 Gene has GO at level 3 of Biological Process • Genes with GO but not at level 3: 272 • Genes without GO: 200
GO Anotation • Of 327 genes with GO at level 3 • 170 Genes belong to GO:0008152, the metabolism • 90 Genes belong to the GO:0007049 the Cell Cycle • 81 Genes belong to GO:0016043, the cell organization and biogenesis • 51 Genes belong to GO:0006810, the transport
Data Preprocessing • Dataset: 799 Cell Cycle Regulated Genes • Filter: Minimum Exiting value over 85% • Impute Missing Values Using KNN • Standardize Patterns (mean = 0 and standard deviation =1)
Cluster • SOTA – Self-Organizing Tree Algorithm • Euclidean Distance • Variability Threshold: 80%
Cluster61 Result
Cluster 61 • 67 Genes from 799 fall in Cluster 61 • 24 out of 67 genes has GO • 10 out of 24 genes belongs to metabolism • 14 belongs to Cell Cycle • 8 belongs to S phase of mitotic cell cycle • 8 belongs to DNA replication • 4 belongs to G1/S transition of mitotic cell cycle • Only one genes that belongs to metabolism not in cell cycles
Cluster 60 • 33 Genes in this Cluster • 11 of 33 has GO • 4 of 11 genes are in M-phase specific microtubule process which belongs to Cell Cycle • 7 in organelle organization and biogenesis which belongs to cell growth and/or maintenance • totally 8 in cell cycle
Cluster 59 • 38 genes in this cluster • 15 genes has anotation • 7 in metabolism • 5 in cell cycle • M phase of mitotic cell cycle has 3 • Nuclear division has 3 • No gene in these two classes are same
Conclusion & Future Work • Cluster #61 has strong relations with cell cycle, next is cluster #60 and #59 • Sub-Cluster the cluster #59, #60, #61 • Analyze the gene expression data of those genes that are known belongs to GO cell cycle annotations • Analyze other clusters • Do the same analyze to 6000 gene dataset
Reference • http://gepas.bioinfo.cnio.es/index.html • P. T. Spellman et al., Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization Mol. Biol. Cell., vol. 9, pp. 3273--3297, 1998. • Raymond J Cho. A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle. Mol. Biol. Cell., vol. 2, pp. 65--73, 1998. • Herrero, J., Valencia et al. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17(2), 126-136. 2001 • Orly Alter. Singular value decomposition for genome-wide expression data processing and modeling. PNS, vol. 97, pp 10101-10106. 2000 • http://www.cellsalive.com/cell_cycle.htm • http://www.geneontology.org/ • http://fatigo.bioinfo.cnio.es/htdocs/helpFatiGO.html