330 likes | 982 Views
Single cell gene expression clustering. Moshe Levy. Advisors: Dr. Eran Treister , Dr. Tal Shay. http://www.free-powerpoint-templates-design.com. 60secs on biology. In every cell there is a gene representation.
E N D
Single cell gene expression clustering Moshe Levy Advisors: Dr. Eran Treister, Dr. Tal Shay http://www.free-powerpoint-templates-design.com
60secs on biology • In every cell there is a gene representation. • Modern techniques of gene sampling (SCrna) allow more precise measurements but bring new challenges for data analysis. • Gene clustering can be processed and benefit biology and medicine research.
Goals • Using unsupervised learning methods to cluster of the genes. • Explore known and novel clustering and preprocessing methods for SC and bulk sequencing data. • Usage examples: • Comparing different samples from different patients. • Same patient before and after a treatment.
Data • Single cell RNA sequencing (SCrna) • 2D table: genes to cells • Taken from real patients in Soroka as a part of MDS research.Myelodysplastic syndrome- Can lead to Leukemia and currently there is no method to know on who. • Challenge: sparse data – sparse ratio of ~0.75
Progression Run on Soroka MDS data Build a framework of my own Reviewing and experimenting with existing frameworks Step 4 Studying basic cell biology Step 3 Step 2 Step 1
Current state of research • Single cell research, tools and articles:
Framework • Data preprocessing: cleaning, filtering and normalizing • Cluster the chosen dataset using 3 different algorithms • Score each run using Silhouette • Plots: • Heatmap of the data sorted by the clusters • Correlation matrix of the data sorted
Best algorithm After many tests on various datasets the algorithm that produced the best results overall was Fuzzy K means, which is an upgraded version of K means algorithm. • Initilaize random cluster centers • While , do (step k): • For each gene: • Softly assign membership to the by metric distance from • Update cluster centers • Return hard assignments.
Results Expression matrix Correlation matrix genes genes genes Cells Pearson Correlation: Low expression High expression -1 1
Findings • The clustering is feasible. • FCM produced the best clustering. • All preprocessing techniques tested did not improve clusters’ quality.(MAGIC, Suerat)
Future directions • Process MDS research clustering results. • Novel methods can be tested using the framework infrastructure and preprocessing. • Investigate novel imputation methods.