330 likes | 488 Views
Gene Discovery from Microarray Images. 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital cchen@cs.nthu.edu.tw cykao@csie.ntu.edu.tw Project#: 93-EC-17-A-19-S1-0016. Motivation and Data Acquisition.
E N D
Gene Discovery from Microarray Images 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital cchen@cs.nthu.edu.tw cykao@csie.ntu.edu.tw Project#: 93-EC-17-A-19-S1-0016
Motivation and Data Acquisition • Parts of our current works attempt to investigate and discover “a subset of genes” related to some specific diseases such as Hepatoma and Gastric Cancers by microarray experiments. Hence, we collect data from cDNA microarray images which are “spot signal intensities” via a sequence of biological experiments
Outline • Microarray Image Data Acquisition • Gridding for Image Segmentation • Normalization from MA-Plot • Finding Differentially Expressed Genes • Finding Discriminative Genes • Performance Evaluation by Dendrogram and K-means Algorithms
Cy3 (for Column 1) 639 54879 5980 1984 324 910 2153 236 Cy5 (for Column 6) 104 52858 567 189 36 1489 5083 407 Spot Feature Computation
Pre-Processing / Normalization • Due to the process of measurements or some unavoidable factors, “Raw Data” directly collected from experiments may contain noise and may have different scales, or have missing items. Thus, a pre-processing step for filtering out some inappropriate data, or normalization may be done.
Cy3Cy5 201 67 520 153 28276 21747 4072 6324 14807 690 1058 1451 572 524 M=(log2Cy3− log2Cy5) A= (log2Cy3+log2Cy5)/2 Program compustt.c computes spot features and pieceline.c does normalization and maplot.c does M-A plot Spot Features for Gene Discovery
Microarray Pattern Analysis • Microarrays consisting of 13574 effected genes from 18564 in a chip with tumor dyed in Cy3 and normal dyed in Cy5 • 12 HCV, 27 HBV, 1 HCV+HBV, 4 neither HCV nor HBV patients • Criterion for Differentially Expressed is defined as log2(Lowess normalized ratio of Cy3/Cy5) is greater than T (↑) or less than -T (↓)
Feature Selection/Extraction (1) • Given a set of N patterns from K categories (K=2, a problem of dichotomy) with Ni , 1≤ i ≤ K, patterns belonging to category i, each pattern consists of M redundant features, e.g., a microarray can be represented as a pattern consisting of 13574 features corresponding to 13574 effected genes. The goal is to select a small subset of features for “Recognition”
Feature Selection/Extraction (2) • Given a set of N patterns from K categories (K=2, a problem of dichotomy) with Ni , 1≤ i ≤ K, patterns belonging to category i. The goal of extraction is to transform an M-dimensional pattern into an m-dimensional pattern with m<<M for classification. A selected feature preserves the original meaning but an extraction usually does not preserve the original one.
Index Accession# 13796 U35376 7197 BG259957 2918 BI520001 8495 AJ012159 11189 AB008549 11087 BC006496 9443 CAC51145 9546 X52125 Index Accession# 16144 AK024601 16496 Y00083 17213 BC007437 14579 BC011568 587 AF386492 113 Y16961 17215 AF195766 16760 AI022747 16 Most Discriminative Genes to distinguish HCV from HBV [YCT39]
Index Accession# 5947 BG207354 4885 AK021818 11291 AF155110 1262 BI861005 8055 AJ224741 10965 AAF36120 4164 NM_000423 8088 BC000187 Index Accession# 7353 AF070641 5434 AB050785 AB062987 14993 AA974308 4182 AI970531 5341 X65882 10052 AB011542 8140 AK026068 Next 16 Most Discriminative Genes to distinguish HCV from HBV
K-means Clustering Results byusing 32 Best Discriminative Genes • G45 from Genasia: distortion 341.26 1222221222 2211111111 111111111111111111 • X47 from C. Chen: distortion 302.33 1222221222 2211111111 112111111111111111 • Y48 by Fisher’s Ratio on YCT39: distortion 307.49 1222221222 2211111111 112111111111111111 • PY50 by Chuang+Kao’s on YCT39: distortion 290.06 2222222222 2211211111 112111111111111111 Leave-one-out errors by 1-nn : 4, 3, 2, 1 (/39) Leave-one-out errors by Fisher : 15, 7, 8, 9 (/39)
Up (Down) Regulated Genes for Gastric Cancers • 5 Advanced and 5 Early Stage of Patients with Gastric Cancer • We find the following genes which can completely discriminate Patients of “Advanced Stage” from “Early Stage” under clinical diagnosis
IndexAccession# 15843 AF316855 12994 BF868865 18370 BC002996 2070 AK021788 1118 BC000249 9661 AP000350 2017 U53530 1128 AF035281 Index Accession# 8728 AL591713 494 AB014526 10990 L77570 342 BC007848 10425 BG745129 6052 AF073362 170 AK000278 1016 BF526386 Top 16 Discriminative Genes for Advanced and Early Stages
Thank You • http://www.bioinfo.ntu.edu.tw • http://www.cs.nthu.edu.tw/~cchen • Tel: (02) 2312 3456 ~ 5917 • Tel: (02) 2362 5336 ~ 418 • Tel: (03) 573 1078