1 / 10

Predictor discovery in training set

MASS-Conductor. Prediction analysis of microarray (PAM). Cluster analysis. Platform Validation. Predictor discovery in training set. Predictor confirmation in testing set. Pattern analysis in all set. 2. 3. 4. 1. 5. Training set (10 AR, 10 STA, 6 BK). Testing set

Download Presentation

Predictor discovery in training set

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MASS-Conductor Prediction analysis of microarray (PAM) Cluster analysis Platform Validation Predictor discovery in training set Predictor confirmation in testing set Pattern analysis in all set 2 3 4 1 5 Training set (10 AR, 10 STA, 6 BK) Testing set (10 AR, 10 STA, 4 BK) All set (20 AR, 20 STA, 10 BK, 10 NS, 10 HC) LCMS raw spectra 2 peptide biomarkers Peak finding peak alignment feature extraction Classifier training PAM Class prediction algorithm 2d hierarchical clustering heatmap plotting MRM assay development Predictors of 2 ~ 630 features Normalization Six-fold Cross-validation Calculate estimates of predicted class probabilities MRM assay AR, STA, BK, NS, HC Training + Testing Samples Remove background signals Predictors of 53 features Classify AR, STA, BK Analysis of goodness of class separation Correlation Analysis 20937 unique features LC-MALDI  MRM FIGURE 1A

  2. Validation Analysis Confirmation Analysis Hypothesis of Molecular Mechanism Exploration Analysis Exploration data set6 (TGCG) Confirmation data set (Stanford ) Validation data set (Stanford ) 4 3 2 1 Quantitative RT-PCR (AR: BX, n=14) (STA: BX, n=10) (HC: BX, n=10) Affymetirics HU-133 (AR: BX, n=37) (HC: BX, n=23) Observation of abundance patterns of Urine peptide biomarkers in AR Affymetirics HG-U95Av2 (AR: PBL, n=6; BX, n=7) (STA: PBL, n=9; BX, n=10) (NR: PBL, n=8; BX, n=5) (HC: PBL, n=8; BX, n=9) Hypothesis 1 Gene expression alteration in AR Confirmation Validation Expression analysis of peptide biomarkers’ corresponding precursor genes Hypothesis 2 Protease expression alteration in AR Expression analysis of metzincin superfamily genes Discovery  mechanism biomarkers Hypothesis 3 Protease inhibitor expression alteration in AR Expression analysis of protease inhibitor genes FIGURE 1B

  3. AR AR STA STA BK BK A Goodness of class separation – D probability Training Testing Feature# 2 4 7 9 14 23 33 53 84 139 226 394 630 2 4 7 9 14 23 33 53 84 139 226 394 630 FIGURE 2

  4. AR STA BK AR STA BK AR STA BK NS HC 10 10 6 10 10 4 20 20 10 10 10 B C D E STA AR BK Training set n = 26 Testing set n = 24 Data set n = 70 AR STA BK NS HC Probability NON -AR NON -AR NON -AR Clinical diagnosis Clinical diagnosis Clinical diagnosis AR AR AR Training 20 50 n = 10 16 10 14 n = n = 2d Cluster PAM PAM Predicted as AR Predicted as AR Clustered as AR 19 0 9 0 8 2 Predicted as Non-AR Predicted as NON-AR Clustered as NON-AR 1 50 1 16 2 12 Percent Agreement with clinical diagnosis Percent Agreement with clinical diagnosis 90% 100% 80% 85% 95% 100% Percent Agreement with clinical diagnosis + - + - + - 96% 83% 98.5% Probability Testing Overall P = 3.2X10-6 Overall P = 0.0027 Overall P =3.1X10-16 FIGURE 2 Sample ID

  5. A B • THP 982.59 VLNLGPITR • THP 1047.48 SGSVIDQSRV • THP 1211.66 DQSRVLNLGPI • THP 1225.69 SRVLNLGPITR • THP 1324.76 IDQSRVLNLGPI • THP 1423.83 VIDQSRVLNLGPI • THP 1468.82 DQSRVLNLGPITR • THP 1510.87 SVIDQSRVLNLGPI • THP 1567.91 GSVIDQSRVLNLGPI • THP 1581.91 IDQSRVLNLGPITR • THP 1654.91 SGSVIDQSRVLNLGPI • THP 1680.98 VIDQSRVLNLGPITR • THP 1755.96 SGSVIDQSRVLNLGPIT • THP 1768.01 SVIDQSRVLNLGPITR • THP 1912.07 SGSVIDQSRVLNLGPITR • THP 2040.16 SGSVIDQSRVLNLGPITRK FIGURE 3 • COL1A1 1235.56 APGDRGEPGPPGP • COL1A1 1251.55 APGDRGEPGPPGP • COL1A1 1322.57 APGDRGEPGPPGPA • COL1A1 1316.59 DAGPVGPPGPPGPPG • COL1A1 1409.66 GPPGPPGPPGPPGPPS • COL1A1 2048.92 NGDDGEAGKPGRPGERGPPGP • COL1A1 2064.91 NGDDGEAGKPGRPGERGPPGP • COL1A1 2192.97 NGDDGEAGKPGRPGERGPPGPQ • COL1A1 2362.12 GKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 2378.10 GKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 2645.24 GPPGKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 1709.79 PPGEAGKPGEQGVPGDLG • COL1A1 2031.95 PPGEAGKPGEQGVPGDLGAPGP • COL1A1 2221.97 ADGQPGAKGEPGDAGAKGDAGPPGP • COL1A1 2205.99 ADGQPGAKGEPGDAGAKGDAGPPGP • COL1A1 2277.01 ADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2293.01 ADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2617.15 GPPGADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2086.93 EGSPGRDGSPGAKGDRGETGPA • COL1A1 2157.96 AEGSPGRDGSPGAKGDRGETGPA • COL1A1 3014.41 ESGREGAPGAEGSPGRDGSPGAKGDRGETGPA • COL1A1 1266.58 SPGPDGKTGPPGPA • COL1A1 2129.99 DGKTGPPGPAGQDGRPGPPGPPG • COL1A1 2017.93 GRPGEVGPPGPPGPAGEKGSPG • COL1A2 2081.94 DGPPGRDGQPGHKGERGYPG • COL1A2 2195.99 NDGPPGRDGQPGHKGERGYPG • COL2A1 1861.85 SNGNPGPPGPPGPSGKDGPK • COL3A1 1738.76 NDGAPGKNGERGGPGGPGP • COL3A1 2008.93 DGESGRPGRPGERGLPGPPG • COL3A1 2079.92 DAGAPGAPGGKGDAGAPGERGPPG • COL3A1 2565.18 GAPGQNGEPGGKGERGAPGEKGEGGPPG • COL3A1 2743.24 KNGETGPQGPPGPTGPGGDKGDTGPPGPQG • COL4A1 1424.66 PGQQGNPGAQGLPGP • COL4A2 1126.51 GLPGLPGPKGFA • COL4A3 1161.52 GEPGPPGPPGNLG • COL4A4 1218.55 GLPGPPGPKGPRG  • COL4A5 1144.52 GPPGPPGPLGPLG • COL4A5 1269.53 PGLDGMKGDPGLP • COL4A5 1733.76 GIKGEKGNPGQPGLPGLP • COL4A6 1158.52 GLPGPPGPPGPPS • COL5A1 1748.82 KGPQGKPGLAGMPGANGPP • COL7A1 1690.80 PGLPGQVGETGKPGAPGR • COL9A1 1732.84 KRPDSGATGLPGRPGPPG • COL11A1 1441.64 GPPGPPGLPGPQGPKG • COL11A1 1828.84 DGPPGPPGERGPQGPQGPV • COL17A1 1368.62 LPGPPGPPGSFLSN • COL18A1 1142.51 GPPGPPGPPGPPS

  6. SAMPLE: URINE PEPTIDES 2.5 AR STA BK NS HC 2.0 Signal Intensity/100 • THP 1680.98 VIDQSRVLNLGPITR • THP 1912.07 SGSVIDQSRVLNLGPITR 1.5 1.0 0.5 • Correlation coefficient analysis • between MRM and LC-MALDI data sets • method P-value • Pearson 5.149 X 10-8 • Kendall 2.363 X 10-6 • Spearman 5.368 X 10-6 0 FIGURE 4A 1 1 1 1 1 2 2 2 2 2

  7. 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.6 1.0 0.0 0.4 0.8 FIGURE 4B 1.0 AR versus BK AR versus STA 0.8 AUC: 0.83 AUC: 0.92 0.6 Sensitivity AUC: 0.74 AUC: 0.83 0.4 SAMPLE: URINE PEPTIDES SAMPLE: URINE PEPTIDES 0.2 THP 1680.98 VIDQSRVLNLGPITR THP 1680.98 VIDQSRVLNLGPITR THP 1912.07 SGSVIDQSRVLNLGPITR THP 1912.07 SGSVIDQSRVLNLGPITR 0.0 0.2 0.6 1.0 0.0 0.4 0.8 1- Specificity 1- Specificity

  8. URINE PEPTIDES 3 FIGURE 5A BIOPSIES RNA MICROARRAY 2 BIOPSIES RNA RT-PCR 1 log (Signal ratio (AR/HC)) 0 -1 -2 -3 COL 1A1 COL 1A2 COL 2A1 COL 3A1 COL 4A1 COL 4A2 COL 4A3 COL 4A4 COL 4A5 COL 4A6 COL 7A1 COL 9A1 COL 11A1 COL 17A1 COL 18A1 THP

  9. FIGURE 5B 6 10 MMP7 SERPING1 SERPING1 8 4 Signal ratio (AR/HC) º RT-PCR º microarray TIMP1 TIMP1 6 ºMicroarray Signal D (AR-HC)/100 2 4 2 0 1 ADAM MEPRIN SERPING1 TIMP MMP

  10. Renal peptides FIGURE 6 Renal proteins Protease inhibitor TIMP1, SERPING1 Endoproteases ( ) ADAM, MMP, MEPRIN … Precursor peptides Allograft disease Exoproteases ( ) unknown Fragment peptides

More Related