820 likes | 987 Views
High throughput urine biomarker discovery and integrative analysis for translational medicine. Bruce Ling, Ph.D. Biomarker .
E N D
High throughput urine biomarker discovery and integrative analysis for translational medicine Bruce Ling, Ph.D.
Biomarker A molecular indicator of a specific biological property; a biochemical feature or facet that can be used to measure the progress of disease or the effects of treatment (NIH, 2002)
Biomarker examples • Small molecules • Glucose (diabetes) • Serum cholesterol (cardiovascular disease) • Proteins • PSA (prostate cancer) • HER2 (IHC) (breast cancer Herceptin Therapy) • hCG (pregnancy test) • RNA/DNA • HER2 (FISH) (breast cancer) • OncoDX (Genomic Health, breast cancer)
Pediatric Diseases • Kidney transplant Acute Rejection • Kawasaki Disease • Systemic Juvenile Idiopathic Arthritis • Necrotizing Enterocolitis • Inflammatory Bowel Disease • Glioblastomamultiforme • Preterm Labor
Where to look for biomarkers • Disease tissue • Proximal/distal fluids • Plasma/serum, urine, amniotic, synovial fluid, CSF, saliva, tears, etc.
Why Urine? • Patient consenting • Non-invasive • Easy to collect for time course analysis • Abundant and stable
Urine is a rich resource for biomarker discovery • Filtration of plasma • 900 liters daily • Urine proteome • > 1500 proteins, ~30 mg/day • 30% from circulation • 70% from urogenital tract • Urine peptidome • > 100, 000 naturally occurring peptide, ~20 mg/day
Urine Peptidome: a fertile ground for biomarker discovery • Equal mass of protein and peptide in urine translates into at least a ten-fold greater molar abundance of peptides than proteins • Urine peptide analysis is not hampered by highly abundant protein issues • One hour one dimensional HPLC separation is sufficient for the analysis of greater than 100,000 urine peptides, allowing a high throughput biomarker discovery
Challenges of Urine Analysis • Dilution factor causing concentration variations • Solution: content normalization • Creatinine; house keeping urine abundant peptides; equal peptide mass • Peptide content can be complicated by • Diet, exercise, circadian rhythm, circulatory levels of hormones • Solution: careful experimental design to avoid these confounding issues, e.g., • Cohorts of patients of similar demographics • Multi-center sample collection and validation
Biomarker HTS Flows Sample peptides: -Class 1:1,2,3… -Class 2:1,2,3… -Class 3:1,2,3… MASS-Conductor ® Machine learning feature discovery and classification RP-HPLC Collect 120 fractions on MALDI plates MALDI-TOF MS on each fraction Candidate Biomarkers 987.62 1027.51 1098.55 etc.
Biomarker Confirmation/Validation Protein ID MS/MS Testing Validation New Center sample sets New sample Sets Identify Differentiating Markers Quantitative MS Higher throughput Quantitative methods Exploration Immunoassay New Longitudinal sample sets
Data Challenges in Urine Peptide Biomarker Discovery • Data tracking and storage • Patient demographics • Peptide profiles in various fractions/samples • Dimension reduction and data reduction • Multi-dimensional data sets • Huge data sets and lots of noise A project of 40 samples produced 241.5 GB raw data in MYSQL database Peptide mass Patient ID HPLC fraction Patient demographics Peptide signal
Decode the Urine Peptidome • Peak finding in each fraction for each sample • Align the peaks across the samples • Create common peak index
Data mining issues in Biomarker Discovery • Peak number >> sample number • False discovery in multiple hypothesis testing • Multi-class classification and validation • Discovery of biomarker signature
MASS-Conductor® Platform Support Urine Peptide Biomarker Discovery • Robustly loading and tracking of high volume proteomic data • Robust reduction of raw data sets and enabling of efficient and accurate peak finding, alignment and indexing • Robust and automatic high throughput computing for expensive algorithms • Integration of FDR analysis and multi-class classification algorithms to obtain statistically differentiating feature panels • Automatic generation of data reports with graphics
Kidney Transplant Rejection • Most effective treatment for end stage renal disease • 16,000 per year in US • Grafts monitored by biopsy • Unmet needs: • Less invasive and more frequent monitoring • Acute rejection vs. stable graft • Acute rejection vs. BK virus
Allograft Acute Rejection Urine Biomarker Discovery 1 2 3 4 LCMS Data reduction Supervised Data mining Unsupervised Data mining Quantitative LCMS Feature selection Training Testing 2D - Clustering Validation Peak finding Peak alignment Peak indexing
108 EGF-like Domain III 107 334 149 ZP-domain EGF-like Domain II COOH 585 EGF-like Domain I 28 65 64 NH2 Urine THP Peptide Biomarkers Fall into a Tight Cluster in C-Terminus 1. R.VLNLGPITR.K 2. G.SVIDQSRVLNLGPI.T 3. I.DQSRVLNLGPITR.K 4. R.SGSVIDQSRVLNLGPI.T 5. S.VIDQSRVLNLGPITR.K 6.R.SGSVIDQSRVLNLGPIT.R 7. G.SVIDQSRVLNLGPITR.K 8.R.SGSVIDQSRVLNLGPITR.K
1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.6 1.0 0.0 0.4 0.8 ROC Analysis of THP Peptide Biomarkers Quantified by MRM 1.0 AR versus BK AR versus STA 0.8 AUC: 0.83 AUC: 0.92 0.6 Sensitivity AUC: 0.74 AUC: 0.83 0.4 SAMPLE: URINE PEPTIDES SAMPLE: URINE PEPTIDES 0.2 THP 1680.98 VIDQSRVLNLGPITR THP 1680.98 VIDQSRVLNLGPITR THP 1912.07 SGSVIDQSRVLNLGPITR THP 1912.07 SGSVIDQSRVLNLGPITR 0.0 0.2 0.6 1.0 0.0 0.4 0.8 1- Specificity 1- Specificity
AR Urine Biomarkers are Collagen and THP Peptides A B Collagen peptide biomarkers THP peptide biomarkers • THP 982.59 VLNLGPITR • THP 1047.48 SGSVIDQSRV • THP 1211.66 DQSRVLNLGPI • THP 1225.69 SRVLNLGPITR • THP 1324.76 IDQSRVLNLGPI • THP 1423.83 VIDQSRVLNLGPI • THP 1468.82 DQSRVLNLGPITR • THP 1510.87 SVIDQSRVLNLGPI • THP 1567.91 GSVIDQSRVLNLGPI • THP 1581.91 IDQSRVLNLGPITR • THP 1654.91 SGSVIDQSRVLNLGPI • THP 1680.98 VIDQSRVLNLGPITR • THP 1755.96 SGSVIDQSRVLNLGPIT • THP 1768.01 SVIDQSRVLNLGPITR • THP 1912.07 SGSVIDQSRVLNLGPITR • THP 2040.16 SGSVIDQSRVLNLGPITRK • COL1A1 1235.56 APGDRGEPGPPGP • COL1A1 1251.55 APGDRGEPGPPGP • COL1A1 1322.57 APGDRGEPGPPGPA • COL1A1 1316.59 DAGPVGPPGPPGPPG • COL1A1 1409.66 GPPGPPGPPGPPGPPS • COL1A1 2048.92 NGDDGEAGKPGRPGERGPPGP • COL1A1 2064.91 NGDDGEAGKPGRPGERGPPGP • COL1A1 2192.97 NGDDGEAGKPGRPGERGPPGPQ • COL1A1 2362.12 GKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 2378.10 GKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 2645.24 GPPGKNGDDGEAGKPGRPGERGPPGPQ • COL1A1 1709.79 PPGEAGKPGEQGVPGDLG • COL1A1 2031.95 PPGEAGKPGEQGVPGDLGAPGP • COL1A1 2221.97 ADGQPGAKGEPGDAGAKGDAGPPGP • COL1A1 2205.99 ADGQPGAKGEPGDAGAKGDAGPPGP • COL1A1 2277.01 ADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2293.01 ADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2617.15 GPPGADGQPGAKGEPGDAGAKGDAGPPGPA • COL1A1 2086.93 EGSPGRDGSPGAKGDRGETGPA • COL1A1 2157.96 AEGSPGRDGSPGAKGDRGETGPA • COL1A1 3014.41 ESGREGAPGAEGSPGRDGSPGAKGDRGETGPA • COL1A1 1266.58 SPGPDGKTGPPGPA • COL1A1 2129.99 DGKTGPPGPAGQDGRPGPPGPPG • COL1A1 2017.93 GRPGEVGPPGPPGPAGEKGSPG • COL1A2 2081.94 DGPPGRDGQPGHKGERGYPG • COL1A2 2195.99 NDGPPGRDGQPGHKGERGYPG • COL2A1 1861.85 SNGNPGPPGPPGPSGKDGPK • COL3A1 1738.76 NDGAPGKNGERGGPGGPGP • COL3A1 2008.93 DGESGRPGRPGERGLPGPPG • COL3A1 2079.92 DAGAPGAPGGKGDAGAPGERGPPG • COL3A1 2565.18 GAPGQNGEPGGKGERGAPGEKGEGGPPG • COL3A1 2743.24 KNGETGPQGPPGPTGPGGDKGDTGPPGPQG • COL4A1 1424.66 PGQQGNPGAQGLPGP • COL4A2 1126.51 GLPGLPGPKGFA • COL4A3 1161.52 GEPGPPGPPGNLG • COL4A4 1218.55 GLPGPPGPKGPRG • COL4A5 1144.52 GPPGPPGPLGPLG • COL4A5 1269.53 PGLDGMKGDPGLP • COL4A5 1733.76 GIKGEKGNPGQPGLPGLP • COL4A6 1158.52 GLPGPPGPPGPPS • COL5A1 1748.82 KGPQGKPGLAGMPGANGPP • COL7A1 1690.80 PGLPGQVGETGKPGAPGR • COL9A1 1732.84 KRPDSGATGLPGRPGPPG • COL11A1 1441.64 GPPGPPGLPGPQGPKG • COL11A1 1828.84 DGPPGPPGERGPQGPQGPV • COL17A1 1368.62 LPGPPGPPGSFLSN • COL18A1 1142.51 GPPGPPGPPGPPS
Hypothesis of Molecular Mechanisms for AR Urine Biomarkers Hypothesis 1 Gene expression alteration in AR Hypothesis 2 Protease expression alteration in AR Hypothesis 3 Protease inhibitor expression alteration in AR
Transcriptome Analysis of Allograft Biopsies Validation Analysis Confirmation Analysis Exploration Analysis Exploration data set6 (TGCG) Confirmation data set (Stanford ) Validation data set (Stanford ) 3 2 1 Quantitative RT-PCR (AR: BX, n=14) (STA: BX, n=10) (HC: BX, n=10) Affymetirics HU-133 (AR: BX, n=37) (HC: BX, n=23) Affymetirics HG-U95Av2 (AR: PBL, n=6; BX, n=7) (STA: PBL, n=9; BX, n=10) (NR: PBL, n=8; BX, n=5) (HC: PBL, n=8; BX, n=9) Confirmation Validation Expression analysis of peptide biomarkers’ corresponding precursor genes Expression analysis of metzincin superfamily genes Discovery mechanism biomarkers Expression analysis of protease inhibitor genes
Parental Protein Expression Analysis of Allograft Biopsies Contrasting Urine Peptide Biomarker Changes
Genome-wide Protease and Protease Inhibitor Expression Analysis of Allograft Biopsies Revealed Up Regulation of MMP7, SERPING1, TIMP1
Allograft Biopsies Expression Biomarkers Effectively Classified AR 50 1.0 AR HC STA 0.8 40 30 0.6 Sensitivity Signal Intensity 20 0.4 Mean ( AUC): 0.98 10 0.2 0 0.0 0.2 0.6 1.0 COL1A2 COL3A1 MMP7 SERPING1 TIMP1 UMOD 0.0 0.4 0.8 1- Specificity
Proposed Underlying Mechanisms for the AR Urine Peptide Biomarkers
Hypothesis: Collagen Breakdown and Deposition in AR Integrated Analysis Increased TIMP1 (Collagenase Inhibitor) in AR Decreased Collagenase Activity In AR tissue Urine Peptidomics Biopsy Gene Expression GSE 14328 Increased MMP7 in AR Urine Decreased Collagen Peptides In AR Decreased Collagen Breakdown in AR Renal Biopsy Increased Collagen Deposition in AR Increased Collagen Expression in AR Urine Peptide Analysis by MS More Graft Fibrosis After an AR episode?
Unmet Medical Needs in Necrotizing Entrocolitis Necrotizing enterocolitis (NEC) is a medical condition primarily seen in premature infants, where portions of the bowel undergo necrosis (tissue death). Despite decades of research the pathogenesis of NEC remains obscure, the diagnostic parameters unclear, and both treatment and prevention strategies remain inadequate and dated. There is the real need for better molecular identification of NEC in order to assist in altering its onset and progression.
Clinical parameters do not adequately predict outcome in Necrotizing Enterocolitis
Clinical Parameters Based Model stratifies Necrotizing Enterocolitis Patients NEC M S 30 Low Risk Group Intermediate Risk Group Rate of NEC-S occurrence (% patients) 20 M: n = 16 S: n = 10 M: n = 26 S: n = 0 High Risk Group 10 M: n = 2 S: n = 15 0 -10 0 10 20 30 40 NEC score
NEC Urine Naturally Occurring Peptide Biomarker Discovery 1 2 3 LCMS Data reduction Supervised Data mining Unsupervised Data mining Feature selection Training Testing 2D - Clustering Peak finding Peak alignment Peak indexing
Permutation based FDR analysis of the biomarker signature
Proposed Ensemble Approach to Diagnose Necrotizing Enterocolitis Patients Discovery set n = 34 NEC Patients NEC M S Clinical Diagnosis 17 17 Medical NEC Scoring Clinical Model NEC Risk Groups Low n=7 Intermediate n=15 High n=9 N/A n=3 Clinical Diagnosis M S M S M S 7 0 9 6 0 9 NEC Risk Diagnosed as M 7 0 4 3 0 1 Diagnosed as S 0 0 5 3 0 8 Urine peptide based Classification Urine Biomarkers Classified as M 7 0 8 1 0 0 Classified as S 0 0 1 5 0 9 Percent Agreement with clinical diagnosis 100 % 100 % 88.9 % 83.3 % 100 % 100 % + - + - + - NEC Diagnosis 100 % 86.1 % 100 % P = 0.01
Proposed Underlying Mechanisms of Urine Naturally Occurring Peptide Biomarkers
Prediction of drug response in SJIA Enbrel Anakinra CR PR CR A Enbrel Anakinra PR CR CR B C
Urine peptide biomarkers: the discovery process Biomarker panels Sample peptides: -Class 1:1,2,3… -Class 2:1,2,3… -Class 3:1,2,3… MASS-Conductor ® Machine learning feature discovery and classification SCX/RP-HPLC Collect 100 fractions on MALDI plates MALDI-TOF MS for each sample LC fraction -- m/.z --abundance Prospective validation with quantitative mass spec (MRM) MSMS protein ID