510 likes | 700 Views
A biased look at Biomarkers. Definition:. Biomarker is a substance used as an indicator of a biologic state Existence of living organisms or biological process. A particular disease state. BioMarker. Proteins. Nucleic acids. Metabolites: Carbohydrates. Lipids. Small molecules.
E N D
Definition: Biomarker is a substance used as an indicator of a biologic state Existence of living organisms or biological process. A particular disease state BioMarker Proteins Nucleic acids Metabolites: Carbohydrates Lipids Small molecules
Detection of biomarker Biomarker Detection of biomarker – diagnosis Self properties, e.g enzymatic activities Antibodies, IHC, ELISA Detection of biomarker Quantitative a link between quantity of the marker and disease Qualitative a link between exist of a marker and disease
Ideal Marker for diagnosis Should have great sensitivity, specificity, and accuracy in reflecting total disease burden. A tumor marker should also be prognostic of outcome and treatment Biomarker & Diagnosis Biomarker for Screening • The marker must be highly specific, minimize false positive and negative • The marker must be able to clearly reflect the different stages of the disease (early) • The marker must be easily detected without complicated medical • procedures. The disease markers released to serum and urine are good • targets for application of early screening. • The method for screening should be cost effective. Samples for biomarker detection Blood, urine, or other body fluids samples Tissue samples
PSA is a protein normally made in the prostate gland in ductal cells that make some of the semen. PSA helps to keep the semen liquid. PSA, also known as kallikrein III, seminin, semenogelase, γ-seminoprotein and P-30 antigen, is a glycoprotein, a serine protease Prostate Cancer marker PSA
Cancer of the prostate does not cause any symptoms until it is locally advanced or metastatic. There is a correlation between elevated PSA and prostate cancer. Prostate Cancer Diagnosis with PSA Detection of PSA is a surrogate for early detection of prostate cancer. Large screening trials have shown that PSA nearly doubles the rate of detection when combined with other methods. Based on these data, PSA testing was approved by the US FDA for the screening and early detection of prostate cancer. PSA is also found in the cytoplasm of benign prostate cells. “I never dreamed that my discovery four decades ago would lead to such a profit-driven public health disaster." -Richard Ablin (inventor of the PSA test) PSA screening generates ~$1.7 billion annually in the U.S. alone.
Sensitivity = the ability of the test to detect the disease (True positive rate) Specificity = the likelihood that your test will be normal if you are disease free (True Negative)
A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 If I test positive, what is the chance that I am really HIV negative?
A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999
A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999 For every 1 True Positive there will be 10 false positives, so my chance of being Negative is 10/11.
How about the PSA test? Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 600 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true?
How about the PSA test? Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 610 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true? The test will miss 80% of the true positives (sensitivity = 20%) so there will only be 3 True Positives Detected so: Chance of being Negative 600/603 = 0.995 Chance of being True Positive = 0.005 Follow up for a +HIV test is another blood test. Follow up for +PSA test is tissue biopsy.
How good does a Biomarker have to be? By Age 65 the rate of Prostate Cancer climbs to 8:1000 and the test performs much better. For every 8 True Positives, there will be 60 False Positives! Chance of being Negative 60/68 = .88 Chance of being Positive = .12 (before test chance was 0.015)
How good does a Biomarker have to be? Prostate Cancer is one of the most frequent cancers (15:10000), most cancers are much less frequent (1:10000: 1:50000) so a biomarker would have to be much better than the PSA test. It is currently believed that a new biomarker would need sensitivity and specificity better than 95%.
Early Proteomics Base Biomarker work was based on SELDI SELDI can detect 200-300 features in a sample. It has been used to find biomarkers from everything from blood to tears.
Early Biomarker work has largely been discredited -Biomarkers with similar masses kept being rediscovered -When the proteins were identified, they were abundant serum proteins and were from the same proteins -Multi-center studies failed to validate the biomarkers in “clinical” setting -Realization that serum and other biofluids are incredibly complex. -Realization that serum and other biofluids are incredibly variable and “fragile” -some strong “biomarkers” -blood collection tube -# of freeze-thaw cycles -diet
Typical Biomarker Discovery study will take 50 samples per condition. Typically takes 10 samples per condition to have a 90% chance of finding differences of 2 times.Validation will take 1000s of samples. Finally the assay will have to be converted to something that can be done in a clinical lab.
Conclusions -Biomarker Discovery is difficult -biofluids are complex -biofluids have a high dynamic range -biomarkers are usually low abundance -even taking “proximal” fluids typically does not help -the is a lot of person to person variability -Most Biomarkers will never become clinically relevant -statistical standards for diagnostic tools is very high -the more prevalent the disease the “better” the biomarker will perform -An MS based biomarker assay is unlikely due to the greater analytical performance of antibody based methods. -For a biomarker workflow to be meaningful it must be quantitative!
Quantitative Approaches Stable Isotope Labeling methods -adds heavy isotopes to one sample so chemically identical compounds are mass shifted -added to the peptides/proteins using reactive groups -added to the proteins in vivo using heavy amino acids -can be multiplexed Label free methods -extracted ion chromatograms -spectral counting
ISOTOPE-CODED AFFINITY TAG (ICAT): • Label protein samples with heavy and light reagent • Reagent contains affinity tag and heavy or light isotopes Chemically reactive group: forms a covalent bond to the protein or peptide Isotope-labeled linker: heavy or light, depending on which isotope is used Affinity tag: enables the protein or peptide bearing an ICAT to be isolated by affinity chromatography in a single step
Example of an ICAT Reagent Biotin Affinity tag: Binds tightly to streptavidin-agarose resin Reactive group: Thiol-reactive group will bind to Cys Linker: Heavy version will have deuteriums at * Light version will have hydrogens at *
100 0 0 600 200 400 550 570 590 How ICAT works? Affinity isolation on streptavidin beads Lyse & Label Quantification MS Identification MS/MS NH2-EACDPLR-COOH Light 100 MIX Heavy Proteolysis (ie trypsin) m/z m/z
ICATAdvantages vs. Disadvantages • Yield and non specificity • Slight chromatography differences • Expensive • Tag fragmentation • Meaning of relative quantification information • No presence of cysteine residues or not accessible by ICAT reagent • Estimates relative protein levels between samples with a reasonable level of accuracy (within 10%) • Can be used on complex mixtures of proteins • Cys-specific label reduces sample complexity • Can set up the mass spectrometer to fragment only those peaks with a certain ratio
Isobaric TagTotal mass = 145 Amine specific peptidereactive group (NHS) Reporter Group mass114 –117 (Retains Charge) PRG Reporter Balance Balance GroupMass 31-28 (Neutral loss) • Gives strong signature ion in MS/MS • Gives good b- and y-ion series • Maintains charge state • Maintains ionization efficiency of peptide • Amine specific • Balance changes in concert with reporter mass to maintain total mass of 145 • Neutral loss in MS/MS = MS/MS Fragmentation Site Isobaric Tag (Total mass = 145) Reporter (Mass = 114 thru 117) Peptide Reactive Group PRG Balance (Mass = 31 thru 28) • Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents • Ross, PL., et al, Mol Cell Proteomics 2004 3: 1154-1169. iTRAQ™ Reagent Design Isobaric Tag (Total mass = 145) Charged Neutral loss
1352.84 114 31 -PRG 116 117 114 115 115 114 117 116 -N H -N H -N H -N H + b 30 31 28 29 S1 b y 115 30 -PRG + y b Mix S2 Parallel Denature & Digest 1347.0 1349.6 1352.2 1354.8 1357.4 1360.0 b y MS/MS 116 29 MS -PRG Mass (m/z) + y S3 • Reporter-Balance-Peptide INTACT • - 4 samples identical m/z 117 28 -PRG + - Peptide fragments EQUAL S4 - Reporter ions DIFFERENT 100 90 114 116 115 117 80 70 60 50 y8 % Intensity 40 111.0 112.8 114.6 116.4 118.2 P Mass (m/z) 30 b2 y10 20 A b4 q,H 72.1 1352.8 b1 b10 b7 10 y11 y5 y3 45.1 b8 b6 b9 y2 y6 112.1 y4 74.1 y9 142.1 T L 39.0 0 9.0 292.8 576.6 860.4 1144.2 1428.0 Mass (m/z) Isobaric Tagging - General Method (4-Plex)
Spotfire K-means Clustering of Protein-level Ratios S S S G1L G1L G1L PM PM PM
100 8396.7 90 b7 80 70 60 y8 50 y8 % Intensity 40 P 30 b2 757 759 761 763 765 767 y10 20 b4 A Mass (m/z) 1352.8 q,H 72.1 b10 b1 b7 y11 y5 y3 b8 b6 45.1 10 b9 y2 y6 112.1 y4 74.1 142.1 y9 T L 39.0 0 9.0 292.8 576.6 860.4 1144.2 1428.0 Mass (m/z) 869 871 873 875 877 879 Mass (m/z) 114.1 116.1 115.1 117.1 111.0 112.8 114.6 116.4 118.2 120.0 Mass (m/z) MS/MS Spectra of a Singly-charged Peptide *-TPHPALTEAK-*
Reporter Group Placement: Selection of ‘Quiet Region’ Summed Ion Intensity (~75,000 Spectra)
Test 1 Control Test 2 Test 3 Example: Time course labeling Trypsin Digestion Label with iTRAQ™ Reagents 1 hr, RT, Single addition 114 115 116 117 Quant Single 2D LC analysis for combined samples (4-plex) Simplified Workflow: (One extra step) MIX ID and SCX LC MS/MS Analysis MS/MS
Differential Expression using iTRAQ™Reagent Approach OverExpression of Chaperonin 10 Non-Cysteine containing Protein Cancer Cancer 54 50 Normal 45 *VLQATVVAVGSGS*K * iTRAQ Labeled Residue Normal 40 Quantitation 35 114 115 116 117 m/z, amu 30 25 y2 y1 20 y3 y5 15 b3 b2 10 b5 y4 b4 y6 y7 5 b6 b7 0 200 300 400 500 600 700 800 900 100 m/z, amu
ITRAQAdvantages vs. Disadvantages • Reagent not completely specific • Expensive • Does not work on ion trap instruments • Reporters tend to dominate the spectra • You have to fragment everything and sort out the ITRAQ reporters later. The mass spec spends a lot of time analyzing peptides with no quantitative differences. • Estimates relative protein levels between samples with a reasonable level of accuracy (> 10%) • Can be used on complex mixtures of proteins • Isobaric so the tag is only visible in the MS/MS, keeping the precursor scans as clean as possible. • The abundance of the peptides sums together. Making analysis of low abundance peptides easier. • Replicates analyzed on the same LC-MS/MS run, minimizing run to run variability.
SILACAdvantages vs. Disadvantages • Labeling may be incomplete • Urea Cycle may cause incorporation of heavy isotopes into other amino acids • Expensive • Works best on high resolution instruments. • Estimates relative protein levels between samples with a high level of accuracy ( <5%) • Can be used on complex mixtures of proteins • Can set up the mass spectrometer to fragment only those peaks with a certain ratio • Extremely flexible and can be adapted to many systems.
Label-Free Quantitation • All approaches so far require purchase of isotopically labeled reagents (can be expensive). • What if you want to compare large numbers of samples (10+) • What if you can’t afford lots of reagents? • Peak/Spectral counting • Peak area comparison (Extracted Ion Chromatograms)
Spectral Counting • Count the number of peptides identified from a protein in each sample. • Typically do not count repeat identifications of the same peptide • Not accurate at quantifying magnitude of change, but can be used to determine if there is a difference. • In general, need a spectral count difference of about 4 peptides in order to be confident of a difference being real. • Most proteins in complex mixtures are identified by less than 4 peptides.
EIC(Extracted Ion Chromatogram) • Measure intensity of peak during its elution off HPLC column and into the mass spectrometer. • Measure area of peak in XIC. • More accurate than selecting peak intensity for one given scan.
emPAI(Exponentially Modified Protein Abundance Index) • emPAI = 10PAI –1 • Where PAI = Nobserved / Nobservable • What is an ‘observable’ peptide • Peptides with a precursor mass between 800-2400Da. • There is a roughly linear relationship between log protein concentration and the ratio of ‘observable’ peptides observed in range of 3-500 fmoles. • If you know how much total protein you analyzed you can derive absolute abundancies. Ishihama et al. Mol Cell Proteomics (2005) 4 9 1265-1272
MRM (Multiple Reaction Monitoring) Look for a component of a specific mass that when fragmented forms a fragment of another specific mass. • Transition: precursor m/z 521.7 fragment m/z 757.6 • Very sensitive and specific.
MRM • Best performed on a triple quadrupole instrument. • Scans are very fast, so can perform multiple transition scans on a chromatographic time-scale. • Requires a lot of optimization: • Verify transitions are reproducible, typically want 2-3 transitions/peptide, 3-4 peptides/protein. • Determine the retention time to maximize the number of peptides • that can be analyzed per run. • It is possible to analyze 100s of transition per hour • MRM coupled to isotopically labeled peptides allows for very high sensitivity and high accuracy analysis and can give absolute quantification. • Once optimized 1000s of samples can be run in a short time frame • Not for discovery! You must already know what you are looking for, sometimes refered to as targeted proteomics
Issues with MS Quantitation Analysis • Should you use all data for quantitation? • Minimum peak intensity? • Peaks near to signal to noise will have much higher variability in quantitation accuracy. • Very intensive peaks may be saturated. • Proteins identified by a single peptide are probably not accurately quantified? • It is best to ignore sequences with more than one form: PTMs, missed cleavages, etc. • Multiple charge states should be summed. Results are normally reported with a mean and standard deviation
Conclusions • There are many different ways to quantitate proteomics data • Quantitative studies need to be approached carefully, because it is easy to make mistakes • No one strategy is best • MRM is the most sensitive and accurate, but requires the most optimization and cannot be used for discovery.