500 likes | 665 Views
CS177 Lecture 11 Experimental Methods (PCR, X-ray crystallography, Microarrays). Tom Madej 11.22.04. Lecture overview. Polymerase chain reaction (PCR) and its applications. X-ray crystallography and the Protein Data Bank (PDB). Microarrays and applications. Polymerase Chain Reaction (PCR).
E N D
CS177 Lecture 11Experimental Methods(PCR, X-ray crystallography, Microarrays) Tom Madej 11.22.04
Lecture overview • Polymerase chain reaction (PCR) and its applications. • X-ray crystallography and the Protein Data Bank (PDB). • Microarrays and applications.
Polymerase Chain Reaction (PCR) • A method that allows us to generate a large amount (relatively) of a particular DNA sequence even from an extremely small sample. • Exquisitely sensitive; even the DNA from a single cell may suffice! • Numerous applications in biotechnology.
PCR: main ideas • You need to know what you are looking for, e.g. the DNA sequence for a particular gene (the target). • Sample, primers, nucleotides to build new DNA strands, and Taq polymerase mixed together. • Mixture is subjected to cycles of heating, cooling, reheating, on the order of a few minutes. • If the target is present in the initial sample, the amount of it in the mixture will grow exponentially with the number of cycles.
ds-DNA target primers primers are complementary to opposite ends of target seq.
PCR cycle • Mixture is heated to 90ºC for 1-2 minutes to separate the DNA strands (denature). • Temperature is dropped to 50º-60ºC so that primers can anneal to complementary regions. • Temperature is raised to 70ºC for 1-2 minutes to allow Taq polymerase to synthesize new DNA strands, starting at the primers; this goes from 5’ to 3’ for both strands. • Note: The Taq polymerase is a DNA polymerase from Thermus aquaticus, a bacteria that lives in hot springs.
PCR notes • Primer selection is critical. The primers should be at least 15-20 bases to ensure specificity. • If you are unsure of the exact sequence, you can use “degenerate” primers, i.e. a mixture of primers (vary at third codon position). • Note that almost all of the product is exactly the target sequence you want, i.e. with flush ends.
PCR applications • Making a lot of protein! Use RT-PCR, “reverse transcriptase” PCR, to create DNA with introns removed and then insert it into bacteria to clone the gene. E.g. to make proteins for X-ray crystallography. • Medical diagnosis: e.g. detect HIV viral proteins long before AIDS symptoms arise; or rapid tuberculosis test. • Forensics; detect trace amounts of DNA at a crime scene.
Methods to determine protein structures • X-ray crystallography (most important, over 80% of structures in the PDB are obtained this way). • NMR spectroscopy (Nuclear Magnetic Resonance). • Electron microscopy; uses a beam of electrons to create images (maybe issues with sample preparation and resolution in regards to applications to protein structure determination).
Protein crystallography steps • Grow crystals of the protein that diffract well (a difficult step, can take from weeks to years!). • Obtain the X-ray diffraction data. • Compute electron density maps. • Refinement: calculate an atomic model to fit electron density; compare the diffraction data computed from the model with the actual data; refine the model to fit the data (iterate).
Protein crystals http://www-structure.llnl.gov/crystal_lab/Crys_lab.html
Protein crystal molecule crystal The unit cell is the basic unit of symmetry in the crystal.
Facts about protein crystals • In contrast e.g. to salt or quartz crystals, protein crystals are mostly water (due to the irregular shape of the molecule) and therefore fragile. • Since they are mostly water, the actual protein structures obtained must be similar to their conformations in vivo. • To preserve the crystal in the X-ray beam, it is kept at a very low temperature (100ºK).
X-ray diffraction • The incident beam of X-rays is diffracted by the electrons in the protein molecules in the crystal. • Some of the diffracted waves will interfere constructively, and others will interfere destructively. • This results in a diffraction pattern of spots of varying intensity on the detector.
Illustration of diffraction http://www.eserc.stonybrook.edu/ProjectJava/Bragg/index.html
Analysis of the diffraction pattern • The diffraction pattern is analyzed by mathematical/computation methods (Fourier analysis) to produce an electron density map. • This gives a 3-dimensional image of the molecule that will be subjected to further processing and analysis.
Electron density maps at different resolutions http://www-structure.llnl.gov/Xray/101index.html
Refinement • Refinement is an iterative process; one constructs an atomic model based on the electron density, then computes diffraction data from the model, which is compared to the actual diffraction data. • The crystallographic R-factor is a measure of how well the model fits the diffraction data. • Can be subject to error! The electron density for certain pairs of amino acid residues is extremely similar.
X-ray crystallography summary http://www.bnl.gov/discover/Spring_04/crystallography.asp
NMR • Based on magnetic moments of atomic nuclei. • NMR spectra give information about distances between atoms in the molecule. • Applied to protein molecules in solution (no crystals needed!). • Only works well for smaller proteins, e.g. 100 residues or less (or so). • A different set of mathematical/computational tools is involved. • Note: The different “models” represent different structures compatible with the distance contraints, not actual conformations of the molecule.
PDB File: Header HEADER ISOMERASE/DNA 01-MAR-00 1EJ9 TITLE CRYSTAL STRUCTURE OF HUMAN TOPOISOMERASE I DNA COMPLEX COMPND MOL_ID: 1; COMPND 2 MOLECULE: DNA TOPOISOMERASE I; COMPND 3 CHAIN: A; COMPND 4 FRAGMENT: C-TERMINAL DOMAIN, RESIDUES 203-765; COMPND 5 EC: 5.99.1.2; COMPND 6 ENGINEERED: YES; COMPND 7 MUTATION: YES; COMPND 8 MOL_ID: 2; COMPND 9 MOLECULE: DNA (5'- COMPND 10 D(*C*AP*AP*AP*AP*AP*GP*AP*CP*TP*CP*AP*GP*AP*AP*AP*AP*AP*TP* COMPND 11 TP*TP*TP*T)-3'); COMPND 12 CHAIN: C; COMPND 13 ENGINEERED: YES; COMPND 14 MOL_ID: 3; COMPND 15 MOLECULE: DNA (5'- COMPND 16 D(*C*AP*AP*AP*AP*AP*TP*TP*TP*TP*TP*CP*TP*GP*AP*GP*TP*CP*TP* COMPND 17 TP*TP*TP*T)-3'); COMPND 18 CHAIN: D; COMPND 19 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 EXPRESSION_SYSTEM_COMMON: BACULOVIRUS EXPRESSION SYSTEM; SOURCE 4 EXPRESSION_SYSTEM_CELL: SF9 INSECT CELLS; SOURCE 5 MOL_ID: 2; SOURCE 6 SYNTHETIC: YES; SOURCE 7 MOL_ID: 3; SOURCE 8 SYNTHETIC: YES KEYWDS PROTEIN-DNA COMPLEX, TYPE I TOPOISOMERASE, HUMAN REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.60 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER … REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: 27% PEG 400, 145 MM MGCL2, 20 REMARK 280 MM MES PH 6.8, 5 MM TRIS PH 8.0, 30 MM DTT REMARK 290 ...
From Coordinates to Models 1EJ9: Human topoisomerase I
Annotating Secondary Structure 1EJ9: Human topoisomerase I α-Helices β-strands coils/loops
Creating 3D Domains 3D Domain 0: 1EJ9A0 = entire polypeptide
3D Domains Creating 3D Domains 1EJ9A1 1EJ9A4 1EJ9A3 1EJ9A5 1EJ9A2 < 3 Secondary Structure Elements
Microarrays • Used to study gene expression levels in cells. • Cells can differ dramatically in the amounts of various proteins that they synthesize; e.g. due to different cell types or different external/internal conditions. • In fact, in higher level organisms only a fraction of the genes in a cell are expressed at a given time, and that subset depends on the cell type. • Via microarrays it is possible to study the expression levels of tens of thousands of genes simultaneously.
Microarray technology • Physically, a microarray is just a glass slide with spots of DNA on it; each spot is a probe (or target). • The DNA is single-stranded cDNA (complementary) and may consist of an entire gene or part of one (an oligonucleotide consisting of 50 bases or so). • If the microarray is exposed to a solution containing mRNA, then the mRNA molecules will bind to those probes to which they are complementary.
Microarray probes ssDNA gene sequences or oligos
Microarray technology • Thousands of probes can fit on a single slide. • The slides can be spotted by robots. • Of course, what genes you can study with a given microarray depends on the collection of probes on it. • There are a number of commercial manufacturers; e.g. Affymetrix, Agilent, Amersham. • They’re expensive!
Microarray experiments • Start with two cell types, e.g. “healthy” and “diseased”. • Isolate mRNA from each cell type, generate cDNA with fluorescent dyes attached, e.g. green for healthy and red for diseased. • Mix the cDNA samples and incubate with the microarray. • After incubation the cDNA in the samples has had a chance to bind (hybridize) with the probes on the chip. • The chip is read by a scanner that uses lasers to excite the fluorescent tags; the intensity levels of the dyes are recorded for each probe gene and stored in a computer.
Microarray data representation • There is a “standard” color scale representation, as follows. • Red means the gene produced more mRNA in the experimental condition; green means the gene produced more mRNA in the control. • Black means equal amounts of mRNA for both experiment and control. • If e.g. there were 5 times as much mRNA for the experimental condition compared to the control, we would say there was a 5-foldinduction; 1/5 as much would be 5-fold repression. • The data is recorded numerically as the log base 2 of the expression ratio.
Microarray data analysis • Since there are typically so many genes, it is useful to cluster the genes based on similar expression patterns. • Different clustering algorithms may be used, e.g. hierarchical with different metrics, or k-means, k-medians. • It may also be useful to cluster the samples (we’ll see this shortly). • Other statistical methods may be useful, e.g. support vector machines (SVM).
Acute Lymphoblastic Leukemia (ALL) • Constitutes 75% of annual diagnoses of childhood leukemia. • Long-term outlook has improved dramatically since about 1970. At that time the long term disease free survival rate (LTDFS) was under 10%; at present it is over 80%. • There is still a risk of relapse in 20% of patients.
ALL (cont.) • The LTDFS rate improved because it was recognized that ALL is heterogeneous, and the therapy should be tailored to the subtype so as to improve the odds of a successful treatment (e.g. bone marrow transplant vs. chemotherapy). • Important subtypes include: T-ALL, E2A-PBX1, BCR-ABL, TEL-AML1, MLL rearrangement, and hyperdiploid > 50 chromosomes.