520 likes | 689 Views
Biophysics 101 Genomics and Computational Biology. Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics, pharmacogenomics
E N D
Biophysics 101 Genomics and Computational Biology Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26DNA 2:Polymorphisms, populations, statistics, pharmacogenomics Tue Oct 03 DNA 3: Dynamic programming, Blast, Multi-alignment, HiddenMarkovModels Tue Oct 10 RNA 1: Microarrays, library sequencing & quantitation concepts Tue Oct 17 RNA 2: Clustering by gene or condition & other regulon data sources Tue Oct 24 RNA 3: Nucleic acid motifs; the nature of biological "proofs". Tue Oct 31 Protein 1: 3D structural genomics, homology, dynamics, function & drug design Tue Nov 07 Protein 2: Mass spectrometry, post-synthetic modifications, Tue Nov 14 Protein 3: Quantitation of proteins, metabolites, & interactions Tue Nov 21 Network 1: Metabolic kinetic & flux balance optimization methods Tue Nov 28 Network 2: Molecular computing, self-assembly, genetic algorithms, neuralnets Tue Dec 05 Network 3: Cellular, developmental, social, ecological & commercial models Tue Dec 12 Team Project presentations Tue Dec 19 Project Presentations Tue Jan 02 Project Presentations Tue Jan 09 Project follow-up & course synthesis
101 Section meetings Tue 3:00 - 4:00 Haley HMS MEC 342 Wed 7:00 - 8:00 pm Jason HMS MEC 342 Thu 12:00 - 1:00 Dan HMS MEC 342 (Except on 12-Oct & 9-Nov he will use MEC 338) Thu 12:00 - 1:00 Nick HMS MEC 340 Tue 7:30 - 9:00 pm Doug Science Cntr 110 Tue 7:30 - 8:30 pm Allegra Science Cntr 101B Tue 7:30 - 8:30 pm Yonatan Science Cntr 102B Wed 6:00 - 7:00 pm Peter Science Cntr 112 Thu 8:00 - 9:00 pm Adnan Science Cntr 209 Despite recruitment of new TFs, the sections are crowded so there are no auditor sections. Anyone registered who did not receive email should check the list at the break. (Email to schedule another "Biology tutorial" for Math/CS experts)
Last week's take home lessons Life & computers : Self-assembly Math: be suspicious of approximations Catalysis by RNA & proteins "The Code": treasure (but don't memorize) exceptions Replication Differential equation: dx/dt=kx Mutation & the single molecule: Noise is overcome Human disease: SNPs <1 ppb & 1.5 fold dosage Directed graphs & pedigrees Bell curve statistics: Binomial & Poisson Selection
Today's story, logic & goals Types of mutants Mutation, drift, selection Binomial & exponential dx/dt = kx Association studies c2 statistic Linked and causative alleles Haplotypes Computing the first genome, the second ... New technologies Random and systematic errors
Types of Mutants Null: PKU Dosage: Trisomy 21 Conditional (e.g. temperature or chemical) Gain of function: HbS Altered ligand specificity
Altered specificity mutants A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. A mutant Escherichia coli sigma 70 subunit of RNA polymerase with altered promoter specificity. A mutant of Escherichia coli with altered inducer specificity for the fad regulon. A mutation in the xanthine dehydrogenase (purine hydroxylase I) of Aspergillus nidulans resulting in altered specificity. Implications for the A point mutation in the gamma2 subunit of gamma-aminobutyric acid type A receptors results in altered benzodiazepine binding site A point mutation leads to altered product specificity in beta-lactamase catalysis. A site-specific endonuclease derived from a mutant Trp repressor with altered DNA-binding specificity. A spontaneous point mutation in the aac(6')-Ib' gene results in altered substrate specificity of aminoglycoside 6'-N-acetyltransferase of a A streptavidin mutant with altered ligand-binding specificity. A structural model for the HIV-1 Rev-RRE complex deduced from altered-specificity rev variants isolated by a rapid genetic strategy. A technique for the isolation of yeast alcohol dehydrogenase mutants with altered substrate specificity. A U1 small nuclear ribonucleoprotein particle with altered specificity induces alternative splicing of an adenovirus E1A mRNA precursor. Amino acid substrate specificity of Escherichia coli phenylalanyl-tRNA synthetase altered by distinct mutations. An altered specificity mutation in the lambda repressor induces global reorganization of the protein-DNA interface. An altered-specificity mutation in a human POU domain demonstrates functional analogy between the POU-specific subdomain and Analysis of estrogen response element binding by genetically selected steroid receptor DNA binding domain mutants exhibiting altered Antiprotease targeting: altered specificity of alpha 1-antitrypsin by amino acid replacement at the reactive centre. AraC proteins with altered DNA sequence specificity which activate a mutant promoter in Escherichia coli. Assessment of the role of an omega loop of cholesterol oxidase: a truncated loop mutant has altered substrate specificity. Butyramide-utilizing mutants of Pseudomonas aeruginosa 8602 which produce an amidase with altered substrate specificity. Carboxyl-terminal domain dimer interface mutant 434 repressors have altered dimerization and DNA binding specificities. Characterization of the nuclear protein import mechanism using Ran mutants with altered nucleotide binding specificities. Computational method for the design of enzymes with altered substrate specificity. Crystallographic analysis of trypsin-G226A. A specificity pocket mutant of rat trypsin with altered binding and catalysis. Designing zinc-finger ADR1 mutants with altered specificity of DNA binding to T in UAS1 sequences. Dinitrogenase with altered substrate specificity results from the use of homocitrate analogues for in vitro synthesis of the iron-molybdenum Dissecting Fas signaling with an altered-specificity death-domain mutant: requirement of FADD binding for apoptosis but not Jun DNA-binding-defective mutants of the Epstein-Barr virus lytic switch activator Zta transactivate with altered specificities. E461H-beta-galactosidase (Escherichia coli): altered divalent metal specificity and slow but reversible metal inactivation. EcoRV-T94V: a mutant restriction endonuclease with an altered substrate specificity towards modified oligodeoxynucleotides. Engineering proteases with altered specificity. Engrailed (Gln50-->Lys) homeodomain-DNA complex at 1.9 A resolution: structural basis for enhanced affinity and altered specificity. Enhanced activity and altered specificity of phospholipase A2 by deletion of a surface loop. Escherichia coli hemolysin mutants with altered target cell specificity. Evidence for an altered operator specificity: catabolite repression control of the leucine operon in Salmonella typhimurium. Evidence that HT mutant strains of bacteriophage P22 retain an altered form of substrate specificity in the formation of transducing Ferrichrome transport in Escherichia coli K-12: altered substrate specificity of mutated periplasmic FhuD and interaction of FhuD with the Generation of estrogen receptor mutants with altered ligand specificity for use in establishing a regulatable gene expression system.
Altered specificity mutants (continued) Genetic strategy for analyzing specificity of dimer formation: Escherichia coli cyclic AMP receptor protein mutant altered in dimerization Immunoglobulin V region variants in hybridoma cells. I. Isolation of a variant with altered idiotypic and antigen binding specificity. In vitro selection for altered divalent metal specificity in the RNase P RNA. In vitro selection of zinc fingers with altered DNA-binding specificity. In vivo selection of basic region-leucine zipper proteins with altered DNA-binding specificities. Isolation and properties of Escherichia coli ATPase mutants with altered divalent metal specificity for ATP hydrolysis. Isolation of altered specificity mutants of the single-chain 434 repressor that recognize asymmetric DNA sequences containing TTAA Mechanisms of spontaneous mutagenesis: clues from altered mutational specificity in DNA repair-defective strains. Molecular basis of altered enzyme specificities in a family of mutant amidases from Pseudomonas aeruginosa. Mutants in position 69 of the Trp repressor of Escherichia coli K12 with altered DNA-binding specificity. Mutants of eukaryotic initiation factor eIF-4E with altered mRNA cap binding specificity reprogram mRNA selection by ribosomes in Mutational analysis of the CitA citrate transporter from Salmonella typhimurium: altered substrate specificity. Na+-coupled transport of melibiose in Escherichia coli: analysis of mutants with altered cation specificity. Nuclease activities of Moloney murine leukemia virus reverse transcriptase. Mutants with altered substrate specificities. Probing the altered specificity and catalytic properties of mutant subtilisin chemically modified at position S156C and S166C in the S1 Products of alternatively spliced transcripts of the Wilms' tumor suppressor gene, wt1, have altered DNA binding specificity and regulate Proline transport in Salmonella typhimurium: putP permease mutants with altered substrate specificity. Random mutagenesis of the substrate-binding site of a serine protease can generate enzymes with increased activities and altered Redesign of soluble fatty acid desaturases from plants for altered substrate specificity and double bond position. Selection and characterization of amino acid substitutions at residues 237-240 of TEM-1 beta-lactamase with altered substrate specificity Selection strategy for site-directed mutagenesis based on altered beta-lactamase specificity. Site-directed mutagenesis of yeast eEF1A. Viable mutants with altered nucleotide specificity. Structure and dynamics of the glucocorticoid receptor DNA-binding domain: comparison of wild type and a mutant with altered specificity. Structure-function analysis of SH3 domains: SH3 binding specificity altered by single amino acid substitutions. Sugar-binding and crystallographic studies of an arabinose-binding protein mutant (Met108Leu) that exhibits enhanced affinity & altered T7 RNA polymerase mutants with altered promoter specificities. The specificity of carboxypeptidase Y may be altered by changing the hydrophobicity of the S'1 binding pocket. The structural basis for the altered substrate specificity of the R292D active site mutant of aspartate aminotransferase from E. coli. Thymidine kinase with altered substrate specificity of acyclovir resistant varicella-zoster virus. U1 small nuclear RNAs with altered specificity can be stably expressed in mammalian cells and promote permanent changes in Use of altered specificity mutants to probe a specific protein-protein interaction in differentiation: the GATA-1:FOG complex. Use of Chinese hamster ovary cells with altered glycosylation patterns to define the carbohydrate specificity of Entamoeba histolytica Using altered specificity Oct-1 and Oct-2 mutants to analyze the regulation of immunoglobulin gene transcription. Variants of subtilisin BPN' with altered specificity profiles. Yeast and human TFIID with altered DNA-binding specificity for TATA elements.
From genomics to public health Vaccines, drugs, lifestyle, public health measures Pharmacogenomics Targets (proteins or phenotypes) Chemical diversity Gene therapy, DNA vaccines, ribozymes, nutrition High-throughput screening of compounds Animal testing Clinical trials phase 1,2,3 Formulation: Bioavailability Toxicity Delivery: time release ,feedback Marketing and societal priorities
Pharmacogenomics Gene/Enzyme Drug Quantitative effect Examples of clinically relevant genetic polymorphisms influencing drug metabolism and effects. Additional data
Diversity Databases 45 genomes completed & 324 started 216.190.101.28/GOLD (DBCat& NAR)513 bio-databases plus List of SNP databasesariel.ucs.unimelb.edu.au:80/~cotton/mdi.htm 803557 human SNPs www.ncbi.nlm.nih.gov/SNP 296990 mapped snp.cshl.org 21591 SNPs in genes http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html
A significant basepair aggtcatctgagGtcaggagttca ANALYSIS: ALU repeat found upstream of Iodothyronine deiodinase, Myeloperoxidase, Keratin K18, HoxA1,etc. "-463 G creates a stronger SP1 binding site & retinoic acid response element (RARE) in the allele... overrepresented in acute promyelocytic leukemia" Piedrafita FJ, et al. 1996 JBC 271: 14412
Critique of a basepair 1. 97% of the genome is noncoding. 2. Even repeats have regulatory & health relevance. 3. H. sapiens as a model system: Saturation mutagenesis screen of 6x109 heterozygotes; many hits per basepair on average. 4. One key basepair may be too reductionistic. Whole genome, whole population, whole network analyses are becoming increasingly feasible.
Today's story, logic & goals Types of mutants Mutation, drift, selection Binomial & exponential dx/dt = kx Association studies c2 statistic Linked and causative alleles Haplotypes Computing the first genome, the second ... New technologies Random and systematic errors
Where do allele frequencies come from? Mutation (T), Migration(M), Drift (D), Selection(S), … Tj=Sj+S(SiFj-i - SjRj-i) + S(SiRi-j - SjF i-j) i=0,j-1 i=j+1,N Mj= Tj + analogous to above Dj= S Mi*B(N,j,i/N) i=0,N Sj= Dj * w (w=relative fitness of i mutants to N-i original). __________________________________ T,M,D,Si = frequency of i mutants in a pop. size N Fi= forward rate = B(N,i,PF), Ri=reverse B(N,i,p)= Binomial = C(N,i) pi (1-p)N-i (ref)
Random Genetic Drift very dependent upon population size
Directional & Stabilizing Selection • codominant mode of selection (genic selection) • fitness of heterozygote is the mean of the fitness of the two homozygotes AA = 1; Aa = 1 + s; aa = 1 + 2s • always increase frequency of one allele at expense of the other • overdominant mode • heterozygote has highest fitness AA = 1, Aa = 1 + s; aa = 1 + t where 0 < s > t • reach equilibrium where two alleles coexist
Fixation Times • for neutral mutations, K = µ • for advantageous mutations, K = 4Nsµ
Role of Genetic Exchange • Effect on distribution of fitness in the whole population • Can accelerate rate of evolution at high cost (50%)
Network genomics Environment Metabolites Interactions RNA DNA Protein Growth rate Expression stem cells cancer cells viruses organisms
64 Conditions 48 (to 600) Strains Intensity calibrated to strain abundance (selection coeficient)
Ratio of strains over environments, e , times, te, selection coefficients, se, R = Ro exp[-sete] 80% of 34 random yeast insertions have s<0.3% or s>0.3% t=160 generations, e=1 (rich media); ~50% for t=15, e=7. Should allow comparisons with population allele models. Other multiplex competitive growth experiments: Thatcher, et al. (1998) PNAS 95:253. Link AJ (1994) thesis; (1997) J Bacteriol 179:6228. Smith V, et al. (1995) PNAS 92:6479. Shoemaker D, et al. (1996) Nat Genet 14:450.
Today's story, logic & goals Types of mutants Mutation, drift, selection Binomial & exponential dx/dt = kx Association studies c2 statistic Linked and causative alleles Haplotypes Computing the first genome, the second ... New technologies Random and systematic errors
Caution: phases of human genetics Monogenic vs. Polygenic dichotomy Method Problems Mendelian Linkage need large families Common direct (causative) 3% coding + ?non-coding Common indirect (LD) recombination & new alleles All alleles (causative) expensive LD= linkage disequilibrium = non-random association of k alleles
Electron magnetic moment to Bohr magneton ratio me/mB = 1.0011596521869 (41) Ur= 4.1x10-12 "99.5%…to accept unambiguously that the Higgs has been spotted, the chances … have to be reduced to one in ten million" Peter J. Mohr and Barry N. Taylor, CODATA & Reviews of Modern Physics, Vol. 72, No. 2, 2000. physics.nist.gov/cuu/Constants Nature 407: 118 Number of genes in the human genome34,000 to 120,000 Nature Genetics July 2000
But what if we test more than one locus? The future of genetic studies of complex human diseases. ref
How many "new" polymorphisms? G= generations of exponential population growth = 5000 N'= population size = 6 x 109 now; N= 104 pre-G m= mutation rate per bp per generation = 10-8 to 10-9 (ref) L= diploid genome = 6 x 109 bp ekG = N'/N; so k= 0.0028 Av # new mutations < SLektm = 4 x 103 to 4 x 104 per genome t=1 to 5000 Take home: "High genomic deleterious mutation rates in hominids" accumulate over 5000 generations & confound LD.
How well linked? G= generations of exponential population growth = 5000 N= population size = 6 x 109 now; N= 104 pre-G for each haplotype H, frequency of H on the variant gametes = nvH/nv frequency of H on the + gametes= n+H/n+ linkage disequilibrium: d2 = (nvH/nv - n+H/n+)2 = 0 to 1 q= marker separation 1% recomb = 1 Mbp If S= sample size needed to detect variant & disease assoc. then approx. S/d2 is required for the LD marker. (Kruglyak ref)
LD as a function of marker spacing & population expansion times Variant at 50% Variant at 10%
Finding & Creating mutants Isogenic Proof of causality: Find > Create a copy > Revert Caution: Effects on nearby genes Aneuploidy (ref)
Pharmacogenomics Example 5-hydroxytryptamine transporter Lesch KP, et al Science 1996 274:1527-31 Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Pubmed
Caution: phases of human genetics Monogenic vs. Polygenic dichotomy Method Problems Mendelian Linkage (300bp) need large families Common indirect/LD (106bp) recombination & new alleles Common direct (causative) 3% coding + ?non-coding All alleles (109) expensive ($0.20 per SNP)
Today's story, logic & goals Types of mutants Mutation, drift, selection Binomial & exponential dx/dt = kx Association studies c2 statistic Linked and causative alleles Haplotypes Computing the first genome, the second ... New technologies Random and systematic errors
& haplotyping technologies • de novo sequencing > scanning > selected sequencing > diagnostic methods • Sequencing by synthesis • 1-base Fluorescent, isotopic or Mass-spec* primer extension (Pastinen97) • 30-base extension Pyrosequencing (Ronaghi99)* • 700-base extension, capillary arrays dideoxy*(Tabor95, Nickerson97, Heiner98) • SNP & mapping methods • Sequencing by hybridization on arrays (Hacia98, Gentalen99)* • Chemical & enzymatic cleavage: (Cotton98) • SSCP, D-HPLC (Gross 99)* • Femtoliter scale reactions (105 molecules) • 20-base restriction/ligation MPSS (Gross 99) • 30-base fluorescent in situ amplification sequencing (Mitra 1999) • Single molecule methods (not production) • Fluorescent exonuclease (Davis91) • Patch clamp current during ss-DNA pore transit (Kasianowicz96) • Electron, STM, optical microscopy (Lagutina96, Lin99) New Genotyping
Fluorecent primers or ddNTPs Anal Biochem 1997 Oct 1;252(1):78-88 Optimization of spectroscopic and electrophoretic properties of energy transfer primers. Hung SC, Mathies RA, Glazer AN http://www.pebio.com/ab/apply/dr/dra3b1b.html
Ewing, Hillier, Wendl, & Green 1998 Indel=I+D Total= I+D+N+S
What are examples of random & systematic errors? For (clone) template isolation? For sequencing? For assembly?
Examples of systematic errors For (clone) template isolation: restriction sites, repeats For sequencing: Hairpins, tandem repeats For assembly: repeats, errors, polymorphisms, chimeric clones, read mistracking
Project completion % vs coverage redundancy Whole-genome shotgun X= mean coverage (see Roach 1995)
B B’ Conventional dideoxy gel with 2 hairpin Systematic errors 3’ 5’ C G T A A T A T A A Sequential dNTP addition (pyrosequencing) > 30 base reads; no hairpin artefacts
Use of DNA Chips for SNP ID & Scoring • already used for mutation detection with HIV-1, BRCA1, mitochondria • higher detection rate than gel-based assays • higher throughput and potential for automation • ID of > 2000 SNPs in 2 Mb of human DNA • can multiplex reactions Wang et al., Science 280 (1998): 1077
Use of Mass Spec for Analysis and Scoring Haff and Smirnov, Genome Research 7 (1997): 378 A single nucleotide primer extension assay
Mass Spectrometry for Analysis and Scoring Haff and Smirnov, Genome Res. 7 (1997): 378 Use mass spec to score which base was added Can also multiplex as long as primer masses are known
Searching for Perls(If only finding mutations were as easy as finding words.) #!/usr/local/bin/perl undef $/; $dnatext = <>; $dnatext =~ s/\>.+?\n//g; $mutation = $text =~ s/mutation/mutation/gi; print " found: $mutation\n";