410 likes | 506 Views
CSE280a: Projects. Vineet Bafna. Project Logisitics. Research project (70%) Work individually, or in groups of 2 Two presentations: Introductory presentation: Feb 1st week (20 minutes) (20% grade) Describe the goals of the project Describe your (computational) formulation
E N D
CSE280a: Projects Vineet Bafna Vineet Bafna
Project Logisitics • Research project (70%) • Work individually, or in groups of 2 • Two presentations: • Introductory presentation: Feb 1st week (20 minutes) (20% grade) • Describe the goals of the project • Describe your (computational) formulation • Summarize/critique reading assignment • Present an algorithm • Constructive criticism of other projects • One on one meeting with instructor (end February) (10% grade) • Discuss preliminary results • Final presentation (last 2-3 classes): (30% grade) • Submit a final report • Final presentation Vineet Bafna
Project 1: disease gene mapping • Recall, Linkage Disequilibrium • In the absence of recombination, • Correlation between columns • The joint probability Pr[A=a,B=b] is different from P(a)P(b) • With extensive recombination • Pr(a,b)=P(a)P(b) Vineet Bafna
Measures of LD • Consider two bi-allelic sites with alleles marked with 0 and 1 • Define • P00 = Pr[Allele 0 in locus 1, and 0 in locus 2] • P0* = Pr[Allele 0 in locus 1] • Linkage equilibrium if P00 = P0* P*0 • D = abs(P00 - P0* P*0) = abs(P01 - P0* P*1) = … Vineet Bafna
LD can be used to map disease genes • LD decays with distance from the disease allele. • By plotting LD, one can short list the region containing the disease gene. LD D N N D D N 0 1 1 0 0 1 Vineet Bafna
Multiple loci • In complex diseases, multiple loci interact to confer disease susceptibility LD D N N D D N 0 0 1 0 0 1 0 1 1 0 0 0 Vineet Bafna
Testing for multiple loci • Assume SNP matrix with n individuals, m loci. Testing for all sets of 5 SNPs implies a huge number of computations? • Can you come out with computational strategies that can speed it up? Vineet Bafna
Speeding up multiple locus computations • A filtering strategy? • Input: a SNP matrix with one or more pairs that interactively associate • Output: a set of SNP pairs that includes the interacting pair(s). • Method should be fast, and should NOT consider all pairs. Vineet Bafna
Speeding up the computations • Correlated SNPs should also have low hamming distance. • Random SNPs should have high hamming distance. • Strategy: select k individuals at random. • Hash each individual restricted to k individuals • Correlated SNPs should fall in the same bin with high probability 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 k=2 Vineet Bafna
Project 2: mtDNA phylogeny • In the absence of recombination, the history of mitochondrial DNA can be expressed by a tree. • The goal of this project is to build a robust phylogeny using a heuristic modification of the perfect phylogeny. Vineet Bafna
The Genographic project • The genographic project aims to trace geographic origins of the human race using mitochondrial DNA. https://www3.nationalgeographic.com/genographic/atlas.html Vineet Bafna
Without recurrent mutations • Unique tree can explain the evolutionary history 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r 1 3 E 2 B 5 4 D A C Vineet Bafna
With recurrent mutations 2 • Adding another individual F destroys perfect phylogeny • Why? • It is not so easy to place F • Can you suggest a strategy? 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 F 0 1 0 0 0 r 1 3 E 2 B 5 4 1 D F A C Vineet Bafna
Tests of Selection • In class, we have discussed alleles that can be selectively neutral, or under active selection • Active selection may be positive or negative • How do we identify regions under positive, or negative selection? • Balancing selection: sometimes it is helpful for a population to Vineet Bafna
Adaptive Selection • Selection leads to loss of heterozygosity (will be explained in detail in the next lecture). • Can you come up with a test for selection? Vineet Bafna
Balancing selection • Sometimes both alleles are useful in a population, and it helps to have both around • A simple example is when diversity is important (the two variants help maintain diversity) • Bipolar disorder genes could be under balancing selection • High creativity which might confer some selective/reproductive advantage. • Depression offers a disadvantage • If so, the tests for this disorder might be tricky. • How can we identify regions under balancing selection? Vineet Bafna
Testing for Balancing Selection • Adaptive selection leads to loss of heterozygosity (will be explained in detail in the next lecture). • Balancing selection leads to two dominant haplotypes • Can you come up with a test for balancing selection? Vineet Bafna
Project: Primer design for cancer genomics Vineet Bafna
The Science behind Gleevec Fusions • observed in leukemia, lymphoma, and sarcomas • “Philadelphia Translocation” • Drugs target this fusion protein Vineet Bafna
Fluoroscent in situ hybridization • Cancer genomes show extensive structural variation Vineet Bafna
Assaying for tumor variants • Most tumors start off with a single cell, which then proliferate. • Drugs like Gleevec are used well after cancer has taken hold. • Can we detect the cancer early by detecting the genomic abnormality? • If a very few cells in the person are cancerous, can we still detect it? • Can we track a patient through his treatment? Vineet Bafna
Cancer genomics • In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes • In the early stages, only a few cells will show this deletion Vineet Bafna
Polymerase Chain Reaction • PCR is a technique for amplifying and detecting a specific portion of the genome • Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb) Vineet Bafna
Assaying for Rare Variants • PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Detection PCR Extract Genomic DNA Distance too large for amplification Tumor cell Vineet Bafna
Variant Variants • What if the variant is the minority in the cell population? • What if deletion boundaries are uncertain? Patient A Deletion Patient B Deletion Patient C Deletion Vineet Bafna
Observed variation in deletion size Sizes of homozygous deletions in cell lines from different human cancers. (scale is in megabases). Vineet Bafna
Primer Approximation Multiplex PCR (PAMP)* • Multiple primers are optimally spaced, flanking a breakpoint of interest • Upstream of breakpoint, forward primers • Downstream of breakpoint, reverse primers • The primers are run in a multiplex PCR reaction • Any pair can form a viable product Patient B Patient C Deletion Deletion Vineet Bafna
Experimental Design (500Kb region) • 10 sets of 25 primers: upstream and downstream • 250 upstream • 250 downstream • Primer-pairs closest to breakpoint amplified • Assay by oligo array Goal: Computational selection of an ‘optimal’ primer set Vineet Bafna
Goal • Input, a collection of primers • Identify a subset of primers that do not cross-hybridize, are unique, yet cover the region completely • Use combinatorial optimization, simulated annealing, integer linear programming….. Vineet Bafna
Spectral Networks Algorithms for De Novo Interpretation of Tandem Mass Spectra Nuno Bandeira, Ph.D. Department of Computer Science and Engineering, University of California, San Diego ProtIG seminar series September 21, 2007 Vineet Bafna
Proteins and their modifications Proteins are fundamental players in the regulation of biological processes. encodes for DNA Proteins regulate Knowing proteins involves knowing many things. This dissertation focuses on: - Identification - Sequencing - Post-translational modifications ( ) Vineet Bafna
Protein sequences and modifications SRLEM ILGF Mass( )=16 Mass(M )=147 From a computational perspective, a protein can be represented as a string over a weighted alphabet: Protein sequence: …AFSRLEMILGF… AFSRL Subsequences are called peptides (obtained via enzymatic digestion) SRLEMILGF EMILG Modifications change amino acid masses: SRLEMILGF Mass(M)=131 Mass(SRLEMILGF)=1047 Mass(SRLEM ILGF)=1063 Vineet Bafna
Nobel prize in chemistry, 2002 Vineet Bafna
What is mass spectrometry? Vineet Bafna http://nobelprize.org/chemistry/laureates/2002/chemadv02.pdf
Tandem Mass Spectrometry (MS/MS) Modified peptide LARG*E PM …THISISAVERYLARGESAMPLEPRTEINSEQENCE… Protein Sequence: Modification: any event that changes the mass at a specific site. Peptide LARGE : b : b y: : y MS/MS spectrum Vineet Bafna
Example of a real MS/MS spectrum b10 Symmetric y12 Vineet Bafna
Tandem Mass Spectrometry (MS/MS) Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, SEQUENCE, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, SEQUENCE, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database search De novo sequencing u e q q f q s u u e e n n n e e c s s n e e q c c e e u s e e c Peptide SEQUENCE Peptides Tandem Mass Spectrometry Enzymatic digestion Proteins Large set of MS/MS spectra … … Vineet Bafna
Mixture spectra Sometimes, the instrument generates a single spectrum from two or more peptides: Peptide B: ALDDILNLK Peptide A: NLAFFQLR ? Vineet Bafna Mixture spectrum
How to identify mixture spectra? Vineet Bafna
Proposed approach • When identifying a mixture spectrum of peptides A,B, assume you have non-mixture spectra for the same peptides. • Compare the non-mixture spectra of known peptides to putative mixture spectra to determine peptide identifications Vineet Bafna
Project description • Implement an algorithm to identify mixture spectra from pairs of peptides by combining previously identified spectra from isolated peptides. • Test the above implementation by simulating mixture spectra using an existing database of spectra from isolated peptides. • Propose a scoring procedure to separate correct from false identifications. Nuno Bandeira bandeira@ucsd.edu Vineet Bafna