370 likes | 516 Views
Bioinformatics Challenge Day. Peter Carr 2/2/2013. This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and
E N D
Bioinformatics Challenge Day Peter Carr 2/2/2013 This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
Bioinformatics Challenge Days The problem: drowning in complex data, very hard to make sense of it all • Approach: A one day hack-a-thon • Innovate: tackle huge challenges in bioinformatics • Educate: bring in specialists from diverse fields, participants in DoD bioinformatics interests • Investigate: what this short format can accomplish • Aggregate: bring people together • The Challenges: • Can you determine the cause of an infection? • Can you invent a new way to visualize complex bioinformatics data? • Can you spot the signs of genetic engineering? Can you figure out what an engineered organism does? DNA sequencing MAGE engineering • Sponsor: Defense Threat Reduction Agency (DTRA) • Organizer: MIT Lincoln Laboratory (MIT LL)
Cast of Characters Darrell Ricke (MIT Lincoln Laboratory) • Bioinformatics Peter Carr (MIT Lincoln Laboratory) • Synthetic Biology, Biochemistry Anna Shcherbina (MIT Lincoln Laboratory) • Bioengineering, Electrical Engineering Nancy Burgess (Defense Threat Reduction Agency) • Chemical and Biological Defense
Some Big Hammers • Sequencing • Complete genome sequences • Mixed populations • Expression (RNA species) • Interaction (ChIP-seq) • Mass spectroscopy • Protein/peptide fingerprinting • Metabolites • Interaction (cross-linking) • Other tools • Microarrays • High-throughput screening (e.g. fluorescence)
Now and Future • Data galore: Omics approaches are generating massive amounts of increasingly complex measurement data • How do we best make sense of this information? • Some fundamental development areas • Processing • Visualizing/analyzing • Storing/accessing
The Challenges • Metagenomic Visual Developing visualization methods to facilitate analysis of metagenomic data with unknown numbers of genomes at varying concentrations • Genome Assembly for the Clinic Performing de novo assembly from clinical samples with an emphasis on pathogen identification • Genetic Engineering ID and interpret the signatures of genetic engineering
What can your efforts today produce? • Analysis, answers to questions • Heuristics, algorithms • Specific software tools • Roadmap for future work
What to get out of this? • A deeper understanding of the field • Tools • Approaches • Concerns/challenges • Ideas and experiences that may motivate future work • Connection to others with similar interests
What We Hope to See From You Creativity (innovative ideas and efforts) Energy (intensity and focus) Communication (results, feedback)
Theme: Flexibility • You can work alone, come with a team, or team up on-site • You can use any of the resources we have provided, any you have access to (including tools you code yourself ahead of time or today) • You keep what you make (DTRA and MIT LL make no claims to what you produce)
Schedule 8:00 AM Breakfast/check-in 9:00 AM Welcome (Pete) 9:15 AM Overview and logistics (Pete) 9:45 AM The Challenges: 1. Metagenomic Visual (Anna) 2. Genome Assembly for the Clinic (Darrell) 3. Genetic Engineering (Pete) 10:45 AM Coffee/Break into project groups 12:30 PM Lunch served (groups can continue to work) 3:30 PM Snack (groups can continue to work) 6:30 PM Progress updates ready by dinnertime 6:30 PM Dinner and progress reports 8:00 PM+Groups can continue to work
Getting Started • On the USB sticks: • Data for the three challenges (FASTA, FASTQ, CSV) • Software (Mac, Windows, Linux) • Local wifi access • Teaming
Challenge 3: Genetic Engineering • Background: a sample has been dug from the back of a lab freezer, and subjected to Ion Torrent sequencing • We would like to know what it is: • Simple or complex? • Natural or engineered? • If engineered, how? (what techniques) • For what purpose? • Will the design work? • [No surprise: yes, there is an (in silico) engineered component. Find it! And figure out as much as you can about it.] • We have a lot of great questions, but may not have all the answers
What Do We Design For? • Investigation (answer a biological question) • Production (make a drug, a fuel) • Serve a specialized role • Protect against infection • Detect dangerous chemicals • Environmental remediation • Creatively explore an interesting design space
Getting DNA In • Transformation/transfection can be via natural, chemical, or electrical methods
Old School: Conjugation • Transfer “in vivo” protects fragile DNA • An entire genome can be transferred • Transfer to other species • Requires an origin of replication, pilus protein donor (sender) recipient (receiver)
Old School: Phage Transduction • Phage/virus can replicate independently, or integrate into genome • DNA or RNA, single- or double-stranded • Examples: • Lentivirus (mammalian) • Lambda, T4, T7, P1, M13 (E. coli)
Old School: Mutagenesis • Natural mutation rates (mutations accumulate slowly over time) • Exposure to damaging effects (chemicals, radiation) • Mutator strains: cells defective for one or more natural repair mechanisms
Revolution 1: Restriction Enzymes • Specific sites: often 6 bp, but can be longer or shorter • “Outside cutters” cut some distance away from recognition site • Homing nucleases (longer ~30 bp sites, can be unique in a genome) • Multiple Cloning Site (MCS) often engineered into cloning vector
Plasmids • Circular • Contain origin of replication • Single copy • Low to high copy (hundreds) • Selection gene (1 or more) • MCS and other features common • Extension: BACs and YACs
Selection and Screening • Almost all approaches give a mix of successes and failures • Screening searches for what you want • Selection kills off what you don’t want
Revolution 2: PCR Polymerase Chain Reaction • Simple scheme made it possible to manipulate DNA in new ways • Used not just to make more DNA, but to modify it • Dependent on oligonucleotide synthesis and enzyme (DNA polymerase)
Site-Directed Mutagenesis • Perform on DNA in vitro (higher background error rates than in vivo) • Employs a synthetic oligo and an enzyme (polymerase) • Users typically screen clones with PCR or restriction, then sequencing • Rest of the plasmid typically not re-sequenced
Gibson Assembly • Can bring together many pieces of DNA at once • Based on identical sequence overlaps • 3-enyzme reaction • Intrinsically scar-less • Often relies on PCR (& thus oligos) to produce each segment http://www.youtube.com/watch?v=WCWjJFU1be8
Golden Gate Assembly • “Outside cutter” restriction enzymes • Little or no scar at joining point • Segments may or may not be produced by PCR
Recombination • Site-specific • attB (Gateway) • Cre/lox • Homologous • Natural (B. Subtilis, RecA) • Engineered (lambda red) • Directed by double-stranded break repair • Zn finger nucleases • TALENs • CRISPRs
DNA Synthesis to Genome Assembly • Oligo synthesis (building blocks) using organic chemistry • Assemble to genes using biochemistry (in vitro) • Assemble to genomes (small ones for starters) using biology (in vivo) • Each of these processes can carry their own error signature, but can also be counteracted by sequencing-based screening, post-repair, etc.
MAGE: Multiplexed Automatable Genome Engineering Generation of genome edits at many targeted chromosomal locations Much like site-directed mutagenesis, but on a chromosome Wang, Isaacs, Carr et al. (2009)Nature460(7257):894-8
MAGE • A lot like site-directed mutagenesis—but on the genome of living cells • Uses long oligos • Does not require selection markers (but can use them) • Other than the desired change (as small as a DNA base, as large as a multi-gene deletion) there is no obvious sign • BUT there can be secondary signs: • Oligo-mediated defects within 50-100 bp of the edited site • Higher background mutation rates (mismatch repair deactivated)
CAGE: Conjugative Assembly Genome Engineering • Conjugation now employed with controlled precision • But DNA crossover points not always perfectly defined Isaacs, Carr, Wang, ... (2011) Science
Genetic Circuits: DNA Parts • Make use of DNA “parts” libraries for constructing more advanced genetic designs • Fundamental concept in synthetic biology, inspired by electrical engineering • Basis of the iGEMcompetetion (International Genetically Engineered Machines)
Genetic Circuits: Bacteria • Repressilator an early example of synthetic biology circuits • Three inverters in series (circular) made a ring oscillator) Elowitz and Liebler (2000) Nature
Genetic Circuits: Yeast • Adapted a signaling system from plants • Used to engineer communication between yeast cells • Basic features can be installed in a variety of organisms Chen and Weiss (2005) Nature Biotechnology
Genetic Circuits: Mammalian Concept: insert DNA circuit into cells ID cancer and/or kill it Overview Genetic Circuits DNA for classifier circuit cancer cell normal cell no match match cell death no effect Xie et al. (2011) Science (Weiss, Benenson labs)
Increasingly Alien • Codon usage • Adapt how often codons are used to match target organism • New amino acids (Tirrell, Schultz) • New genetic codes (Church, Carr) • Minimal life • Engineering by subtraction (Blattner) • Compose from the ground up (Forster/Church) • New DNA bases • Alternate hydrogen-bonding (Benner) • Hydrophobic bases (Schultz) • Mirror-image life