470 likes | 619 Views
51:123. Bioinformatics Techniques Terry Braun, Ph. D. (genetics), M.S. (EE) Administration Syllabus Webpage Text Office hours tabraun@eng.uiowa.edu. Syllabus. Webpage. http://pdb.eng.uiowa.edu/~tabraun/biotech/2008 Icon: https://icon.uiowa.edu
E N D
51:123 • Bioinformatics Techniques • Terry Braun, Ph. D. (genetics), M.S. (EE) • Administration • Syllabus • Webpage • Text • Office hours tabraun@eng.uiowa.edu
Webpage • http://pdb.eng.uiowa.edu/~tabraun/biotech/2008 • Icon: • https://icon.uiowa.edu Please check and make sure you have access.
Textbook Beginning Perl for Bioinformatics," J. Tisdall, O’Reilly, 2001. Orders? 39.95 vs 23.95
Literature Review • This year, students will read and assemble a presentation on a paper • The rest of the class must read the paper, write a question pertaining to the paper, and submit their question • Also, you will evaluate your peers (and submit your evaluations) • More details on papers to come
Previous years text (2006): Discovering Genomics, Proteomics, & Bioinformatics second edition by A. Malcolm Campbell and Laurie J. Heyer • Comments web companion: http://www.aw-bc.com/geneticsplace/ Discovery questions (links on web companion) I really like this text as it has so many examples and additional links to external material. It should allow you to perform additional learning as needed. This book has a biology emphasis.
Previous years texts (2004) • “Programming Perl (3rd edition) ” L. Wall, T. Christiansen, and J. Orwant, O'Reilly, 2000. Another good one is: • “Learning Perl,” Schwartz, O’Reilly, 2001 • These books clearly have a Perl emphasis.
Assumptions • No programming background or • No biology or genetics background (I'm assuming you have at least one). • This will probably change in future versions of this course. • Previously there was a bias towards Linux/Unix/vi • open source • favorite development platform in my lab • in this version of the course -- I will try to provide examples from many different platforms
Motivation of this course • Intro to Informatics (in CS) • Intro to Bioinformatics (51:121) • provides a first exposure to some available computational techniques and resources • however, the emphasis is on utilization • In this course (51:123) -- I try to emphasize tools and techniques that you would use to go about developing your own computational resources (software, systems, tools, etc). • Computational Methods in Molecular Biology (51:122 -- Casavant, Scheetz, Xing) • advanced topics
More Motivation • In 2003, the "bioinformatics" was not a searchable term on "monster.com" • 2008 -- you will find many "hits" • 21 for July and August
Programming • This course will go through the Perl language at a fairly introductory level. However, there are some basic concepts in general to all programming languages that are not covered in this course. • If you are new to programming, then you will be responsible for the additional work of learning basic programming concepts • I will attempt to provide as much help as possible
Editorial • This is a fairly difficult course to teach just because of the changes in technologies and techniques (for example, PHP, web services, webStart, Ruby, SNP chips, SNPlex, pyrophosphate sequencing etc. are fairly recent developments). • However, whether you go on to develop your own applications, or never write another line of code again -- your knowledge and understanding of this field will be a benefit to your career.
Bioinformatics What is it?
Many different things to many different people? • Generally, so broad, hard to define • Bio . informatics =?= biology . informatics • Informatics • problem solving with computers (or custom hardware) and software • can you really use a computer without software?
Bioinformatics • Multiple dimensions upon which to carve up bioinformatics • One dimension • software tool use • software tool development
Examples: Tool Use • Is amino acid 269 in LRP5 conserved in vertebrates? • How many SNPs are in ABCA4? • I have treated tumor cells with a new compound in an experiment and observe 32 genes that appear to be up-regulated. Is there a commonality between these 32 genes that is significant? • I find a mutation in a gene in an individual with AMD. Does that gene's protein interact with any other known proteins?
Examples: Tool Building • Given a novel genome sequence, find all genes and p-genes. • I want to design "sequence capture" probes for the exons of 40 genes that cause RP. Obtain the exonic sequene, with at least 100 nt's flanking, and 1000 nts of the promoter from transcription start • I propose a new way to find disease-causing mutations in humans. I want to only look in genes that have regions that are 1) highly conserved across species, 2) have known functional protein domains (ex. transmembrane domains), and 3) have mRNA secondary structure. Is this a good idea?
Genomics sequencing model organisms annotation discovery and understanding epigenomics Genetics mapping Disease discovery treatment/therapeutics Proteomics Genetic Engineering Gene therapy Clinic/patient/translational TCGA "Systems Biology" e-QTLA mapping New technology 454-sequencing Basic science model organisms structure and function discovery (number of genes, alternative splicing, alternative transcription start, miRNAs) Major Areas of Bioinformatics
A Biology "Dilemma" • There are students in high-school today that think they like biology because it is not quantitative • Their surprise is that biology is increasingly becoming quantitative.
Growing and Evolving Fields of Genetics, Genomics, and Computational Biology • “Explosion”, “Avalanche”, “Tsunami” of data • Complexity of data • "genome is 106.2 feet tall" (not 555 and 1/3 feet). • 30,000 “genes” • structure, function unknown • pathways (circuits) unknown • so much of biology that we do not know • chromosomal packaging, transcriptional regulation, post-translational modification, evolutionary questions – introns early/late, etc. • Stone Age of medicine for determining drugs and treatment? • Multiple model organism genomes • Requires the basics of genetics and biology • Also represents an increasing need in computational expertise
Biomedical information tsunami • overwhelming volume of data • multitude of sources Taken from Ken Buetow, NCI
Incredible developments in biomedical information generation Taken from Ken Buetow, NCI
Treatment of Disease Accelerated by Human Genome Project time Disease with genetic component ID genes Diagnostics Understanding basic biological defect Preventative medicine Gene therapy Drug therapy Pharmacogenomics Adapted from Francis Collins
Stone Age of Medicine Try compound on model organism No No Does it work? Does it harm/kill? Try compound on different model organism or for other result Yes Yes Seek approval, human testing, clinical trials, FDA, 16 years, millions of dollars Done Knowledge from the genome is moving us away from this.
Former BME student “The ability to program is a must in this day of technology. As data is collected at higher and higher rates for more accuracy, the tasks of processing data has become a must. Through CIE, I learned the basics of C programming as well as digital to analog and analog to digital conversions which I use consistently. My programming capabilities have allowed me to write programs that process data as needed for my particular experiment and does not limit my capabilities." The bottom line -- whether you end up in industry or academia, you can be wildly successful without ever programming a computer. But those with even a basic understanding will be much better prepared for the challenges of the future.
Genomes (NIH/Ensembl 2003) • Human, drosophila, mosquito, malaria parasite, various microbial genomes (112), mouse, rat, retorvirusus (50 – HIV, etc), zebrafish, fugu, plants (arabidopsis thaliana, barley, corn, cotton, potato, tomato, rice, wheat, others)
What’s in a Genome? • Stone age of medicine? • new drugs, treatments, procedures (genetic engineering), diagnosis • New materials (drugs, proteins, enzymes embedded – contact lenses, prosthetics, etc) • Tissue engineering • organ replacement, improving rejection response, new materials • neuronal • Genome discoveries: RNAi, haplotype blocks, SNPs, the number of genes, the number of pseudogene, etc. • Cells • Systems/pathways • Imaging (functional, cellular, molecular) • New devices and technology • automated sequencers, gene chips, SNPlex, proteomics ….Opportunity
First Assignment • Obtain access to a computer • Examples: • home computer with Linux, Windows, or MacOS • Lab computer with Windows, MacOS, or Linux • CSS account • http://css.engineering.uiowa.edu (or go to 1256 SC) • We will utilize (or install) perl on this computer at a later time.
Introduction to Genetics and Genomics 51:123 Bioinformatics Techniques Terry Braun
Outline • Basic Mendelian Genetics • Mendel’s laws • independent assortment • independent segregation • mitosis and meiosis • dominant/recessive and pedigrees • alleles • Basic molecular genetics • DNA • RNA • proteins • Central Dogma • genes and gene structure • cells and chromosomes Principles of Genetics, Tamarin, Human Molecular Genetics 2, Strachan and Read
Outline • Basic Genomics • Genome • human • others • molecular genetics and genomics • clones, contigs, libraries
Mendelian Genetics • Humans have 22 pairs (diploid) of chromosomes • plus XX or XY
Genome Lexicon Overview (3 Bb) Adenine Thymine Guanine Cytosine ATGC purines AG pyrimidines CT www.ensembl.org
Mendelian Genetics • Rule of Segregation • offspring receive ONE allele (genetic material) from the pair of alleles possessed by BOTH parents (offspring receives 2 of 4 possible) • a gamete receives only one allele from the pair of alleles possessed by an organism • fertilization (union of 2 gametes) reestablishes the double number
Mendelian Genetics • Rule of Independent Assortment • alleles of one gene can segregate independently of alleles of other genes • (Linkage Analysis relies on the violation of Independent Assortment Rule)
mitosis • cell duplication (duplicate genetic material) • DNA synthesis (broad bean) • S phase (40%), Gap2 (25%), Mitosis (10%), Gap1 (25%) • DNA duplicates in S phase (engineering marvel) • mitosis • prophase, metaphase, anaphase, telophase • prophase • chromosomes coalesce (shorten, thicken – analogous to “packaging”) • each “chromosome” is now a pair of sister “chromatids” • other structural activities (formation of spindle – microtubules that is structural mechanism for separating homologous chromosomes, centrosome divides [individual centriole]) • nuclear membrane breaks down
mitosis • metaphase • microtubules attached to centromeres • homologous pairs are lined up • anaphase • physical separation of chromosome • microtubule consumed • telophase • sister chromatids are separated (end of anaphase) and pulled to opposite poles of cell • nuclear membranes reform • cell constricts and separates • chromosomes uncoil and protein synthesis resumes
Significance of mitosis • two daughter cells (“clones”?) • identical genetic material to parent cell (assuming perfect fidelity of copy mechanism)
meiosis • gamete formation (halving of genetic material, diploid to haploid) • but also duplicating (cell divides in 2 phases, meiosis I, and meiosis II) • prophase I • chromosomes more spread out (relative to mitosis) • identical pairs matched • homologous pairs match up (called a bivalent) • crossing over can now occur • as chromatids shorten, and thicken, they are called “tetrads” • “chiasmata” – regions where crossover occurs • virtually all tetrads form at least one “chiasma” • thought to stabilize the tetrad • see Holliday structure for “homologous recombination”
meiosis I • metaphase I • tetrads line up • microtubules attach to sister chromosomes • anaphase I • sister chromatids are pulled to the same pole (in mitosis, sister chromatids were pulled apart) • telophase I – cell divides • meiosis separates maternal and paternal chromosome pairs • meiosis II separates sister chromatids
meiosis II • metaphase II • sister pairs line up • anaphase II • sister pairs are pulled apart • telophase II • cell constricts and divides
meiosis • significance • four cells formed • diploid to haploid • randomness of chromosome separation • very large number of different chromosomal combinations • gamete can get either maternal, or paternal chromosome 223 = >> 8 million combinations • more combinations of alleles because of recombination • recombination – new arrangements of alleles due to either crossing over or by independent segregation of homologous pairs • 30,000 genes 230,023 combinations • each gamete receives only one chromosome (rule of segregation) • anaphase 1 – direction of separation independent of tetrads (rule of independent assortment)
Molecular Genetics • Not covered • chemical structure of nucleotides and DNA • molecular details of DNA duplication • continuous replication, discontinuous, Okazaki fragments, etc.