1 / 58

Bioinformatics Fall 2017 Syllabus and Project Overview

Get introduced to the Bioinformatics course for Fall 2017, including syllabus details, course objectives, assignments, and grading policies. Explore the class project on Chloroplast Coiled-coil Proteins in Arabidopsis thaliana. Learn about coiled-coil motifs and computational predictions in bioinformatics. Start your knowledge base contributions and prepare for your class project effectively.

ctinsley
Download Presentation

Bioinformatics Fall 2017 Syllabus and Project Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to Bioinformatics! Fall 2017

  2. Introductions Who am I? Who are you?

  3. 3 Fun Facts About Dr. Rose… 1) I’m a legal alien!

  4. 3 Fun Facts About Dr. Rose… 2) I don’t do mornings…

  5. 3 Fun Facts About Dr. Rose… 3) I was born to scribble illegible notes on blackboards

  6. Whitney Bond • Advisor:Dr. Karatan

  7. Garrett Bonds • Advisor: ?

  8. Kitt Franse • Advisor:Dr. Kinkel

  9. Katrina Godsey • Advisor: ?

  10. Katie Hahn • Advisor:Dr. Bouldin

  11. Young Jeon • Advisor: ?

  12. Sandipty Kayastha • Advisor:Dr. Rose

  13. Brewer Logan • Advisor:Dr. Seals

  14. Victor Nasr • Advisor:

  15. James Seward • Advisor:Dr. Mowa &Dr. Ahmed

  16. Megan Tennant • Advisor:Dr. Zerucha

  17. Adam Willits • Advisor: ?

  18. Erin Young • Advisor:Dr. Karatan

  19. Syllabus • AsULearn website • Contact information & Office Hours • Course Objectives • Textbook • Policies • Schedule • Assignments and Grading

  20. Homework etc. • First half of semester, one per week • Graded activities could be: • Take-home assignments on paper • Online activities and quizzes • Group work/quizzes in class • Midterm quiz (taken online in class) will be the "final exam" for the first half of the semester

  21. Knowledge Base • Each student should make the following contributions: • at least 5 tutorials for bioinformatics websites (e.g. programs, databases) or common tasks (e.g. finding genes in a DNA sequence) • at least 5 literature summaries relevant to bioinformatics (e.g. database publications, papers comparing different programs) • at least 2 feedback comments/suggestions on other people’s tutorials

  22. Knowledge Base • Deadlines: • start adding tutorials/bibliography entries this week, edit and make corrections as you become more familiar with the methods • complete your 10 entries by Fall Break • add feedback on other people’s tutorials any time during September and October • all contributions need to be complete by October 31 to be assessed for grading

  23. Class Project • Find something that interests you that can be investigated using bioinformatics • List of examples on AsULearn • Ideally tied to your masters project – talk to your advisor about ideas! • You should have a rough idea what you’re going to do by the end of September

  24. Chloroplast Coiled-coil Proteins in Arabidopsis thaliana

  25. Arabidopsis 2010 Project Arabidopsis thaliana – Mouse Ear Cress • Genome sequence completed in 2000 (Nature 408, 796-815) • 2010 project initiated by the National Science Foundation in 2001 • Goal: To determine the function of 25,000 genes in Arabidopsis thaliana by the year 2010 (Haha!) The Arabidopsis Information Resource http://www.arabidopsis.org/

  26. e g f f a g g d e c g d b f a f a g d g e c g d b f a (abcdefg)n Heptad repeat: a g c d a + d = hydrophobic residues e + g = charged residues f d e b a The Coiled-Coil Motif = two or more α-helices winding around each other in a supercoil Repeats = ideal for computational predictions!

  27. Database of known coiled-coil proteins (myosins, tropomyosins, intermediate filament proteins) amino acid frequency in the seven positions of the heptad repeat (e.g. occurrence of hydrophobic residues at positions a and d) Coiled-coil prediction using MultiCoil used to build a scoring table with “pairwise probabilities” 1.0 0.8 0.6 0.4 Coiled-coil Probability 0.2 coiled-coil scores computed for 30 residue-windows 0 AtMFP1 Computational Prediction of Coiled-Coil Structures MultiCoil – prediction of two-stranded and three-stranded coiled-coils Wolf et al.. 1997, Protein Sci. 6:1179-1189 (abcdefg)n Heptad repeat: cut-off

  28. 2 1 3 4 T.a. M.j. A.f. S.s. 5 6 7 M.g. M.t. B.s. 8 9 10 11 C.p. H.p. B.b. S.sp. 14 12 13 E.c. C.v. A.tu. 15 16 S.p. S.c. 17 18 19 20 D.m. C.e. M.m. H.s. 21 22 A.t. O.s. Percentages of Long Coiled-Coil Proteins Per Genome B C D E F A A = archaea S.s. 8 CC total 6 B = gram+ bacteria M.j. 4 % per genome 2 C = gram– bacteria B.s. 3 E.c. H.p. Long CC 2 S.sp. 1 D = yeast C.e. CC >=250 E = metazoa 0.3 0.2 S.p. 0.1 F = plants 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 H.s. A.t. Rose et al., 2005, BMC Evol Biol 5:66 images from Genome News Network (http://www.genomenewsnetwork.org/)

  29. Methods • Sequence data and pre-processing • Coiled-coil prediction and post-processing • Masking of coiled-coil domains • Sequence similarity comparison • Clustering analysis • Cluster alignments and phylogenetic tree generation

  30. How to get started • Review the literature, websites, and do some testing • Would it be a suitable/interesting subject for a bioinformatics class? • You should use at least three different publicly available methods in your project - can you find three? • Is it too big? Try to limit it in scope – this is a term project, not your masters thesis.

  31. Make a plan • Draft up a proposal for your project • Formulate your research question • Briefly explain its significance, including how it ties into your masters research (if it does) • List the websites/programs/steps you are going to use to answer your question • Estimate how much time it would take and propose a 2-3 week timeline for completion Proposals are due on October 10

  32. Write up your project • Work on your project! • October 17 – 31 • Turn in a paper drafton October 31: • include your rationale/objective for your project • include enough detail in the methods for another student to repeat your analysis and verify results • include any results you have produced • include interpretation of the results • it is ok for it to be a rough draft, but there should be enough material for another student to review

  33. Peer review paper drafts • Peer-review two projects • Test out the methods and verify results • Check for knowledge base tutorials for unusual methods • Provide feedback on the paper draft • Check guidelines & grading rubric • Peer reviews are due November 16

  34. Present your project • Prepare a 10-15 minute presentation on your project and pick a date • Nov. 28, Nov. 30 • Instructor & peer evaluations • Edit your paper based on reviews • Final papersare due December 5 (last day of class)

  35. Other Assignments • 2 short essays – due Oct. 19, Dec. 5 • Bioinformatics methods you are interested in • Case study presentations – November • Could be: special databases/methods not covered in class, research applications, bioinformatics in society/medicine/forensics, legislation related to bioinformatics etc. • Movie and Debates – Halloween, Nov. 16 • Fun stuff!

  36. What is Bioinformatics? Textbook Definition: Bioinformatics involves computer technology to store, retrieve, manipulate and distribute information related to biological macromolecules such as DNA, RNA, and proteins. = computational molecular biology

  37. Brief History of Bioinformatics • 1965 First protein sequence database (Dayhoff) • 1970 First sequence alignment algorithm (Needleman-Wunsch) • 1970s Protein Data Bank established • 1974 First protein structure prediction algorithm (Chou-Fasman) • 1980s GenBank established, FASTA, BLAST (Pearson, Altschul) • 1990 Start of the human genome project

  38. ACTGATCGATGACGTCAGCTAGTCGATCGAGCATGCTGACTGATCCGACTGTACGATATTAGCCGCTATCGACGCGATGCTACGATCGTACCGCGCTAGCTACATGACTCGATCGTACGATGATGCTAGCTAGCTCAGCGTGTGGCAGCATCGACGTCAGTCGACCGACATGTACGTGCATGCATGCTACGTCAGTCGATCGATCGATGCATCGATCAGTCGATGCTAGACTATCATCGTATATACTCGTACGTAGTGTGCTGACTAGACTAGTCGTACGGACAACTATATAATAACGTCGTGCATCGTAATATCGATCGTAGCTGGCGCTACGTAGATAGCTACTAGCTAGTCGTGCGTCGTACGCTACGTACGTAGTAGTGCAGCTACGTCGTCGACATACTTCATCGATGCTAGCATACGACTGCTGACGTACAGACTTTAATACGACTGCGCGATCCGAAAACGTACGTAACGAGATCGTAGCTGATCGATCCCCATCGATGCTAGTGTTCGATCGGGCGCGCATCGTAGCGAGCATGTAGCGATTACTGCTAGCTGCATGCATGCAATCGATCGTATGATCGATGACTGCTCCCCCACGCTAACATCAGCATGACCAGTACGACCACATCGGCAGATCAGCTCGTAGCTGATGCTATACTACGTCATCGTACACTACACTGATCCACTCGAATATCGACGTCCATGCGGCCCTCGTCTAGTTTACGTACGTAGCTGTAGTTAAACTAGCTGATCGTCGCTCTATGCTCTAGCACTGCTATCGATCGCATGACGTACGTCGATCGATCTACGTCGATTACGTGCATCGTATCATAGCTGCAGCAGCAATCTACGTAGCTAATTAACTGATCGATCGATCGTACCGTACAGTCGTAGATCGATCTGATGCTAGACTCGATGCTCAGTCGTCAGCAGATCTGATCAGTCGTAGTCGTCAGCATGCTGCATGCATCTGCTATCAATCACTGATCGATGACGTCAGCTAGTCGATCGAGCATGCTGACTGATCCGACTGTACGATATTAGCCGCTATCGACGCGATGCTACGATCGTACCGCGCTAGCTACATGACTCGATCGTACGATGATGCTAGCTAGCTCAGCGTGTGGCAGCATCGACGTCAGTCGACCGACATGTACGTGCATGCATGCTACGTCAGTCGATCGATCGATGCATCGATCAGTCGATGCTAGACTATCATCGTATATACTCGTACGTAGTGTGCTGACTAGACTAGTCGTACGGACAACTATATAATAACGTCGTGCATCGTAATATCGATCGTAGCTGGCGCTACGTAGATAGCTACTAGCTAGTCGTGCGTCGTACGCTACGTACGTAGTAGTGCAGCTACGTCGTCGACATACTTCATCGATGCTAGCATACGACTGCTGACGTACAGACTTTAATACGACTGCGCGATCCGAAAACGTACGTAACGAGATCGTAGCTGATCGATCCCCATCGATGCTAGTGTTCGATCGGGCGCGCATCGTAGCGAGCATGTAGCGATTACTGCTAGCTGCATGCATGCAATCGATCGTATGATCGATGACTGCTCCCCCACGCTAACATCAGCATGACCAGTACGACCACATCGGCAGATCAGCTCGTAGCTGATGCTATACTACGTCATCGTACACTACACTGATCCACTCGAATATCGACGTCCATGCGGCCCTCGTCTAGTTTACGTACGTAGCTGTAGTTAAACTAGCTGATCGTCGCTCTATGCTCTAGCACTGCTATCGATCGCATGACGTACGTCGATCGATCTACGTCGATTACGTGCATCGTATCATAGCTGCAGCAGCAATCTACGTAGCTAATTAACTGATCGATCGATCGTACCGTACAGTCGTAGATCGATCTGATGCTAGACTCGATGCTCAGTCGTCAGCAGATCTGATCAGTCGTAGTCGTCAGCATGCTGCATGCATCTGCTATCAATC ?

  39. Challenge #1 Where are the genes? When and where are they turned on? • Gene prediction • Regulatory element prediction • Expression profiling There are about 40,000 genes in the human genome.

  40. Challenge #2 What do the genes code for? What functions do the gene products have? • RNA – alternative splicing, editing etc. • Proteins – translation, post-translational modifications, activation/inactivation, stability • Proteomics analysis Alternative splicing turns 40,000 genes into 500,000 different messages. Protein modifications turn 500,000 messages into millions of different proteins.

  41. Challenge #3 How do these puzzle pieces interact? • Subcellular localizations • Interaction partners • Metabolic pathways • Regulatory loops and networks Protein function is context-sensitive! Millions of proteins create hundreds of millions of metabolic pathways and billions of different phenotypes through interactions.

  42. Goal of Bioinformatics Better understanding of cellular functions on a molecular level by: • Decoding genetic information • Predicting structure from sequence • Predicting function based on sequence and structure • Prediction of phenotype from genotype

  43. Gene prediction, code translation Protein sequence letters Structure/motif prediction, homologs syllables Secondary structure and motifs Targeting signals, conserved domains Functional features and activities words Protein expression (timing, location, amount) Interactions, modifications, networks sentences Functional significance for the cell/organism A story! From Sequence to Function DNA sequence code

  44. Bioinformatics Approach to Gene Function Discovery • the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms • Phylogenetics Database mining and sequence alignments:

  45. Things to learn in this course: • How to compare DNA and protein sequences • How to determine homology between sequences • Choosing the right sequence for the task • Choosing the right program for the task • How to interpret and evaluate sequence “hits” in database searches • How to find conserved domains in proteins • How to build phylogenetic trees and how to evaluate them • How to find promoters/regulatory sequences • How to visualize and predict protein structures • …

  46. Homework for Thursday • Familiarize yourself with the NCBI website • How do you navigate the NCBI resources? • How do you use PubMed to find literature? • How do you find DNA or protein sequences? • Thursday: • Sequence and Data Retrieval Exercises

  47. Bioinformatics Resources • The National Center for Biotechnology Information (NCBI): GenBank, BLAST and many more • The European Bioinformatics Institute (EBI): SWISS-PROT • The Protein Data Bank (PDB)

  48. Data Retrieval Most popular retrieval system: Entrez (www.ncbi.nlm.nih.gov/gquery): - developed and maintained by the NCBI

More Related