1 / 16

Introductory Biological Sequence Analysis Through Spreadsheets

Introductory Biological Sequence Analysis Through Spreadsheets. Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee, WI. Teaching Mathematics to Students of Biology.

palma
Download Presentation

Introductory Biological Sequence Analysis Through Spreadsheets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee, WI ICTCM 2000

  2. Teaching Mathematics to Students of Biology • Need to make the math in the courses correlate with math that needed in that discipline • The most important “math” needed is statistics • The molecular biology revolution in biology presents data in a form in which calculus has little impact (sequences of letters) ICTCM 2000

  3. The Nature of Biological Sequence Data • Primary structure of DNA, RNA, and proteins are sequences of letters -- 4 letters in the case of DNA (ATGC) and RNA (AUGC) and 20 letters representing the sequence of amino acids which makes up a protein • Secondary and Tertiary structures (bending, folding and twisting) of structures determines function -- hints seen through primary structure ICTCM 2000

  4. Use of Spreadsheets in this setting • Commonly found and used in biological labs for data acquisition, storage and organization, and data analysis • Commonly present on student computers and computer labs • Unlike calculators -- able to handle data sets typical of “real world” applications • R.F. Murphy at CMU has developed a set of worksheets for sequence analysis ICTCM 2000

  5. Meaningful Questions & Problems 1. Measuring the similarity between two strings -- “alignment” or “homology” 2. Finding instances of a pattern in a string 3. Describing the composition and properties of a string 4. Graphing the evolutionary process and construction of phylogenetic trees ICTCM 2000

  6. Measuring the Similarity between Strings • Given a gene -- suggest the function of the protein coded for by finding a similar sequence (possibly in another species) • Simple homology involves assigning a “1” for agreement and “0” for nonagreement at each site. Then sum over all sites • Homology is the fraction of the highest possible score, in % ICTCM 2000

  7. Spreadsheet #1 Simple Homology ICTCM 2000

  8. Spreadsheet #1 (cont.)comparing random sequences ICTCM 2000

  9. Finding Instances of a Particular Pattern in a String • The process of locating genes involves locating regions of the DNA sequences that contain patterns which resemble those of known genes • Identifying sites on DNA where one of the restriction enzymes can cleave DNA -- Also of interest is size of the fragments that result • Identify regions of RNA which correspond to particular features (e.g. loops) which may be splice sites ICTCM 2000

  10. Describing the Composition and Properties of a String • Counts of frequencies of particular letters due to their properties (e.g. regions rich in G&C or A&T in DNA) • Properties of proteins (e.g. charge or hydrophobicity) which depend on the nature and frequencies of the particular amino acids ICTCM 2000

  11. Spreadsheet #2 Hydropathy Plot ICTCM 2000

  12. Spreadsheet #2 (Cont.) ICTCM 2000

  13. Graphing Evolution and Phylogenetic Trees • Evolutionary distance between two DNA sequences used to determine the process of the changes in the sequences over time (e.g. the evolution of HIV or the flu viruses) • Trees constructed to express the relationship between related sequences -- distance in the tree a monotone function of homology ICTCM 2000

  14. Spreadsheet #3 Mutation & Evolution ICTCM 2000

  15. Spreadsheet #3 (cont.) To study the evolution of a sequence, we randomly pick a site for mutation, then change its letter ICTCM 2000

  16. Conclusion • Use of a spreadsheet makes possible an experimental approach to introducing the mathematics of sequence analysis • The use of spreadsheets makes possible the use of real-world data and presents the computational tool in a meaningful context • The importance of the topics to all educated individuals suggests that the topics be included in many liberal arts math courses ICTCM 2000

More Related