1 / 19

Challenges for computer science as a part of Systems Biology

Challenges for computer science as a part of Systems Biology. Benno Schwikowski Institute for Systems Biology Seattle, WA. Towards integrative models. Species. Conditions/time. Genes. DNA Sequence Genomic locus Domain content Intron/exon structure Regulatory motifs

emilyadams
Download Presentation

Challenges for computer science as a part of Systems Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges for computer scienceas a part of Systems Biology Benno SchwikowskiInstitute for Systems BiologySeattle, WA

  2. Towards integrative models Species Conditions/time Genes • DNA • Sequence • Genomic locus • Domain content • Intron/exon structure • Regulatory motifs • Chemical modifications • SNPs - Splice variants- Accessibility • Variation • mRNA • Abundance- Regulatory information- initiation/ termination signals • Proteininteraction • Interaction partner • Direct/indirect- Affinity • Effect • Protein- Abundance- State • Localization • 3D structure • Functional characterization • Half-life • Active sites • Biochemical function- Cellular role Benno Schwikowski

  3. Challenge: Integrative models …Across genes and proteins: Many genes involved (e.g., multifactorial diseases) • …Across model systems: Lack of experimental platforms in target system • …Across levels of biological organization(e.g. gene regulatory processes involving phosphorylation) • …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements • …Across timescales Benno Schwikowski

  4. Challenge: Capturing evolutionary constraints DNA RNA Proteins Modules Organelles Cells Organs Individuals Populations Ecologies "Nothing in biology makes sense except in the light of evolution.“ Theodosius Dobzhansky Benno Schwikowski

  5. Challenge: Which tools and experiments to use

  6. Challenge: Choosing experiments • Machine LearningDetermine most likely classification/parameterization on the basis of a randomly sampled dataset • Active LearningAllow an algorithm to query selected data points, using the result of previous queries. Benno Schwikowski

  7. Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998 Benno Schwikowski

  8. Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998 Benno Schwikowski

  9. Challenge: Develop models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... Benno Schwikowski

  10. CLUSTALW(1.74) multiple sequence alignment Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA Pea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG Benno Schwikowski

  11. Challenge: Developing models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGT ACGT ACGT ACGG Parsimony score: 1 J. Comp Biol. 2002 Benno Schwikowski

  12. An Exact Algorithm(generalizing Sankoff and Rousseau 1975) Wu [s] =  min ( Wv [t] + d(s, t) ) v:child t of u … ACGG: + ACGT: 0 ... …ACGG: ACGT :0... …ACGG:ACGT :0... …ACGG:ACGT :0 ... … ACGG: 1 ACGT: 0 ... … ACGG: 2 ACGT: 1... … ACGG: 1ACGT: 1 ... … ACGG: 0ACGT: 2 ... Wu [s] = best parsimony score for subtree rooted at node u, if u is labeled with string s. 4k entries AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 0 ACGT: +... J. Comp Biol. 2002 Benno Schwikowski

  13. What are good challenges to tackle? • Biological/medical questions asked • Experimental technologies to acquire a lot of relevant data • Available datasets with a formalized notion of “data quality” Benno Schwikowski

  14. Memory complexity: O(k 42k ) per node Average sequence length Number of species Time complexity: Total time O(nk(42k + l )) Motif length J. Comp Biol. 2002 Benno Schwikowski

  15. Technology-based challenges:Universal DNA Tag Systems • Existing applications in high-throughput technologies • Universal DNA arrays • Padlock probes • LYNX mRNA technology

  16. Formalization Define: weight(A/T)=1, weight(C/G)=2 weight(AACTTG) = 1+1+2+1+1+2 = 8  melting temperature (AACTTG) = 2·weight l-ucode problemGiven two integers, l < u, find the largestset of tags such that Each tag has weight uEach string of weight  l occurs at most once J. Comp Biol. 2000 & 2003

  17. Challenge: Visualization Andrea Weston et al.@ ISB & Cytoscape Benno Schwikowski

  18. Challenge: Visualization Cytoscape, pre-release 2.0 Benno Schwikowski

  19. A computer scientist’s perspective “Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.” Donald Knuth, 7 Dec 1993 Donald Knuth Benno Schwikowski

More Related