160 likes | 349 Views
Bioinformatika pod zimní škola výpočetní chemie, Praha 2006. Jiří Vondrášek Ústav organické chemie a biochemie AV ČR. bioinformatika. Informatika nad biologickými molekulami (daty). Bioinformatika extrahuje molekulární informační systém pro molekulární biologii.
E N D
Bioinformatikapodzimní škola výpočetní chemie, Praha 2006 Jiří Vondrášek Ústav organické chemie a biochemie AV ČR
bioinformatika Informatika nad biologickými molekulami (daty). Bioinformatika extrahuje molekulární informační systém pro molekulární biologii. Bioinformatika je konceptualizovaná molekulární biologie (ve smyslu fyzikálně chemickém) na níž je aplikována informatika (odvozená od matematické informatiky a statistiky). Aplikace: teorie biotechnologie farmacie medicína genetické inženýrství
bioinformatika sekvence kontigy geny struktura funkce metabolismus (vše) strukturovaná data (databáze), hypotézy experimentální data počítačová analýza
velikosti genomů Mycoplasma genitalium 0.58 Mbp Escherichia coli 4.6 Mbp Saccharomyces cerevisiae 16 chr. 11.2 Mbp Arabidopsis thaliana 5 chr. 115.4 Mbp Drosophila melanogaster 5 chr. ~137.0 Mbp Homo sapiens 24 chr. ~ 3.3 Gbp
replikace transkripce translace DNA RNA protein reverzní transkripce informace funkce centrální dogma molekulární genetiky
DNA evoluční vztahy mezi geny a organizmy funkce geny struktura proteiny
sekvence >jana (4797 nt) GAATTCGCCGCGGGGCTGCGCATCACCGATGCCGCCACCATCGAGATCGTCGAGATGGTACTGGCCGGCTCGATCAACAAGCAGCTCGTCGGCTACATCA ACGAAGCGGGCGGCAAGGCCGTCGGCCTGTGCGGCAAGGACGGCAACATGGTGTCCGCCACCAAGGCGACGCGCACCATGGTCGATCCGGATTCGCGGAT CGAAGAGGTGATCGACCTCGGTTTCGTCGGCGAGCCGGAGAAGGTCGACCTCACCCTGCTCAACCAGCTGATCGGCCACGAGTTGATCCCGGTGCTGGCG CCGCTGGCGACCTCCGCGTCGGGCCAGACCTTCAACGTCAATGCCGACACCTTTGCAGGTGCGGTTGCCGGTGCGCTGCGGGCCAAGCGCCTGCTGCTGC TGACCGACGTGCCGGGCGTGCTCGACCAGAACAAGAAGCTGATCCCCGAACTGTCGATCAAGGATGCCCGCAAGCTGATCGCAGACGGCACCATCTCGGG CGGCATGATCCCCAAGGTCGAGACCTGCATCTACGCGCTCGAACAGGGCGTCGAAGGCGTCGTCATCCTCGACGGCAAGGTCCCGCACGCAGTGCTGCTC GAATTGTTCACCAACCAGGGCACCGGCACGCTGATCCACAAGTGATGCGAGGCTGCGGCGACAACATCCGTCATGGCCGGGCTCGTCCCGGCCATCCACG TCTTTCCGGCGGTTTTCTCAGCAAGACGTGGATGCCCGGCACAAGGCCGGGCATGACGGGGTGGAGATCGCGCGCCCTCGCCGCCATTGTCACCACCCTC GCCCTCACCTCCGCCGCCCACGCCGACCTCAAGCTCTGCAACCGCATGAGCTACGTGGTCGAGACGGCGATCGGGGTCGATTCCAACGGCACCACCGCCT CGCGCGGATGGCTGCGGATTGATCCGGCGCAATGCCGGGTCGTGGTGCAAGGCGCGCTCAACGCCGACCGCATCATGCTGAATGCCCGCGCGCTGGCGGT GTACGGCGTCTCGCCGCTGCCGCAGAACGGCACTGACCGGCTGTGCATTGCCGAAGACAATTTCGTCATCGCCGCCGCGCGGCAATGCCGCGGCGGCCAA ACGCTCGCCGCCTTCACCGAGATCAAGCCCACCGACACCGAGGACGGCAACAAGATCGCTTATCTGGCGGAAGACTCCGGCTACGACGACGAACAGGCCA AACTCGCCGCGATCCAGCGGCTGCTGGTGATCGCCGGTTACGACGCCTCGCCGATCGACGGCGTCGACGGCCCGAAGACGCAGGCCGCGCTGTCCGCCTT CCTCAAGAGCCGAGGCCTGAAGCCCGAGATCGTCGATGCGCCGGATTTCTTCGACGTGATGATCAAGGCAGTGCAGCAGCCGTCCGGCAGCGGGCTGACC TGGTGCAACGACACCAAGTACAAGATCATGGCGGCCGTCGGCGAAGACGACGGCAAGACTGTCACCAGCCGCGGCTGGTACGGTGTTGCGCCCGGCCAAT GCCTGCGCCCCGACCTCGGCGCACAGCCGAAGCGGGTGTTCAGCTTCGCCGAAGCGGTCGACGGCAGCGGCAGGCCGGTGACCATCAAGGGCCGTGCGCT GAACTGGGGCGGCGGCGTGACGCTGTGCACGCGTGACAGCAAGTTCGAGATCGGCGAGCAAGGCGATTGCGCGGCGCGCGGCCTCGCCGCCACCGGCTTC GCCGCCGTCGATCTCAGTAGCGGCAAGACATTGAGGTTGTCCGCCCCATGATGCAGCTCGGCAAACGCGGCTTCGATCACGTCGAGACCTGGGTGTTCGA TCTCGACAACACGCTGTACCCGCATCACCTCAACCTATGGCAGCAGGTCGATGCGCGGATCCGCGACTTCGTCGCCGACTGGCTGAAGGTTTCGCCGGAA GAAGCCTTCCGTATCCAGAAGGATTACTACAAGCGCTACGGCACCACGATGCGCGGGATGATGACCGAGCACGGCGTTCACGCCGACGACTACCTGGCTT ATGTCCACGCCATCGACCATTCGCCGCTGCAGCCGAATCCGGCGATGGGCGATGCGATCGAGCGACTGCCGGGCCGCAAGCTGATCCTGACCAACGGCTC GACCGCCCATGCGGGCAAGGTGCTGGAGCGGCTCGGCATCGGCCATCATTTCGAGGCGGTGTTCGACATCATTGCGGCCGACCTCGAGCCGAAGCCGGCG CCGCAGACCTACCGCCGTTTTCTCGATCGCCATGGTGTCGACCCGGCCCGCGCCGCGATGTTCGAAGACCTCGCCCGCAACCTCACCGTGCCGCACCAGC TCGGCATGACCACCGTGCTGGTGGTGCCTGACGATAGCCAGGACGTGGTCCGCGAAGATTGGGAGCTTGAAGGCCGCGACGCCGCCCACGTCGATCACGT GACTGATGATTTGACAGGGTTCTTGGGGAAGCTGAGTTCGCTGTAGGCCGGGGACGCCTCCCAAGCGTCAATCGTCATCGCCGCCGGATGCAAGGCGGCT AGGTATTGCGGAGCGCTCGCGATCTTCCGTCCAATGCCCTGGGATACTGGATCGCCCGGACGAGCCGGGCGACGACGTTGAAGAGAGATGACGTGGCGTC ACCACATCCCCCGCCGTCATCGCCCGCGCAGGCGGGCGATGACTTGGCGGACGGGGCGGCGCCTTGACTCCGACCCGGCGAATCCGGACAACACTCCGCA AAACTCTCCCTGAAATCAGCCTCCCAAGGACCCGTCGATGCCGCTCACCGCCCTGGAATCTACCATCAACGCCGCTTTCGACGCGCGCGACACCGTTACC GCGGCGACGCAGGGCGAGATTCGTCAGGCCGTCGAGGATGCGCTCGATCTGCTCGACCAGGGCAAGGTGCGGGTGGCGCGGCGCGACGACTCCGGCGCCT GGACGGTCAATCAGTGGCTGAAGAAAGCAGTGCTGCTGTCGTTCCGGCTCAACGACATGGGCGTGATCGCCGGCGGCCCGGGCGGCGCCAACTGGTGGGA CAAGGTGCCGTCGAAGTTCGAGGGCTGGGGTGAGAACCGCTTCCGCGAGGCCGGCTTCCGCGCCGTGCCGGGCCGATCGTCGCGCGTCGGCCTTTATCGC CAAGACGCGGTACTGATCCGTCCTTCGTCAATCTCGGCGCTTACGTCGATGAAAGCACCATGGTCGAACACCTGGGCGACCGTCGGCTCCTGCGCCCAGA TCGGCAAGCGCGTGCACATCTCCGGCGGTGCCGGCATCGGCGGCGTGCTCGAGCCGCTGCAGGCCGGCCCGGTGATCATCGAGGACGACTGCTTCATCGG CGCCCGCTCCGAAGTCGCCGAAGGCGTGATCGTGCGCAAGGGTGCGGTGCTGGCGATGGGCGTTTTCCTCGGCGCCTCGACCAAGATCGTCGACCGCGAG ACCGGCGAAATCTTCGTCGGCGAAGTGCCGGAATATGCCGTGCTGGTGCCCGGCACCCTGCCCGGCAAGCCGATGAAGAACGGCGCCCCCGGCCCAGCCA CCGCCTGCGCGGTGATCGTCAAGCGCGTCGACGAGCGCACCCGTTCCAAGACCTCGATCAACGAATTGCTGCGGGACTGACACCTGTAGGAGGCGCGAAT GGACTGGACCACGCTGTTCTTCAGCTTTCGAGGTCGGATCAATCGCGCCAAATACTGGCTGGTCGGACTGATCTACGTCGCCGCCTGGATGG ….
všeobecná analýza Co lze v DNA najít? strukturní a organizační elementy evoluční vztahy geny promotory a další řídící elementy „cizí“ DNA
geny Jak najít geny?
geny Leucin Rhodobacter capsulatus antikodón počet % CUA 3 <1 CUC 119 16 CUG 458 60 CUU 157 20 UUA 0 0 UUG 27 3 Escherichia coli % 4 9 52 10 11 13
alignment Jaké proteiny geny kódují?
alignment Dotplot 1:1 Dot plot SSEARCH BLITZ SSEARCH ftp://ftp.virginia.edu/pub/fasta BLITZ ... http://www.ebi.ac.uk 1:n FASTA BLAST n:n PSI-BLAST HMMER ClustalW MultAlign n
alignment 1:1 Dot plot SSEARCH BLITZ 1:n FASTA BLAST FASTA http://www.ebi.ac.uk BLAST http://ncbi.nlm.nih.gov/blast n:n PSI-BLAST HMMER ClustalW MultAlign n
alignment 1:1 Dot plot SSEARCH BLITZ 1:n FASTA BLAST n:n PSI-BLAST HMMER PSI-BLAST http://ncbi.nlm.nih.gov HMMER ClustalW MultAlign ClustalW MultAlign n