60 likes | 251 Views
Bioinformatic PhD. course . Bioinformatics Xavier Messeguer Peypoch (http://www.lsi.upc.es/~alggen) LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona Supercomputing Center Universitat Politècnica de Catalunya. Contents . 1. Biological introduction .
E N D
Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch (http://www.lsi.upc.es/~alggen) LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona Supercomputing Center Universitat Politècnica de Catalunya
Contents 1. Biological introduction 2. Comparison of short sequences ( up to 10.000bps) Dot Matrix Pairwise align. Multiple align. Hash alg. 3. Comparison of large sequences ( more that 10.000bps) Data structures Suffix trees MUMs 4. String matching Exact Extended Approximate 5. Sequence assembly 6. Projects: PROMO, MREPATT, …
Pairwise alignment Recall that with two strings of length n S2 C A -1 __ S1 O(n2) 22-1 1 And with 3 strings?
Multiple alignment S2 S3 S1 3 2 What happens with three strings? Let n be their length, then the cost becomes A C A -1 __ O(n3) 23-1 And with k strings? O(nk 2k k2)
Multiple alignment programs Multi-alignment programs: • Malig (Progressive alignment) http://alggen.lsi.upc.edu • Clustal (Progressive alignment) http://www.ebi.ac.uk/clustalw • TCoffee (Progressive alignment + data bases) http://igs-server.cnr-mrs.fr/Tcoffee_cgi/index.cgi • HMM (Hidden Markov Models)
Multiple progressive alignment Run alggen-program RunMalig (Progressive alignment) http://alggen.lsi.upc.edu Run Clustal (Progressive alignment) http://www.ebi.ac.uk/clustalw