1.13k likes | 1.64k Views
Introduction to Bioinformatics. Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 G ö ttingen, March 2004. Introduction to Bioinformatics. Bioinformatics in G ö ttingen: Dep. of Bioinformatics (UKG), Edgar Wingender
E N D
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March 2004
Introduction to Bioinformatics Bioinformatics in Göttingen: • Dep. of Bioinformatics (UKG), Edgar Wingender • Dep. of Bioinformatics (IMG), BM • Inst. Num. and Applied Mathematics, Stephan Waack • Dep. of Genetics (Hans Fritz, IMG), Rainer Merkl
Introduction to Bioinformatics Definition: Bioinformatics = development and application of software tools for Molecular Biology
Bioinformatics: Topics: • Sequence Analysis (Gene finding …) • Structure Analysis (RNA, Protein) • Gene Expression Analysis • Metabolic Pathways, Virtual Cell
Bioinformatics: Areas of work: • Application of software tools for data analysis in (Molecular) Biology • Computing infrastructure, database development, support • Development of algorithms and software tools
Information flow in the cell Idea: Sequence -> Structure -> Function
Information flow in the cell • Lots of data available at the sequence level • Fewer data at the structure and function level
Topics of lecture: • Data bases SwissProt, GenBank • Pair-wise sequence comparison • Data base searching • Multiple sequence alignment • Gene prediction
Protein data bases • Sanger and Tuppy: protein-sequencing methods (1951) • Margaret Dayhoff: Atlas of Protein Sequence and Structure (1972); later: Protein Identification Resource (PIR) as international collaboration (a) Organize proteins into families; (b) Amino acid substitution frequencies • Amos Bairoch: SwissProt (1986)
DNA data bases • Maxam and Gilbert; Sanger: DNA sequencing methods (1977) • GenBank DNA data base (1979), now run by NCBI. • Collaboration with EMBL (1982), DDBJ (1984) • Translated DNA sequences stored in protein data bases (PIR, trEMBL)
Most important tool for sequence analysis: • Sequence comparison
The dot plot Y Q EW T Y I V A R E A Q Y E C I V M R E Q Y
The dot plot Y Q EW T Y I V A R E A Q Y E C I V M R E Q Y
The dot plot Y Q EW T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X
The dot plot Y Q EW T Y I V A R E A Q Y E C IX VX M R X E X X X Q X X Y X X
The dot plot Y Q EW T Y I VA R E A Q Y E C IX VX M RX EX X X QX X YX X
The dot plot Y Q EW T Y I V A R E A Q Y E C I X V X M R X EX X X QX X YX X
The dot plot Y Q EW T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X
The dot plot Y Q EW T Y Q E V R E Y Q E I C I X V X M R YX X X QX X X E X X X X
The dot plot Advantages: • Various types of similarity detectable (repeats, inversions) • Useful for large-scale analysis
Pair-wise sequence alignment Evolutionary or structurally related sequences: • alignment possible Sequence homologies represented by inserting gaps
Pair-wise sequence alignment T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X
Pair-wise sequence alignment T Y I V A R E A Q Y E C IX VX M RX E X X Q X YX X
Pair-wise sequence alignment T Y I V A R E A Q Y E C IX VX M RX E X X Q X YX X
Pair-wise sequence alignment T Y I V A R E A Q Y E C IX VX M RX E X X Q X YX X
Pair-wise sequence alignment T Y I VAR EAQ Y E C I VMR E Q Y
Pair-wise sequence alignment T Y I VAR EAQ Y E - C I VMR E - Q Y –
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Global alignment: sequences aligned over the entire length
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Basic task: Find best alignment of two sequences
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Basic task: Find best alignment of two sequences = alignment that reflects structural and evolutionary relations
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Questions: • What is a good alignment? • How to find the best alignment?
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Problem: • Astronomical number of possible alignments
Pair-wise sequence alignment T Y I V A R E A Q Y E C I - V M R E - Q Y – Problem: • Astronomical number of possible alignments
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Problem: • Astronomical number of possible alignments • Stupid computer has to find out: which alignment is best ??
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – First (simplified) rules: • Minimize number of mismatches • Maximize number of matches
Pair-wise sequence alignment T Y I V A R E A Q Y E C I - V M R E - Q Y – First (simplified) rules: • Minimize number of mismatches • Maximize number of matches
Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – First (simplified) rules: • Minimize number of mismatches • Maximize number of matches