380 likes | 568 Views
Introduction to Bioinformatics. Alexandra M Schnoes Univ. California San Francisco Alexandra.Schnoes@ucsf.edu. What is Bioinformatics?. Intersection of Biology and Computers Broad field Often means different things to different people Personal Definition:
E N D
Introduction to Bioinformatics Alexandra M Schnoes Univ. California San Francisco Alexandra.Schnoes@ucsf.edu
What is Bioinformatics? • Intersection of Biology and Computers • Broad field • Often means different things to different people • Personal Definition: • The utilization of computation for biological investigation and discovery—the process by which you unlock the biological world through the use of computers.
What does one do in Bioinformatics?(a small sample) • Our Lab: Understanding Protein (Enzyme) Function • dsafd • dsafd ?
What does one do in Bioinformatics?(a small sample) • Discover new drug targets—computational docking Atreya, C. E. et al. J. Biol. Chem. 2003;278:14092-14100 Shoichet, B. K. Nature. 2004;432:862-865
What does one do in Bioinformatics?(a small sample) • Systems Biology sbw.kgi.edu/ www.sbi.uni-rostock.de/ research.html
This lab: Nucleotide & Protein Informatics • Sequence analysis • Finding similar sequences • Multiple sequence alignment • Phylogenetic analysis
Process of Evolution • Sequences change due to • Mutation • Insertion • Deletion
Use Evolutionary Principles to Analyze Sequences • If sequence A and sequence B are similar • A and B evolutionarily related • If sequence A, B and C are all similar but A and B are more similar than A and C and B and C. • A and B are more closely evolutionarily related to each other than to C
Extremely Powerful Idea • Start with unknown sequence • Find what the unknown is similar to • Use information about the known to make predictions about the unknown
How do you know when sequences are similar? • Align two sequences together and score their similarity TASSWSYIVE TATSFSYLVG • Use substitution matrices to score the alignment
Substitution Matrices Give a Score for Each Mutation • Many different matrices available • Blosum matrices standard in the field Blosum 62 Scoring matrix http://www.carverlab.org/images/
Scoring: Add up the positional Scores • Score of 30 TASSWSYIVE TATSFSYLVG TASSWSYIVE TATSFSYLVG • Score of 1
Additional issues… • Gaps (insertions/deletions) • Have scoring penalties for opening and continuing a gap TASSWSYIVETASSWSYIVE TATSFLVGTATSF--LVG
How do we find similar sequences? • Start at the National Center for Biotechnology Information • http://www.ncbi.nlm.nih.gov/
How do we find similar sequences? • Nucleotide Sequence Databases
How do we find similar sequences? • Protein Sequence Databases
How do we find similar sequences? • Similarity Search: BLAST • Basic Local Alignment Search Tool
BLAST is very quick but … • Only local alignments • Alignments aren’t great • Only pair-wise alignments
Want better alignments … • Multiple alignment • Multiple sequences • Better signal to noise • More Sequences = Better alignment • More accurate reflection of evolution • ClustalW • Commonly used • Easy to use
Use the Alignment to Calculate Evolutionary Distances • See ‘how close’ sequences are to each other • Best way to tell what is ‘most similar’ • Can calculate simple tree from clustalW Taubenberger et al., Nature: 437, 889-893, 2005
Caveats! • In reality • Sequences (even parts of sequences) can evolve at different rates • Don’t have a good understanding of sequence and function • High sequence identity does not always mean the same function • Getting good alignments and good trees can be very hard
Bioinformatics: Sequence Analysis • Start with unknown sequence • Find similar sequences • Create alignment • Create phylogenetic tree • Use information about knowns to make predictions about unknown
Mini Virus Intro— • Often considered ‘not alive’ • Extremely small (much smaller than a cell) • Cellular parasites • Has a genome but can only reproduce inside a host cell
Different Viruses • RNA & DNA viruses • Both single and double-stranded
Different Viruses • RNA & DNA viruses • Both single and double-stranded • Influenza Virus
Influenza Virus (flu) • Small genome—8 RNA molecules • Evolves quickly– genetic drift, antigenic shift
Influenza Virus (flu) • Sequencing Reverse Transcriptase Sequencing Genomic Nucleotide Sequence DNA
Influenza Pandemics • 1918 Flu • Killed from 50-100 Mil. people worldwide • Considered to be one of the most deadly pandemics • Killed many of the young and healthy • Influenza A, Type H1N1 • Thought to have derived from Avian Influenza • Recently reconstituted from recovered human samples • Considerable ethical debate
Avian Influenza • Current fear of pandemic • High mortality rate (including young and healthy) • Current concern is Influenza A, Type H5N1 • Still only transmitted by contact with birds • Is now in Asia and Eastern and Western Europe
This lab: Nucleotide & Protein Informatics • Sequence analysis • Finding similar sequences • Multiple sequence alignment • Phylogenetic analysis