1 / 33

Introduction to Bioinformatics CPSC 265

Introduction to Bioinformatics CPSC 265. What is bioinformatics?. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and computer databases Genome informatics: making sense of the billions of base pairs of DNA

ajay
Download Presentation

Introduction to Bioinformatics CPSC 265

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics CPSC 265

  2. What is bioinformatics? • Interface of biology and computer science • Analysis of proteins, genes and genomes • using computer algorithms and • computer databases • Genome informatics: making • sense of the billions of base pairs of DNA • that are sequenced by genomics projects. • Mostly, it’s about protein and DNA sequences

  3. What do bioinformatics researchers do? Process large data outputs from new technologies Turn sequence data into whole-genome sequences Interpret genome sequences in terms of genes and their expression Find genes that control crop, animal traits, disease etc. Model evolution in genomes and proteins Model and predict 3D structures of proteins

  4. Growth of GenBank Base pairs of DNA (billions) Sequences (millions) Fig. 2.1 Page 17 1982 1986 1990 1994 1998 2002 Updated 8-12-04: >40b base pairs Year

  5. Cost of sequencing is falling exponentially

  6. DNA sequence analysis Could be like those from our experiment last week Or, a lot bigger, like the whole human genome. Some have chromatogram or “quality” data, some don’t.

  7. DNA makes RNA makes protein Hard to sequence RNA Very hard to sequence protein We can deduce RNA sequence from DNA (in bacteria, as easy as turning Ts to Us. In eukarya, need also to figure out where introns are) We can deduce protein sequence from RNA, using the Universal Genetic Code

  8. Conceptual Translation In a computer, take each set of three RNA letters, and then figure out what amino acid they code for. Professional biologists use the SINGLE LETTER CODE

  9. DNA potentially encodes six proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

  10. We call these READING FRAMES 5’ CAT CAA 5’ ATC AAT 5’ TCA ATG 5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

  11. All proteins start with M (ATG) TAG, TAA and TGA are all STOP This can help narrow it down 5’ CAT CAA 5’ ATC AAT 5’ TCA ATG 5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

  12. Once you know the sequence of the protein, you can figure out if it has been studied already. You may even be able to track down a likely structure

  13. There are three major public DNA databases EMBL GenBank DDBJ Housed at EBI European Bioinformatics Institute Housed at NCBI National Center for Biotechnology Information Housed in Japan Page 16

  14. www.ncbi.nlm.nih.gov

  15. PubMed is… • National Library of Medicine's search service • 12 million citations in MEDLINE • links to participating online journals • PubMed tutorial (via “Education” on side bar)

  16. BLAST is… • Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 80,000 searches per day

  17. TaxBrowser is… • browser for the major divisions of living organisms • (archaea, bacteria, eukaryota, viruses) • taxonomy information such as genetic codes • molecular data on extinct organisms

  18. From the NCBI home page, type “lectin” and hit “Search”

  19. PubMed PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966. Page 35

  20. BLAST • BLAST looks for similarity between your favorite • query sequence and other known protein or DNA • sequences. • Applications include • identifying homologs (orthologs and paralogs) • discovering new genes or proteins • discovering variants of genes or proteins • investigating expressed sequence tags (ESTs) • exploring protein structure and function page 88

  21. Four components to a BLAST search (1) Obtain the sequence (query) (2) Select the BLAST program (3) Enter sequence (4) Choose optional parameters Then click “BLAST” page 88

  22. Step 2: Choose the BLAST program blastn (nucleotide BLAST) blastp (protein BLAST) tblastn (translated BLAST) blastx (translated BLAST) tblastx (translated BLAST)

  23. DNA potentially encodes six proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

  24. Choose the BLAST program ProgramInputDatabase 1 blastnDNADNA 1 blastpproteinprotein 6 blastxDNAprotein 6 tblastnproteinDNA 36 tblastxDNADNA

  25. Step 3: choose the database nr = non-redundant protein (most general database) Also can search specific organisms and DNA rather than protein (although ALL DNA is going to take a long time…)

  26. filtering

  27. So now you can • Find any sequence in the database • Find relevant publications • Match DNA to protein sequence • Find database matches to DNA or protein • Find conserved domains in protein • Find the 3D structure of a protein …Without doing any experiments!

More Related