1 / 59

How Bioinformatics can change your life Basic Concepts of Bioinformatics

How Bioinformatics can change your life Basic Concepts of Bioinformatics. M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the BCS Wolerhampton Branch at the University of Wolverhampton http://www.geocities.com/mark_ai/. TOC. Introduction

torgny
Download Presentation

How Bioinformatics can change your life Basic Concepts of Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Bioinformatics can change your lifeBasic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the BCS Wolerhampton Branch at the University of Wolverhampton http://www.geocities.com/mark_ai/

  2. TOC • Introduction • Basic concepts in Molecular biology • Bioinformatics techniques • Areas in bioinformatics • Applications • Related Computer Technology • Conference in Glasgow • Acknowledgements • Reference M.Alroy Mascrenghe

  3. Introduction…… M.Alroy Mascrenghe

  4. 2000 • A Major event happened that was to change the course of human history • It was a joint British and American effort • nothing to do with IRAQ! • It was a race – who will complete first • Race Test – not whether they have taken drugs but whether they can produce them! • Human genome was sequenced M.Alroy Mascrenghe

  5. A Situ…somewhere in the near future • A virus –not ‘I love you’ virus- creates an epidemic • Geneticists and bioinformaticians role on their sleeves • Genetic material of the virus is compared with the existing base of known genetic material of other viruses • As the characteristics of the other viruses are known • From genetic material computer programs will derive the proteins necessary for the survival of the virus • When the protein (sequence and structure) is known then medicines can be designed M.Alroy Mascrenghe

  6. What is • The marriage between computer science and molecular biology • The algorithm and techniques of computer science are being used to solve the problems faced by molecular biologists • ‘Information technology applied to the management and analysis of biological data’ • Storage and Analysis are two of the important functions – bioinformaticians build tools for each M.Alroy Mascrenghe

  7. Biology Chemistry Computer Science Statistics Bioinformatics M.Alroy Mascrenghe

  8. What is.. • This is the age of the Information Technology • However storing info is nothing new • Information to the volume of Britannica Encyclopedia is stored in each of our cells • ‘Bioinformatics tries to determine what info is biologically important’ M.Alroy Mascrenghe

  9. Basics of Molecular Biology…. M.Alroy Mascrenghe

  10. DNA & Genes • DNA is where the genetic information is stored • Blonde hair and blue eyes are inherited by this • Gene - The basic unit of heredity • There are genes for characteristics i.e. a gene for blond hair etc • Genes contain the information as a sequence of nucleotides • Genes are abstract concepts – like longitude and latitudes in the sense that you cannot see them separately • Genes are made up of nucleotides M.Alroy Mascrenghe

  11. M.Alroy Mascrenghe

  12. Nucleotide (nt) • Each nt I made up of • Sugar • Phospate group • Base • The base it (nt) contains makes the only difference between one nt and the other • There are 4 different bases • G(uanine),A(denine),T(hymine),C(ytosine) • The information is in the order of nucleotide and the order is the info • Genes can be many thousands of nt long • The complete set of genetic instructions is called genomes M.Alroy Mascrenghe

  13. Chromosomes • DNA strings make chromosomes • Analogy • Letters - nt • Sentences – genes • Individual volumes of Britannica encyclopedia – chromosomes • All voles together - Genome M.Alroy Mascrenghe

  14. Double Helix • The DNA is a double helix • Each strand has complementary information • Each particular base in one strand is bonded with another particular base in the next strand • G - C • A - T • For example - • AATGC one strand • TTACG other strand M.Alroy Mascrenghe

  15. Proteins • Proteins are very important biological feature • Amino Acids make up the proteins • 20 different amino acids are there • The function of a protein is dependant on the order of the amino acids M.Alroy Mascrenghe

  16. Proteins… • The information required to make aa is stored in DNA • DNA sequence determines amino acid sequence • Amino Acid sequence determines protein structure • Protein structure determines protein function • A Substance called RNA is used to carry the Info stored in the DNA that in turn is used to make proteins • Storage - DNA • Information Transfer – RNA • RNA is the message boy! M.Alroy Mascrenghe

  17. Central dogma DNA transcription RNA Translation Protein RNA Polymerase Ribosomes M.Alroy Mascrenghe

  18. M.Alroy Mascrenghe

  19. Proteins….. • Since there are 20 amino acids to translate one nt cannot correspond to one aa, neither can it correspond as twos • So in triplet codes – codon – protein information is carried • The codons that do not correspond to a protein are stop codons – UAA, UAG, UGA (RNA has U instead of T) • Some codons are used as start codons - AUG as well as to code methionine M.Alroy Mascrenghe

  20. Protein Structure • Shows a wide variety as opposed to the DNA whose structure is uniform • X-ray crystallography or Nuclear Magnetic Resonance (NMR) is used to figure out the structure • Structure is related to the function or rather structure determines the function • Although proteins are created as a linear structure of aa chain they fold into 3 d structure. • If you stretch them and leave them they will go back to this structure – this is the native structure of a protein • Only in the native structure the proteins functions well • Even after the translation is over protein goes through some changes to its structure M.Alroy Mascrenghe

  21. Gene Expression • Gene Expression – the process of Transcripting a DNA and translating a RNA to make protein • Where do the genes begin in a chromosome? • How does the RNA identify the beginning of a gene to make a protein • A single nt cannot be taken to point out the beginning of a gene as they occur frequently • But a particular combination of a nucleotide can be • Promoter sequences – the order of nt which mark the beginning of a gene M.Alroy Mascrenghe

  22. Bioinformatics Techniques….. M.Alroy Mascrenghe

  23. Prediction and Pattern Recognition • The two main areas of bioinformatics are • Pattern recognition • ‘A particular sequence or structure has been seen before’ and that a particular characteristic can be associated with it • Prediction • From a sequence (what we know) we can predict the structure and function (what we don’t know) M.Alroy Mascrenghe

  24. Dot plots…. • Simple way of evaluating similarity between two sequences • In a graph one sequence is on one side the next on the other side • Where there are matches between the two sequences the graph is marked M.Alroy Mascrenghe

  25. M.Alroy Mascrenghe

  26. Alignments • A match for similarity between the characters of two or more sequences • Eg. • TTACTATA • TAGATA • There are so many ways to align the above two sequences • 1. • TTACTATA • TAGATA • 2. • TTACTATA • TAGATA • 3. • TTACTATA • TAGATA • So which one do we choose and on what basis? • Solution is to Provide a match score and mismatch score M.Alroy Mascrenghe

  27. Gaps • Introduce gaps and a penalty score for gaps • TTACTATA • T_A_GATA • In gap scores a single indel which is two characters long is preferred to two indels which are each one character long • However not all gaps are bad • TTGCAATCT • CAA • How do we align? • ---CAA--- • These gaps are not biologically significant • Semi Global Alignments M.Alroy Mascrenghe

  28. Scoring Matrix • For DNA/protein sequence alignment we create a matrix • If A and A score is 1 • If A and T score is -5 • If A and C score is -1 M.Alroy Mascrenghe

  29. Dynamic Programming • As the length of the query sequences increase and the difference of length between the two sequence also increases –more gaps has to be inserted in various places • We cannot perform an exhaustive search • Combinatorial explosion occurs – too much combinations to search for • Dynamic programming is a way of using heuristics to search in the most promising path M.Alroy Mascrenghe

  30. Databases • Sequence info is stored in databases • So that they can be manipulated easily • The db (next slide) are located at diff places • They exchange info on a daily basis so that they are up-to-date and are in sync • Primary db – sequence data M.Alroy Mascrenghe

  31. Major Primary DB

  32. Composite DB • As there are many db which one to search? Some are good in some aspects and weak in others? • Composite db is the answer – which has several db for its base data • Search on these db is indexed and streamlined so that the same stored sequence is not searched twice in different db M.Alroy Mascrenghe

  33. Composite DB • OWL has these as their primary db • SWISS PROT (top priority) • PIR • GenBank • NRL-3D M.Alroy Mascrenghe

  34. Secondary db • Store secondary structure info or results of searches of the primary db M.Alroy Mascrenghe

  35. Database Searches • We have sequenced and identified genes. So we know what they do • The sequences are stored in databases • So if we find a new gene in the human genome we compare it with the already found genes which are stored in the databases. • Since there are large number of databases we cannot do sequence alignment for each and every sequence • So heuristics must be used again. M.Alroy Mascrenghe

  36. Areas in Bioinformatics… M.Alroy Mascrenghe

  37. Genomics • Because of the multicellular structure, each cell type does gene expression in a different way –although each cell has the same content as far as the genetic • i.e. All the information for a liver cell to be a liver cell is also present on nose cell, so gene expression is the only thing that differentiates M.Alroy Mascrenghe

  38. Genomics - Finding Genes • Gene in sequence data – needle in a haystack • However as the needle is different from the haystack genes are not diff from the rest of the sequence data • Is whole array of nt we try to find and border mark a set o nt as a gene • This is one of the challenges of bioinformatics • Neural networks and dynamic programming are being employed M.Alroy Mascrenghe

  39. Proteomics • Proteome is the sum total of an organisms proteins • More difficult than genomics • 4 20 • Simple chemical makeup complex • Can duplicate can’t • We are entering into the ‘post genome era’ • Meaning much has been done with the Genes – not that it’s a over M.Alroy Mascrenghe

  40. Proteomics….. • The relationship between the RNA and the protein it codes are usually very different • After translation proteins do change • So aa sequence do not tell anything about the post translation changes • Proteins are not active until they are combined into a larger complex or moved to a relevant location inside or outside the cell • So aa only hint in these things • Also proteins must be handled more carefully in labs as they tend to change when in touch with an inappropriate material M.Alroy Mascrenghe

  41. Protein Structure Prediction • Is one of the biggest challenges of bioinformatics and esp. biochemistry • No algorithm is there now to consistently predict the structure of proteins M.Alroy Mascrenghe

  42. Structure Prediction methods • Comparative Modeling • Target proteins structure is compared with related proteins • Proteins with similar sequences are searched for structures M.Alroy Mascrenghe

  43. Phylogenetics • The taxonomical system reflects evolutionary relationships • Phylogenetics trees are things which reflect the evolutionary relationship thru a picture/graph • Rooted trees where there is only one ancestor • Un rooted trees just showing the relationship • Phylogenetic tree reconstruction algorithms are also an area of research M.Alroy Mascrenghe

  44. Applications…. M.Alroy Mascrenghe

  45. Medical Implications • Pharmacogenomics • Not all drugs work on all patients, some good drugs cause death in some patients • So by doing a gene analysis before the treatment the offensive drugs can be avoided • Also drugs which cause death to most can be used on a minority to whose genes that drug is well suited – volunteers wanted! • Customized treatment • Gene Therapy • Replace or supply the defective or missing gene • E.g: Insulin and Factor VIII or Haemophilia • BioWeapons (??) M.Alroy Mascrenghe

  46. Diagnosis of Disease • Diagnosis of disease • Identification of genes which cause the disease will help detect disease at early stage e.g. Huntington disease - • Symptoms – uncontrollable dance like movements, mental disturbance, personality changes and intellectual impairment • Death in 10-15 years • The gene responsible for the disease has been identified • Contains excessively repeated sections of CAG • So once analyzed the couple can be counseled M.Alroy Mascrenghe

  47. Drug Design • Can go up to 15yrs and $700million • One of the goals of bioinformatics is to reduce the time and cost involved with it. • The process • Discovery • Computational methods can improves this • Testing M.Alroy Mascrenghe

  48. Discovery Target identification • Identifying the molecule on which the germs relies for its survival • Then we develop another molecule i.e. drug which will bind to the target • So the germ will not be able to interact with the target. • Proteins are the most common targets M.Alroy Mascrenghe

  49. Discovery… • For example HIV produces HIV protease which is a protein and which in turn eat other proteins • This HIV protease has an active site where it binds to other molecules • So HIV drug will go and bind with that active site • Easily said than done! M.Alroy Mascrenghe

More Related