1 / 27

BIOINFORMATICS AND GENE DISCOVERY

UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL. Bioinformatics Tutorials. BIOINFORMATICS AND GENE DISCOVERY. Iosif Vaisman. 1998. From genes to proteins. From genes to proteins. DNA. PROMOTER ELEMENTS. TRANSCRIPTION. RNA. SPLICE SITES. SPLICING. mRNA. START CODON. STOP CODON.

duard
Download Presentation

BIOINFORMATICS AND GENE DISCOVERY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials BIOINFORMATICSANDGENE DISCOVERY Iosif Vaisman 1998

  2. From genes to proteins

  3. From genes to proteins DNA PROMOTER ELEMENTS TRANSCRIPTION RNA SPLICE SITES SPLICING mRNA START CODON STOP CODON TRANSLATION PROTEIN

  4. From genes to proteins

  5. Comparative Sequence Sizes • Yeast chromosome 3 350,000 • Escherichia coli (bacterium) genome 4,600,000 • Largest yeast chromosome now mapped 5,800,000 • Entire yeast genome 15,000,000 • Smallest human chromosome (Y) 50,000,000 • Largest human chromosome (1) 250,000,000 • Entire human genome 3,000,000,000

  6. Low-resolution physical map of chromosome 19

  7. Chromosome 19 gene map

  8. Computational Gene Prediction • Where the genes are unlikely to be located? • How do transcription factors know where to bind a region of DNA? • Where are the transcription, splicing, and translation start and stop signals? • What does coding region do (and non-coding regions do not) ? • Can we learn from examples? • Does this sequence look familiar?

  9. Artificial Intelligence in Biosciences Neural Networks (NN) Genetic Algorithms (GA) Hidden Markov Models (HMM) Stochastic context-free grammars (CFG)

  10. Information Theory 0 1 1 bit

  11. Information Theory 00 01 1 bit 11 10 1 bit

  12. Information Theory 1 bit 1 bit

  13. Stochastic models Mechanistic models Mechanism Black box Predictive power Elegance Consistency Predictive power Hidden Markov models Stochastic mechanism Scientific Models Physical models -- Mathematical models

  14. Neural Networks • interconnected assembly of simple processing elements (units or nodes) • nodes functionality is similar to that of the animal neuron • processing ability is stored in the inter-unit connection strengths (weights) • weights are obtained by a process of adaptation to, or learning from, a set of training patterns

  15. Genetic Algorithms Search or optimization methods using simulated evolution. Population of potential solutions is subjected to natural selection, crossover, and mutation choose initial population evaluate each individual's fitness repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness until terminating condition

  16. Parent A Parent B crossover point Child AB Child BA Crossover Mutation

  17. Markov Model (or Markov Chain) A A G T C T Probability for each character based only on several preceding characters in the sequence # of preceding characters = order of the Markov Model Probability of a sequence P(s) = P[A] P[A,T] P[A,T,C] P[T,C,T] P[C,T,A] P[T,A,G]

  18. G T A C A C T Hidden Markov Models States -- well defined conditions Edges -- transitions between the states ATGAC ATTAC ACGAC ACTAC Each transition asigned a probability. Probability of the sequence: single path with the highest probability --- Viterbi path sum of the probabilities over all paths -- Baum-Welch method

  19. Hidden Markov Model of Biased Coin Tosses • States (Si): Two Biased Coins {C1, C2} • Outputs (Oj): Two Possible Outputs {H, T} • p(OutputsOij): p(C1, H), p(C1, T), p(C2, H) p(C2, T) • Transitions: From State X to Y {A11, A22, A12, A21} • p(Initial Si): p(I, C1), p(I, C2) • p(End Si): p(C1, E), p(C2, E)

  20. Hidden Markov Model for Exon and Stop Codon (VEIL Algorithm)

  21. REFINED EXON POSITIONS FINAL EXON CANDIDATES POSSIBLE EXONS GRAIL gene identification program

  22. Suboptimal Solutions for the Human Growth Hormone Gene (GeneParser)

  23. FN TN FN TP FN TN TN TP FP REALITY PREDICTION REALITY Sensitivity c nc Sn = TP / (TP + FN) FP TP c PREDICTION Specificity FN nc TN Sp = TP / (TP + FP) Measures of Prediction Accuracy Nucleotide Level

  24. number of correct exons Sensitivity Sn = number of actual exons number of correct exons Sp = Specificity number of predicted exons Measures of Prediction Accuracy Exon Level MISSING EXON WRONGEXON CORRECTEXON REALITY PREDICTION

  25. GeneMark Accuracy Evaluation

  26. Bibliography http://linkage.rockefeller.edu/wli/gene/list.html and http://www-hto.usc.edu/software/procrustes/fans_ref/ Gene Discovery Exercise http://metalab.unc.edu/pharmacy/Bioinfo/Gene

More Related