1 / 29

Bioinformatics

Bioinformatics. Gil McVean Department of Statistics. What is it to be a human?. What is it to be an individual?. Photos from UN photo gallery www.un.org/av/photo. Is it your genes?. Is it your transcripts?. Is it your proteins?. Is it your protein interactions?. Is it your systems?.

mablei
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Gil McVean Department of Statistics

  2. What is it to be a human?

  3. What is it to be an individual? Photos from UN photo gallery www.un.org/av/photo

  4. Is it your genes?

  5. Is it your transcripts?

  6. Is it your proteins?

  7. Is it your protein interactions?

  8. Is it your systems?

  9. Bioinformatics and genome biology • Bioinformatics is the analytical wing of genome biology • It concerns itself with large amounts of data (more than you can look at!) • It uses computers and efficient algorithms • It is • Data assembly • Data summary • Data modelling • Data analysis

  10. The raw material

  11. The output

  12. Classical bioinformatics I: DNA and protein sequence alignment

  13. Classical bioinformatics II: Genome assembly

  14. Classical bioinformatics III: Gene finding

  15. Classical bioinformatics IV: Protein structure prediction

  16. Bioinformatics of genetic variation • An area of considerable current attention is human genetic variation • The aim of current experiments is to map the genetic basis of human phenotypic variation • Disease susceptibility • Normal variation • It is challenging because of • The scale of the data • The structure of the data • The underlying processes that shape variation • Bioinformatics is needed to • Assemble, collate, check and summarise data • Model the data • Make inferences

  17. What does the data look like? • Single Nucleotide Polymorphisms (SNPs) • Insertion-Deletion Polymorphisms (INDELs) TGCTTGGCAGGGCAGACTGACTGT TGCTTGGCAGGGCAGACTGACTGT TGCATGGCAGGGCAG-CTGACTGT TGCATGGCAGGGCAG-CTGACTGT TGCATGGCAGGGCAGACTGACTGT TGCATGGCAGGGCAGACTGACTGT SNP INDEL

  18. Collections of SNPs HCB JPT YRI CEU SNP

  19. Engineering challenges • Identifying SNPs • Working out which SNPs will work on a given platform • Controlling the genotyping work-flow • Controlling the output quality • Performing quality-assurance exercises • Identifying problems, gaps and inconsistencies

  20. A Bioinformatics problem: How small is my P-value? • The basic idea of association studies is to look for genetic differences between groups Cases (D) It is easy to ask the question “Is there a significant difference in the frequency of a mutation between groups?” Controls (C) Locus of interest

  21. The problems • In a study of several hundred thousand mutations (or even millions) it is unlikely that we have actually typed the causal variant(s). • In a study of several hundred thousand mutations (or even millions), even if NONE of them are causal a lot of them will show significance at the 5%, 1% or even 0.01% level • Differences in the frequency of disease incidence between groups (for example African Americans and European Americans) will be associated with ANY genetic difference between them

  22. What we really want to ask • “Does any of the genome show an association with disease over and above any effect I might expect from the correlation between genotype and environmental risk?” • “If so, what is the most likely position for the causal mutation(s)?” • Answering these questions is difficult, but a natural way to approach the problem is to model the process

  23. MODEL MODEL Modelling genetic variation Evolutionary parameters Population Sample Stochastic Evolutionary process Stochastic Sampling process Selection Mutation Genetic drift Recombination Migration ATGCATGGGCTATTGGACCT ATGGATGGGCTATTGCACCT ATGCATGGGCAATTGCACCT ATGCATGGGCAATTGGACCT ATGGATGGGCTATTGCACCT Inference

  24. Genes in populations Present day

  25. Ancestry of current population Present day

  26. Ancestry of sample Present day

  27. The coalescent: samples in populations Most recent common ancestor (MRCA) coalescence Ancestral lineages Present day time

  28. How does this help us to think about mapping disease? • Individuals are related to each other through their genealogical history • Two nearby points on the genome will have similar genealogical histories, a result of which is that mutations at these positions will also be correlated • Understanding how genealogical history changes along the genome (through recombination) and between populations (through historical demography) will allow us to • Construct more powerful tests for disease association • Localise disease-associated mutations

  29. The bioinformatics module • Genomic technologies • Annotating genomes • Modelling gene evolution • Mapping disease genes • Measuring gene and protein expression • Predicting protein structure

More Related