290 likes | 740 Views
Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton Coursework 1 coursework – worth 20 marks Work in pairs Retrieving information from a database Using Perl to manipulate that information The Robot Scientist
E N D
Introduction to Bioinformatics2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton
Coursework • 1 coursework – worth 20 marks • Work in pairs • Retrieving information from a database • Using Perl to manipulate that information
The Robot Scientist • Performs experiments • Learns from results • Using machine learning • Plans more experiments • Saves time and money • Team member: • Stephen Muggleton
Biological Nomenclature • Need to know the meaning of: • Species, organism, cell, nucleus, chromosome, DNA • Genome, gene, base, residue, protein, amino acid • Transcription, translation, messenger RNA • Codons, genetic code, evolution, mutation, crossover • Polymer, genotype, phenotype, conformation • Inheritance, homology, phylogenetic trees
Affects the Behaviour of Affects the Function of Folds into Prescribes Substructure and Effect(Top Down/Bottom Up) Substructure Species Organism Cell Nucleus Protein Chromosome Amino Acid DNA strand Gene Base
Cells • Basic unit of life • Different types of cell: • Skin, brain, red/white blood • Different biological function • Cells produced by cells • Cell division (mitosis) • 2 daughter cells • Eukaryotic cells • Have a nucleus
Nucleus and Chromosomes • Each cell has nucleus • Rod-shaped particles inside • Are chromosomes • Which we think of in pairs • Different number for species • Human(46),tobacco(48) • Goldfish(94),chimp(48) • Usually paired up • X & Y Chromosomes • Humans: Male(xy), Female(xx) • Birds: Male(xx), Female(xy)
DNA Strands • Chromosomes are same in every cell of organism • Supercoiled DNA (Deoxyribonucleic acid) • Take a human, take one cell • Determine the structure of all chromosonal DNA • You’ve just read the human genome (for 1 person) • Human genome project • 13 years, 3.2 billion chemicals (bases) in human genome • Other genomes being/been decoded: • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria
DNA Structure • Double Helix (Crick & Watson) • 2 coiled matching strands • Backbone of sugar phosphate pairs • Nitrogenous Base Pairs • Roughly 20 atoms in a base • Adenine Thymine [A,T] • Cytosine Guanine [C,G] • Weak bonds (can be broken) • Form long chains called polymers • Read the sequence on 1 strand • GATTCATCATGGATCATACTAAC
Differences in DNA • DNA differentiates: • Species/race/gender • Individuals • We share DNA with • Primates,mammals • Fish, plants, bacteria • Genotype • DNA of an individual • Genetic constitution • Phenotype • Characteristics of the resulting organism • Nature and nurture tiny 2% Share Material Roughly 4%
Genes • Chunks of DNA sequence • Between 600 and 1200 bases long • 32,000 human genes, 100,000 genes in tulips • Large percentage of human genome • Is “junk”: does not code for proteins • “Simpler” organisms such as bacteria • Are much more evolved (have hardly any junk) • Viruses have overlapping genes (zipped/compressed) • Often the active part of a gene is split into exons • Seperated by introns
The Synthesis of Proteins • Instructions for generating Amino Acid sequences • (i) DNA double helix is unzipped • (ii) One strand is transcribed to messenger RNA • (iii) RNA acts as a template • ribosomes translate the RNA into the sequence of amino acids • Amino acid sequences fold into a 3d molecule • Gene expression • Every cell has every gene in it (has all chromosomes) • Which ones produce proteins (are expressed) & when?
Transcription • Take one strand of DNA • Write out the counterparts to each base • G becomes C (and vice versa) • A becomes T (and vice versa) • Change Thymine [T] to Uracil [U] • You have transcribed DNA into messenger RNA • Example: Start: GGATGCCAATG Intermediate: CCTACGGTTAC Transcribed: CCUACGGUUAC
Genetic Code • How the translation occurs • Think of this as a function: • Input: triples of three base letters (Codons) • Output: amino acid • Example: ACC becomes threonine (T) • Gene sequences end with: • TAA, TAG or TGA
Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine
Example Synthesis • TCGGTGAATCTGTTTGAT Transcribed to: • AGCCACUUAGACAAACUA Translated to: • SHLDKL
Proteins • DNA codes for • strings of amino acids • Amino acids strings • Fold up into complex 3d molecule • 3d structures:conformations • Between 200 & 400 “residues” • Folds are proteins • Residue sequences • Always fold to same conformation • Proteins play a part • In almost every biological process
Evolution of Genes: Inheritance • Evolution of species • Caused by reproduction and survival of the fittest • But actually, it is the genotype which evolves • Organism has to live with it (or die before reproduction) • Three mechanisms: inheritance, mutation and crossover • Inheritance: properties from parents • Embryo has cells with 23 pairs of chromosomes • Each pair: 1 chromosome from father, 1 from mother • Most important factor in offspring’s genetic makeup
Evolution of Genes: Mutation • Genes alter (slightly) during reproduction • Caused by errors, from radiation, from toxicity • 3 possibilities: deletion, insertion, alteration • Deletion: ACGTTGACTC ACGTGACTC • Insertion: ACGTTGACTC AGCGTTGACTC • Substitution: ACGTTGACTC ACGATGACTT • Mutations are almost always deleterious • A single change has a massive effect on translation • Causes a different protein conformation
Evolution of Genes: Crossover (Recombination) • DNA sections are swapped • From male and female genetic input to offspring DNA
Bioinformatics Application #1Phylogenetic trees • Understand our evolution • Genes are homologous • If they share a common ancestor • By looking at DNA seqs • For particular genes • See who evolved from who • Example: • Mammoth most related to • African or Indian Elephants? • LUCA: • Last Universal Common Ancestor • Roughly 4 billion years ago
Genetic Disorders • Disorders have fuelled much genetics research • Remember that genes have evolved to function • Not to malfunction • Different types of genetic problems • Downs syndrome: three chromosome 21s • Cystic fibrosis: • Single base-pair mutation disables a protein • Restricts the flow of ions into certain lung cells • Lung is less able to expel fluids
Bioinformatics Application #2Predicting Protein Structure • Proteins fold to set up an active site • Small, but highly effective (sub)structure • Active site(s) determine the activity of the protein • Remember that translation is a function • Always same structure given same set of codons • Is there a set of rules governing how proteins fold? • No one has found one yet • “Holy Grail” of bioinformatics
Protein Structure Knowledge • Both protein sequence and structure • Are being determined at an exponential rate • 1.3+ Million protein sequences known • Found with projects like Human Genome Project • 20,000+ protein structures known • Found using techniques like X-ray crystallography • Takes between 1 month and 3 years • To determine the structure of a protein • Process is getting quicker
500000 400000 300000 200000 100000 0 85 90 95 00 Sequence versus Structure Protein sequence Number Protein structure Year
Database Approaches • Slow(er) rate of finding protein structure • Still a good idea to pursue the Holy Grail • Structure is much more conservative than sequence • 1.3m genes, but only 2,000 – 10,000 different conformations • First approach to sequence prediction: • Store [sequence,structure] pairs in a database • Find ways to score similarity of residue sequences • Given a new sequence, find closest matches • A good match will possibly mean similar protein shape • E.g., sequence identity > 35% will give a good match • Rest of the first half of the course about these issues
Potential (Big) Payoffsof Protein Structure Prediction • Protein function prediction • Protein interactions and docking • Rational drug design • Inhibit or stimulate protein activity with a drug • Systems biology • Putting it all together: “E-cell” and “E-organism” • In-silico modelling of biological entities and process
Further Reading • Human Genome Project at Sanger Centre • http://www.sanger.ac.uk/HGP/ • Talking glossary of genetic terms • http://www.genome.gov/glossary.cfm • Primer on molecular genetics • http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html