1 / 27

生物信息学基础讲座

生物信息学基础讲座. 第 2 讲 生物信息学中的生物学. 生物学基础知识. 生物学基本概念 biological concepts 生物大分子 macromolecules 生物过程 biological processes 生物学实验技术 biological techniques 测序技术 sequencing :大分子序列 NMR/X-ray 技术:大分子结构 Microarray 技术:基因表达 酵母双杂交 Y2H :蛋白 - 蛋白相互作用 生物学数据 biological data 序列数据: DNA/RNA/ 蛋白质序列

chul
Download Presentation

生物信息学基础讲座

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 生物信息学基础讲座 第2讲 生物信息学中的生物学

  2. 生物学基础知识 • 生物学基本概念biological concepts • 生物大分子 macromolecules • 生物过程 biological processes • 生物学实验技术biological techniques • 测序技术 sequencing:大分子序列 • NMR/X-ray技术:大分子结构 • Microarray技术:基因表达 • 酵母双杂交Y2H:蛋白-蛋白相互作用 • 生物学数据 biological data • 序列数据:DNA/RNA/蛋白质序列 • 结构数据:蛋白质/复合物二维/三维结构 • 表达数据:microarray/Chip-chip数据 • 生物学数据库 biological databases • 文献数据库:Pubmed • 核酸数据库:Genbank/EMBL/DDBJ • 基因组数据库:UCSC Genome Browser • 蛋白数据库:Genprot/Swissprot/ • 结构数据库:PDB • 表达数据库:GEO/SMD • 基因通路数据库:KEGG • 其他专业数据库:Flybase/MGI/Wormbase

  3. Biological Nomenclature • Need to know the meaning of: • Species, organism, cell, nucleus, chromosome, DNA • Genome, gene, base, residue, protein, amino acid • Transcription, translation, messenger RNA • Codons, genetic code, evolution, mutation, crossover • Polymer, genotype, phenotype, conformation • Inheritance, homology, phylogenetic trees

  4. Affects the Behaviour of Affects the Function of Folds into Prescribes Substructure and Effect Substructure Species Organism Cell Nucleus Protein Chromosome Amino Acid DNA strand Gene Base

  5. Cells • Basic unit of life • Different types of cell: • Skin, brain, red/white blood • Different biological function • Cells produced by cells • Cell division (mitosis) • 2 daughter cells • Eukaryotic cells • Have a nucleus

  6. Nucleus and Chromosomes • Each cell has nucleus • Rod-shaped particles inside • Are chromosomes • Which we think of in pairs • Different number for species • Human(46),tobacco(48) • Goldfish(94),chimp(48) • Usually paired up • X & Y Chromosomes • Humans: Male(xy), Female(xx) • Birds: Male(xx), Female(xy)

  7. DNA Strands • Chromosomes are same in every cell of organism • Supercoiled DNA (Deoxyribonucleic acid) • Take a human, take one cell • Determine the structure of all chromosonal DNA • You’ve just read the human genome (for 1 person) • Human genome project • 13 years, 3.2 billion chemicals (bases) in human genome • Other genomes being/been decoded: • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria

  8. DNA Structure • Double Helix (Crick & Watson) • 2 coiled matching strands • Backbone of sugar phosphate pairs • Nitrogenous Base Pairs • Roughly 20 atoms in a base • Adenine  Thymine [A,T] • Cytosine  Guanine [C,G] • Weak bonds (can be broken) • Form long chains called polymers • Read the sequence on 1 strand • GATTCATCATGGATCATACTAAC

  9. Differences in DNA • DNA differentiates: • Species/race/gender • Individuals • We share DNA with • Primates,mammals • Fish, plants, bacteria • Genotype • DNA of an individual • Genetic constitution • Phenotype • Characteristics of the resulting organism • Nature and nurture tiny 2% Share Material Roughly 4%

  10. Genes • Chunks of DNA sequence • Between 600 and 1200 bases long • 32,000 human genes, 100,000 genes in tulips • Large percentage of human genome • Is “junk”: does not code for proteins • “Simpler” organisms such as bacteria • Are much more evolved (have hardly any junk) • Viruses have overlapping genes (zipped/compressed) • Often the active part of a gene is split into exons • Seperated by introns

  11. The Synthesis of Proteins • Instructions for generating Amino Acid sequences • DNA double helix is unzipped • One strand is transcribed to messenger RNA • RNA acts as a template • Ribosome translate the RNA into the sequence of amino acids • Amino acid sequences fold into a 3d molecule • Gene expression • Every cell has every gene in it (has all chromosomes) • Which ones produce proteins (are expressed) & when?

  12. Transcription • Take one strand of DNA • Write out the counterparts to each base • G becomes C (and vice versa) • A becomes T (and vice versa) • Change Thymine [T] to Uracil [U] • You have transcribed DNA into messenger RNA • Example: Start: GGATGCCAATG Intermediate: CCTACGGTTAC Transcribed: CCUACGGUUAC

  13. Genetic Code • How the translation occurs • Think of this as a function: • Input: triples of three base letters (Codons) • Output: amino acid • Example: ACC becomes Threonine (T) • Gene sequences end with: • TAA, TAG or TGA

  14. Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine

  15. Example Synthesis • TCGGTGAATCTGTTTGAT Transcribed to: • AGCCACUUAGACAAACUA Translated to: • SHLDKL

  16. Proteins • DNA codes for • strings of amino acids • Amino acids strings • Fold up into complex 3d molecule • 3d structures:conformations • Between 200 & 400 “residues” • Folds are proteins • Residue sequences • Always fold to same conformation • Proteins play a part • In almost every biological process

  17. Evolution of Genes: Inheritance • Evolution of species • Caused by reproduction and survival of the fittest • But actually, it is the genotype which evolves • Organism has to live with it (or die before reproduction) • Three mechanisms: inheritance, mutation and crossover • Inheritance: properties from parents • Embryo has cells with 23 pairs of chromosomes • Each pair: 1 chromosome from father, 1 from mother • Most important factor in offspring’s genetic makeup

  18. Evolution of Genes: Mutation • Genes alter (slightly) during reproduction • Caused by errors, from radiation, from toxicity • 3 possibilities: deletion, insertion, alteration • Deletion: ACGTTGACTC  ACGTGACTC • Insertion: ACGTTGACTC  AGCGTTGACTC • Substitution: ACGTTGACTC  ACGATGACTT • Mutations are almost always deleterious • A single change has a massive effect on translation • Causes a different protein conformation

  19. Evolution of Genes: Crossover (Recombination) • DNA sections are swapped • From male and female genetic input to offspring DNA

  20. Phylogenetic trees • Understand our evolution • Genes are homologous • If they share a common ancestor • By looking at DNA seqs • For particular genes • See who evolved from who • Example: • Mammoth most related to • African or Indian Elephants? • LUCA: • Last Universal Common Ancestor • Roughly 4 billion years ago

  21. Genetic Disorders • Disorders have fuelled much genetics research • Remember that genes have evolved to function • Not to malfunction • Different types of genetic problems • Downs syndrome: three chromosome 21s • Cystic fibrosis: • Single base-pair mutation disables a protein • Restricts the flow of ions into certain lung cells • Lung is less able to expel fluids

  22. Predicting Protein Structure • Proteins fold to set up an active site • Small, but highly effective (sub)structure • Active site(s) determine the activity of the protein • Remember that translation is a function • Always same structure given same set of codons • Is there a set of rules governing how proteins fold? • No one has found one yet • “Holy Grail” of bioinformatics

  23. Protein Structure Knowledge • Both protein sequence and structure • Are being determined at an exponential rate • 1.3+ Million protein sequences known • Found with projects like Human Genome Project • 20,000+ protein structures known • Found using techniques like X-ray crystallography • Takes between 1 month and 3 years • To determine the structure of a protein • Process is getting quicker

  24. 500000 400000 300000 200000 100000 0 85 90 95 00 Sequence versus Structure Protein sequence Number Protein structure Year

  25. Database Approaches • Slow(er) rate of finding protein structure • Still a good idea to pursue the Holy Grail • Structure is much more conservative than sequence • 1.3m genes, but only 2,000 – 10,000 different conformations • First approach to sequence prediction: • Store [sequence,structure] pairs in a database • Find ways to score similarity of residue sequences • Given a new sequence, find closest matches • A good match will possibly mean similar protein shape • E.g., sequence identity > 35% will give a good match • Rest of the first half of the course about these issues

  26. Potential (Big) Payoffsof Protein Structure Prediction • Protein function prediction • Protein interactions and docking • Rational drug design • Inhibit or stimulate protein activity with a drug • Systems biology • Putting it all together: “E-cell” and “E-organism” • In-silico modelling of biological entities and process

  27. Further Reading • Human Genome Project at Sanger Centre • http://www.sanger.ac.uk/HGP/ • Talking glossary of genetic terms • http://www.genome.gov/glossary.cfm • Primer on molecular genetics • http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html

More Related