270 likes | 516 Views
生物信息学基础讲座. 第 2 讲 生物信息学中的生物学. 生物学基础知识. 生物学基本概念 biological concepts 生物大分子 macromolecules 生物过程 biological processes 生物学实验技术 biological techniques 测序技术 sequencing :大分子序列 NMR/X-ray 技术:大分子结构 Microarray 技术:基因表达 酵母双杂交 Y2H :蛋白 - 蛋白相互作用 生物学数据 biological data 序列数据: DNA/RNA/ 蛋白质序列
E N D
生物信息学基础讲座 第2讲 生物信息学中的生物学
生物学基础知识 • 生物学基本概念biological concepts • 生物大分子 macromolecules • 生物过程 biological processes • 生物学实验技术biological techniques • 测序技术 sequencing:大分子序列 • NMR/X-ray技术:大分子结构 • Microarray技术:基因表达 • 酵母双杂交Y2H:蛋白-蛋白相互作用 • 生物学数据 biological data • 序列数据:DNA/RNA/蛋白质序列 • 结构数据:蛋白质/复合物二维/三维结构 • 表达数据:microarray/Chip-chip数据 • 生物学数据库 biological databases • 文献数据库:Pubmed • 核酸数据库:Genbank/EMBL/DDBJ • 基因组数据库:UCSC Genome Browser • 蛋白数据库:Genprot/Swissprot/ • 结构数据库:PDB • 表达数据库:GEO/SMD • 基因通路数据库:KEGG • 其他专业数据库:Flybase/MGI/Wormbase
Biological Nomenclature • Need to know the meaning of: • Species, organism, cell, nucleus, chromosome, DNA • Genome, gene, base, residue, protein, amino acid • Transcription, translation, messenger RNA • Codons, genetic code, evolution, mutation, crossover • Polymer, genotype, phenotype, conformation • Inheritance, homology, phylogenetic trees
Affects the Behaviour of Affects the Function of Folds into Prescribes Substructure and Effect Substructure Species Organism Cell Nucleus Protein Chromosome Amino Acid DNA strand Gene Base
Cells • Basic unit of life • Different types of cell: • Skin, brain, red/white blood • Different biological function • Cells produced by cells • Cell division (mitosis) • 2 daughter cells • Eukaryotic cells • Have a nucleus
Nucleus and Chromosomes • Each cell has nucleus • Rod-shaped particles inside • Are chromosomes • Which we think of in pairs • Different number for species • Human(46),tobacco(48) • Goldfish(94),chimp(48) • Usually paired up • X & Y Chromosomes • Humans: Male(xy), Female(xx) • Birds: Male(xx), Female(xy)
DNA Strands • Chromosomes are same in every cell of organism • Supercoiled DNA (Deoxyribonucleic acid) • Take a human, take one cell • Determine the structure of all chromosonal DNA • You’ve just read the human genome (for 1 person) • Human genome project • 13 years, 3.2 billion chemicals (bases) in human genome • Other genomes being/been decoded: • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria
DNA Structure • Double Helix (Crick & Watson) • 2 coiled matching strands • Backbone of sugar phosphate pairs • Nitrogenous Base Pairs • Roughly 20 atoms in a base • Adenine Thymine [A,T] • Cytosine Guanine [C,G] • Weak bonds (can be broken) • Form long chains called polymers • Read the sequence on 1 strand • GATTCATCATGGATCATACTAAC
Differences in DNA • DNA differentiates: • Species/race/gender • Individuals • We share DNA with • Primates,mammals • Fish, plants, bacteria • Genotype • DNA of an individual • Genetic constitution • Phenotype • Characteristics of the resulting organism • Nature and nurture tiny 2% Share Material Roughly 4%
Genes • Chunks of DNA sequence • Between 600 and 1200 bases long • 32,000 human genes, 100,000 genes in tulips • Large percentage of human genome • Is “junk”: does not code for proteins • “Simpler” organisms such as bacteria • Are much more evolved (have hardly any junk) • Viruses have overlapping genes (zipped/compressed) • Often the active part of a gene is split into exons • Seperated by introns
The Synthesis of Proteins • Instructions for generating Amino Acid sequences • DNA double helix is unzipped • One strand is transcribed to messenger RNA • RNA acts as a template • Ribosome translate the RNA into the sequence of amino acids • Amino acid sequences fold into a 3d molecule • Gene expression • Every cell has every gene in it (has all chromosomes) • Which ones produce proteins (are expressed) & when?
Transcription • Take one strand of DNA • Write out the counterparts to each base • G becomes C (and vice versa) • A becomes T (and vice versa) • Change Thymine [T] to Uracil [U] • You have transcribed DNA into messenger RNA • Example: Start: GGATGCCAATG Intermediate: CCTACGGTTAC Transcribed: CCUACGGUUAC
Genetic Code • How the translation occurs • Think of this as a function: • Input: triples of three base letters (Codons) • Output: amino acid • Example: ACC becomes Threonine (T) • Gene sequences end with: • TAA, TAG or TGA
Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine
Example Synthesis • TCGGTGAATCTGTTTGAT Transcribed to: • AGCCACUUAGACAAACUA Translated to: • SHLDKL
Proteins • DNA codes for • strings of amino acids • Amino acids strings • Fold up into complex 3d molecule • 3d structures:conformations • Between 200 & 400 “residues” • Folds are proteins • Residue sequences • Always fold to same conformation • Proteins play a part • In almost every biological process
Evolution of Genes: Inheritance • Evolution of species • Caused by reproduction and survival of the fittest • But actually, it is the genotype which evolves • Organism has to live with it (or die before reproduction) • Three mechanisms: inheritance, mutation and crossover • Inheritance: properties from parents • Embryo has cells with 23 pairs of chromosomes • Each pair: 1 chromosome from father, 1 from mother • Most important factor in offspring’s genetic makeup
Evolution of Genes: Mutation • Genes alter (slightly) during reproduction • Caused by errors, from radiation, from toxicity • 3 possibilities: deletion, insertion, alteration • Deletion: ACGTTGACTC ACGTGACTC • Insertion: ACGTTGACTC AGCGTTGACTC • Substitution: ACGTTGACTC ACGATGACTT • Mutations are almost always deleterious • A single change has a massive effect on translation • Causes a different protein conformation
Evolution of Genes: Crossover (Recombination) • DNA sections are swapped • From male and female genetic input to offspring DNA
Phylogenetic trees • Understand our evolution • Genes are homologous • If they share a common ancestor • By looking at DNA seqs • For particular genes • See who evolved from who • Example: • Mammoth most related to • African or Indian Elephants? • LUCA: • Last Universal Common Ancestor • Roughly 4 billion years ago
Genetic Disorders • Disorders have fuelled much genetics research • Remember that genes have evolved to function • Not to malfunction • Different types of genetic problems • Downs syndrome: three chromosome 21s • Cystic fibrosis: • Single base-pair mutation disables a protein • Restricts the flow of ions into certain lung cells • Lung is less able to expel fluids
Predicting Protein Structure • Proteins fold to set up an active site • Small, but highly effective (sub)structure • Active site(s) determine the activity of the protein • Remember that translation is a function • Always same structure given same set of codons • Is there a set of rules governing how proteins fold? • No one has found one yet • “Holy Grail” of bioinformatics
Protein Structure Knowledge • Both protein sequence and structure • Are being determined at an exponential rate • 1.3+ Million protein sequences known • Found with projects like Human Genome Project • 20,000+ protein structures known • Found using techniques like X-ray crystallography • Takes between 1 month and 3 years • To determine the structure of a protein • Process is getting quicker
500000 400000 300000 200000 100000 0 85 90 95 00 Sequence versus Structure Protein sequence Number Protein structure Year
Database Approaches • Slow(er) rate of finding protein structure • Still a good idea to pursue the Holy Grail • Structure is much more conservative than sequence • 1.3m genes, but only 2,000 – 10,000 different conformations • First approach to sequence prediction: • Store [sequence,structure] pairs in a database • Find ways to score similarity of residue sequences • Given a new sequence, find closest matches • A good match will possibly mean similar protein shape • E.g., sequence identity > 35% will give a good match • Rest of the first half of the course about these issues
Potential (Big) Payoffsof Protein Structure Prediction • Protein function prediction • Protein interactions and docking • Rational drug design • Inhibit or stimulate protein activity with a drug • Systems biology • Putting it all together: “E-cell” and “E-organism” • In-silico modelling of biological entities and process
Further Reading • Human Genome Project at Sanger Centre • http://www.sanger.ac.uk/HGP/ • Talking glossary of genetic terms • http://www.genome.gov/glossary.cfm • Primer on molecular genetics • http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html