880 likes | 1.25k Views
Genome biology. Topics. Definitions The structure of the genome The function of the genome Methods of genomics. 1.Definitions. Definitions- 1. Genome: definition 1. The information coded in the material of inheritence of an organism
E N D
Topics • Definitions • The structure of the genome • The function of the genome • Methods of genomics
Definitions- 1 Genome: definition 1. Theinformation coded in the material of inheritence of an organism definition 2. The haploid DNA (of a cell) ofan organism 1. Nuclear genome 2. Mitochondrial and chloroplast genomes Transcriptome: 1. Full transcriptom:-the total amount of mRNAs of an organism 2. Cellular transcriptome:- the total amount mRNAs of a cell of an organism in an experimental situation Proteome: 1. Full proteome: - the total ammount of proteins of an organism 2. Cellular proteome: - the total ammount of proteins of a cell in an experimental situation
Definitions- 2 • Genomics (genome biology) • 1. Structural genomics, def:genetic mapping and comparison of individuals • a. determination of the genomic sequence (human, mouse, chimpanzee, etc.) • b. genome variability: intraspecific polimorfism • c. genome evolution: interspecific polimorfism • 2. Functional genomics: def 1:examination of the transcriptome • def 2: examination of the function of the genes • 2/1 Functional genomics-I: transcriptomics • 2/2 Functional gemomics-II: proteomics • Scope: • - collecting of cDNAs • - measuring the differentions in mRNA expression: transcriptomics • - measuring the differentions in protein expression : proteomics
Definitions- 3 • Alternative grouping: • Genomics • Functional genomics (transcriptomics) • Proteomics
Definitions-4 • Other „omics”: • Phosphorylomics:The interaction between kinases and their substrates • Metilomics: The methylation markings of the full DNA (3-5% in • mammals) • Metabolomics:The interactions between enzimes and their substrates • involved in metabolism • Interactomics:Interaction between genes • Lipidomics: The collection of lipids • Omics: system biological approach
The phosphorylome of the yeast Red dots: kinases Blue dots: substrates Green lines: connections
2. The structure of the genome 2a. The structure of the DNA 2b. The variation of the DNA 2c. The evolution of the DNA
2a.The structure of the DNA Genome programs The human genome
Genome programs- history • 1990 The genomes of the viruses • 1995 The first prokaryotic genome – H. influenzae • 1996The first eukaryotic genome – yeast • 1998The first multicellular genome – C. elegans(string worm) • 2000 Drosophila melanogaster, Arabidopsis thaliana (goose-weed) • 2001Humangenome: draw version (90%): 30-35,000 gene • 2002 Mouse genome: draw version • 2004Humangenome: full version (99%): 20-25,000 gene • 2005Chimpanzeegenome: draw version
Genome programes- active ( 300) a. Non mammals: Lot of viruses: small genome E. coli: model organism Other bacteria: H. influenzae, etc Amoeba: different genome String worm (C. elegans): model organism Fruit fly: model organism Bee: livestock, intelligent insect 3 wasp species Tripanosoma + malaria gnat: health care Triboleum castaneum: pest, modell animal of beetles Sea star: modell animal Goose-weed (arabidopsis) modell, rice + coffee: agriculture b. Mammals human: vanity, self-study, health care mouse, rat: model organism bovine: livestock dog: huge number of genetic variants, homogenic races (in-bred breeds) chimpanzee: relative orangutan: Rhesus monkey Wallaby (kanguru) Marmoset (monkey)
Genome programs – the competition director, NIH National Human Genome Research Institute Head of the Celera Genomics Craig Venter Francis Collins Bill Clinton President: USA
The set-up of the human genome –What did they found? • Not 100,000 - 150,000 genes, but: 20,000 - 25,000 • - barely more, than fruit fly and the C. elegans, but the proteome is ~10x as big • II. The bigger part of the genome is non-coding: waste - or selfish DNA? –maybe functional? • Nearly all insect and string-wormal genes are inside of us as well. • IV. This is not true conversely: immunity genes: antibodies, MHC, cytokinines; • apoptotic genes, etc. • V. Numerous proteins from one gene: a human gene codes for an avarige of 2,6 protein: alternative splicing • VI. More transcription factors • VII. Huge enhancer region • VIII. More complex domain structure
The human genome 53% repetitive sequences Protein coding sequences large duplications 1,2% Total genome 25% introns + UTR 45% Transposable elements 20,7% other inter genic sequences 5% 21% LINE 13% SINE 8% 3% 3% Simple repeats (microsatelites; VNTR-s) LTR - retrotransposons Non-LTR retrotransposons DNA transposons LTR:long terminal repeat (regulatory role) LINE: long interspersed elements; SINE: short interspersed elements:
The human genome Simple repeats Protein coding sequences large duplications 3% 1,2% Total genome 45% Transposable elements 20,7% other inter- genic sequences 25% introns + UTR 5% 21% LINE 13% SINE 8% 3% „cut and paste” Retrotransposons DNS transposons Non-LTR retrotransposons (degenerated viral genes) (850,000 LINE, 1500,000 SINE) LTR retrotransposones (Retroviruses, and other functioning retroposons) (450,000 copies) „Copy and paste” LINE: long interspesed elements; SINE: short interspersed elements:
The human genome -transposable elements I. Class: retotransposons I/1. LTR transposons I/2. Non-LTR transposons II/21. LINE-s II/22. SINE-s II/23. Retrogenes II. Class: DNA transposons gag pol env Pr RT Int CP NC RNáz H Retroviruses (1%) LTR capsid nucleocapsid proteinase ribonuclease H envelope LTR Reverse transcriptase integrase 7% gag pol LTR retro-transposones Pr RT Int CP NC RNase H gag? pol LTR LTR polyA LINE-s (pl. L1) RT RNaseH polyA 33% SINE-s (Alu) A B polyA Retrogenes ORF I. class 3% IR IR transposase DNA transposons II. class LTR: long terminal repeat
The human genome -retroviruses gag pol env Pr RT Int CP NC RNase H LTR capsid nucleocapsid proteinase ribonuclease H envelope LTR Reverse transcriptase integrase gag: capsid(structural elements) pol: polimerase: reverse transcriptase, integrase, proteinase, RNase H env: envelope (structural elements) Low copy number (10-1000 copies) human endogene retroviruses are present in 1% of the genome LTR: long terminal repeat
The human genome - Retroviral infection
The human genome - retrotransposons I. Class: retotransposons I/1. LTR transposons I/2. Non-LTR transposons II/21. LINE-s II/22. SINE-s II/23. Retrogenes II. Class: DNA transposons LTR retrotransposones: fromhuman endogene retroviruses, 10 – 1000 copies LINE-s: in human L1 is the most common; present in 100,000 copies, but Lots of them are degenerated pseudogenes (non perfect reverse transcription). The 3,500 full length (6,1 kb) L1 –s 1% have promoter and two intact ORFs. LINE mobilisation in germ line and somatic cells as well. SINE-s: 500,000 – 900,000 Alu copies (the most succesful transposone in human). All Alu element was created from a 280 bp + polIII promoter containing 7SL RNS gene. AluI restriction enzyme recognition sites are present in them. Reverse Transcription! gag pol LTR retro-transposons NC Pr RT Int CP RNaseH gag? pol LTR LTR polyA LINE-s (pl. L1) RT RNase H polyA SINE-s (Alu) A B polyA Retrogenes ORF I. class
The human genome - DNA transposons I. Class: retotransposons I/1. LTR transposons I/2. Non-LTR transposons II/21. LINE-s II/22. SINE-s II/23. Retrogenes II. Class: DNA transposons • DNA transposons: • - The transposase responsible for the flip: how does it multiply? • - More than 60 families: Charlie, mariner, Tigger, THE1, etc • The mariner family is similar to the the transposones present in insects: • horizontal gene transfer? 21% LINE 13% SINE 8% 3% Retrotransposons DNA transposons IR IR transposase DNA transposons II. class IR: inverted repeat
The human genome - microsatellites, minisatellites, macrosatellites Exons 1,2% Large duplications: minisatellites and macrosatellites Total genome 45% Transposable elements 20,7% other inter- genic sequences 25% introns+ UTR 5% 3% Simple repeats (microsatellites; VNTR-s) Satellites: highly repetitve sequences Duplications: importance in evolution DNA satellites • Microsatellites:small 4 base pair long or shorter repeats:1 – 15 kilobasepairs long • CA/TG repeats in the 0.5% of the genom – yet their function is not known, „replication slippage” • AAAAs and TTTTTs • trinucleotid repeats CAA (Glu), ACA (ala): neuronal disorders; transcription factors in dogs • Minisatellites:1 – 15 kbs repeats: like telomer: 15 kb: TTAGGG hexamer -the telomerse forges • to the end of the chromosomes • Macrosatellites:several hundred kbs repeats
E2 I2 I1 The human genome - exons and introns Exons: - protein coding DNA sequences + UTRs Introns: - cut out - alternative splicing, other alternative processes. - are they functional? Protein coding sequences Simple repeats Large duplications 1.3% 3% Total genome 45% Transposable elements 20,7% other inter- genic sequences 25% introns + UTR 5% E3 E1 pre-mRNS leader trailer 5’-UTR 3’-UTR Coding sequence Stop AUG polyA signal
The human genome- other intergenic sequences • Unidentifyable degenerated transposones • Pseudogenes: 2 types (reverse transcripted RNA, duplicated DNA) • Regulatory elements: promoters, enhancers, silencers • others Protein coding DNA sequences Large duplications 1.3% Total genome 45% Transposable elements 25% introns + UTR 20,7% other inter- genic sequences 5% 3% Simple repeats
2b. The variability of the DNA - intraspecific variability Human genom diversity programes The genetic code of the phenotypic variability – coding vs regulatory sequences
Human genome diversity programs • From 1990 programs to map the polimorfism of the • human genome. Importance: genealogic, medical • From 2005 Genographic Project (National Geographic) • mtDNA • Y chromosome Genetic markers The practicability of the data: - Two theories on the origin of Homo sapiens (From Homo erectus): multiregional theory – African origin (mitochondrial Éva) - The wanderings of modern man.
Inheritence somatic chromosomes
Inheritance somatic chromosomes Y chromosome Mitochondrial DNA
Inheritance somatic chromosomes Y chromosome Mitochondrial DNA
Genetic markers on the Y chromosome Genes STR*-s STR: short tandem repeats
Genes on the mitochondrial DNA Hiper variable region 16,569 nukleotide
„Common” origin Homo neanderthalensis Homo sapiens Homo sapiens 100.000 years ago European Homo erectus Asian Homo erectus 1.8 million years ago Homo erectus African Homo erectus √ Hypotheses:Multiregional Origin Out of Africa ------------------
African origin 40,000 yr yr 20,000 yr 67,000 yr 130,000 yr 13,000 yr 40-60,000 yr Comparison of mitochondrial DNAs: winning of „Out of Africa” hypothesis over „Multiregional Origin” hypothesis.
The genetic base of the phenotypic variability: coding vs regulatory sequences • Genes and proteins functional variance • The theory of neutrality • Intragenic variability in the regulatory regions • Variance in the coding region of the regulatory genes
Genes and proteins - functional variance Traditional concept The different gene-products are responsible for the phenotypic variance in efficiency and function
The theory of neutrality Motoo Kimura The gene variants (alleles) are functionally the same ! • The majority of gene substitutions are not responsible for amino-acid changes (sinonim changes) • The vast majority of aminoacid substitutions do not changes the function of the protein (chemically similar aminoacid substitutions: conservative change) • A The function of the genes did not changed through evolution, Gene variability do not cause phenotypic variability. These are true for the most genes. • - except for defective genes
Intraspecfic variability in the regulatory regions Of Mice and Man • Significant polimorfism in the regulatory sequences • variability: expression level and tissue-specifity
Intraspecfic variability in the regulatory regions The gene regulation theory: variability in gene expression Variants of gene „A” with identical function but differing controlling regions 1 gene A P 2 gene A P 3 gene A P 4 gene A P enhancers promoters individuals
Intraspecfic variability in the regulatory regions Harold Garner and John W. Fondon Revival of gene function theory: Variability in the sequence of transcription factors 1931 1976 Q19A14 Q19A13 runx-2 gene bull terrier Q Q Q A A A Q: glutamine A: alanine . . . CAACAAGCACAAGCAGCA . . . Coding variance: number of triplet repeats (number of glutamine and alanine repetitions)
2c. The evolution of the DNA - interspecific variability Differences between genomes The chimpanzee in us
The differences between genomes - Genome size ExaplesSize (bp)Length (m) HIV-19.8x103 10-6 fage 4.8x104 10-5 T4 fage 1.7x105 10-4 E. Coli4.6x106 10-3 Drosophila1.8x108 10-1 Mouse3.5x109 1 Dog3.4x109 1 Horse 3.3x109 1 Human3.4x109 1 Corn 5.0x109 1 Lilie 3.6x1010 10 Amoeba2.9x1011 100 The genome size differs between species and is not in relation with phenotypic complexity
The differences between genomes - Number of the genes compared with the total number of the cells Number of the genes: number of the cells: Human: 20 – 25,000 1014 Fruit fly: 13,500 - C. elegans: 19,100 959
Differences between genomes Sctructural and functional differences DNA similarity: Human - chiken: 60% Human – mouse: 88% Human – chimpanzee: >98% Same proteins Ascidian –human: 80% Human – fruit fly: 40% Fruit fly – human: 61% C. elegans – human: 43% Yeast – human: 46% Same domains: Human – fruit fly, C. elegans >90%, Exon shuffling in human , 2x as much gene Rather „copy and paste”, than „cut and paste” mechanism
The chimp in us • Human genome - chimp genome • 1. Chromosome number: 23 vs. 24 • 2. Genetic alterations: • 3. More Alu and L1 sequences in human • (short repetitive sequences) • - Point mutations (complete genome)… 1.23% • Point mutations (coding sequences).... 1% • Duplications: ……………………….. 2.7% • - Insertions, deletions:………………… 3.0% • - Several Chromosomal rearrangements • 4 theories about the differences: • Evolution of regulatory proteins • a.FOXP2gene: mutation: disorder in speech and articulation;2 amino acid differences between human and chimp • b.ASPM and MCPH1genesmutation: microcephaly; their expression levels are higher in neural precursor cells • 2. Evolution of regulatory sequences • A general increase in the expression level of genes in brain; it is difficult to detect it at the level of DNA • 3. Retainment of juvenile characters • Lack of body fur, higher brain/body weigh ratio, the form of out skull is similar to that of chimp kid • 4. Neotenia theory • Compare to the rest of the body, the development of head accelerated
3. The function of the genome 3a. Gene expression 3b. Genome expression 3c. An astonishing RNA world
pol-II Gene expression RNA polimerase-II ER DNA pre-mRNA polyA mRNA cap Golgi nucleus cytoplasm ribosome protein
Alternative gene usage - instrument of complexity • Alternative…… • - …promoter-usage • …splicing • …polyadenylation • …gene expression
pol P2 P1 Alternative promoter usage Coding region Epidermal cells terminator Promoter 1 Ex1 I1 Ex2 I2 Ex3 T Promoter 2 pre-mRNA mRNA ribosome protein Ex: exon I: intron
pol P2 P1 Alternative promoter usage Coding region Neuronal cells terminator Promoter 1 Ex1 I1 Ex2 I2 Ex3 T Promoter 2 pre-mRNA mRNA protein Ex: exon I: intron