The GENOME

The GENOME structure, function & evolution

Definitions

From genome to cell biochemistry Full genome: Total amount of DNA of an organism Cellular genome: Haploid DNA content of a cell of an organism GENOME Transcription Full transccriptome: Total amount of RNA of an organism Cellular transcriptome: Total amount of RNA of a cell of an organism TRANSCRIPTOME Translation Full proteome: Total amount of proteins of an organsim Cellular proteom: Total amount of proteins of a cell of an organsim PROTEOME Proteome activity Metabolome, lipome, phosphorylome, methylome, etc. Biochemistry of the cell

PROTEOMICS Disciplines of genome biology Structural genomics GENOMICS GENOME Transcription Functional genomics TRANSCRIPTOME TRANSCRIPTOMICS Translation (Functional genomics) PROTEOME Proteome activity (Functional genomics) Metabolomics, lipomics, methylomics, phosphorylomics… Metabolome, lipome, methylome phosphorylome, interactome, etc

Remarks • Structural genomics collects elements of the DNA, the full transcriptome • and proteome, but does not deal with their functions • The scope of functional genomics: • (A) change of transcriptome and proteome • (1) in different cell types • (2) healthy/diseased tissues and cells • (3) treated/untreated tissues and cells • (B) Interaction between the elements of transcriptome and proteome • - interaction maps

Yeast protein interaction map Each dots represents a protein, with connecting lines indicating interactions between pairs of proteins. Red dots:essential proteins – an inactivating mutation is lethal Green dots: non-essential proteins - mutation is nonlethal Orange dots: non-essential proteins - mutation leads to slow growth

Yeast protein interaction map Each oval represents a protein complex, with connections shown between complexes that share at least 1 protein. Red:cell cycle Dark green: signaling Dark blue: transcription, chromatin structure Pink: protein and RNA transport Orange: RNA metabolism Light green: protein synthesis and turnover Brown: cell polarity Violet: intermediate or energy metabolism Light blue: membrane biogenesis and/or traffic

Yeast protein interaction map - the complete network • Hubs: proteins with many interactions. A much larger number of protein has only few individual connections • This architecture is thought to minimize the effect on the proteome of mutations which might inactivate individual proteins

Yeast protein interaction map - removal of party hubs Party hubs: interact with all their partners simultaneously - their removal has little effect on the overall structure of the network

Yeast protein interaction map - removal of date hubs Date hubs: interact with different partners at different times - their removal breaks the network into a small subnetworks

Phosphorylome of the yeast Red dots: kinases Blue dots: substrates Green lines: connections

Gene network extracellular space cytoplasm mitochondrion ER nucleus In a strict sense: only transcription factors are the gene network components

Structure and operation of the genome

Nuclear genome Mitochondrial genome Two genomes in a cell

Mitochondrial DNA strands The mitochondrial genome 16,569 nukleotid

The mitochondrial genome • Mitochondrion: arose1.5x109 years ago from a purple bacterium sp.: endosymbiosis •  Mammalian mitochondrion: • - most of genes has been lost or got to the chromosome • - 13 polypeptide (all of the are the enzymes for oxydative phosphorylation) • - 12S and 16S rRNA genes • - 22 tRNA gene •  Mitochondrial DNA: several thousands copies/cell •  Deviations from the universal code: • codon amino acid normallyamino acid in mitochondrion • UGA stop Trp (mammals, insects, yeast, fungi) • AGA Arg stop (mammals, insects) • ACG Arg stop (mammals) • AUA Ile Met (mammals, insects, yeast) • CUN Leu Thr (yeast) • CGG Arg Trp (maize) • Remark: there are alterations in the genomial DNAs in some species, too (prokaryotes and eukariotes)

The nuclear genome chromatin

Genome size

% Prokaryotes unicellular plants/ protochordata human organisms fungi invertebratesvertebrates The ratio of noncoding sequences

The human genome project Human genome: 2001: raw version (90%) 2004: full version (99%) The missing 1%: repetitive sequences near the centromere Bill Clinton Craig Venter Francis Collins The human variom project Collection of variable sequences from different individuals - primary focus on medical application Richard Cotton

Genome programs Relatives of human: chimp, orang utan Model organisms of science: E. coli, yeast, C. elegans, fruit fly, arabidopsis, mouse Pathogens and their vectors: viruses, bacteria, plasm. malariae + malaria mosquito Agric. animals and plants: wheat, chick, cow, pig Pets: dog Others: archaebacteria,amoeba, wallabi kangaroo, etc. Ascertaining the sequence of DNA is not enough to understand its operation!

DNA sequencing Initiation of DNA synthesis „A” dideoxynucleotide Template DNA Frederick Sanger Base (A) 1. 2. 3. The dATP –OH group was changed to –H  ddATP Synthesis is terminated upon incorporation For the synthesis dATP/ddATP mixture is added (less ddATP), therefore the synthesis stops at „T”s 1. 2. 3.

DNA sequencing The different ddNTPs are labeled with distinct colors synthesis, then gel electrophoresis 10 nucleotide detector 50 nucleotide

Human genome – 3,2 GB (3,2 billion base pair) LTR retrotransposons DNA transposons SINEs Simple repeats Large duplications retrotransposons miscellaneous heterochromatin LINEs Miscellaneous unique sequences exons introns LINE: long interspersed nuclear elements SINE: short interspersed nuclear elements

Human genome 1,5% 36% 62,5% Gene-related sequenc.Intergenic sequences Coding sequences 48 MB 1152 MB 2000 MB Gene-related sequences Intergenic sequences Coding sequences Non-coding RNA coding „genes”

Human genome 1,5% 36% 62,5% Gene-related sequences Intergenic sequences Coding sequences 24% 1,5% pseudogenes gene fragments introns UTRs 10,5%

Human genome 1,5% 36% 62,5% Gene-related sequences Intergenic sequences Coding sequences 51,5% 11% 24% 1,5% Repeated sequences others pseudogenes gene fragments introns UTRs transposons 10,5% 2,8% 2,8% 5% 41% DNA transposons Simple repeats Large repeats retroposons

E2 E2 I1 I2 Coding sequences Amino acid coding parts of exons E3 E1 pre-mRNA leader trailer 5’-UTR 3’-UTR AUG Stop polyA signal E3 E1 mRNA leader trailer 5’-UTR 3’-UTR Coding sequences polyA signal AUG Stop

10-12,000 genes; the functions of the rest 10,000 genes are unknown!!

Introns and UTRs UTR: regulation of translation and half-life of mRNAs Intron: 1. genetic junks 2. it can contain regulatory elements 3. in case of alternative splicing it can serve as an exon

Pseudogenes & gene fragments Gene fragments Genetic junks Fossils in the genetic cemetery Pseudogenes 2 types: 1. intron-containing: chromosomal segment duplication 2. intronless: reverse transcription, then reinsertion Function: 1. In some cases regulation of the original gene by means of antisense interaction 2. Genetic junk

1,5% 36% 62,5% Gene-related sequences Intergenic sequences Coding sequences 51,5% 11% Repetative sequences Others 2,8% 2,8% 5% 41% transposons DNA transposons Simple repeats Large repeats retroposons

Transposable elements in the human genome class family copy number occurrance % retrotransposons

Transposable elements • I. class: retotransposons • I/1. LTR transposons • I/2. Non-LTR transposons • II/21. LINEs • II/22. SINEs • II. class: DNA transposons gag pol env Pr RT CP NC RNaseH Int Endogenous retroviruses: all inactive 1% LTR capsid nucleocapsid protease ribonuclease H envelope LTR reverse transcriptase integrase gag pol 1% LTR retro-transposons 8% Pr RT Int CP NC RNaseH 7% gag? pol LTR LTR polyA LINEs RT RNaseH 33% polyA SINEs A B I. class OR IR IR transposase DNA transposons 3% II. class IR: inverted repeat

Transposons 2,8% 2,8% 5% 41% DNA transposons Simple repeats Large repeats retroposons 13% SINE 20% LINE 8% DNA transposons Colonized the genome by horizontal gene transfer Vector is unknown Non-LTR retrotransposons (850,000 LINE, 1500,000 SINE) LTR retrotransposons Endogenous retroviruses (more than 20 families; 450,000 copies) degenerated virus genes Derived from 7S RNA „gene” „Copy and paste” „cut and paste” LTR: long terminal region LINE: long interspersed nuclear elements SINE: short interspersed nuclear elements

Retrovirus infection Virus RNA envelope capsid

Human endogenous retroviruses (HERVs) & LTR-transposons gag pol env Pr RT CP NC RNaseH Int LTR capsid nucleocapsid protease ribonuclease H envelope LTR reverse transcriptase integrase LTR retrotransposons are compose of 8% of the genome, but only 1% of them has a structure similar to those of retroviruses, the others are degenerated. All of them are mutant: they are not able to form infective virions but, de some of them can move by the enzymes of other elements. The genome of chimp and other monkeys contains infective retroviruses. gag: capsid (structural element) pol: polymerase: reverse transcriptase, integrase, protease, RNase H env: envelope (structural element) LTR (long terminal repeat): promoter

Retroviruses and their fossils Wild type retroviruses Human endogenous retroviruses ....... and their fossils Solitary LTR

The effect of endogenous retroviruses on gene expression a cellular gene 1. No effect 2. Transcription from the LTRl (HERV splice donor site can also be active) polymorphism methylation Cell-specific activation/inhibition 3. The activity of LTR can be modulated HERV: human endogenous retroviruses

Non-LTR retrotrasposons gag pol LTR LTR LTR retrotransposons CP NC Pr RT RNaseH Int gag? pol polyA LINEs: autonomous transposons RT RNaseH polyA SINEs A B Non-autonomous transposons

DNA gag? pol LINE-ok polyA RT RNaseH ORF1 ORF2 promoter IRES ribosome RNA polyA RT RNaseH protease - 21% of the human genome (850,000), 17% L1 (500,000), 10,000 full-length (6,1 kb), however, only 50-100 functional - Some part of the rest can jump with the help of the enzymes of the intact ones. - LINE mobilization both in germ line and somatic cells LINE: long interspersed nuclear elements SINE: short interspersed nuclear elements IRES: internal ribosome entry site

copying perfect5’-deleted 5’-deletes + inverted A LINE-1 „propagation”

The effect of LINE-1 on the genome – formation of pseudogenes gene Intronless pseudogene

The effect of LINE-1 on the genome – gene inactivation L1 mRNA insertion to the exon to the intron to the intron *:stop codon

The effect of LINE-1 on the genome – transduction An exon of gene „A” mRNA The poly A signal of LINE is weak  readthrough of adjacent gene exon Gene „B” Insertion of a piece of LINE and the exon of gene „A” to gene „B” Or only the exon of gene „A”

Alu domain S domain SINEs polyA A B • - 13% of the genome, 11% Alu sequences; non-protein coding • - AluI restriction enzyme recognition site • - An average SINE repeat unit 100 - 400 bp (Alu: 300 bp: 280 bp + pol III promoter) • More than 1 million copies, the most successful transposon in human • -Ancestor: SRP (signal recognition particle; ribonucleoprotein) RNA component (7SL RNA)

The hyperparasite Alu sequences

DNA transposons • - Infection mechanism is not known, what could be the vector? • - transposase executes the jumping: „cut and space” mechanism – how do they multiply? • - More than 60 families: Charlie, mariner, Tigger, THE1, etc • - The mariner family resembles to the those of insects transposons: horizontal gene transfer? IR IR transzposase IR: inverted repeat

Defense by the host 1. Heterochromatinization (methylation): inhibition of transcription 2. RNA interference: inhibition of transcription & translation 3. Local raise of mutation rate: inactivation

Benefits of the host from the transposons • Variability of genes encoding antibodies and T cell receptors • 2. Genome plasticity

The GENOME

The GENOME

Presentation Transcript

Mining the Genome

Inside the Genome

The Human Genome

The Human Genome

The Human Genome

Browsing the Genome

The Mitochondrial Genome

The Human Genome

The Human Genome

THE HUMAN GENOME

The Human Genome

The Human Genome

The mitochondrial genome

The Human Genome

THE GENOME

The tangled genome

The Human Genome

The Human Genome

The genome

The Human Genome

THE HUMAN GENOME

The Human Genome