Human genome organisation

Human genome organisation

Humangenome The human genome is the term to describe the total genetic information in human cell. Human genome = all the DNA present in the cell It really comprises two genoms: • a complex nuclear genome with • about 30 000 genes; • a very simple mitochondrial genome • with 37 genes.

The nuclear genome provides • the great bulk of essential • genetic information, most • of which specifies • polypeptide synthesis on • cytoplasmic ribosomes. • The mitochondrial genome • specifies only a very small • portion of the specific • mitochondrial functions. • The bulk of the mitochondrial polypeptides are • encoded by nuclear genes and are synthesized • on cytoplasmic ribosomes, before being imported • into the mitochondria.

Mitochondrial genome Mitochondria possess their own ribosomes, however the very few polypeptide-encoding genes in the mitochondrial genome produce mRNA which are translated on the mitochondrial ribosomes.

General structure of mitochondrial genome The human mitochondrial genome is defined by a single type of circular double-stranded DNA. • The human mitochondrial genome complete nucleotide sequence has been established in 1981 and could be found on Mitomap mitochondrial genome database (www.mitomap.org). • It is 16 565 bp in length and has two strands- heavy (H) and light (L). • Although the mitochondrial DNA is double stranded, a small section shows triple- DNA strand structure due to the repetitive synthesis of a short segment of H strand DNA, 7S DNA.

General structure of mitochondrial genome The mitochondrial DNA is rich in G+C nucleotides- 44%. The heavy strand is rich in guanines (G). The light strand is rich in cytosines (C).

Mitochondrial genome A single mitochondrion contains 2 to 10 mtDNA copies. A single somatic cell (containing only two chromosome copies) has 100 - 10 000 mtDNAs. The number of mtDNA can vary considerably in different cell types. Lymphocytes have about 1000 mtDNA. Certain cells, such as terminally differentiated skin cells lack any mitochondria and so have no mtDNA. • The gametes are unusual: • sperm cells have a few hundred copies of mtDNA • oocytes have about 100 000 copies, • accounting for over 30% of the oocyte DNA.

During zygote formation, a sperm cell contributes its nuclear genome, but not its mitochondrial genome, to the egg cell. Mitochondrial genome is maternally inherited: males and females both inherit their mitochondria from their mother. Males do not transmit their mitochondria to subsequent generations. During mitotic cell division, the mtDNA molecules of the dividing cell segregate in a purely random way to the two daughter cells.

The human mitochondrial genome contains 37 genes. Mitochondrial genes A total of 24 genes specify a mature RNA product: • 22 mitochondrial tRNA molecules • 2 mitochondrial rRNA molecules: • a component of the large subunit • of the mt ribosomes- 23S rRNA; • a component of the small subunit • of the mt ribosomes- 16S rRNA. • the remainig 13 genes encode polypeptides which are synthesized on mitochondrial ribosomes: • coding genes involved in mitochondrial respiratory complexes; • the enzymes of oxidative phosphorylation which • are engaged in the production of ATP.

Coding and non-coding DNA Mitochondrial genes The human mitochondrial genome is extremely compact and approximately 93% of the DNA sequence is coding. All 37 genes lack introns and they are tightly packed (on average 1 per 0.45 kb). The coding sequences of some genes (subunits of ATPase) show some overlap. The coding sequences of neighboring genes are contiguous or separated by 1 or 2 non-codong bases.

Coding and non-coding DNA Mitochondrial genes The only region lacking any known coding DNA is the displacement (D) loop region. D loop is the region in which a triple- stranded DNA structure is generated by duplicate synthesis of a short piece of the H- strand DNA (7S DNA). The D loop contains the predominant promoter for transcription of both H and L strands. Transcription of the mtDNA starts from D loop promoters and continues, in opposing directions for the H and L strands, round the circle to generate large multigenic transcripts.

Nuclear genome The nucleus of a human cell tipically contains more than 99% of the cellular DNA. DNA is structured in long strands that are wrapped around protein complexes called nucleosomes that consist of proteins -histones. Such structured DNA constitute a chromosome. The human cell has 46 chromosomes: • 22 pairs of autosomes • 1 pair of sex chromosomes, X and Y

Number of chromosomes are not the same in different species.

The haploid human genome contains approximately 3 billion base pairs of DNA packaged into 23 chromosomes. • Most cells in the body are diploid, that makes a total of 6 billion base pairs of DNA per cell. • Because each base pair is around 0.34 nanometers long (a nanometer is one-billionth of a meter), each diploid cell therefore contains about 2 meters of DNA [(0.34 × 10-9) × (6 × 109)].

It is estimated that human body contains about 50 trillion cells—which works out to 100 trillion meters of DNA per human. • The Sun is 150 billion meters from Earth. This means that each of us has enoughDNA to go from here to the Sun and back more than 300 times, or around Earth's equator 2.5 million times! How is this possible?

DNA packs into mitotic chromosome. Certain proteins (histones) compact chromosomal DNA into the microscopic space of the eukaryotic nucleus. Histones are positively charged proteins that strongly adhere to negatively-charged DNA and form complexes called nucleosomes. The resulting DNA-protein complex is called chromatin.

Human genome sequence was published in 2001. • International Human Genome Sequence Consortium (IHGSC) • public funding, free access, started earlier • Celera Genomics • private funding

Human gene number • The total number of genes in the human genome is now thought to be in the • 25000- 35 000 range. • 1400 genes per chromosome on average. There are general difficulties to estimate the precise gene number. • When the draft genome sequences were published in 2001, about • 11 000 genes could be identified with confidence. • Many thousands of genes were predicted by computer- based analysis • of the sequence: • Prediction of polypeptide-coding genes has been helpful, but is • not always reliable- false positives and inaccuracy in genuine exons • identification. • Prediction of RNA genes is particularly poor.

1998 Human gene number A comparatively low number of human genes was a suprise. • Very simple , 1mm long roundworm, Caenorhabditis elegans: • consists of 959 somatic cells • a genome only 1/30 of the human genome • contain 19 099 polypeptide- encoding genes • contain over 1000 RNA genes

Genome complexity • Genome complexity might not always • parallel biological complexity: • Drosophila melanogaster has • substantially fewer genes than • the simpler C.elegans. Invertebrate genomes (insects, roundworm) 14 000 – 20 000 genes. Vertebrate genomes (human, mouse, putterfish) 30 000 – 35 000 genes. The unexpected low gene number in complex genoms has been rationalized on the bases of • increased transcriptional complexity; • increased frequency of alternative splicing.

Human gene distribution Gene density varies substantially between chromosomal regions

Gene size diversity • Genes in simple organisms such as bacteria are comparatively similar in size, and usually very short. • Human genes show enormous variation in size and internal organization. • There is a direct correlation between gene and product sizes, but there are some anomalies. Dystrophin Apolipoprotein B 3685 amino acids long 4563 amino acids long encoded by 45- kb gene encoded by 2.4 Mb gene

Genes vary in size and exon content

Genes vary in size and exon content A very small minority of human genes lack introns, and generally small in size. Intronless genes: • Interferon genes • Histone genes • Many ribonuclease genes • Heat shock protein genes • Many G-protein coupled receptors • Various neurotransmitters receptors and hormone receptors

Genes vary in size and exon content • The majority of the genes have an introns. • There is an inverse correlation between gene size and fraction of coding DNA. • There is huge variation in intron lengths. Large genes tend to have very large introns. • Natural selection favors short introns in highly expressed genes, since transcription oflong introns is costly in time and energy. • The average exone size in human genes are less than 200 bp.

Base composition in the human genome 41% GC nucleotides in average The base composition vary considerably between chromosomes: • 38% GC for Ch. 4 and Chr. 13 • 49% GC for Chr. 19. • It also varies considerably between the lengths of chromosomes: • The distal 10.3 Mb part of Ch. 17 has 50% GC, but • the adjacent 3.9 Mb part has only 38% GC. Gene density correlates with higher GC content

Gene clusters • Genes encoding identical products or sequence related are often found in one or more clusters. These clusters may be dispersed on several chromosomes. • A very few human polypeptides are known to be encoded by two or more identical gene copies. Often, these are by recently duplicated genes in a gene cluster, such as alpha- globin genes.

Very occassionally some genes on different chromosomes encode identical polypeptides. Histone genes 86 different histone sequences distributed over 10 clusters in different chromosomes. Some subfamily members are identical althougt encoded by genes on different chromosomes.

Gene clusters Genes encoding identical products or sequence related are often found in one or more clusters. These clusters may be dispersed on several chromosomes.

Functionally related genes Some genes encode products which may not be so closely related in sequence, but are clearly functionally related. • Subunits of the same protein • (α- globin and β- globin) or • macromolecular structure. • Components of the same • metabolic or signalling pathway • (JAK1 and STAT1). • Ligand plus associated receptor • (insulin and insuli receptor). • Immune system proteins Such genes are not clustered and are usually found on different chromosomes.

Genes within genes Several polypeptide- encoding genes are located within the introns of the larger genes. Neurofibromatosis type I gene (NF1) Three small internal genes transcribed from the opposite strand: OGMP- oligodendrocyte myelin glycoprotein, EV12B and EVI2A – homologues of murine ecotropical viral integration sites.

Genes within genes Several polypeptide- encoding genes are located within the introns of the larger genes. The majority small nucleolar RNA (snoRNA) genes are located within other genes, often ones which encode a ribosome-associated protein or a nucleolar protein. Possibly this arrangement has been maintained to permit co-ordinate production of protein and RNA components of the ribosome.

Polycistronic (multigenic) transcription units Polycistronis transcription units are found in human mitochondrial genome and the major rRNA gene clusters. There are some rare examples of polypeptide encoding bicistronic transcription units in the nuclear genome. The A and B chains of insulin (related functionally). The UBA52 and UBA80 genes generate ubiquitin and a ribosomal protein (functionally distinct).

Organisation and distribution of human RNA genes Non- coding RNA The minority of nuclear genes specify noncoding (untranslated) RNA genes. There are probably about 3000 RNA genes, accounting for close to 10% of the total gene number. They have not been taken into account in gene count. The mitochondrial genome is exceptional in that 65% (24 from 37) of the genes specify mature RNA molecules.

Organisation and distribution of human RNA genes. Ribosomal RNA (rRNA) • Structural component of ribosome • There are approximately 700- 800 human rRNA genes. • Mostly organized in tandemly repeated clusters. • Many related pseudogenes.

Organisation and distribution of human RNA genes Transfer RNA (tRNA) • Involved in translation process. • There are 497 nuclear genes encoding cytoplasmic tRNA molecules. • They can be grouped into 49 families according to their anticodon specificities. • tRNA genes appear to be dispersed throughout the genome and clustered: • found on all chromosomes except Chr. 22 and Y). • more than half of them (280 tRNA genes) reside on either Chr. 6, • or on Chr.1. • There are also 324 tRNA- derived putative pseudogenes.

Organisation and distribution of human RNA genes. Small nuclear RNA (snRNA) • Involved in assisting general gene expression. • Many snRNA are uridine-rich. • Named as U3 snRNA which means • the third uridine-rich small nucleolar RNA • to be classified. • They are encoded close to 100 genes • More than 70 of these genes specify snRNA used • in the major spliceosome (removes the introns during • splicing process). • 44 genes specify U6 snRNA and 16 genes- U1 snRNA. • There are a large number of related nonfunctional sequences (pseudogenes).

Organisation and distribution of human RNA genes Small nucleolar RNA (snoRNA) • Mostly employed in the nucleolus • to direct site- specific base modifications of rRNA; • to carry out base modifications on snRNA. • Some snoRNA are involved in the processing of pre-rRNAS • rather than nucleotide modification. • They are 2 subfamilies of snoRNA: • C/D box snoRNA • H/ACA snoRNA • nucleolus- the site of ribosome • synthesis and assembly

Small nucleolar RNA (snoRNA) C/D box snoRNA The C/D box Box C/D snoRNAs direct the 2'-O-methylation of rRNA nucleotides. Contain two short sequence motifs. C/D box snoRNAs show conserved boxes termed C (UGAUGA) and D (CUGA), positioned near their 5' and 3' termini. Sometimes two additional, less conserved, boxes called C' and D' are present. Antisense nucleotides (upstream of the box D or D’) are complementary to a specific site of rRNA.

Small nucleolar RNA (snoRNA) C/D box snoRNA Antisense nucleotides (upstream of the box D or D’, 10-21 nt) are complementary to a specific site of rRNA. The methyl group is added onto nucleotide at the fifth position upstream from this box. While the snoRNA component directs the snoRNP complex to the appropriate rRNA location, it is a protein enzyme- methyltransferase that actually catalyzes the methylation reaction.

Small nucleolar RNA (snoRNA) C/D box snoRNA During ribosome biosynthesis, the pre-rRNAs must undergo several modifications mainly 2'-O-ribose methylation and pseudouridylation The 2'-O methylation of nucleotides may • protect the RNA from hydrolytic degradation, • enhance hydrophobic surfaces for interaction, • stabilize helical stems. The methyl group is added onto nucleotide at the fifth position upstream from this box.

Small nucleolar RNA (snoRNA) H/ACA box snoRNA Box H/ACA snoRNAs form a secondary structure made up to two large hairpin domains connected by a hinge region, followed by a short tail. • The conserved box motifs include: • the H (5'ANANNA3' where N is any nucleotide); • the ACA trinucleotide always found three nucleotides away from the 3' end of the snoRNA. Generally there is 14-16 nucleotide distance between the box motifs and the site to be modified by the pseudouridine synthase- dyskerin.

Small nucleolar RNA (snoRNA) H/ACA box snoRNA In pseudouridylations uridine is isomerized to give pseudouridine, the most common modified base. Dyskerin is the pseudouridine synthase which catalyzes the conversion of uridine to pseudouridine. Pseudouridines have increased flexibility in their C-C glycosyl bonds and therefore may allow for increased capacity for hydrogen bond formation, contributing to RNA tertiary structure.

Small nucleolar RNA (snoRNA) snoRNA genes often found within the introns of other genes. Most of the snoRNA genes appear to be single copy and dispersed. Some large clusters are known (in SNURF-SNRPN transcription unit).

MicroRNAs (miRNA) • A new class of non-coding RNA gene • They are 19-25 bases long RNAs • They derived from larger up to 70- nucleotide long precursors containing an inverted repeat which permits ds hairpin RNA formation. • Hairpin precursor RNAs are cleaved by a ribonuclease III- Dicer.

MicroRNAs (miRNA) • The human genome has about 1000 distinct miRNAs that regulate at least 1/3 of the protein- encoding genes. • They act as antisense regulators by binding to • complementary sequences in the 3’ UTR of • mRNA. • Block translation or result in degradation • of target mRNA

Pseudogenes Genes are frequently characterized by defective copies- pseudogenes. Non-functional copy of a gene • Nonprocessed pseudogene • Nonfunctional copies of the genomic DNA sequence of a gene. • Contains exons, introns and promoter regions of the functional genes, • but recognized to be defective by the presence of innapropriate • termination codons (such as in α-globin and β-globin clusters).

Pseudogenes Genes are frequently characterized by defective copies- pseudogenes. • Processed pseudogene • Nonfunctional copies of the exonic sequences of a gene. • Contain at one end poly(A) tail • No introns • No 5’ promoter regions • Reverse transcriptases transcribe mRNA • into cDNA which can then integrate • into chromosomal DNA.

Pseudogenes Genes are frequently characterized by defective copies- pseudogenes. • Processed pseudogene are typically not expressed, but some examples are • known of expressed processed genes. • The cDNA has integrated into a chromosomal DNA site, which happens, • by chance, to be adjascent to a promoter which can drive expression of the processed • gene copy. Both types of pseudogenes include events (frameshifts, stop codons) that make the gene nonfunctional. The 20-30% of all genomic sequence predictions could be pseudogene. We assume pseudogenes have no function, but we really don’t know!

Human genome organisation

Human genome organisation

Presentation Transcript

Human Genome Project

Human Genome Project

Human Genome Project

HUMAN GENOME

Organisation of human genome

The Human Genome

Human Genome Project

Human Genome Lecture

Human Genome Project

PHAR2811: Genome Organisation

THE HUMAN GENOME

The Human Genome

Human Genome

Genome Organisation II

HUMAN GENOME

Human Genome Resources

The Human Genome

HUMAN GENOME

HUMAN GENOME PROJECT

Human Genome

$Human Genome$

The Human Genome