1 / 28

Human Genome Project

Human Genome Project. Basic Strategy. How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in 1995. Various side projects: genetic diseases, variations between individuals, ethnic variation, comparison to other species. Strategy:

Download Presentation

Human Genome Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Genome Project

  2. Basic Strategy • How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in 1995. • Various side projects: genetic diseases, variations between individuals, ethnic variation, comparison to other species. • Strategy: • 1. physical map relating specific DNA markers to the proper chromosomal position. • 2. Overlapping set of cloned DNAs (contigs) • 3. sequencing and assembly • 4. finding the genes in the sequence • 5. annotation of gene function

  3. Genetic mapping • Where and why genes are present inside chromosomes • Simply means we need to locate genes in total genome • A genetic map uses recombination, crossing over during meiosis, to determine how frequently two genes (or markers) are inherited together. Genes genotypes phenotypes

  4. Gene map Linkage map physical map It tells you whether the presence of genes in chromosome 2 genes are close or distantly related No location

  5. Linkage map • Genetic linkage is the tendency of genes that are located proximal to each other on a chromosome to be inherited together during meiosis. • Genes whose loci are nearer to each other are less likely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be genetically linked. • In other words, the nearer two genes are on a chromosome, the lower is the chance of a swap occurring between them, and the more likely they are to be inherited together.

  6. Chromosome Theory of Linkage • Morgan, along with Castle formulated the chromosome theory of linkage. It has the following postulates; • 1. Genes are found arranged in a linear manner in the chromosomes. • 2. Genes which exhibit linkage are located on the same chromosome. • 3. Genes generally tend to stay in parental combination, except in cases of crossing over. • 4. The distance between linked genes in a chromosome determines the strength of linkage. Genes located close to each other show stronger linkage than that are located far from each other, since the former are less likely to enter into crossing over.

  7. However crossing over does not occur between linked genes in every meiotic event, especially when the positions of the genes on the chromosome are very near one another. • The frequency with which crossing over occurs between any two linked genes is proportional to the distance between the loci along the chromosome.

  8. 1. At very small distances, crossover is very rare, and most gametes are parental. • 2. As the distance between two genes increases, crossover frequency increases. More recombinant gametes, fewer parental gametes. • 3. When genetic loci are very far apart on the same chromosome, crossing over nearly always occurs, and the frequency of recombinant gametes approaches 50 percent.

  9. What is molecular marker? • DNA sequence used to mark a particular location on a particular chromosomes.

  10. Genetic markers • Modern genetic markers: SNPs • A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. • It can be described as a variation (which may arise due to mutation or alteration in the genomic loci) that can be observed. • A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like

  11. What are they? Variable sites in the genome What are their uses? Finding disease genes Testing/estimating relationships Studying population differences

  12. Physical mapping • Cytogenetic mapping A cytogenetic map is the visual appearance of a chromosome when stained and examined under a microscope. Particularly important are visually distinct regions, called light and dark bands, which give each of the chromosomes a unique appearance. This feature allows a person's chromosomes to be studied in a clinical test known as a karyotype, which allows scientists to look for chromosomal alterations

  13. Physical Maps • A physical map determines where a given DNA marker is located on the DNA of the chromosome. • Genetic and physical maps are (supposed to be) colinear—all the genes appear in the same order in both maps. But, distances are quite different: there is very little recombination in the centromeres, so large DNA distances are very short recombination distances. • Genetic maps using microsatellite (SSR) markers were used to develop physical maps: the appropriate SSR sites were expected to be found on the corresponding cloned DNA.

  14. Sequence Tagged Sites • Produced by sequencing RNA which in turn transcript from genes • RNA present Genes which are turned on in tissue • Its called “taq” because they are not really complete sequence of genes, its only partially sequenced

  15. Sequence Tagged Sites • a sequence tagged site (STS) is a short sequence that is unique in the genome. • You obtain the sequence information from cloned DNA, and then locate it in the genome. • Using PCR it is then possible to determine whether your STS is present in any other clone or cell line. • Obtaining STS: sequencing the ends of large cloned DNAs (BACs or YACs, for example). • Uniqueness: use the cloned DNA from the STS as a probe on a Southern blot of genomic DNA: if the STS is unique, only 1 band will hybridize. • Repetitive DNA is very common in the human genome, and many DNA sequences are not unique. • A good source of unique DNA is EST clones: cDNA made from messenger RNA.

  16. Somatic Cell Hybrids • Human and mouse (or hamster) cultured cells can be fused together using polyethylene glycol. • The resulting fused cell is a heterokaryon: it has 2 nuclei from different species. • If the heterokaryon undergoes mitosis, the nuclei fuse. • Human chromosomes are unstable in a mixed nucleus, and most of them are randomly lost. The mouse chromosomes all stay. • Different cell lines can be established that contain different combinations of human chromosomes • You can identify which human chromosomes remain using chromosome banding techniques. • A good way to determine which chromosome a DNA sequence is on. Sometimes also for gene products or phenotypes.

  17. Radiation Hybrids • Standard somatic cell fusions contain entire human chromosomes. To locate a gene more closely, you need to use chromosome fragments. • Start by irradiating human cells with a controlled dose of X-rays: chromosomes break up. Then, fuse the cells to mouse cells. The human chromosome fragments get integrated into the mouse chromosomes. • Create a panel of mouse/human hybrid cell lines. • The current standard panels contain about 100 cell lines. • Each line contains about 32% of the human genome • Average size of human genome fragment = 25 kbp • More radiation = smaller fragments • Mapping: the hybrid cell lines contain random human chromosome fragments, but closely linked sites are usually in the same cell line (same basic principle as recombination mapping). • Until you have located some of the markers on the chromosomes, radiation hybrid mapping only gives you information about whether any two sequences are close together on the chromosome.

  18. Contigs • A contig is a set of partially overlapping clones, a contiguous set of clones. No gaps between them. • Contigs allow you to build up the sequence of the chromosome over much larger regions than any single clone. • The first reasonably complete physical map of the human genome involved contigs generated by YACs (yeast artificial chromosomes). • Initially, you have a collection of clones with no information about how they are ordered on the chromosome. • Contigs are built up by using PCR to identify unique sequences (STS or EST) on each clone, and then looking for overlaps between the clones.

  19. Sequencing Strategy • Once a contig map of the genome was obtained, it was necessary to sequence each individual clone. • Most of the actual human genome sequencing was done on BAC clones, which are less prone to rearrangement than YAC clones. BACs are about 100-200 kbp long. • Large clones are generally sequenced by shotgun sequencing: The large cloned DNA is randomly broken up into a series of small fragments ( less than 1 kb). These fragments are cloned and sequenced. A computer program then assembles them based on overlaps between the sequences of each clone. • To ensure that every bit has been covered, you need to sequence random clones until you have covered each spot 5-10 times on average.

  20. Whole Genome Shotgun Sequencing • Why bother with creating a large scale physical map: all that YAC and BAC cloning, radiation hybrids, STS comparisons, etc? Why not just fragment the whole genome into 1 kb pieces, sequence them all, and let the computer assemble the whole genome? • In practice, the genome is cloned into large fragments first, and then each large fragment is broken up for shotgun sequencing. But, the large fragments are not ordered: no physical map or set of contigs is created. • Requires a lot of overlapping coverage • Also requires good software. • Very successful for prokaryotic genomes (10 Mbp or less). • but the human genome is 300 times larger • Big problem: repeat sequence DNA, which is everywhere, and especially near the centromere. To find overlaps between clones, you need unique regions. • It remains unclear whether whole genome shotgun sequencing will work if there is no other information available to provide order. It has not been widely adopted for eukaryotic projects (so far).

  21. EST (expressed sequence tag): • A unique stretch of DNA within a coding region of a gene that is useful for identifying full-length genes and serves as a landmark for mapping. • An EST is a sequence tagged site (STS) derived from cDNA. • An STS is a short segment of DNA which occurs but once in the genome and whose location and base sequence are known. STSs are detectable by the polymerase chain reaction (PCR), are helpful in localizing and orienting mapping and sequence data, and serve as landmarks in the physical map of the genome.

  22. Expressed-sequence tags (ESTs) • are cDNA sequences that have been sequenced from either the 5’ or 3’ ends. • They may contain all or part of a particular cDNA coding sequence, • and are useful for identifying unknown genes, mapping their positions within a genome, • and as a potential source for genetic material when a full-length cDNA is not available for a specific gene of interest.

  23. Gene Detection • the best evidence that a given DNA sequence is expressed is to find an EST (cDNA copy of mRNA) that matches it. • Large numbers of EST libraries have been constructed and sequenced. • The primary result of this was to determine that many genes have several different intron slicing patterns: sequences are exons in some tissues but introns in others.

  24. Gene Detection • Homology searches, using BLAST, are a good way to find genes. If a DNA sequence closely matches a sequence from another organism, it has been evolutionarily conserved, and that usually means that it is an expressed gene. • Exon prediction: exons need to be open reading frames (no stop codons), and they display patterns of nucleotide usage different from random DNA. Several different programs exist, and they give somewhat varying results. “Hypothetical genes” are genes whose existence has been predicted by computer but which lacks any experimental or cross-species data to confirm it. • a “conserved hypothetical gene” is a sequence that matches other species even though there is no EST or other experimental evidence for its expression

  25. Genome annotation • The process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. • Once a genome is sequenced, it needs to be annotated to make sense of it.

  26. Gene Annotation • There is a big problem of too much information not uniformly coded or maintained. The scientific literature contains numerous examples of the same gene or protein with several different names, and getting common definitions of functions is even harder. • To counter this, the Gene Ontology Consortium (GO) has created a controlled vocabulary of about 11,000 terms. • Every gene product (protein) can be annotated into three general categories: • molecular function: what the protein actually does, such as “kinase activity” • biological process: what cellular process the protein participates in, such as “signal transduction” • cellular component: where the protein is found in the cell, such as “integral to the plasma membrane” • Each gene product can have multiple descriptive terms. • The terms are hierarchical: more specific terms are contained within less specific terms. • But, a given term can have more than one parent and more than one child term.

  27. GO Example

More Related