1.24k likes | 1.25k Views
Explore the core aims of genomics science, including database development, genetic mapping, gene sequencing, gene identification, DNA sequence diversity, and linkage analysis.
E N D
Genome Science Ka-Lok Ng Dept. of Bioinformatics Asia University
The Core Aims of Genomics Science (1) An integrated web-based database and research interface • access to the enormous volume of data • web interfaces • Relational databases • Generic Model Organism Database (GMOD) • project http://www.gmod.org/ to develop reusable components suitable for creating new community databases of biology
The Core Aims of Genomics Science (2) To assemble physical an genetic maps • location of genes in a genome • physical distance and relative position defined by recombination frequencies • the map is crucial for comparing the genomes of related species • related phenotypic and genetics data • used in animal and plants breeding • extend to more species with greater accuracy
The Core Aims of Genomics Science • To generate and order genomic and expressed gene sequences • High-volume sequencing • Basic technique is developed by Fred Sanger • “Shotgun” approach assemble into contigs, scaffolds (a set of contigs), then the whole chromosomes • mRNA is unstable • Coding parts cDNA clones – cloned from mRNA transcripts • Expressed sequence tags (ESTs) • Obtain full length cDNA is not easy because of mRNA structure
The Core Aims of Genomics Science • To generate and order genomic and expressed gene sequences • mRNA cDNA EST Whole genome reconstruction Reverse transcription cDNA EST - partial cDNA sequences sequenced either from 5' or 3‘ Alternative splicing not a one-to-one correspondence between ESTs and genes
The Core Aims of Genomics Science • Identify and annotate the complete set of genes encoded within a genome • From complete sequence of a genome genes identification • Alignmentof cDNA, DNA and protein sequences – BLAST • Gene finding software – ORFs, transcription start and termination sites, exon/intron boundaries • Then gene annotation linking sequence to genetic function, expression, locus information, comparative data from homologous proteins
The Core Aims of Genomics Science (5) To characterize DNA sequence diversity • Single-nucleotide polymorphisms (SNPs) • About 90 percent of human genome variation comes in the form of single nucleotide polymorphisms (neither harmful nor beneficial) • Theoretically, a SNP could have four possible forms, or alleles (different seq. alternative), since there are four types of bases in DNA. But in reality, most SNPs have only two alleles. For example, if some people have a T at a certain place in their genome while everyone else has a G, that place in the genome is a SNP with a T allele and a G allele. • The human genome contains more than 10 million SNPs once in every 100 to 300 bp ! • Find associations between SNP variation and phenotypic variation,e.g. Sickle-cell anemia 鐮刀狀細胞貧血症 SNP mutation
Sickle-cell anemia and SNP http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RFLPs.html
The Core Aims of Genomics Science (5) To characterize DNA sequence diversity • Characterize the level of haplotype structure due to linkage disequilibrium (LD) • haplotype = a set of adjacent polymorphisms found on a single chromosome • LD = groups of closely linked alleles that tend to be inherited together, can be used to map human disease genes very accurately • Knowledge of LD are utilized to do disease locus mapping • In the human genome, haplotypes tend to be approximately 60,000 bp in size and therefore contain up to 60 SNPs that travel as a group. Haplotype
The Core Aims of Genomics Science Mendel's Laws enable the outcome of genetic crosses to be predicted. A and B on different chromosome
The Core Aims of Genomics Science Genes on the same chromosome should display linkage. Genes A and B are on the same chromosome and so should be inherited together. Mendel's Second Law should therefore not apply to the inheritance of A and B, but holds for the inheritance of A and C, or B and C. Mendel did not discover linkage because the seven genes that he studied were each on a different pea chromosome. Partial linkage Partial linkage was discovered in the early 20th century. The cross shown here was carried out by Bateson, Saunders and Punnett in 1905 with sweet peas. The parental cross gives the typical dihybrid result (see Figure on the right ), with all the F1 plants displaying the same phenotype, indicating that the dominant alleles are purple flowers and long pollen grains. The F1 cross gives unexpected results as the progeny (後裔) show neither a 9 : 3 : 3 : 1 ratio (expected for genes on different chromosomes) nor a 3 : 1 ratio (expected if the genes are completely linked). An unusual ratio is typical of partial linkage
The Core Aims of Genomics Science (5) To characterize DNA sequence diversity • the farther apart two genes are, the more they tend to assort independently (randomly) recombination frequency ↑ Higher freq. farther apart Vermilion - 朱紅色
The Core Aims of Genomics Science (6) To compile atlases of gene expression • analyzing profiles of transcription and protein synthesis • traditional method: Northern blots, hybridization • modern technology – microarray • relative level of expression (differential expression) • patterns of covariation in gene expression clues to unknown gene function (guilt by association)
The Core Aims of Genomics Science (7) To accumulate functional data, including biochemical and phenotypic properties of genes • Near-saturation mutagenesis (screening hundreds of thousands of mutants to identify genes that affect traits as diverse as embryogenesis, immunology, and behavior) • high-throughput reverse genetics (methods to systematically and specifically inactivate individual genes). • Yeast Genome Deletion Project http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html • Mouse http://www.bioscience.org/knockout/knochome.htm • Proteomics – detecting protein expression and protein-protein interactions • Pharmacogenomicists – study the interactions between small molecules (i.e. potential drugs) and proteins • Functional genomics – a crucial component is to study various model organisms • Clone library – collections of DNA fragments that are cloned into a vector
The Core Aims of Genomics Science With Smith's site-directed mutagenesis the researchers can study in detail how proteins function and how they interact with other biological molecules. Site-directed mutagenesis can be used, for example, to systematically change amino acids in enzymes, in order to better understand the function of these important biocatalysts. The researchers can also analyze how a protein is folded into its biologically active three-dimensional structure. The method can also be used to study the complex cellular regulation of the genes and to increase our understanding of the mechanism behind genetic and infectious diseases, including cancer. GTC Valine GCC Alanine Site-directed mutagenesis
The Core Aims of Genomics Science (8) To provide the resources for comparison with other genomes. • Comparative maps allow genetic data from one species to be used in the other species • Comparative maps local gene order along a chromosome tends to be conserved Synteny (human and mouse genome) • Even without synteny, the conservation of gene function is known (say from fly to primate靈長類動物) • Gene order conservation (GOC)
Genetic map – the relative order of genetic markers in linkage groups in which the distance between markers is expressed as units of recombination • Genetic markers – sequences tags, repeats, restriction enzyme polymorphism (cutting sites) • In diploid (具兩套染色體) organisms, genetic maps are assembled from data on the co-segregation (同時分離) of genetic markers either in pedigrees (家譜) or in the progeny (後代) of controlled crosses. • Genetic distance unit centriMorgan (cM) • In human 1cM = 1% of recombination frequency • Human, 1cM ~ 1Mbp • 100 cM 1 crossover occurs per chromosome per generation • Markers on different chromosomes have a 50-50 chance of co-segregation 50cM (0.5 crossover occurs per generation) Mapping Genomes – Genetic Maps
Mapping Genomes – Genetic Maps (A) A pair of different parental chromosomes (green and blue colors). (B) A table showing the frequency of recombinants between each marker. Larger number indicates that the genes are farther apart. (C) The most likely genetic map from the entire data. In this hypothetical example, two linkage groups are inferred, the top one is longer than 50 cM. Genetic distance ~ 0.11 11cM 0.22 21cM, 0.25 24cM, 0.33 33cM Figure 1.1
Mapping Genomes – Genetic Maps • Software of the assembly of genetic maps http://linkage.rockefeller.edu/soft/list.html • Multiple factors lead to high variation in the correspondence between physical and genetic distances • There is variability of recombination rate along a chromosome (centromeres and telomeres are less reconbinogenic than general euchromatin) hot spots and cold spots of recombination
Exercise 1.1 (Part 1) Constructing a genetic map Constructing a genetic map - four recessive loci – thickskin, reddish, sour, petite. After identifying two true-breeding trees that are either completely wild-type or mutant for all four loci, the breeder crosses them, and then plants an orchard of F2 (second generation) trees. Q. Based on the following frequencies of mutant classes, determine which loci are likely to be on the same chromosome and which are the most closely linked.
Exercise 1.1 (Part 2) Constructing a genetic map Assume independent assortment for each recessive phenotype ¼ 242 petite (127+42+38+12+10+8+3+2), 249 reddish, 247 sour and 236 thickskin Expect that unlinked loci would segregate independently ~ 60 trees (that is 1/4*1/4*968) produced each double mutants class
Exercise 1.1 (Part 2) Constructing a genetic map Mapping Genomes – Genetic Maps Exercise 1.1 Constructing a genetic map four recessive loci – thickskin, reddish, sour, petite Q. Determine which loci are likely to be on the same chromosome and which are the most closely linked. Answer: Total number of 968 trees. Assume independent assortment for each recessive phenotype ¼ 242 petite, 249 reddish, 247 sour and 236 thickskin Expect that unlinked loci would segregate independently ~ 60 trees (that is 1/4*1/4*968) produced each double mutants class
Exercise 1.1 (Part 2) Constructing a genetic map Mapping Genomes – Genetic Maps s r t p Approximate solution
Mapping Genomes – Physical Maps Physical maps • is an assembly of contiguous stretches of chromosomal DNA – contigs – in which the distance between landmark sequences of DNA is expressed in kilobases • the ultimate physical map is the complete sequence Applications (1) provide a scaffold upon which polymorphic markers can be placed (2) facilitating finer scale linkage mapping (3) confirm linkages inferred from recombination frequencies (4) resolve ambiguities about the order of closely linked genes (5) enable detailed comparisons of regions of synteny between genomes
Mapping Genomes – Physical Maps Two strategies used to assemble contigs • Alignment of randomly isolated clones based on shared restriction fragment length profiles • YAC – ~1Mbp long fragments • BAC – ~100kbp long fragments • Plasmid – ~ kbp long fragments • Automatic restriction profiling (Ch. 2) assemble contigs (short for "contiguous sequences").
Genomic clone library Unlike the case of fX174, no large genome could be completely sequenced without an extra round of fragmentation into manageable sized chunks. In other words it had to be transferred into one or more clone libraries from which individual clones were picked to be "subcloned" in M13 for sequencing. The general outline of the procedure is shown at right. You can see that fX174 bypassed the first stage, the construction of a clone library from the target genome. cDNA library – made from RNA that has been reverse transcribed into cDNA and are used for EST sequencing projects.
Mapping Genomes – Physical Maps (2) Hybridization-based approaches – chromosome walking Chromosome walking is used as a means of finding adjacent genes (positional cloning), or parts of a gene which are missing in the original clone as well as to analyze long stretches of eukaryotic DNA. This task requires finding a set of overlapping fragments of DNA that spans the distance between the marker and the gene. Genomic DNA is shown in blue. Selected clones from a library of cloned genomic DNA fragments are shown in red. The initial probe, probe a, is specific to gene A or exon A and allows identification of clones 1 and 2. A new probe, probe b, is prepared from one end of clone 2 and used to isolate new clones 3 and 4 from the genomic library. Probe c, prepared from clone 4 is used to identify clone 5, etc. The orientation of the clones is determined by restriction mapping of the clones. Clone 6 contains the desired gene B or exon B.
Mapping Genomes – Cytogenetic Maps Historically – aid in the alignment of physical and genetic maps Cytogenetic maps are the banding patterns observed through a microscope on stained chromosome spreads Traditional preparation – salivary gland polytene chromosomes 唾液腺多線染色體 (greatly enlarged relative to their usual condition) of insects and Giemsa-banded mammalian metaphase karyotypes http://book.tngs.tn.edu.tw/database/scientieic/content/1970/00100010/images/0053b.jpg Chromosomes the genetic material phenotypes or medical conditionscorrelate with the deletion or rearrangement of chromosome sections Cytogenetic map are aligned with the physical map through in situ (在原位置) hybridization – a clone fragment is annealed to a single location on the cytogenetic map NCBI Genomic Biology http://www.ncbi.nlm.nih.gov/Genomes/ Keyword: HOX AND homo[ORGN] Karyotypes
Mapping Genomes – Cytogenetic Maps Alignment of cytological, physical, and genetic maps. Cytological map – a representation of a chromosome based on the pattern of staining of bands Physical map – the location of transcripts and sites of insertions and deletions Genetic map – recombination rates vary along a chromosome, typically reduced near the telomere and centromere Distances between genetic, physical and cytological markers are not uniform How to search for genes on a genome map ? See my lecture notes on Bioinformatics class.
Synteny – conservation of gene order between chromosome segments of two or more organisms. Homologes – highly conserved loci derived form a common ancestral locus Orthologs – similar genes that arose as result of duplication subsequent to an evolutionary split Paralogs – similar genes that arose as result of duplication Comparative Genomics speciation • Conservation of gene order is an inverse function of the times since • divergence from the ancestral locus. • Note – rates of divergence vary considerably at all taxonomic levels. • Japanese pufferfish – 7.5 times smaller than the human genome, show • extensive gene order similarity with humans, around 50% - 80% is in the same • order as is found in the human genome
Comparative Genomics • Chromosome painting– used to define regions of Synteny cover regions (~0.1 of a chromosome arm) • Each chromosome of one species is labeled with a set of fluorescent dyes, and hybridized to chromosome spreads of the other genome. • Uses the fluorescent in situ hybridization (FISH) technique to detect DNA sequences in metaphase spreads of animal cells. The fluorescently labeled hybrid karyotype is shown in bottom.
Comparative Genomics Synteny between cat and human genomes. Ideograms (染色體模式圖) for each of the 24 chromosomes shown on the right in each pair are aligned against color-coded representations of corresponding cat chromosomes. CAT – six groups (A – F) of 2 – 4 chromosomes each. Top row – 12 autosomes that are essentially syntenic along, except for some rearrangements Bottom row – 10 autosomes that have at least one major rearrangement The two sex chromosomes are essentially syntenic between cat and human
Comparative Genomics • Sequence conservation = functional importance • High-resolution comparative physical mapping – found ~1Mbp synteny region between human and mouse • May contain hundreds of genes, local inversions and insertions/deletions involving one or a few genes • Families of genes organized in tandem clusters • Considerable size variation in intergenic “junk” DNA
Comparative Genomics • Identifying genes and regulatory regions in seq. genomes is challenging • ORF are usually good
Comparative Genomics • Identifying genes and regulatory regions in sequenced genomes is challenging • Open reading frames (ORFs) are usually good indication of genes • However, it is difficult to determine which ORFs belong to a gene • Many mammalian genes have small exons and large introns • Regulatory sequences even more difficult
Comparative Genomics • Computer programs analyze genomic sequence • GRAIL • GeneFinder • Look for ORFs, splice sites, poly A addition sites, etc. • Predict gene structure • Frequently wrong • Usually miss exons at beginning or end of gene • Sometimes predict exon when one doesn’t really exist
Comparative Genomics • When comparing genomes of different species, the genes normally have the same exon–intron structure • Look for conserved ORFs in both genomes • Frequently permit accurate identification of genes • Fugu–human comparison found >1,000 genes • Mouse–human comparison indicates only 25,000 genes in genome
Example of sequencecomparison • Comparison of the human and mouse spermidine synthase genes revealed an additional intron in the human gene that is not found in the mouse homologue Human Mouse 5,500 bp
Objectives • Generation of high-resolutiongenetic and physical maps that will help in the • localization of disease-associated genes. • The attainment of sequence benchmarks, leading to generation of a complete • genome sequence by the year 2005. (A draft version was achieved in May 2000, • but finished sequence required an error rate of less than 1 in 10,000 bp) • Identification of each and every gene in the genome by a combination • bioinformatics identification of open reading frame (ORFs), generation of voluminous • EST databases, and collation(對照)of functional data including comparative data from • other animal genome projects. • Compilation of exhaustive polymorphism databases, in particular of SNPs, to • facilitate integration of genomic and clinical data, as well as studies of human • diversity and evolution. The Human Genome Project (HGP)
The Human Genome Project (HGP) Table 1.1 Initial Goals of the HGP From the First 5-Year Plan: 1993-1998 Table 1.2 A Blueprint for the Future of the HGP 15 Grand Challenges in the Third 5-Year Plan: 2003 – 2005 HGP budget – set aside for research on the ethical , legal, and social implication of genetic reserach (the ELSI project)
The Human Genome Project The architecture of the Human Genome Project in the twenty-first century. Three major themes for future genome research are founded on six pillars of genome resources.
ELSI Box 1.1 The Ethical, Legal, and Social Implications of the HGP Funding – The National Human Genome Research Institute (NHGRI) 5% of its annual budget to ELSI Funding three types of activities: regular research grants, education grants, and intramural programs at the NIH campus Web sites: http://www.genome.gov/10001618 http://www.ornl.gov/sci/techresources/Human_Genome/research/elsi.html 4 major objectives 4 main subject areas
ELSI Great concern is the privacy and confidentiality of genetic information. Especially – Iceland (介於格陵蘭與挪威間 http://www.tita.org.tw/view/iceland.html) and Estonia (愛沙尼亞共和國 http://www.suntravel.com.tw/zone/Europe/Estonia-136.htm) government-sponsored databases of medical records have been supplied to medical research companies. Psychological impact and potential for stigmatization (給帶來恥辱,使貼上標籤) inherent in the generation of genetic data racial mistrust and socioeconomic differences in gathering of and access to genetic information Reproductive issues Potential moral (possible legal) obligations once data has been obtained. Philosophical discussions – human responsibility, human right to “play God” with genetic material, meaning of free will in relation to genetically influenced behaviors Genetically Modified Organisms (GMOs) 1998 – Five new major aims