380 likes | 567 Views
Genomics. An introduction. Aims of genomics I. Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences. cDNA. Aims of genomics II. Describing every gene: function/expression data/relationships/phenotype
E N D
Genomics An introduction
Aims of genomics I • Establishing integrated databases – being far from merely a storage • Linking genomic and expressed gene sequences cDNA
Aims of genomics II • Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains, repeats) • similarities to other genes • Characterize sequence diversity in population
Genomics can be: • Structural • where it is? • Functional • what it does? • DNA microarrays: • Comparative • finding important fragments
Mapping genomes • Past • Genetic mapsDistance between simple markers expressed in units of recombination • Cytological mapsStained chromosomes, observable under microscope • Present • Physical mapsDistance between nucleotides expressed in bases • Comparative mapCorresponding genes detection; Regulatory sequence detection;
Genetic differences among humans • Goals • Genetic diseases • Identifying criminals • Methods • Genetic markers (fingerprints) and DNA sequence. Repeats: • Microsatellites (repeats of 1-12 nucleotides) • Minisatellites (> 12) • Other types of variation • Genome rearrangements • Single nucleotide mutations
Microsatellites and disease • Huntington’sdisease • Huntingtin gene of unknown (!) function • Repeats #: 6-35: normal; 36-120: disease • Friedrich ataxiadisease • GAA repeat in non-coding (intron) region • Repeats #: 7-34: normal; 35 up: disease • Repeat expansion reduces expression of frataxin gene
SNP - Single Nucleotide Polymorphism • Definition • SNP and phenotype • Occurrence in genome • Rarity of most SNPs (agrees with neutral molecular evolutionary theory) • SNPs in human population: • High variance in genome! • Detection of SNPs: Hybridization
Sickle cell anemia Sickle looks like this: SNP on Beta Globin gene, which is recessive: • 2 faulty copies: red blood cells change shape under stress - anemia • 1 faulty copy: red blood cells change shape under heavy stress – but gives resistance to malaria parasite
SNPs and haplotypes Passengers and their evolutionary vehicles
GT GA ...CT AC GT... SNP - Phase inference • In the data from sequencing the genome the origin of SNP is scrambled Possibility 1 Possibility 2 ...CTGACGGT... chromosome ...CTGACAGT... ...CTTACAGT... ...CTTACGGT... chromosome • Which SNPs are on the same chromosome (are in phase)?
GA CT GT CG AA GA ...CT AC GT... ...CT AC GT... ...CT AC GT... SNP – phase inferenceDetermining the parent of origin for each SNP In this case: GG TA Phase inference – the reason why many SNPs sequencing is done for child and two parents.
A B A b a B a b Linkage Disequilibrium, introHow hard is it to break a chromosome • An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both • fA - frequency of occurrences of trait A in population • fa = 1- fA • fB, fb = 1 - fB are frequency occurrences of B and b • Probabilities of occurences of both traits on the same chromosome:fABfAbfaBfab • LD and genomic recombination
Linkage Disequilibrium, calculation • When these alleles are not correlated we expect them to occur together by chance alone: fAB = fA fB fAb = fA fb faB = fa fBfab = fa fb • But if A and B are occurring together more often (disequilibrium state), we can write fAB = fA fB + D fAb = fA fb - D faB = fa fB - Dfab = fa fb + D • where D is called the measure of disequlibrium • Of course from definitions above we have D = fAB - fA fB
How can we use it? • Phase inference tells us how SNPs are organized on chromosome • Linkage disequilibrium measures the correlation between SNPs
Back to SNPs Daly et al (2001), Figure 1
Haplotypes - vehicles for SNPs • Daly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“ • The haplotype blocks: • Up to 100kb • 5 or more SNPs For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes
Haplotypes on the genome fragment • Observed haplotypes with dotted lines wherever probability of switching to another line is > 2% • Percent of explanation by haplotypes • Contribution of specific haplotypes
Another genetic testDoes haplotypes exist? • Each row represents an SNP • Blue dot = major • yellow = minor • Each column represents a single chromosome • The 147 SNPs are divided into 18 blocks defined by black lines. • The expanded box on the right is an SNP block of 26 SNPs over 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs
How much SNPs we can ignore? …and still predict haplotypes with high accuracy?
Literature • Gibson, Muse „A Primer of Genome Science” • N Patil et al . Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science294 2001:1719-1723. • M J Daly et al . High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229-232.