470 likes | 630 Views
BI820 – Seminar in Quantitative and Computational Problems in Genomics. Biology and Bioinformatics. Gabor T. Marth. Department of Biology, Boston College marth@bc.edu. The animal cell. DNA – the carrier of the genetic code. DNA organization – chromosomes. Translation of genetic information.
E N D
BI820 – Seminar in Quantitative and Computational Problems in Genomics Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu
DNA sequencing informatics DNA sequencing informatics
sequence variations make our genetic makeup unique SNP • Single-nucleotide polymorphisms (SNPs) are most abundant, but other types of variations exist and are important Sequence variations • Human Genome Project produced a reference genome sequence that is 99.9% common to each human being
inherited diseases demographic history Why do we care about variations? phenotypic differences
diverse sequence resources can be used EST WGS BAC • diversion: sequencing informatics How do we find polymorphisms? • look at multiple sequences from the same genome region
Sequence clustering Cluster refinement Multiple alignment SNP detection SNP discovery -- Methods
507,152 high-quality candidate SNPs (validation rate 83-96%) Marth et al., Nature Genetics 2001 SNP discovery – Mining Projects ~ 30,000 clones >CloneX ACGTTGCAACGT GTCAATGCTGCA >CloneY ACGTTGCAACGT GTCAATGCTGCA 25,901 clones (7,122 finished, 18,779 draft with basequality values) 21,020 clone overlaps (124,356 fragment overlaps) ACCTAGGAGACTGAACTTACTG ACCTAGGAGACCGAACTTACTG
characterizing known polymorphic sites in sample collections – genotyping SNP databases and characteristics • access to variation data • SNP properties • reliability of information
TAACAAT • mutations are propagated down through generations MRCA TAAAAAT TAAAAAT TAACAAT TAAAAAT TAAAAAT TAACAAT TAACAAT TAACAAT Where do variations come from? • sequence variations are the result of mutation events TAAAAAT
MRCA MRCA accgctatgtaga accgttatgtaga accgctatataga actgttatgtaga Mutation rate • higher mutation rate (µ) gives rise to more SNPS
accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga accgttatgtaga Recombination accgttatgtaga accgttatgtaga accgttatgtaga
large (effective) population size N Demographic history small (effective) population size N • different world populations have varying long-term effective population sizes (e.g. African N is larger than European)
Modeling bottleneck stationary collapse expansion past history present MD (simulation) AFS (direct form)
Ancestral inference modest but uninterrupted expansion bottleneck
The signatures of selection • selective mutations influence the genealogy itself; in the case of neutral mutations the processes of mutation and genealogy are decoupled
“haplotype blocks” Association and haplotype structure “linkage disequilibrium”
functional understanding Medical utility? ? clinical phenotype molecular markers
association between allele and phenotype Mapping disease-causing loci genetic linkage