740 likes | 1.24k Views
School of Bioinformatics. Harbin Medical. Science and Technology. University. Statistical Genetics. School of Bioinformatics Science and Technology Harbin Medical University. LECTURE 1: Introduction. Course Organization History Review Baisc Concepts Study Design Analysis Methods
E N D
School of Bioinformatics Harbin Medical Science and Technology University Statistical Genetics School of Bioinformatics Science and Technology Harbin Medical University
LECTURE 1: Introduction • Course Organization • History Review • Baisc Concepts • Study Design • Analysis Methods • Statistical Genetics in the Genomic Era • Course Outline
Course Organization • Chief Instructor • Ruijie Zhang • Office: Room 106, MBB • Tel: 86650712-106 • Email:Zhangruijie2002@yahoo.com.cn • Grading • Homework • 8 assignments • 1 Project • Test • 1 final
Charles Darwin (1809-1882) 主要学术贡献 The founder of evolutionary theory and the theory of natural selection in particular. On the Origin of Species (1859) ; The Decent of Man (1871) 婚姻与家庭 与表妹结婚 10个孩子,3个早年死亡. 另几个遭受疾病和弱质的摧残 达尔文恐惧是近亲结婚的结果
Mendel • An Austrian monk, Gregor Mendel, developed the fundamental principles that would become the modern science of genetics. Mendel demonstrated that heritable properties are parceled out in discrete units, independently inherited. These eventually were termed genes.
Genetic Terms • Gene - a unit of inheritance that usually is directly responsible for one trait or character. • Allele - an alternate form of a gene. Usually there are two alleles for every gene, sometimes as many a three or four. • Homozygous - when the two alleles are the same. • Heterozygous - when the two alleles are different, in such cases the dominant allele is expressed.
Genetic Terms • Dominant - a term applied to the trait (allele) that is expressed irregardless of the second allele. • Recessive - a term applied to a trait that is only expressed when the second allele is the same (e.g. short plants are homozygous for the recessive allele). • Phenotype - the physical expression of the allelic composition for the trait under study. • Genotype - the allelic composition of an organism.
Mendel’s first law Characters are controlled by pairs of genes which separate during the formation of the reproductive cells (meiosis)
Hardy-Weinberg principle, a basic principle of population genetics, independently discovered in 1908 Wilhelm Weinberg (1862–1937) Godfrey Harold Hardy (1877–1947)
主要学术贡献 Ronald Fisher (1890–1962) 现代统计科学和统计遗传学的奠基人 Genetical contribution The Correlation Between Relatives Statistical contribution maximum likelihood; ANOVA; concepts of sufficiency, ancillarity; Fisher's linear discriminator; Fisher information; F distribution … 学术争论 否认抽烟可引起肺癌 Yates和Mather批评Fisher受雇于烟草公司。
JBS HALDANE (1892-1964) 群体遗传学的三巨匠之一 Contribution Mathematical Theory of Natural and Artificial Selection ->Direction and rates of changes of gene frequencies ->Interaction of natural selection with mutation and with migration ->Using maximum likelihood for estimation of human linkage maps ->Mutational load
SEWALL WRIGHT (1889-1988) 群体遗传学的三巨匠之一 Contribution Theoretical population genetics -> Inbreeding, mating systems, and genetic drift -> the inventor of the inbreeding coefficient -> Wright‘s F-statistic -> Wright's statistical method of path analysis
杰出的人类群体遗传学大师 -> 《群体遗传学导论》:第一次清晰易懂的阐述费希尔(R. A. Fisher)、霍尔丹(J. B. S. Haldane)、赖特(S. Wright)等浩瀚而不朽著作,而这些大师们则在很大程度上是受惠于李景均才使他们的巨著能被更多的人所理解 -> 1953年,李景均对费希尔关于Rh阴性基因型有更高频率的补偿理论提出质疑 -> 1954年,提出“应用随机矩阵推导亲属间的联合分布与相关”ITO方法 ->60年代人类性状的孟德尔式分离研究,提出的:“单法是简单分离分析中需要应用的唯一方法。” Ching Chun Li (李景均) (1912-2003)
中国统计遗传学的先驱 吴仲贤 (1911- 2007 ) 我国动物数量遗传学科奠基人之一。1935年赴英国爱丁堡大学专攻动物遗传学,获博士学位。 ->我国数量遗传学史上第一部专著——《统计遗传学》 ->“杂种遗传力”的概念 我国植物数量遗传学科奠基人之一。1950年获伊利诺大学博士学位 ->大豆种质资源与育种 马育华(1912-)
Allele: A sequence of DNA bases. • Locus: Physical location of an allele on a chromosome. • Linkage: Proximity of two alleles on a chromosome. • Marker: An allele of known position on a chromosome. • Distance: Number of base-pairs between two alleles. • centiMorgan: Probabilistic distance of two alleles. • Phenotype: An outward, observable character (trait). • Genotype: The internally coded, inheritable information. Allel 1 Allel 2 Locus 1 Locus 2
Distances • Physical distances between alleles are base-pairs. But the recombination frequency is not constant. • A useful measure of distance is based on the probability of recombination: the Morgan. • A distance of 1 centiMorgan (cM) between two loci means that they have 1% chances of being separated by recombination. • A genetic distance of 1 cM is roughly equal to a physical distance of 1 million base pairs (1Mb).
Physical maps: Maps in base-pairs. • Human autosomal physical map: 3000Mb (bases). • Linkage maps: Maps in centiMorgan. • Human Male Map Length: 2851cM. • Human Female Map Length: 4296cM. • Correspondence between maps: • Male cM ~ 1.05 Mb; Female cM ~ 0.88Mb. • Cosegregation: Alleles (or traits) transmitted together. The tendency for closely linked genes and genetic markers to segregate (be inherited) together
Phenotype and Genotype • Task: Find basis (genotype) of diseases (phenotype). • Marker: Flag genomic regions in linkage disequilibrium. • Problem: Real genotype is not observable. • Strategy: Use marker as genotype proxy. • Condition: Linkage di ili i sequ br um. • Dependency: Observable measure of dependencybetween marker andphenotype.
Genetic Markers • One of the most celebrated findings of the human genome project is that humans share most DNA. • Still, there are subtle variations: • Simple Sequence Repeats (SSR): Stretches of 1 to 6 nucleotide repeated in tandem. • Microsatellite: Short tandem repeat (e.g. GATA) varying in number between individuals. • Single nucleotide polymorphism (SNP): Single base variation with at least 1% incidence in population.
Single Nucleotide Polymorphisms • Variations of a single base between individuals: ... ATGCGATCGATACTCGATAACTCCCGA ... ... ATGCGATCGATACGCGATAACTCCCGA ... • A SNP must occur in at least 1% of the population. • SNPs are the most common type of variations. • Differently to microsatellites or RTLPs, SNPs may • occur in coding regions: • cSNP: SNP occurring in a coding region. • rSNP: SNP occurring in a regulatory region. • sSNP: Coding SNP with no change on amino acid.
Reading SNP Maps common Heterozygote rare
Complex Traits • Problem: Traits don’t always follow single-gene models. • Complex Trait: Phenotype/genotype interaction. • Multiple cause: Multiple genes create phenotype. • Multiple effect: Gene causes more than a phenotype. • Caveat: Some Mendelian traits are complex indeed. • Sickle cell anemia: A classic Mendelian recessive. • Pattern: Identical alleles at beta-globulin locus. • Complexity: Patients show different clinical courses, from early mortality to unrecognizable conditions. • Source: X-linked locus and early hemoglobin gene.
Reasons for Complex Traits • Incomplete Penetrance: Some individuals with genotype do not manifest trait. Breast cancer / BRCA1 locus. • Genetic Heterogeneity: Mutation of more than one gene can cause the trait. Difficult in non experiment setting. • Retinitis pigmentosa: from any of 14 mutations. • Polygenic cause: Require more than one gene. • Hirshsprung disease: Ret、Sox10、Gfra-1 etc.
Characterizing Phenotypes • Simple phenotypes: Mendelian diseases usually have also the advantage of simple (binary) phenotypes. • Complex diseases: Twenty years of AI in medicine show that often diseases do not obey these patterns. • Dissecting phenotypes: It is critical as dissecting gene expression patterns. • Animal models: QTL strategy may be an answer. • Opportunity: Better clinical definition of disease states. • Clinical data: With dropping costs of sequencing, good clinical data about patients are the real wealth.
Correlating Marker and Pheonotype M Disease M X Disease M Pop’n Disease
Theoretical Empirical Experimental
Theoretical population genetics • General theoretical models predict evolution of gene frequencies and other things • Highly dependent upon assumptions • May or may not be realistic • Mathematically satisfying
Empirical population genetics • Apply statistical models to real data to infer underlying processes • Again, adequate sampling is necessary to achieve statistical power • Empirical population genetics is the emphasis in this course • due to enormous volumes of genetic data out there
Emprical population genetics:Detecting individual genes underlying phenotypic traits
Experimental population genetics • Test hypotheses in population genetics using controlled experiments • Design such that alternative outcomes possible, some of which can reject hypothesis • Need controls, replicates and adequate sample size • Usually restricted to model organisms • Drosophila, Neurospora, some crop plants
Study Design • Classification by sample strategy: • Pedigrees: Traditional studies focused on heredity. • Large pedigree: One family across generations. • Triads: Sets of nuclear families (parents/child). • Sib-pairs: Sets of pair of siblings. • Case/control: Unrelated subjects with/out phenotype. • Classification by experimental strategy: • Double sided: Case/control studies. • Single sided: e.g triads of affected children.
Analysis Methods • Study designs and analysis methods interact. • We review five main analysis types: • Linkage analysis: Traditional analysis of pedigrees. • Allele-sharing: Find patterns better than random. • Association studies: Case/control association. • TDT: transmission disequilibrium test. • Experimental crosses: Crosses in controlled setting. • Typically, these collections are hypothesis driven. • The challenge is to collect data so that the resulting analysis will have enough power.