10 likes | 224 Views
Short arm. Short arm. Short arm. Short arm. Long arm. Long arm. Long arm. Long arm. Applications of Homozygosity Haplotype in the Study of Human Genetic Diseases with High Density SNP Genotype Haiyan Jiang 1 ,Mark Samuels 1,2 ,Duane Guernsey 1 ,Andrew Orr 1,3
E N D
Short arm Short arm Short arm Short arm Long arm Long arm Long arm Long arm . Applications of Homozygosity Haplotype in the Study of Human Genetic Diseases with High Density SNP Genotype Haiyan Jiang1,Mark Samuels1,2,Duane Guernsey1,Andrew Orr1,3 Departments of 1Pathology, and 3Ophthalmology and Visual Sciences, Dalhousie University, Halifax, NS Canada 2Department of Medicine, University of Montreal, Montreal, QC Canada Abstract In a large family with a specific disease, patients usually share the identity-by-descent (IBD) haplotype linked to the disease susceptibility genes. Although many haplotype analysis methods have been developed to detect the shared interval, it is currently still very difficult to reconstruct the haplotype on a genome-wide basis. A non-parametric method Homozygosity Haplotype (HH) was proposed recently for the genome-wide search of the shared autosomal segment with high density SNP genotype. Rather than phasing the haplotype, HH utilizes a form of haplotype described by the homozygous SNPs only, which allows HH to perform genome-wide search with high efficiency. The applicability and the effectiveness of HH in identifying the candidate region of causative gene were studied with the Illumina 550k genotype data of the affected members from a large family with Schnyder crystalline corneal dystrophy (SCCD, MIM 121800), a rare autosomal dominant disease. HH successfully detected the ~1Mb shared segment with a minimum set of three samples. We proposed that HH can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype for patients who have inherited a susceptibility gene from a common ancestor. A new strategy for the genome-wide screening of the known causative genes or loci with high density SNP genotype data was developed, which has the potential to be used as an efficient alternative approach other than sequencing or microsatellite-based fine mapping for the research of genetic diseases and the clinical diagnosis. 2. Use HH to identify the candidate loci for Schnyder crystalline corneal dystrophy 3. Application to the screening of known causative genes Assuming that patients who have inherited the disease susceptibility gene from a common ancestor also share haplotype in the genomic interval, HH approach can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype around the gene. If patients do not share significant RCHH around the known gene, then the gene can be excluded. Impact of genotyping errors It is difficult to determine genotyping errors when only a few affected individuals in a family are available to be genotyped, an approach was developed to calculate the error possibility. First, replace the mismatched compSNPs with concordant SNPs to create consistent homozygosity haplotype. Run Monte Carlo (MC) to simulate genotyping errors with the selected error model and error ratio on the modified genotypes. Analyze the distribution of the number of mismatched compSNPs created by simulated genotyping errors using Poisson distribution. Calculate the possibility of getting N mismatched compSNPs introduced by genotyping error. Background Schnyder crystalline corneal dystrophy (SCCD) is a rare genetic disorder presenting with opacification of the cornea caused by aberrant intracellular cholesterol storage and possible systemic dyslipidemia. AMGGI ascertained a large family segregating SCCD from Nova Scotia. Mapping analysis confirmed linkage of the Nova Scotia SCCD family to a previously published locus at chromosome 1p34. Lincon and Lander error model In which, O is the observed genotype of a SNP, G is the underlying genotype, the genotype error rate is E. The error rate is the same for all possible underlying genotypes. Lincoln SE, Lander ES. Genomics 1992; 14:604–610. The P value, the possibility of getting N mismatched compSNPs introduced by genotyping error, is calculated with a Poisson distribution. DNA resequencing identified a heterozygous segregating missense variant in the gene UBIAD1 at chr1:11,255,866-11,268,929 in the NS family, and four other missense variants in four additional small families ascertained nationally and internationally with SCCD. The presumptive pathogenic variants, at positions 102, 112, 119, 175 and 232 are all in highly conserved residues, and lie near each other in a 3-D model of protein folding. UBIAD1 encodes a potential prenyltransferase, and may participate in biosynthesis or regulation of intracellular cholesterol trafficking. Thus UBIAD1 is a potential novel therapeutic target for treatment of hyperlipidemia, a significant risk factor for cardiovascular disease. Taken from Orr et al. PLoS One 2: e685 (2007) Results 1. Homozygosity Haplotype Method The whole-genome screening approach was validated using a family with Myoclonus dystonia (MIM 159900). The known causative genes are SGCE, DRD2, and DYT1. A published causative mutation c.304C>T (R102X) in the SGCE gene has been detected in the affected family members by sequencing. HH was tested whether the proposed screening approach can exclude non-causative genes correctly. Four patients from the family were genotyped with Illumina HumanHap550 beadchips. HH was run to identify RCHHs shared by the four patients with a cutoff 3.0 cM. Genome-wide mapping of RCHHs shared by four patients from a Canadian family with Myoclonus dystonia DRD2: chr11:112,785,527-112,851,211 Results An HH is a haplotype described by only homozygous SNPs and is obtained by the deletion of heterozygous SNPs. DYT1: chr9:131,616,072-131,626,199 A compSNP is a SNP that is homozygous in two subjects I. An RCHH at chr1:10,679,786-11,639,887 was identified by HH method with the genotype data of 10 patients. SGCE: chr7:94,052,472-94,123,457 10 patients were genotyped with Illumina 550K bead chips. HH analyses were run with cutoff 3.0 cM. RCHHs identified by HH method with the 550K genotypes of 10 patients HH program Read in genotypes of all affected subjects Delete heterozygous SNPs and generate HH for each subject Generate list of RCHHs for each pair of subjects with a cutoff value Determine the shared RCHH of multi-subjects An RCHH is a run of compSNPs matched for allelic type, the genetic length of which is longer than the cutoff value. An RCHH is bounded by either a mismatched compSNP(s) or by the end(s) of an autosome. II. Minimal subset required to identify the interval Results of genotyping error simulation Sample selection: select distantly related individuals because they share less RCAs Region from a common ancestor (RCA) Cutoff value selection Ratio of RCA to the total genetic length shared by two descendants from a common ancestor. In which, m, n are the number of generations removed from a common ancestor of two subjects • The two gene DRD2 and DYT1 can be excluded because no RCHH was detected around them. The results of genotyping error simulations with P=0 suggest the genotype data are reliable. • The largest RCHH at chr7: 93,168,493-130,965,632 with size of 37 Mb includes gene SGCE (chr7:94,052,472-94,123,457). • The study of Myoclonus dystonia demonstrated that the proposed screening approach excluded all non-causative genes successfully. Besides, it identified the potential linkage of SGCE in the meanwhile. An RCA is an autosomal region where subjects share a chromosomal segment derived from a common ancestor (i.e. IBD). The presence of RCA is predicted through the RCHH. Miyazawa H, et. al. Homozygosity haplotype allows a genomewide search for the autosomal segments shared among patients. Am J Hum Genet. 2007 Jun; 80(6):1090-102. • Features of HH method • Non-parametric • High efficiency • Complexity O(n2), n: number of subjects • For Marfan syndrome, Affymetrix 500k SNP genotype, 9 subjects, the computational time is 6 s on laptop. • Both dominant and recessive disease loci can be detected • HH analysis may provide an advantage when 6≤m+n≤ 50 (m, n are the number of generations removed from a common ancestor of two subjects) where the haplotype analysis or the linkage analysis are difficult to perform. • HH is well-suited to the local population in Atlantic region with m+n<20. Conclusions Our study of HH approach with Illumina 550k SNP genotype data from a series of monogentic disease projects demonstrates that HH method is very efficient and effective in identifying disease linked regions. Based on the idea of homozygosity haplotype, we developed a new approach for the genome-wide screening of the known causative genes or loci using high density SNP genotype data. The successful application to a family with known causative mutation supports that the method has the potential to be used as an efficient alternative approach other than sequencing or Microsatellite-based fine mapping for the research and clinical diagnosis of genetic diseases. HH successfully detected the ~1Mb shared segment on Chr1 with genotypes of three patients RCHHs identified by HH method with the 550K genotypes of patient 1351 and 1425 July 2008, ISMB 2008