240 likes | 463 Views
Imputing HLA Alleles from SNPs. CSCI 8980 Dave Roe Mar 18, 2011. Overview. HLA SNPs Allele imputation LDMhc algorithm Conclusions and Applications. HLA. Human Leukocyte Antigen Major Histocompatibility Complex (MHC) Gene names: A, B, C, DR, DP, DQ, DQ.
E N D
Imputing HLA Alleles from SNPs CSCI 8980 Dave Roe Mar 18, 2011
Overview • HLA • SNPs • Allele imputation • LDMhc algorithm • Conclusions and Applications
HLA Human Leukocyte Antigen Major Histocompatibility Complex (MHC) Gene names: A, B, C, DR, DP, DQ, DQ. Alleles names: A*02, B*07:02:01, etc. More digits imply greater resolution (higher coverage of the gene) Image source: Wikipedia
HLA (cont.) • Genes encode proteins that present molecules (antigens) on the surface of cells to immune system cells (leukocytes) Image source: Carolyn Hurley (Georgetown)
T-cell T-cell HLA HLA HLA HLA HLA HLA HLA HLA TCR TCR HLA HLA HLA HLA Infected Cell HLA (cont.) Help immune system recognize viruses, parasites, bacteria, etc. Cancer Autoimmune diseases Healthy Cell Image source: Steven Mack (CHORI)
HLA (cont.) DP DQ DR B C A Most human genes have only a few (5-10) variants (alleles) 400 kb 50 kb 1100 kb 100 kb 1270 kb 141 112 809 1800 829 1193 class II loci class I loci The HLA region is the most polymorphic region of the human genome Source: Steven Mack (CHORI)
HLA (cont.) DP DQ DR B C A Alleles (gene variants) on the same chromosome can be inherited together as a haplotype Consider the number of possible protein variants for each HLA gene: ~4 Trillion Possible Unique Since everyone has two copies of each chromosome: ~16 Trillion Trillion Unique A-C-B-DR Haplotype Pairs (Genotypes) But, it isn’t that complicated because inheritance occurs in haplotypes Source: Steven Mack (CHORI)
HLA (cont.) DP DP DQ DQ DR DR B B C C A A • Genotype: A*03:01, A*03:01, B*08:01, B*35:02, C*04:01, C*07:01, DR*03:01, DR*11:04 • Haplotypes: 03:01 08:01 04:01 03:01 03:01 11:04 07:01 35:02 Source: adapted from Steven Mack (CHORI)
HLA (cont.) Image source: http://en.wikipedia.org/wiki/File:Migration_map4.png
SNPs: Single Nucleotide Polymorphisms • Allele-level gene typing: all SNPS in a gene • Relatively cheap • Used has markers for • Imputing higher (allelic) resolution information • Finding case-control differences (e.g., GWAS: genome-wide association studies) CCTGTAATGTCCCCCCTTGTACGTTAAATTT CGTGTAATGCGCCCCCTTGTACGTCAAATTT
SNPs (cont.) Image source: http://www.iavireport.org/archives/2007/Pages/IAVI-Report-11%284%29-perspective.aspx
LDMhc Approach • Imputation of allele-level typings from SNPs • Optimized for HLA • Reference set: Collections of SNPs from samples with known HLA alleles • Select most informative SNPs • Type those SNPs on experiment samples • Associate SNP typings with reference set to impute HLA alleles
LDMhc: Statistical Model • Probability that a haplotype carries an allele at a locus • Goal is to optimize selected SNPs (SL)
Reference Data Set • 2500 samples (each w/2 haplotypes) • 7733 SNPs per haplotype • Provides (phased) SNP haplotypes • Provides allele to SNP haplotype associations
LDMhc: SNP Selection via HMM • States/Transitions: SNPs along the chromosome that define haplotypes Source: Dilthey et al. 2011
LDMhc: Validation of SNP Selection • Applied new SNP selection method to an earlier experiment • Threshold on certainty of calls (e.g., 90%) • Improvement of 44% • due to call rate more than accuracy • Helps, but increased size of reference panel helps more • Samples I think – not SNPs
LDMhc: Validation of Imputation • Split reference set • 2/3: training/reference • 1/3: validation Source: Dilthey et al. 2011
Application to Disease Association • Applied to previous psoriasis study • C*06:02 is key risk factor • Recreated the result • More significant than any single SNP • Results • Aggregation/synergy creates information Source: Dilthey et al. 2011
Software Application • Local GUI for input preparation and QC • Submit to remote server for imputation Source: Dilthey et al. 2011
Conclusions (1/2) • Provides accurate, high-resolution imputation of HLA • Weakness • Most important information is imputted • Phased of SNPs • Association of alleles to SNPs • Can be improved and might lead to even greater accuracy • Race specific (plans to expand)
Conclusions (2/2) • Application to transplantation • Transplantation registry needs • Large donor pool • High resolution allelic typings • Potential use for recruitment typings
Acknowledgements • Dilthey, A. T., Moutsianas, L., Leslie, S., McVean, G. (2011): "HLA*IMP - An integrated framework for imputing classical HLA alleles from SNP genotypes " Bioinformatics Advance Access, doi: 10.1093/bioinformatics/btr061. • Leslie, S. et al. (2008):"A statistical method for predicting classical HLA alleles from SNP data." Am J Hum Genet 82(1): 48-56. • Application: https://oxfordhla.well.ox.ac.uk/hla/tool/main • Slides/images • Steven Mack, Children’s Hospital Oakland • Carolyn Hurley, Georgetown University • Loren Gragert, NMDP