50 likes | 143 Views
High dimensional genomic data, identifiability , and query-response. Haixu Tang School of Informatics and Computing Indiana University, Bloomington. “Big Data” in Personal G enomics. Genomics is a key component of personalized medicine Massive
E N D
High dimensional genomic data, identifiability, and query-response Haixu Tang School of Informatics and Computing Indiana University, Bloomington
“Big Data” in Personal Genomics • Genomics is a key component of personalized medicine • Massive • Large research-oriented projects: 1000 genomes to 106 • Genome sequencing for all new-borns? • Open data project, e.g., the Personal Genomics Project (PGP) • Heterogeneous • Genomic sequence (variations) • Constant, dynamic monitoring • Transcritpomics, proteomics, metabolomics, microbial communities, etc. (as demonstrated by iPOP)
Challenges in Personal Genomics Challenges: Speed, Storage, Scalability, Security Solution: cloud, hybrid cloud, bring computing to the data!
Privacy Enhancing Technologies Database security approaches: access control, query auditing, differential privacy Cryptographic protocols: SMC, homomorphic computation, functional encryption Ethic studies, informed consent, policy
What is specific for genomic data? • Challenges • Genome technologies evolve very fast! • Genomic data are extremely high dimensional • Millions of SNPs, easily identifiable • Balance between data security and utility • Not only the data, but also analysis results need to be protected • Allele frequencies or test statistics (e.g., Homer’s attack) • Special properties • Different dimensions are NOT independent • Genetic structures (e.g., linkage disequilibrium) • Specific genomic research focuses on a small number of dimensions (e.g., disease-associated SNPs)