220 likes | 308 Views
European Genotype Archive Technical Feasibility Study Update. Goal.
E N D
European Genotype Archive Technical Feasibility Study Update
Goal The goal of the TFS is to assess the feasibility of the policies and technical systems developed by the European Genotype Archive (now the European Genome-phenome Archive: EGA) to ensure compliance with all necessary European privacy and medical regulations regarding the collection, storage and use of genotype data
Outline • EGA Overview • TFS Interim Results • New scientific results • Worldwide policy changes for GWAS data • EGA responses • New delivery date for final TFS report
EGA Mission Provide data security appropriate for individually identifiable data Data access limitations at three levels Public and open to all Authorized access for approved users Pre-publication (consortium access) Share data with similar databases worldwide Subject to approval of data access committee Create community value through regular release of anonymous data summaries Summary data to be incorporated in Ensembl or other genome browsers Amount and type of summary data is currently being determined in consultation with other policy stakeholders 4
EGA Description A database of the EBI core infrastructure Publication and permanent repository for all types of genotype and personally identifiable data Cohort and genome-wide association studies Population studies Phenotype data Technological solution for world-wide data sharing As transparent as possible to the data generators Accessible to other researchers Accepts array and sequence-based genotype data Features an API (Application Programming Interface) for consistent access Support for multiple underlying data types: SNP arrays, CNV data, sequence data Optimised database design for each data type 5
EGA Data Acceptance and Access All data with a defined release policy will be accepted Raw data file types and investigator defined genotypes Will store all deposited phenotype data 6
EGA Data Acceptance and Access Access decisions will remain with the data generating body Distributed model Transparency to the data generators EGA manages the access granted Users can also be restricted to particular collections within a study EGA is the European peer database to dbGAP (NCBI) dbGAP has adopted a more centralised model of data access decisions Data exchange discussions are on going to increase data discoverability Working toward a common application for both databases to lower administrative burden 7
Feasibility Assessment part 1 • Collection of regulations and procedures already in place at other European biobanking facilities • We held a series of meetings with data providers including • The Wellcome Trust Case Control Consortium • The UK DNA Banking Network • Cancer Genome Project (WTSI) • Nordic Controls data set (Institute for Molecular Medicine) • International Human Microbiome Consortium • Together these data providers have significantly different requirements • Incorporating information from all into our design provides maximal future flexibility
Feasibility Assessment part 2 • Development and testing of anonymous and secure data transfer technology and policies for linking this data with the EGA and publicly available data • Transfer technology development and assessment is essentially complete • Policies for linking data and level of aggregate data available for public release are not settled • Low resolution visualisation of data (at EGA) is available • High resolution visualisation of data (at Ensembl) is not available
EGA Physical Data Security and Transfer All data within the EGA is stored and distributed in encrypted form We provide the key for data decryption by postal mail EGA disks are physically attached to only one machine and EGA staff and the only people at the EBI with permission to log into the machine Users are given passwords and accounts with specific data set restrictions Raw data transfer (i.e. CEL files) is still difficult for most users These are provided by hard drive or by Aspera high speed data transfer technology 15
EGA Experience and Update • We have over 1000 researchers approved for data access • Raw data was previously only provided by mailing hard drives • Aspera for EGA data is now available and will likely replace hard drive mailing for all but a small number of users • EGA includes data for approximately 40,000 individuals • This number is expected to grow to by 10 times by the end of 2010 • Data submission systems through AIMS are in place • Other submission methods are in development • We are working with other groups to create phonotype data models 16
Ensembl View of GWAS summary data GWAS summary data currently unavailable 18
Limitations for Summary Level Data Release New method allow for accurate determination of whether a given individual is a member of a set where the only thing known about the set is allele frequencies dbGAP and EGA were notified in advance of the publication and moved the summary allele frequency data into the restricted area of the site Essentially all public access portals (Ensembl, UCSC, etc.) pulled GWAS data pending further review 19
EGA Response • After the release of this result, a large policy effort has been undertaken world wide which has included the EGA and other collaborative partners • One focus of this effort has been to fully understand the implications of the result • We have spent considerable research effort to analyze and understand the power and limits of this computational technique
Consultation • The EGA is contracting with an external security consultant to assess our procedures • The EGA will consult with its Scientific and Ethical Advisory Board in the second half of 2009 • Such consultations were envisioned in the original TFS plan
Conclusion • The EGA is working in a rapidly changing policy environment • There is an expectation that significant policy will be made regarding the use and sharing of aggregate GWAS data in the second half of 2009 • In light of this, the final report for TFS 13.5 will now be issued in ELIXIR month 28