200 likes | 316 Views
The European Radiobiological Archives (ERA). Paul Schofield & Jonathan Bard Cambridge U Edinburgh U. Supported by European Commission contracts: FIR1-CT-2000-20097 & FI6R – SSA -2006 - 028275. Background.
E N D
The European Radiobiological Archives (ERA) Paul Schofield & Jonathan Bard Cambridge U Edinburgh U Supported by European Commission contracts: FIR1-CT-2000-20097 & FI6R – SSA -2006 - 028275
Background • In the 40s, 50s & 60s, there was a great deal of research to understand the scientific basis and effects of irradiation • This work was done across the world on a wide range of animals • The closed set of USA, European and Japanese data was archived during the ‘80s in an old and (apparently) poor version of ACCESS • The archiving has been done using disease, pathological and anatomical terminology that is inconsistent across labs and animals • The EU have funded a small group led by Paul Schofield (Cambridge) to make the database useful
The International Radiobiology Archives • Set of HTML files on a Web Site describing all individual studies from > 300 000 individuals. Links allow the search for specific studies using certain radiation or other exposure or certain strains/species; • Database in ACCESS of all studies + data from ~200K individuals on survival, pathology, and clinical etc (~350MB) • Description of the ACCESS DB listing all tables and their fields as well the forms and their use including the underlying computer codes; • Database of selected references in ENDNOTE.
ERA ACCESS Database • Relational database with a hierarchical structure; • Forms provided with Visual Basic code allows a user to • Browse through data, • Search for groups with specific characteristics, • Select groups for further study, • Perform preliminary Statistics and/or Export selected data for other statistics; • Re- or meta-analysis of different studies would be facilitated by • Combining (control) groups • Lumping diseases for analysis into • Families (e.g. all malignant liver tumors), • Classes (e.g. all systemic tumors). Lumping diseases is crucial for comparing studies from different laboratories using different pathologicalcriteria,..
Pers_Status Lab Person Ref_Pers Status Referenc Study_Pers Study Group_Con Study_Ref RefKey Strain Group_Treat Keyword Group Gender Treat_Type Applicat Unit KeywordCat AgeCat Treat_Class PathDone Ind_Main Ind_Treat ICD7 Meas_Type DeathMode Ind_Measur ICD8 Common tables ICD9 Ind_PathMan KDS Ind_PathRod Dogs only Man only Rodents only Ind_Pedigree Hierarchical Structure of the ACCESS Database Ind_PathDog
Number of Individuals in the database • Some individual group data unobtainable from several Japanese and one US institute (~60K individuals). • The US archives include ~40K beagles; a 20 year study as compared to the 5 years for a rodent study.
Problems • Current classifications are a set of simple controlled vocabularies • Granularity is highly variable • Cannot separate anatomy & disease for some codes e.g. ICD9 has terms like pulmonary cancer • Cross species computing is very complex • No interoperability with other resources • Uses high-level Control Vocabularies for Disease and Family These are reasonable but unwieldy • Nothing is standardised cross the database!! • It is not easy to use
Need for keeping legacy Radiobiology Archives • Most scientists involved in such studies are retired or dead • Few institutions (perhaps 4) now do this work. • Substantial risks exist that irreplaceable data will be discarded • Budgetary, legal and political limitations and popular opposition to animal studies make new animal studies unlikely. • Old animal studies represent a substantial investment • a study on 50-100 dogs or 5000 mice costs about €10M • the studies in the IRA database cost > 2x109€in) • Several studies are not yet fully evaluated • Old data can, with modern statistical methods, yield information on • dose, dose-rate and radionuclide effects • ageing, cancer & diseases.
ERA-PRO funds the current work • Migration of data to Oracle database • Checking of data with original input sources - validation/curation • Improving access and functionality • An easy, intuitivve search facility for the user • Interoperability with other databases • Compliance with current ontology and other data standards • Coding of pathology diagnoses
Migration of data to Oracle database • This is being done by a database group in Germany • Professional advice is to produce a DBMS-free version (in XML?) • Load this version into Oracle Validation of data • Hand checking of randomly selected datasets with original submission • The few errors are mainly systematice phase shifts or a keystroke error Better access, functionality & interoperability • This involves using modern coding to access the data • Our policy is not to touch the data itself but to use look-up tables • This means • Starting with standard ontologies for pathology and anatomy • Making links that allow them to handle all relevant species
Bio-Ontologies • There are now various codings that can be used to annotate biological terms • Some are formal ontologies (sets of structured knowledge) - others are less well organised • Anatomy: • FMA (human) MA (mouse embryo an adult) • Others for c elegans, zebrafish and Drosophila • Pathology • ICD9 (human) every known disease (105 terms?) • SNOVET (animals) every animal disease • MPATH (mouse) mouse pathologies (600 terms) • There are others such as EULEP codes that are inaccessible • Some have mixed anatomy and pathology (pulmonary tumour) The ERA only uses old ones!
Current disease coding in ERA ? Suggested approach
Pathology terminology • Could handle some pathology by implementing • ICD9 ontology for human and adding anatomy codes? • MPATH for rodents with high level anatomical codes? • This excludes dogs and SNOVET is not an ontology • Unlikely to find a solution with a common detailed pathology ontology for all three species. • Many of the problems of unifying Pathology/disease terms depend on the disaggregation of Pathology and Anatomy and there is no common anatomy with adequate spatial detail
Anatomy ontology resources Many: resources, formats, philosophies, purposes, variable content,
Handling anatomy Options • Attempt full anatomy and pathology mappings for each species - NO • Use dynamic mapping facilities of cross-species anatomies - NO • Modify/generalise an existing model anatomy - possible Current approach • Use adult mouse ontology (cut down to the necessary level of detail) • Abstract it to be species-independent • Map on this ERA anatomy (topo) and disease anatomy terms (Dis-rod) • Use a second code to define species • Link the dual ontology to the data via a look-up table
The draft anatomy look-up table TOPO term from the Topology list DIS-RODterm from the Disease-Rodent list NEW term invented to improve organisation Adult tissues (SYN: whole body) (TOPO) Body cavities(NEW) Pleural cavity (DIS-ROD) Pleural mesothelium (DIS-ROD) Peritoneal cavity (DIS-ROD) Peritoneal mesothelium (DIS-ROD) Meosthelium (DIS-ROD) DAG – dual parentage Peritoneal mesothelium (DIS-ROD) Pleural mesothelium (NEW) Cardiovascular system (TOPO) Blood vessels (TOPO) (DIS-ROD) Heart (TOPO) (DIS-ROD) Myocardium (DIS-ROD) Pericardium (DIS-ROD) Lymphatic vessels (TOPO) SYN: system (DIS-ROD)
Handling pathology • The mix of SNOMED, SNOVET, ICD9 etc is just too cumbersome • The added problem is the mix of pathology and anatomy • The 600 MPATH terms cover everything Current approach • Use MPATH for the search page • Where there is added anatomy, use an additional MA code • Link MPATH to data via look-up tables
In two years time …. • The user will have a search page based around ontologies and controlled vocabularies for • Species • General anatomy terms (based around the mouse) • Pathology terms (based around the mouse) • We will provide an underlying Oracle DB and series of look-up tables and links that will allow a user to • Identify the experiments that include data on his search terms • Extract individual animal data that meet the search criteria At least, that is the plan!