1 / 20

The European Radiobiological Archives (ERA)

The European Radiobiological Archives (ERA). Paul Schofield & Jonathan Bard Cambridge U Edinburgh U. Supported by European Commission contracts: FIR1-CT-2000-20097 & FI6R – SSA -2006 - 028275. Background.

abiola
Download Presentation

The European Radiobiological Archives (ERA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The European Radiobiological Archives (ERA) Paul Schofield & Jonathan Bard Cambridge U Edinburgh U Supported by European Commission contracts: FIR1-CT-2000-20097 & FI6R – SSA -2006 - 028275

  2. Background • In the 40s, 50s & 60s, there was a great deal of research to understand the scientific basis and effects of irradiation • This work was done across the world on a wide range of animals • The closed set of USA, European and Japanese data was archived during the ‘80s in an old and (apparently) poor version of ACCESS • The archiving has been done using disease, pathological and anatomical terminology that is inconsistent across labs and animals • The EU have funded a small group led by Paul Schofield (Cambridge) to make the database useful

  3. The International Radiobiology Archives • Set of HTML files on a Web Site describing all individual studies from > 300 000 individuals. Links allow the search for specific studies using certain radiation or other exposure or certain strains/species; • Database in ACCESS of all studies + data from ~200K individuals on survival, pathology, and clinical etc (~350MB) • Description of the ACCESS DB listing all tables and their fields as well the forms and their use including the underlying computer codes; • Database of selected references in ENDNOTE.

  4. ERA ACCESS Database • Relational database with a hierarchical structure; • Forms provided with Visual Basic code allows a user to • Browse through data, • Search for groups with specific characteristics, • Select groups for further study, • Perform preliminary Statistics and/or Export selected data for other statistics; • Re- or meta-analysis of different studies would be facilitated by • Combining (control) groups • Lumping diseases for analysis into • Families (e.g. all malignant liver tumors), • Classes (e.g. all systemic tumors). Lumping diseases is crucial for comparing studies from different laboratories using different pathologicalcriteria,..

  5. Pers_Status Lab Person Ref_Pers Status Referenc Study_Pers Study Group_Con Study_Ref RefKey Strain Group_Treat Keyword Group Gender Treat_Type Applicat Unit KeywordCat AgeCat Treat_Class PathDone Ind_Main Ind_Treat ICD7 Meas_Type DeathMode Ind_Measur ICD8 Common tables ICD9 Ind_PathMan KDS Ind_PathRod Dogs only Man only Rodents only Ind_Pedigree Hierarchical Structure of the ACCESS Database Ind_PathDog

  6. Number of Individuals in the database • Some individual group data unobtainable from several Japanese and one US institute (~60K individuals). • The US archives include ~40K beagles; a 20 year study as compared to the 5 years for a rodent study.

  7. Species and Individuals in Different Archives

  8. A typical (opaque) webpage

  9. Problems • Current classifications are a set of simple controlled vocabularies • Granularity is highly variable • Cannot separate anatomy & disease for some codes e.g. ICD9 has terms like pulmonary cancer • Cross species computing is very complex • No interoperability with other resources • Uses high-level Control Vocabularies for Disease and Family These are reasonable but unwieldy • Nothing is standardised cross the database!! • It is not easy to use

  10. Need for keeping legacy Radiobiology Archives • Most scientists involved in such studies are retired or dead • Few institutions (perhaps 4) now do this work. • Substantial risks exist that irreplaceable data will be discarded • Budgetary, legal and political limitations and popular opposition to animal studies make new animal studies unlikely. • Old animal studies represent a substantial investment • a study on 50-100 dogs or 5000 mice costs about €10M • the studies in the IRA database cost > 2x109€in) • Several studies are not yet fully evaluated • Old data can, with modern statistical methods, yield information on • dose, dose-rate and radionuclide effects • ageing, cancer & diseases.

  11. ERA-PRO funds the current work • Migration of data to Oracle database • Checking of data with original input sources - validation/curation • Improving access and functionality • An easy, intuitivve search facility for the user • Interoperability with other databases • Compliance with current ontology and other data standards • Coding of pathology diagnoses

  12. Migration of data to Oracle database • This is being done by a database group in Germany • Professional advice is to produce a DBMS-free version (in XML?) • Load this version into Oracle Validation of data • Hand checking of randomly selected datasets with original submission • The few errors are mainly systematice phase shifts or a keystroke error Better access, functionality & interoperability • This involves using modern coding to access the data • Our policy is not to touch the data itself but to use look-up tables • This means • Starting with standard ontologies for pathology and anatomy • Making links that allow them to handle all relevant species

  13. Bio-Ontologies • There are now various codings that can be used to annotate biological terms • Some are formal ontologies (sets of structured knowledge) - others are less well organised • Anatomy: • FMA (human) MA (mouse embryo an adult) • Others for c elegans, zebrafish and Drosophila • Pathology • ICD9 (human) every known disease (105 terms?) • SNOVET (animals) every animal disease • MPATH (mouse) mouse pathologies (600 terms) • There are others such as EULEP codes that are inaccessible • Some have mixed anatomy and pathology (pulmonary tumour) The ERA only uses old ones!

  14. Current disease coding in ERA ? Suggested approach

  15. Pathology terminology • Could handle some pathology by implementing • ICD9 ontology for human and adding anatomy codes? • MPATH for rodents with high level anatomical codes? • This excludes dogs and SNOVET is not an ontology • Unlikely to find a solution with a common detailed pathology ontology for all three species. • Many of the problems of unifying Pathology/disease terms depend on the disaggregation of Pathology and Anatomy and there is no common anatomy with adequate spatial detail

  16. Anatomy ontology resources Many: resources, formats, philosophies, purposes, variable content,

  17. Handling anatomy Options • Attempt full anatomy and pathology mappings for each species - NO • Use dynamic mapping facilities of cross-species anatomies - NO • Modify/generalise an existing model anatomy - possible Current approach • Use adult mouse ontology (cut down to the necessary level of detail) • Abstract it to be species-independent • Map on this ERA anatomy (topo) and disease anatomy terms (Dis-rod) • Use a second code to define species • Link the dual ontology to the data via a look-up table

  18. The draft anatomy look-up table TOPO term from the Topology list DIS-RODterm from the Disease-Rodent list NEW term invented to improve organisation Adult tissues (SYN: whole body) (TOPO) Body cavities(NEW) Pleural cavity (DIS-ROD) Pleural mesothelium (DIS-ROD) Peritoneal cavity (DIS-ROD) Peritoneal mesothelium (DIS-ROD) Meosthelium (DIS-ROD) DAG – dual parentage Peritoneal mesothelium (DIS-ROD) Pleural mesothelium (NEW) Cardiovascular system (TOPO) Blood vessels (TOPO) (DIS-ROD) Heart (TOPO) (DIS-ROD) Myocardium (DIS-ROD) Pericardium (DIS-ROD) Lymphatic vessels (TOPO) SYN: system (DIS-ROD)

  19. Handling pathology • The mix of SNOMED, SNOVET, ICD9 etc is just too cumbersome • The added problem is the mix of pathology and anatomy • The 600 MPATH terms cover everything Current approach • Use MPATH for the search page • Where there is added anatomy, use an additional MA code • Link MPATH to data via look-up tables

  20. In two years time …. • The user will have a search page based around ontologies and controlled vocabularies for • Species • General anatomy terms (based around the mouse) • Pathology terms (based around the mouse) • We will provide an underlying Oracle DB and series of look-up tables and links that will allow a user to • Identify the experiments that include data on his search terms • Extract individual animal data that meet the search criteria At least, that is the plan!

More Related