1 / 17


IWIR-CRIS '06. Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities. Agenda. Overview Retrieval Validated manual data gathering Dynamic integration to local back-end systems Aggregation, enrichment and import of historic data

Download Presentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities

  2. Agenda • Overview • Retrieval • Validated manual data gathering • Dynamic integration to local back-end systems • Aggregation, enrichment and import of historic data • Experiments with automated imports of historic data • Exposure • Two web services • OAI • Z39.50 • Reports • Portal framework • Archiving • Near future

  3. Overview • Brief overview • … in order to discuss ingestion, integration, conversion and import in a specific context

  4. Overview • Brief overview • History • Development begun in 2002 • Users • 9 universities (DK+SE), several hospitals + other research institutions • Platform and architecture • J2EE enterprise application • Release management: All users have instances of same release version, same code-base • Business model • Commercial software licenses, powerful user group, shared budgets • Modular • Basic module, Reporting module, Student thesis module, External publications module, Bibliometrics module, Press module.

  5. Overview

  6. Retrieval • Manual data gathering • User roles/right + workflow: • = de-centralized data gathering • = validated data gathering • = continuous data gathering • GUI example • Management focus is necessary • Reports and statistics, KPI-management, etc. • Adding value to researchers is necessary • Instantly in Google indexes, instantly updated personal websites, instantly updated CV, increased citations (source in paper), etc.

  7. Retrieval • Dynamic integration • Dynamic integration to local back-end systems: • Personnel systems, payroll systems (for data retrieval) • LDAPs, Active Directories (for data retrieval + authentication) • Single sign-on systems (for authentication) • … to automatically create object types such as “person” or “organization” • … and yes, PURE hosts data, too • We need complete objects according to the meta-data model • Plug-in architecture in PURE: • Pro = individually adapted integration • Con = individually programmed plug-in necessary • Future = GUI, standardized plug-ins

  8. Retrieval • Import • Historic data • Many sources • More or less useful data • More or less consequent use of formats :-) • The PXA format • PURE XML Archive format - .zip based • Meta-data, relations between entities, binary files • Aggregation > enrichment > conversion > import • The process is external to PURE

  9. Retrieval • Experiments • Experiments with automated imports of historic data from specific, identified sources • [source format] > PXA conversion > import > enrichment/validation • Very poor data quality demands the concept of “draft objects” in PURE

  10. Exposure • Web services • RPC/encoded + document/literal • Rich libraries of methods • Including format-specific methods: APA, MLA, HARVARD, VANCOUVER and CBE • Free and near-instant adding of methods • WS code example (if time)

  11. Exposure • OAI support • OAI-PMH data provider • OAI-PMH formats • DC • DDF-MXD (Danish national format) • SVEP (Swedish national format) • … more to come • Also used to harvest other PURE-repositories for “external publications”

  12. Exposure • Z39.50 • Enabling of searches in PURE from library systems • SRW/SRU

  13. Exposure • Reports • PURE reporting module • GUI example

  14. Exposure • Reference manager • Export of data to local Reference Manager installation • Using RM-formatted export file • Promotes registering to the repository rather than in RM • GUI example

  15. Exposure • Portal framework • PUREportal – free PURE-specific framework for custom development of research exhibition portals • Online example • Typical cost scenario € 20,000 • Typical delivery time 1 month • Little need for requirements specification • Automatic PURE-API maintenance

  16. Archiving • Data archiving – 2 levels • SQL environment • Meta-data and relations • Binary files just stored in server file system • FEDORA via connector (not PURE-specific, Open Source) • Facilitates: • Higher quality archival of binary files • Long term preservation in general • Adoption of PURE in institutions’ general FEDORA strategies

  17. Near future • The near future regarding data retrieval • More automated imports using increasingly advanced converters • Automated data delivery (push and harvest) to: • Industry specific search services (e.g. PubMed, Nordicom) • Documentary data collections (such as clinicaltrials.org), and national collections (such as DDF (DK), ForskDok (NO), etc. • Temporary import objects • When imported data are not in sufficient quality to create valid objects • when data cannot be properly related to other objects upon import

More Related