160 likes | 275 Views
GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling. www.geode.stir.ac.uk. ‘The Grid’ and New Technologies of Data Collection. ‘The Grid’ and ‘eScience’: Online Coordination of electronic resources and collaborations (Distributed computing)
E N D
GEODE: Grid Enabled Occupational Data EnvironmentPaul Lambert and Larry TanUniversity of Stirling www.geode.stir.ac.uk GEODE - NeSC workshop, Oct 2006
‘The Grid’ and New Technologies of Data Collection ‘The Grid’ and ‘eScience’: • Online Coordination of electronic resources and collaborations • (Distributed computing) • Large scale • Collaborative • Heterogeneous • Standard protocols / information management systems UK eSocial Science: • Investment in assessing / implementing technology • Computationally demanding data analysis • Qualitative and quantitative data collection technologies • **Data sharing, processing and access** GEODE - NeSC workshop, Oct 2006
GEODE: Survey records’ occupational data The importance of occupational micro-data Collecting occupational data • Initial occupational records (textual description) • Processing occupational records: Good practice: • Preservation of original, OUG and substantive variables • NSI’s favour transparent occupational data coding (1) and translation systems (2) • Text descriptions • →(1) Standardised Occupational Index (e.g. unit group: OUG) • →(2) Substantive occupational summary (e.g. social class code) GEODE - NeSC workshop, Oct 2006
(1) Text records → OUG data Currently: Text coding software (e.g. CASCOT) Manual look-up GEODE: Linkage to existing resources Further facilities possible but not planned (users typically have adequate resources) (2) OUG data → summary indicators Currently: Numerous aggregate occupational information resources Bespoke data programming requirements GEODE: Core provision: management and access of these data resources Service to large volumes of users Occupational data collection and processing GEODE - NeSC workshop, Oct 2006
Some illustrative occupational information resources GEODE - NeSC workshop, Oct 2006
What’s the problem? Indexed mainly by Occupational Unit Group (OUG). But… • Numerous alternative occupational data files (time; country; format) • Alternative OUG schemes; other index factors (‘employment status’) • Inconsistent translations to social classifications – ‘by file or by fiat’ • Dynamic updates to occupational data resources • Low uptake of existing occupational information resources • Strict security constraints on users’ micro-social survey data GEODE - NeSC workshop, Oct 2006
GEODE: Grid Enabled Occupational Data Environment Strategy: • Occupational data index service (depository) • Semantic data curation (DDI) • Data storage (OGSA-DAI) • Data indexing / access (OGSA-DAI) 2) User-friendly ‘portal’ access • Entry to an international virtual organisation for data depositors and users (GridSphere, GT4, OGSA-DAI) • Facilitate linking occupational information to users’ datasets (OGSA-DAI) (initial focus on CAMSIS resources) GEODE - NeSC workshop, Oct 2006
Occupational information depository 1.1) Semantic curation of occupational information • Establish a ‘GEODE-M’ meta-data subset (.xml) • Founded on Michigan Data Documentation Initiative • Minimise curation requirements • Web proforma entry • [via Portal using Gridsphere] GEODE - NeSC workshop, Oct 2006
Technical Objectives • Create a virtual community of occupational information researchers • Gateway for occupational information • Data abstraction • Uniform access to resources • Accessible via a portal • Occupational data curation • Annotation of data using DDI • Occupational matching services • e.g. Linking surveyed data to CAMSIS scores GEODE - NeSC workshop, Oct 2006
GEODE - Architecture • VO members can deploy own data services, also occupational matching services • Scalable • Distributed • Possible application for other types of social science data • Annotation with DDI • Custom services can be deployed GEODE - NeSC workshop, Oct 2006
GEODE – Prototype • Simple occupational matching services • VO of Occupational Data Resources • Portal for searching external resources GEODE - NeSC workshop, Oct 2006
GEODE - Prototype GEODE - NeSC workshop, Oct 2006
GEODE - Prototype • Windows environment • Java • GridSphere Portal Framework • Globus Toolkit 4 • Index Service (Virtual Organization) • OGSA-DAI WSRF (Data Access Middleware) • Custom OGSA-DAI resources and activities • Accesses CSV, Relational data resources GEODE - NeSC workshop, Oct 2006
GEODE - Prototype • Data Documentation Initiative • Annotate the data resources • Occupational Matching Grid Services • Checks if DDI of target resource is compatible (e.g. category specified matches requirement) • Map occupational unit group to data • Returns mapped/matched results • Demonstration of prototype GEODE - NeSC workshop, Oct 2006
Future Work • Possible extension of VO to other social science related datasets • With services • Variety of occupational data analysis services GEODE - NeSC workshop, Oct 2006