190 likes | 326 Views
ODaF Europe 2009 Virtual Research and Collaborative Center. Pascal Heus , Open Data Foundation Tim Mulcahy, National Opinion Research Center http://www.opendatafoundation.org info@opendatafoundation.org. Background. Demand for socio-economic data has grown dramatically in the past decade
E N D
ODaF Europe 2009Virtual Research and Collaborative Center Pascal Heus, Open Data Foundation Tim Mulcahy, National Opinion Research Center http://www.opendatafoundation.org info@opendatafoundation.org
Background • Demand for socio-economic data has grown dramatically in the past decade • Connectivity / network speed • Globalization / Economic crisis • Access to microdata has improved • Better archiving / preservation • Adoption of metadata standards such as DDI and related practices • But many challenges remain: • Discovery, access remain an issue (lack of visibility) • Usability: documentation is still an issue, complexity of datasets is a barrier • No community knowledge • Dataset are still typically made available using simple / static web based interfaces • There is a lack of researchers tools that leverage on metadata ODaF Euope 2009
Putting some ideas together… • Internet technologies • Community driven virtual spaces are now very common • Social networking is widely accepted • User driven knowledge management works (for large groups) • Social science • Large number of public datasets are available • Surveys can now be easily be documented using the Data Documentation Initiative • Metadata related XML technologies can significantly automate tasks and maintain linkages across the life cycle • Researcher • User needs are different from the producer’s: they have a custom view of the data (their project) • Outputs should be preserved / captured / shared (not limited to a paper) • Need community space to foster dialog / share knowledge (within and outside research projects) ODaF Euope 2009
A Virtual Research and Collaborative Center • Go beyond the static web site to provide dynamic, virtual research within a collaborative environment • Leverage on Internet / XML technologies and metadata standards • Provide virtual access to public use data (global) • Web-based remote access: for discovery, analysis, publication • Enhanced analytical tools: data and documentation customization • Advanced collaboration, communication and dissemination tools: community knowledge capture, collaboration, social networking, information sharing/reuse • Approach • New tools based on DDI metadata and related standards • Leverage on Web 2.0 technologies • Provide research oriented environment • Build upon open source solutions ODaF Euope 2009
Home Welcome, background information, contact, simple access to public data and documentation Researcher Services Collaborative Space My Datasets Create custom view of the data for use in project or sharing with community Wiki Capture knowledge surrounding the data. Initial content will be seeded with survey metadata. My Projects Bring together researchers in a virtual environment to share research ideas, data, documentation, and scripts. Library Searchable libraries of papers/references/documentation, scripts/programs, primary and secondary data. Most of the content is extracted automatically from the research space. My Publications Package research outputs (papers, documents, scripts/programs, secondary data) for preservation, dissemination and sharing Communication Events and news, Community driven discussion groups, FAQ/Answers, Chat My Profile Provide individual background information, research interests, set privacy options and configure notifications services Services Researcher Directory, Project Directory, Call for collaboration, Notification, Support, Training Infrastructure Primary and researcher data and metadata storage, databases, security (access, backups), web services Admin Services System and data usage reports, data/metadata management, user administration, etc. ODaF Euope 2009
General features • Everything is publicly available (read only) • Registered users can manage research projects and contribute to the content • Registration will likely be based on OpenID (no need to create a new account) • User will optionally provide (with privacy control) • Demographics: name, nickname, email, social networks • Affiliations: institutions, memberships • Academic background • Research interests ODaF Euope 2009
Analytical Tool: My Datasets • Researcher rarely use the full set of variables available in a single survey • Instead derived a “virtual” dataset off one of more data sources • Description of virtual dataset can be captured using DDI like metadata • Scripts to generate that particular view can then get automatically created for various statistical packages • Benefits • Hides the complexity of merging, filtering, recoding files • Independent of statistical package • Customized documentation can be produced dynamically • Virtual datasets can be versioned, shared with other, refreshed with new data, etc. • This also provides valuable usage information to data provider ODaF Euope 2009
Analytical Tools: My Projects • Provide virtual space for research team • Brings together virtual datasets, documents, scripts, outputs, collaborative tools • Primary Investigator can bring in collaborators • Knowledge exchange tools: blog, IM, optional wiki • File sharing tools: • Documents: referenced, research, outputs • Citations: within and outside project, • Scripts: shared research processes • Secondary data: microdata and aggregates • Can be marked for preservation / dissemination (see My Publications) • Can draw from community libraries • Project description contains topics that provides valuable metadata for usage and collaboration ODaF Euope 2009
Dissemination Tools: My Publications • Typically research output is a PDF • This is insufficient to meet Gary King’s Replication Standard • Leads to poor preservation and reuse • Need tool to package as enhanced publications • For preservation: contains everything that needs to be archived (from My Projects) • For dissemination: contains all necessary information to reproduce research process (not just the paper) • Files in projects can be marked for archiving and/or dissemination • Extra metadata can be provide for each file (Dublin Core citation, etc.) • Archived files will be stored for several years • Dissemination package will be made available on the web • Research paper • Can be circulated for peer review • Will be shared with the community, can be automatically sent to libraries, citation repositories, integrated into printed publications, etc. • Scripts can be automatically tagged with header, author, etc. • Data can marked as intermediate, final, public, etc. • Public usage, comments, ratings will be reported to PI ODaF Euope 2009
Discovery Tools: My Profile • Looking for data or documents is a significant effort for researcher • A metadata driven system can greatly alleviate by bringing the information to the user (rather than the other way around) • Researcher profile will provide various subscription and notification tools based on research interest • Examples: • Document becomes available on a specific topic or from a particular author/group • New or updated data becomes available on a specific topic • New research paper published using a specific dataset • Resarch project looking for collaborator or reviewers ODaF Euope 2009
Collaborative: Catalogs • The center community space will contain several catalogs, libraries, directories • Content will be derived automatically from research projects or contributed by users/providers • Data catalog: simple and complex search for dataset / variables based on survey, time ,geography, topics, etc. • Document library: searchable collections of research papers, survey documentation, references/methodologies, etc. • Script library: statistical programs shared by projects/users searchable by dataset, language, etc. • Researcher directory: lookup other researchers by interest, profile, expertise, etc. • Project directory: completed, ongoing and future research projects. Also a place to advertise research opportunities ODaF Euope 2009
Collaborative: Tools • Wiki: classic community driven knowledge capture • Some of the content will be seeded automatically from DDI metadata to create pages per survey, file, variable, etc • Classic tools: FAQ, news, events/calendar, chat, discussion forums • Collaborative tagging: • folksonomies to capture researcher perspective/feedback at the survey, dataset, variable level • Rating/comments on papers, datasets, etc. • And likely more…. ODaF Euope 2009
Administration • Various management tools will be implemented • Reporting • User demographics • Data usage: most user variables, popular research topics, quality feedback, etc. • System usage: hits/visits, number of active projects, new papers, secondary datasets, etc. • Management • Data / metadata maintenance • User/Group management ODaF Euope 2009
Implementation strategy • Based on metadata standards • Build as open source product (and leverage on OSS) • Web service based architecture • Virtual / cloud server environment to ensure scalability (processing and storage) • Modular system to allow for incremental development • Build upon other ongoing initiatives • Not only a technological chalenge: need also to address organizational / legal issues ODaF Euope 2009
Status / Next steps • Project at initial stage (concept note) • Partnership NORC, ODaF and other agencies • Will likely start at NORC using the General Social Survey (GSS) and possibly other public use files • In discussion with other producers • Planning for prototype 4Q 2009 • Other options being considered: • Use for non-public dataset • Add harmonization/comparability features • Extend functionalities to aggregate data (SDMX) • Link to geography (ISO 19115 and others) • Integrate statistical engine • Integrate disclosure control features ODaF Euope 2009
Conclusion • Proposal to build innovative tools to provide a dynamic environment to perform research on survey microdata • Based on metadata and open technology standards to ensure a generic solution • Promotes sharing and reuse • Facilitates preservation and dissemination of research outputs • Foster collaboration and support community driven knowledge base • Provides better understanding on the usage of the data • For further information, contact • Tim Mulcahy, National Opinion Research Center (NORC), Mulcahy-Tim@norc.org • Pascal Heus, Open Data Foundation (ODaF), pheus@opendatafoundation.org ODaF Euope 2009
XML metadata specifications for socio-economic data • Statistical Data and Metadata Exchange (SDMX) • Macrodata, time series, indicators, registries • http://www.sdmx.org • Data Documentation Initiative (DDI) • Microdata (surveys, studies) • http://www.ddialliance.org • ISO 11179 • Semantic modeling, concepts, registries • http://metadata-standards.org/11179/ • ISO 19115 • Geography • http://www.isotc211.org/ • Dublin Core • Resources (documentation, images, multimedia) • http://www.dublincore.org ODaF Euope 2009
The Data Documentation Initiative (DDI) • International XML based specification for the documentation of social and behavioral data • Started in 1995, now driven by DDI Alliance (30+ members) • Became XML specification in 2000 (v1.0) • Current version is 2.1 with focus on archiving (survey/codebook) • New Version 3.0 (2008) • Focus on entire survey “Life Cycle” • Provide comprehensive metadata on the entire survey process and usage • Aligned on other metadata standards (DC, MARC, ISO 11179, SDMX, …) • Include machine actionable elements to facilitate processing, discovery and analysis • http://www.ddialliance.org ODaF Euope 2009