160 likes | 303 Views
Virtual Center for Collaborative Research (ViCtoR). IASSIST 2010 – Session D3: Virtual Research Environments Pascal Heus, Metadata Technology North America pascal.heus@metadatatechnology.com http://www.metadatatechnology.com. Background.
E N D
Virtual Center for Collaborative Research(ViCtoR) IASSIST 2010 – Session D3: Virtual Research Environments Pascal Heus, Metadata Technology North America pascal.heus@metadatatechnology.com http://www.metadatatechnology.com
Background • Demand for socio-economic data has grown dramatically in the past decade • Connectivity / network speed • Globalization / Economic crisis • Access to microdata has improved • Better archiving / preservation • Adoption of metadata standards such as DDI and related practices • But many challenges remain: • Discovery, access remain significant issues (lack of visibility) • Usability: documentation is still a problem, complexity of datasets is a barrier • Little or no community knowledge is typically available • Dataset are still typically delivered using simple / static web based interfaces • There is a lack of researchers tools that leverage on metadata
Putting some ideas together… • Internet technologies • Community driven virtual spaces are now very common • Social networking is widely accepted • User driven knowledge management works (for large groups) • Social science • Large number of public datasets are available • Surveys can now be easily be documented using the Data Documentation Initiative • Metadata related XML technologies can significantly automate tasks and maintain linkages across the life cycle • Researcher • User needs are different from the producer’s: they have a custom view of the data (their project) • Outputs should be preserved / captured / shared (not limited to a paper) • Need community space to foster dialog / share knowledge (within and outside research projects)
A Virtual Research and Collaborative Center • Go beyond the static web site to provide dynamic, virtual research within a collaborative environment • Leverage on Internet / XML technologies and metadata standards • Provide virtual access to public use data (global) • Web-based remote access: for discovery, analysis, publication • Enhanced analytical tools: data and documentation customization • Advanced collaboration, communication and dissemination tools: community knowledge capture, collaboration, social networking, information sharing/reuse • Approach • New tools based on DDI metadata and related standards • Leverage on Web 2.0 technologies • Provide research oriented environment • Build upon open source solutions
Home Welcome, background information, contact, simple access to public data and documentation Researcher Services Collaborative Space My Datasets Create custom view of the data for use in project or sharing with community Wiki Capture knowledge surrounding the data. Initial content will be seeded with survey metadata. My Projects Bring together researchers in a virtual environment to share research ideas, data, documentation, and scripts. Library Searchable libraries of papers/references/documentation, scripts/programs, primary and secondary data. Most of the content is extracted automatically from the research space. My Publications Package research outputs (papers, documents, scripts/programs, secondary data) for preservation, dissemination and sharing Communication Events and news, Community driven discussion groups, FAQ/Answers, Chat My Profile Provide individual background information, research interests, set privacy options and configure notifications services Services Researcher Directory, Project Directory, Call for collaboration, Notification, Support, Training Infrastructure Primary and researcher data and metadata storage, databases, security (access, backups), web services Admin Services System and data usage reports, data/metadata management, user administration, etc.
General features • Everything is publicly available (read only) • Registered users can manage research projects and contribute to the content • User will optionally provide (with privacy control) • Demographics: name, nickname, email, social networks • Affiliations: institutions, memberships • Academic background • Research interests
Analytical Tool: My Datasets • Researcher rarely use the full set of variables available in a single survey • Instead derived a “virtual” dataset off one of more data sources • Description of virtual dataset can be captured using DDI like metadata • Scripts to generate that particular view can then get automatically created for various statistical packages • Benefits • Hides the complexity of merging, filtering, recoding files • Independent of statistical package • Customized documentation can be produced dynamically • Virtual datasets can be versioned, shared with other, refreshed with new data, etc. • This also provides valuable usage information to data provider
Analytical Tools: My Projects • Provide virtual space for research team • Brings together virtual datasets, documents, scripts, outputs, collaborative tools • Primary Investigator can bring in collaborators • Knowledge exchange tools: blog, IM, optional wiki • File sharing tools: • Documents: referenced, research, outputs • Citations: within and outside project, • Scripts: shared research processes • Secondary data: microdata and aggregates • Can be marked for preservation / dissemination (see My Publications) • Can draw from community libraries • Project description contains topics that provides valuable metadata for usage and collaboration
Dissemination Tools: My Publications • Typically research output is a PDF • This is insufficient to meet Gary King’s Replication Standard • Leads to poor preservation and reuse • Need tool to package as enhanced publications • For preservation: contains everything that needs to be archived (from My Projects) • For dissemination: contains all necessary information to reproduce research process (not just the paper) • Files in projects can be marked for archiving and/or dissemination • Extra metadata can be provide for each file (Dublin Core citation, etc.) • Archived files will be stored for several years • Dissemination package will be made available on the web • Research paper • Can be circulated for peer review • Will be shared with the community, can be automatically sent to libraries, citation repositories, integrated into printed publications, etc. • Scripts can be automatically tagged with header, author, etc. • Data can marked as intermediate, final, public, etc. • Public usage, comments, ratings will be reported to PI
Discovery Tools: My Profile • Looking for data or documents is a significant effort for researcher • A metadata driven system can greatly alleviate by bringing the information to the user (rather than the other way around) • Researcher profile will provide various subscription and notification tools based on research interest • Examples: • Document becomes available on a specific topic or from a particular author/group • New or updated data becomes available on a specific topic • New research paper published using a specific dataset • Resarch project looking for collaborator or reviewers
Collaborative: Catalogs • The center community space will contain several catalogs, libraries, directories • Content will be derived automatically from research projects or contributed by users/providers • Data catalog: simple and complex search for dataset / variables based on survey, time ,geography, topics, etc. • Document library: searchable collections of research papers, survey documentation, references/methodologies, etc. • Script library: statistical programs shared by projects/users searchable by dataset, language, etc. • Researcher directory: lookup other researchers by interest, profile, expertise, etc. • Project directory: completed, ongoing and future research projects. Also a place to advertise research opportunities
Collaborative: Tools • Wiki: classic community driven knowledge capture • Some of the content will be seeded automatically from DDI metadata to create pages per survey, file, variable, etc • Classic tools: FAQ, news, events/calendar, chat, discussion forums • Collaborative tagging: • folksonomies to capture researcher perspective/feedback at the survey, dataset, variable level • Rating/comments on papers, datasets, etc. • Google Wave • And likely more….
Administration • Various management tools will be implemented • Reporting • User demographics • Data usage: most user variables, popular research topics, quality feedback, etc. • System usage: hits/visits, number of active projects, new papers, secondary datasets, etc. • Management • Data / metadata maintenance • User/Group management
Implementation strategy • Based on metadata standards • Leverage on OSS • Web service based architecture • Virtual / cloud server environment to ensure scalability (processing and storage) • Modular system to allow for incremental development • Build upon other ongoing initiatives • Not only a technological challenge: need also to address organizational / legal issues
Status / Next steps • Partnership NORC, Metadata Technology, ODaF • Initial version will use the General Social Survey (GSS) and likely PUFs from NORC Data Enclave datasets • Allows comparison between PUFs and DE version • Project transitioned from concept note into prototype end of 2009 • Technologies: Google Web Toolkit, J2EE, Spring, SQL, BaseX (native XML DB), Tomcat • Development currently on hold… • Other features being considered • Use for non-public dataset, harmonization/comparability features, extend functionalities to aggregate data (SDMX), link to geography (ISO 19115 and others), integrate statistical engine, Integrate disclosure control features
Conclusion • Work in progress to build innovative platform supporting a dynamic environment to perform research on survey microdata • Based on metadata and open technology standards to ensure a generic solution • Promotes sharing and reuse • Facilitates preservation and dissemination of research outputs • Foster collaboration and support community driven knowledge base • Provides better understanding on the usage of the data • For further information, contact • Tim Mulcahy, National Opinion Research Center (NORC), Mulcahy-Tim@norc.org • Pascal Heus, Metadata Technology (pascal.heus@opendatafoundation.org)