520 likes | 642 Views
Data Curation Profiles Workshop (Sort of, not really). Jake Carlson Data Services Specialist. Workshop hash tag: # dcptoolkit. Outline. What are Data Curation Profiles? Preparation Interviewing Building a Profile. The Data Curation Profile Toolkit is a means to determine:
E N D
Data CurationProfiles Workshop (Sort of, not really) Jake Carlson Data Services Specialist Workshop hash tag: #dcptoolkit
Outline • What are Data Curation Profiles? • Preparation • Interviewing • Building a Profile
The Data Curation Profile Toolkit is a means to determine: • Information about a particular data set and its lifecycle. • What a researcher is doing to manage / curate the data set. • What a researcher would like to do with the data.
Characteristics of the DCP • Tells “the story” of the data • Focused on a specific data set – provides depth not breadth • Interview based • Meant to be “discipline neutral” and widely applicable to different types of data • Modular – allows for flexibility and tailoring to specific situations and uses
Characteristics of the DCP • Represents the researcher’s needs and perspectives • A concise, structured document suitable for sharing and annotation. • A resource for Librarians, Archivists, IT Professionals, Data Managers, and others.
DCP Sections • Information about a Data Set and its Context • Overview of the Research • Focus • Intended Audience • Funding • Data Kinds and Stages • Data Narrative (data lifecycle) • Target Data for Sharing • Use/re-use Value • Contextual Narrative
What is a data set? • The data collected and analyzed for a specific project or problem • Primary = data generated, analyzed to achieve project results • Ancillary = additionaldata which furtheradds to project
Define curation • Curation is the activity of managing and promoting the use of data, starting from the point of creation, to ensure its fitness for contemporary purposes and availability for discovery and re-use. • Archiving is a curation activity which ensures that data is properly selected and stored, can be easily accessed and that its logical and physical integrity is maintained over time. • Preservation is an archiving activity in which specific items of data are maintained over time so that they can still be accessed and understood through succession and obsolescence of technologies. Lord, Macdonald, Lyon & Giaretta (2004) "From data deluge to data curation." Proceedings of the UK e-Science All Hands Meeting 2004, 31st August - 3rd September, Nottingham UK.
The “story” of a research data set • “Data from both studies in this project consist primarily of field data and plant samples. Variables gathered include the yield and overall health of the plot, the physical characteristics of the plant sample, and the amount of selected nutrients present in the sample.” • “The scientist studies real-time traffic signal performance measures in which he measures the movement of traffic, specifically the number of vehicles passing through an intersection and the amount of time they spend at an intersection on a movement-by-movement basis over a 24 hour period.”
Understanding data lifecycle • Researchers talk about their data in different phases, stages or levels • Helpful to understand distinctions because • It is often how they identify their work • Helps in talk about process/methods • Sometimes differences determine what someone is willing/able to share • Only some data may be curated
Data lifecycle General data lifecycle indicating stages or levels of data Humphrey, Charles. “e-Science and the Life Cycle of Research” (2006) Retrieved 4/20/10: http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
More DCP Sections Information about Needs Tools Interoperability Measuring Impact Data Management Preservation • Intellectual Property • Organization and description of data • Ingest • Access • Discovery
The DCP Toolkit The Data Curation Profile Toolkit consists of 4 components: • User Guide • Interviewer’s Manual • Interview Worksheet • DCP Template Photo from: http://www.flickr.com/photos/neilt/2517652/sizes/m/in/photostream/
The User Guide • Describes the rationale for the DCP • Describes the process through which a DCP is generated • Stage 1 – Preparation • Stage 2 – Worksheet & Interviews • Stage 3 – Constructing the Profile • Provides guidance & advice
Interview Worksheet and Manual • Meant to be used in tandem • The Interview Worksheet is given to the researcher to fill out. • The Interviewer’s Manual contains follow up questions for the interviewer to ask once the researcher has filled out a module.
Interviewing • Using the Interview Manual & Worksheet • Read any introductory statement listed in the “Interviewer’s Manual” (if any) • Then have the researcher complete the list of questions for the module in the “Interview Worksheet”. • Review the responses and ask the questions listed in the “Interviewer’s Manual” as appropriate. • Ask any follow up questions you feel are needed. • Move on to the next module.
Data Curation Profile Template • Describes the structure of the Data Curation Profile • Each section or sub-section contains a brief definition of the information that is needed to populate a Data Curation Profile
Connections Between Components Worksheet Manual Template
Worksheet Mod.13 Q2 Worksheet
Connections Between Components Section 13.1 of the Completed Data Curation Profile
How to Develop a DCP A Data Curation Profile is developed through 3 stages: • Stage 1 – Preparation • Stage 2 – Interviews • Stage 3 – Constructing the Profile
Stage 1 – Preparing • Schedule the interview. • At time / place convenient for the researcher • Select the data set to be profiled. • This may be negotiated by them if they feel they have a project in particular that they want to discuss, or one that is “easier” to discuss • Be sure to let researcher know that you will be recording the interview.
Stage 1 – Preparing • Sending out the “Interview Worksheet” in advance. • Pros and Cons… • Filling out the worksheet—we have found that people are not reluctant to fill them out, just that they either forget, or can’t find time • Ask for any materials that may help familiarize you with the selected data set. • An article, report, S.O.P., or other documentation
Stage 1 – Preparing • Do your homework - Investigate researchers’ work and use of data • Faculty publications • Faculty’s website • Press release / News Article • Review of awarded grants
Exercise Read the article and ID the data
In-depth: Scheduling • Schedule time & place to meet—best if quiet, room to spread out (some offices are packed!) • Send a gentle reminder to fill out Worksheet, if you sent it beforehand. (If they don’t, try to get them to do so during interview, noting that notes will help in compiling Profile)
Stage 1 – Preparing • Modifications to the DCP • How many of the profile modules do you want to include? • Time • Purpose • Core Modules • Are there additional modules or questions that you want to include? Photo from: http://www.flickr.com/photos/julia_manzerova/2190216162/sizes/m/in/photostream/
Stage 2 – Interviewing • Introduction to the Interview • The need for two interviews • Time required • Coverage Image from: http://www.flickr.com/photos/terryhart/2890904949/sizes/m/in/photostream/
Stage 2 – Interviewing • Using the Interview Manual & Worksheet • Read any introductory statement listed in the “Interviewer’s Manual” (if any) • Then have the researcher complete the list of questions for the module in the “Interview Worksheet”. • Review the responses and ask the questions listed in the “Interviewer’s Manual” as appropriate. • Ask any follow up questions you feel are needed. • Move on to the next module.
Stage 2 – Interviewing • Types of Questions: Worksheet • Free text • Short answer (text) • Selecting from a range of possible responses • Yes/No • Likert Scale Image from: http://www.flickr.com/photos/valeriebb/3006348550/sizes/m/in/photostream/
Stage 2 – Interviewing • Types of Questions: Manual • Explanatory – “Tell me why you selected ‘x’ as your response” • Clarifying – “Could you explain what you mean by ‘x’?” • Probing – “Could you tell me more about ‘x’?” • Relational – “Could you tell me how ‘x’ relates to your earlier response of ‘y’?” • Summarization – “So ‘x’ leads to ‘y’ is that right? Then what happens?”
Exercise Conducting the Interview • What was said about sharing the data set by the researcher? • What follow up questions would you ask?
Transcribing the Interview • Full:
Transcribing the Interview • Indexed
Exercises • Compose a subsection of a Data Curation Profile from a completed interview worksheet and transcript.
Exercise: Building a Profile 2.2 - Intended audiences The information needed to populate this sub-section will likely come from Module 3 – Sharing. This sub-section is meant to identify who the potential audiences for the data (not the research as a whole) are or might be according to the data client. The audience types listed may be specific (“Researchers studying the effects of climate change on plant growth during the Mesozoic era”) or broad (“Climate Change Researchers”) as dictated by the data client. Audience types may be those with whom the data client is currently sharing his or her data or audience types the data client imagines would be interested in this data.
Data lifecycle General data lifecycle indicating stages or levels of data Humphrey, Charles. “e-Science and the Life Cycle of Research” (2006) Retrieved 4/20/10: http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc