190 likes | 298 Views
BEER* workshop. 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé – Data integration made easier ~1515 – break ~1530 – Todd O’Brien – Better data, better science
E N D
BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé – Data integration made easier ~1515 – break ~1530 – Todd O’Brien – Better data, better science ~1600 – Discussion session – your input to the cookbook and the way forward *Being Efficient and Environmentally Responsible
IMBER BEER Imbizo • Welcome • Times and discussion - ask (or write down) pertinent questions - this is a workshop • Tea/coffee/BEER • Who are we CROZEX and Crozet (Possession Island)
IMBER Data Management • Data Management Committee • Arrange data by Project • First task is to engage and educate researchers to how good organization of data will benefit them • Before, during and after the project field phase
The bottom line • DM cannot be an afterthought • If you give DM some thought when you first plan a project, it will be • relatively straightforward • not too much effort • remarkably useful to all participants • valuable to those who come after
DM topics (data management) • Cookbook (http://planktondata.net/imber/) • Recognition for DM • Data Scientist • Best Practice (e.g. BCO-DMO*) • Data and Metadata (e.g. CSR) cruise summary • Data Centers – national (e.g. BODC) • Data Centers – specialist (e.g. OBIS, CCHDO, COPEPOD) • IMBER Data Portal Biological and Chemical Oceanography Data Management Office
Writing papers • Writing papers is an essential part of a researcher’s job • Writing papers is time consuming • Writing papers is tedious/boring • Writing papers needs attention to detail • Publications are a legacy of your research
Data management • Data management is an essential part of a researcher’s job • Data management is time consuming • Data management is tedious/boring • Data management needs attention to detail • Data sets are a legacy of your research
So why do we accept that we must write papers, but treat DM as the poor relation? • Because everybody else does! • Because we get recognition for publishing • But not for DM - seek to change this • But in fact: • Our published interpretation may be wrong • A good data set can be reinterpretted (..Fe) • So the data set is a more objective legacy • of a cruise (say) which cost a huge amount and cannot easily be repeated
Recognition for DM • Carrots and sticks • SCOR is considering how to allocate DOIs (Digital Object Identifiers) to data sets • At what level? • Quality control? • Put it on your CV • Act as Data Scientist to a project/cruise • Breadth of interest • Management experience • Contribute to promotion/pay rise
Being a Data Scientist is FUN! Raymond Pollard
So, what is a Data Scientist? • The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described • The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data • Why is it FUN? - because you learn so much yourself by having to talk to people • Can be full or part-time; paid or unpaid; hire, cajole or volunteer
Key role 1 - talking to people • Find out what they do and how they document it - methods, accuracy, … • What do they need from others - positions, water temperature, … • How do they store and back up their data. Do they back it up??! • What do they do with the data - calibrate, compare, sort, …
Range of data • Be aware of huge range of data types and quantities. People are blinkered by their own experience • E.g.volumes: • nutrients - 24 values per CTD cast • T&S - 5,000 to 100,000 values per cast • Turbulence - millions • Storage • Nutrients - PC spreadsheet • T&S, navigation - central workstation • Turbulence - dedicated workstation
Key role 2 - helping PIs • back up their data • paper copies • copy to central server • document their data, e.g. • help with metadata • create forms for them • obtain data from others for them • by masterminding an Event Log
Key role 3 - documentation • Document as much as possible yourself • Take copies of PI’s handwritten records • Use USB stick to copy their spreadsheets • be diplomatic • assure them you will NOT copy to others • emphasize the value of duplication • Create your own summary spreadsheets
Key role 4 - assist Principal Scientist • Help PS enforce unique referencing • Maintain and post an Event Log • of stations occupied • accurate station times and positions, etc • Quietly advise PS if a PI is not coping • with data rate • documentation • Prepare or help PS prepare CSR
Why can’t the PS do most DS tasks? • Not his priority (optimize cruise program) • Maybe not his forte • Too much work
Possible role 5 - primary data • Scientists often seem to assume that universally required data (time, navigation, CTD depth, temp, surface and met data) appears from thin air • In fact, those data need careful calibration • DS may need to do this, if no other person is responsible – at least check it • e.g. WHPO => CCHDO GEOTRACES (Chris Measures)
What does the DS gain? • Broadening your experience, learning from other PIs • Advancing your own DM skills • Great management training! (listening to others, looking for problems) • Looks great on your CV • You might even get paid