1 / 19

BEER* workshop

BEER* workshop. 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé – Data integration made easier ~1515 – break ~1530 – Todd O’Brien – Better data, better science

saki
Download Presentation

BEER* workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé – Data integration made easier ~1515 – break ~1530 – Todd O’Brien – Better data, better science ~1600 – Discussion session – your input to the cookbook and the way forward *Being Efficient and Environmentally Responsible

  2. IMBER BEER Imbizo • Welcome • Times and discussion - ask (or write down) pertinent questions - this is a workshop • Tea/coffee/BEER • Who are we CROZEX and Crozet (Possession Island)

  3. IMBER Data Management • Data Management Committee • Arrange data by Project • First task is to engage and educate researchers to how good organization of data will benefit them • Before, during and after the project field phase

  4. The bottom line • DM cannot be an afterthought • If you give DM some thought when you first plan a project, it will be • relatively straightforward • not too much effort • remarkably useful to all participants • valuable to those who come after

  5. DM topics (data management) • Cookbook (http://planktondata.net/imber/) • Recognition for DM • Data Scientist • Best Practice (e.g. BCO-DMO*) • Data and Metadata (e.g. CSR) cruise summary • Data Centers – national (e.g. BODC) • Data Centers – specialist (e.g. OBIS, CCHDO, COPEPOD) • IMBER Data Portal Biological and Chemical Oceanography Data Management Office

  6. Writing papers • Writing papers is an essential part of a researcher’s job • Writing papers is time consuming • Writing papers is tedious/boring • Writing papers needs attention to detail • Publications are a legacy of your research

  7. Data management • Data management is an essential part of a researcher’s job • Data management is time consuming • Data management is tedious/boring • Data management needs attention to detail • Data sets are a legacy of your research

  8. So why do we accept that we must write papers, but treat DM as the poor relation? • Because everybody else does! • Because we get recognition for publishing • But not for DM - seek to change this • But in fact: • Our published interpretation may be wrong • A good data set can be reinterpretted (..Fe) • So the data set is a more objective legacy • of a cruise (say) which cost a huge amount and cannot easily be repeated

  9. Recognition for DM • Carrots and sticks • SCOR is considering how to allocate DOIs (Digital Object Identifiers) to data sets • At what level? • Quality control? • Put it on your CV • Act as Data Scientist to a project/cruise • Breadth of interest • Management experience • Contribute to promotion/pay rise

  10. Being a Data Scientist is FUN! Raymond Pollard

  11. So, what is a Data Scientist? • The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described • The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data • Why is it FUN? - because you learn so much yourself by having to talk to people • Can be full or part-time; paid or unpaid; hire, cajole or volunteer

  12. Key role 1 - talking to people • Find out what they do and how they document it - methods, accuracy, … • What do they need from others - positions, water temperature, … • How do they store and back up their data. Do they back it up??! • What do they do with the data - calibrate, compare, sort, …

  13. Range of data • Be aware of huge range of data types and quantities. People are blinkered by their own experience • E.g.volumes: • nutrients - 24 values per CTD cast • T&S - 5,000 to 100,000 values per cast • Turbulence - millions • Storage • Nutrients - PC spreadsheet • T&S, navigation - central workstation • Turbulence - dedicated workstation

  14. Key role 2 - helping PIs • back up their data • paper copies • copy to central server • document their data, e.g. • help with metadata • create forms for them • obtain data from others for them • by masterminding an Event Log

  15. Key role 3 - documentation • Document as much as possible yourself • Take copies of PI’s handwritten records • Use USB stick to copy their spreadsheets • be diplomatic • assure them you will NOT copy to others • emphasize the value of duplication • Create your own summary spreadsheets

  16. Key role 4 - assist Principal Scientist • Help PS enforce unique referencing • Maintain and post an Event Log • of stations occupied • accurate station times and positions, etc • Quietly advise PS if a PI is not coping • with data rate • documentation • Prepare or help PS prepare CSR

  17. Why can’t the PS do most DS tasks? • Not his priority (optimize cruise program) • Maybe not his forte • Too much work

  18. Possible role 5 - primary data • Scientists often seem to assume that universally required data (time, navigation, CTD depth, temp, surface and met data) appears from thin air • In fact, those data need careful calibration • DS may need to do this, if no other person is responsible – at least check it • e.g. WHPO => CCHDO GEOTRACES (Chris Measures)

  19. What does the DS gain? • Broadening your experience, learning from other PIs • Advancing your own DM skills • Great management training! (listening to others, looking for problems) • Looks great on your CV • You might even get paid

More Related