170 likes | 243 Views
(I have no title). Joe Hourclé ESSI Workshop 2010-08-02. Note : When reading this presentation at home, view with the ‘Notes’ window visible; I have talking points and other comments in there. About Me. Programmer for the Virtual Solar Observatory (VSO)
E N D
(I have no title) • Joe Hourclé • ESSI Workshop 2010-08-02
Note : When reading this presentation at home, view with the ‘Notes’ window visible; I have talking points and other comments in there.
About Me • Programmer for the Virtual Solar Observatory (VSO) • Sysadmin & DBA for the Solar Data Analysis Center (SDAC) • Likes to complain about things • Has been working for the 18+ months on integrating SDO data into the VSO.
So, the problem ... • Scientists either don’t know, or don’t care about informatics issues • We need to work with the scientists to educate them on how to make their work (data, systems, catalogs, etc) useful to as wide an audience as possible • We need to stop having every data system designed from the ground up
Ignored Issues in e-Science: Collaboration, Provenance and the Ethics of Data
Ignored Issues in e-Science: Collaboration, Provenance and the Ethics of Data
Ignored Issues in e-Science: Collaboration, Provenance and the Ethics of Data
Provenance can't be a bolt-on. It must be part of the data system from the beginning of the mission. Otherwise, people can cast doubt in the data to refute research they don't like. • Uncertainties in some data are not straightforward to include in data files. Software should be seen as an alternative source of uncertainty information • It is impossible to tell in detail exactly how the data was produced. What assumptions were made, what artifacts introduced, what the absolute accuracy is. • In sensor networks – need annotation of when sensors are swapped out or other discontinuities.
How you describe / document time series data is fundamentally different from images & spectra – Collections are hard to define when there isn't a synoptic campaign. • Software engineering point of view for data :
Need ways to measure how interoperable systems are; types of interop and levels of compliance. • IRL : Interoperability Readiness Levels. Join the NASA Tech Infusion Working Group. • IPY is working on a cookbook.
Create reward systems for scientists that reward re-usability. (see Townhall Thurs evening) • Different users have different requirements – do you cater to the general user or all specific cases. Quick search vs. advanced search. • How do we determine the value of data? Increase in data value if we can reduce uncertainty or increase interop with other data. • Scale of software – when do you need to bring in a programmer, or a whole team to make it a full project?
(suggestion) YourBadData.org – name and shame the problem data sets. • Need automatization methos [sic] to process Nexrad data products by extracting only certain grids from a time data series of files, by geographic coordinate and/or location transformation files to readible formats. txt, shp, ... • Author identities – using pseudonyms to publish fringe work (blogs) ... might later want to merge identities, or might try to disassociate them when trying to get a new job.
Conclusion • We need to raise the informatics issues in ways that the scientists care about • They care about error bars; how can we improve their error tracking? • We need simple guidelines / best practices for good data systems • We need data & system specialists as stakeholders on new data system projects