140 likes | 246 Views
Data at Work: Supporting Sharing in Science and Engineering. ( Birnholtz & Bietz , 2003) Adam Worrall LIS 6269 Seminar in Information Science 3/30/2010. Data and data sharing. Information science needs “a better understanding of the use of data in practice” (p. 339)
E N D
Data at Work: Supporting Sharing in Science and Engineering (Birnholtz & Bietz, 2003) Adam Worrall LIS 6269 Seminar in Information Science 3/30/2010
Data and data sharing • Information science needs “a better understanding of the use of data in practice” (p. 339) • Data fundamentally “different from documents”(p. 339) • Data sharing important (p. 339-340) • “Openness” of scientific process • Confirm findings, replicate results • Build on previous work • Large data sets require distributed collaboration • Collaboratories, e-science LIS 6269 Seminar in Information Science
Data sharing problems • Collaborating and sharing of data should be encouraged • But it “is not easy” to do so (p. 340) • Why? • Lack of willingness to share, trust others • Competition for “revenue” (p. 345) • Restrictions imposed by commercial interests • Trust of sources • Trust of others; will they use data well?(see also Van House, 2003) LIS 6269 Seminar in Information Science
Data sharing problems • Reasons (continued) • Problems with finding shared data • Negotiate access • Difficulties interpreting and using shared data • How collected? • How analyzed? • What format? • Metadata • Format, encoding, controlled vocabularies, etc. • Data quality (see also Stvilia et al., 2008; Wand & Wang, 1996) • “Tacit” knowledge of data (p. 340) LIS 6269 Seminar in Information Science
Methodology • Three disciplines • Earthquake engineering • HIV / AIDS research • Space physics • Observation and interviews of all three, surveys of earthquake engineers • Inductive, grounded approach • Claimed they made “no assumptions about the purpose of data” (p. 340) LIS 6269 Seminar in Information Science
Data dimensions • Two dimensions identified (p. 341) • “news” vs. “confirmation” • Confirm existing or expected results • Something unexpected needing further exploration • Something not fitting expected / prevailing model • “streams” vs. “events” • Longitudinal vs. cross-sectional • Context for data may change • Rate of data different • Different disciplines, different data use LIS 6269 Seminar in Information Science
Data’s role in scientific communities • Defines boundaries between communities • Experimental, deductive • More possessive of data • Theoretical, inductive • More interested in sharing data • More interested in using shared data • Increasing blurring of boundaries in some fields • Provides gateway into communities • Access to data, knowledge about data is “valuable resource” (p. 343) • Those who control data and knowledge, and access to it, act as “gatekeepers of the field” (p. 343) LIS 6269 Seminar in Information Science
Data’s role in scientific communities • Indicates status in community • Using one’s own data “seen as ‘better’” than using public data (p. 344) • “Analyzing somebody else’s data … arguably ‘counts’ for less” (p. 344) • Higher quality data means better reputation • For researchers, research groups, and institutions • Enables indoctrination into community • Students often work with collecting, managing data • Degree of sharing of responsibilities differs between fields, sometimes by seniority in field LIS 6269 Seminar in Information Science
Categories of data uses (p. 345) • Identified with an eye to “revenue” from use • Benefits: reputation, publications, funding, etc. • “A scientist’s data set is her [or his] castle” • Researcher wants to and is able to use data to solve a particular problem or question • Will increase revenue • “With a little help from my friends” • Researcher wants to use data, but needs to collaborate with others in order to do so successfully • Data can be shared privately • Limited risk (but still some risk) • Will increase revenue LIS 6269 Seminar in Information Science
Categories of data uses (p. 345) • “One scientist’s junk is another one’s treasure” • Researcher has no interest in using the data for a particular problem, but others do have interest • Sharing data will slightly increase revenue • May not be worth risk of losing other revenues • “D’oh!” • Researcher has not thought of a use, but it would be relevant to them and help them with a problem or question • Sharing data could be embarrassing, decrease revenue LIS 6269 Seminar in Information Science
Categories of data use • Researchers will be less willing to share data unless incentives high, risks low • Data sharing follows social networks • Provide facilities for communication around abstractions of data sets • Encourage sharing and collaboration (category 2) • Extend researcher’s social network • Reduce risks of embarrassment (category 4) • Preliminary abstractions allow questions / comments before they are embarrassing • Increase incentives and benefits (categories 2 & 3) • Beyond boundaries of researcher’s community LIS 6269 Seminar in Information Science
Recommendations and conclusions • Efforts to support “social interaction around data abstractions and the data themselves” should be made (p. 346) • Metadata should be augmented through “the sharing of supplementary materials” (i.e. abstractions) (p. 346) • Consideration of the “social and scientific roles of data” and how to support them necessary in future research (p. 346) • Better understanding of data abstractions needed (p. 347) LIS 6269 Seminar in Information Science
Issues with study and article • Bias towards natural sciences • Social scientists may use, share data differently • Only 3 disciplines studied, others may differ further • Generally coherent, but some parts hard to follow • Indoctrination examples appeared similar, despite what authors termed “critical” distinction (p. 344) • Promised “three aspects of the way data are used” but only discussed two dimensions (p. 341) • Limitations only discussed briefly LIS 6269 Seminar in Information Science
Questions, comments? LIS 6269 Seminar in Information Science