440 likes | 550 Views
Data Sharing in Zooarchaeology Challenges and Promises. Sarah Whitcher Kansa The Alexandria Archive Institute. Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>. Why are we publishing data
E N D
Data Sharing in ZooarchaeologyChallenges and Promises Sarah Whitcher Kansa The Alexandria Archive Institute Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Why are we publishing data as part of this project? • What isdata publishing? • Why is it good for you?
Started in 2007 • Publishes archaeological data (open access / open data) • Archiving by California Digital Library • Prioritizes access & reuse
Data reuse is hard • Needs documentation (esp. methods), ideally with data creator • “Standards” applied, recorded in different ways • Expand on this initial study
“Data sharing as publishing” model … but need concrete examples of reuse! - NEH projects - EOL project
EOL Computable Data Challenge (Ben Arbuckle, Sarah W. Kansa, Eric Kansa)
Anatolia Zooarchaeology Case Study: Aims • Collaborative research paper(s) • Drawing on integrated datasets • Linked to published data • Example of research potential of data publishing • Eventually fill in the gaps (spatial and temporal) • Data publications • Lasting outcome, not just one-time integration • Edited, verified data • Linked data for future research opportunities
EOL Computable Data Challenge • 14 different sites • 34+ zooarchaeologists • Decoding, cleanup, metadata documentation • 220,000+ specimens • 450 entities linked to 143 EOL concepts • Collaborative analysis • Parsed out to you because so large
Data are challenging! • Decoding takes 10x longer • More work needed modeling research methods (esp. sampling) • Requires lots of back-and-forth with data authors. • Tension between modeling needs and familiarity with tools (Excel). Archiving is not enough! NEED data editing!
“Distal epiphysis unfused” http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058 uf. dist., f. prox. d. uf. 30 Distal epiph. unfused Distal end unf. dist. unfused
“Distal epiphysis unfused” http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058
Data Documentation Practices I use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13) A long way to go before we get usable, intelligible data
Open Context Entity Reconciliation Authors / Editors relate project-specific terminologies to global terminologies Many project-specific terms related to global terminologies Editorial work-flow helps annotate data for interoperability
“Ovisaries” http://eol.org/pages/311906/ Sheep Code: 16 Schaf Domestic sheep O. aries Code: 70 Ovisaries Code: 15 Code: 14
Why is linked data important for this project? • Foundation for future work, much • of which we can’t even imagine. • - Disambiguates terms at the outset, allowing for future informed uses of the data. • - Growing movement that allows data to be part of the web (not just on the web).
Questions for this project (and in collaboration with DIPIR): • How was the data reuse experience for you? • Your thoughts on data publication • - Feedback on EOL concepts