130 likes | 260 Views
Linked Data: Principles and Practice. Joe Futrelle Woods Hole Oceanographic Institution jfutrelle@whoi.edu WHOI / BCO-DMO, July 11, 2011. Grand challenge: whole systems. Observation and modelling of multiple systems at multiple scales Linking data from different disciplines
E N D
Linked Data:Principles and Practice Joe FutrelleWoods Hole Oceanographic Institutionjfutrelle@whoi.edu WHOI / BCO-DMO, July 11, 2011
Grand challenge: whole systems • Observation and modelling of multiple systems at multiple scales • Linking data from different disciplines • to get useful global results! “... modelling complex systems will be a major research challenge for the 21st century” - National Science Foundation
Building current practices up isn't working • Heterogeneous tools, data formats • Can’t get everyone in one workgroup • Funding goes to science, not stewardship M.C. Escher, “Tower of Babel” (1928)
Proposed solutions aren't working • e-Journals – not machine-interpretable • Collaboration tools • everyone falls back on email & other p2p • Portals and repositories – typically: • centralized • domain-specific • “The Grid” – can orchestrate complex processing jobs, but that's not science
Only networks work at scale • Single researcher • Ad hoc data mgt, single-user apps • Community • Community tools, resources, control • Global • No global practice, tools, control Desktop Workgroup Network
Or to put it another way … Ted Nelson, Computer Lib / Dream Machines (1974)
Data is the network linkeddata.org (2009) There is no boundary, center, or locus of control, … so it scales
“If you can’t tweet your dataset, it doesn’t exist” • Links are the global currency of the internet • The more people link to you, the more you matter (e.g., Page rank) • If nobody can link to your data, they will choose data they can link to instead • If someone links to your data, someone will link to them, and thus to you • The lowest entry barrier wins
Don’t drink the Kool-aid • Semantic web “layer cake” • Where do we do actual work? • User interface? • Applications? • “Semantic Grid” (D. DeRoure, C. Goble) (source: World Wide Web Consortium)
Semantics = what they hear • Shared semantics are minimal • Maximal semantics emerge when multiple nodes act on partial information • Validating each exchange doesn’t scale Gary Larson (1983)
Design data for network effects • Global, persistent identification • Open models (tolerate incompleteness) • Transparent protocols (pass-through) • “Graceful degradation” (cf. Dublin Core) • Data outlives code, so data should control code, not the other way around • Semantics matter, so they must be explicit and machine-readable (not a side effect of running code)
Practices that grow the network • Give everything a portable identifier • Link entities via properties = network • Reuse existing ontologies and only build the partial ontologies that fill in the gaps (e.g., don’t re-develop Dublin Core terms) • Emit metadata early and often; don’t assume curators will do it later (who? $?) • “Not building a wall; building a brick” (Oblique Strategies, 1970)