210 likes | 285 Views
LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading. LinkSphere. Linking Researchers and their Data Social networking for researchers Cross-database search Mostly Arts and Humanities datasets “Promoting serendipity”
E N D
LinkSphere:P2P Cross Database Search -- Architecture and IssuesHugo MillsUniversity of Reading
LinkSphere • Linking Researchers and their Data • Social networking for researchers • Cross-database search • Mostly Arts and Humanities datasets • “Promoting serendipity” • Access by and presentation of datasets to wider audiences
Datasets • Museums Archives • Archaeology: Silchester Excavation, IADB • Ure Museum of Classical Archaeology • CentAUR: ePrints • Library • Beckett Collection • Cole Museum of Zoology • Film Collection • Herbarium • Typography Collections
Tycho • Fully asynchronous peer-to-peer communications framework • Written in Java • Fully distributed • Robust • “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” (Leslie Lamport) • Has a simple distributed data store (“Virtual Registry”) for client metadata
Tycho • (Relatively) lightweight • 3MiB for a fully functional system • Fast • Flexible, Extensible • Bootstrap handlers • Additional message types • VR extensions • Alternative communication protocols • Discovery of core mediators via Bonjour/ZeroConf
XDB System Architecture Search App Search App REST search API Tycho Core Meta Meta Meta Meta VR VR VR VR JDBC ... Web API SPARQL Repo Repo Repo Repo
User Interface • Main UI is web-based • Uses AJAX • Currently embedded within the LinkSphere project site • Will ultimately move to the SNS • Any UI possible using the REST API
Issues • Getting the data is hard • Implementation problems • Maintenance problems • Admin problems • Social problems • Legal problems
“Muddling along” • Archive of material for intra-departmental use only • Some legal issues involved • Group of technicians administering the data • Poor quality data • Excel spreadsheet(!) • Reluctant to have index of material made public
“Not ready yet” • Big university projects • New systems, (potentially) large data sets • MERL museums archive (AdLib) • Data all loaded from previous systems • Access modules not yet installed • CentAUR publications archive (ePrints 3) • Very little data available yet
“Works For Me” • Custom web application • PHP, sophisticated • External developer • No documentation • MySQL underneath
“It works, but...” (part 1) • Non-technical users • Admins are Mac-only, desktop-only people • FileMaker Pro • DB structure and UI developed externally • No documentation • This has bad implications
“It works, but...” (part 2) • Completely custom application • External developer • No documentation (again) • Large lump of write-only perl • Custom data store • Not SQL. Not XML. Not RDF. • No external access
Unreachable data • Uncommunicative systems • Custom applications • Developers/administrators AWOL • Custom data models • Lost passwords • Excel spreadsheets • See also, “Uncommunicative”
Unreachable data • Private data • Legal issues • Possessive owners • Internal use only • Poor quality • No data!
Conclusions • Building the software is easy • There is still lots of hard-to-reach data out there • Issues are largely not technical • More outreach to A&H areas needed
Acknowledgements and thanks • LinkSphere team: Mark Baker, Shirley Williams, Pat Parslow (Reading), Claire Warwick, Melissa Terras, Claire Ross (UCL) • Repository owners at Reading: Amy Smith (Ure Museum), Guy Baxter (University Archivist), Mary Dyson, Hadj Messelles (Typography), Jonathan Bignell (Film Studies), Alison Sutton (CentAUR), Mike Fulford, Amanda Clarke (Silchester) • JISC VRE 3 programme
Tycho Architecture C VR C C M C C VR M M VR C M VR C C
REST Interface • /api/query • POST to start new query asynchronously • /api/query/query_id • GET for query metadata • DELETE to cancel query (or it will time-out naturally) • /api/query/query_id/start/finish • GET a range of results from the query • Feedback API coming soon
REST Interface • /api/repository • GET list of repositories currently online • /api/repository/repo_id • GET for repository metadata • Link to repository itself • Link to LinkSphere description of it