110 likes | 193 Views
Welcome . First SCAPE Developers’ Workshop. Andy Jackson The British Library. SCAPEdev1 AIT, Vienna - 6 th – 7 th June 2011. SCAPEdev1 – The Goals. Get to know each other. Get familiar with the major platforms.
E N D
Welcome First SCAPE Developers’ Workshop Andy Jackson The British Library SCAPEdev1 AIT, Vienna - 6th – 7th June 2011
SCAPEdev1 – The Goals • Get to know each other. • Get familiar with the major platforms. • Outline and discuss the initial Preservation Component (a.k.a. tool) integration plan.
Getting to know me • Andrew Jackson, at the British Library • Technical Coordinator for SCAPE, which means… • Someone to go to when you get stuck • Someone who will look for cross-work-package confusion or integration problems • Someone who will propose solutions if necessary • Chair of the Technical Coordination Committee • Raising and resolving technical integration issues, etc. • Open – let me know if you want to sit in or raise an issue • Will meet via Skype monthly
Getting to know each other • Round-the-room • Let’s get it out of the way… • And… • Talk together • Debug together • Eat together • And finally, put your picture on the SCAPE wiki and/or Sharepoint… • …if you don’tmind.
Where are Preservation Components used? • SCAPE Testbeds (TB) • Taverna designs workflows that run the tools. • SCAPE Platform (PT) • Executing tools and workflows at scale. • Preservation Planning & Watch (PW) • PLATO uses tools on sample files during planning. • And beyond… • Integration into tools, repositories, institutions, command-line interfaces (CLI) & scripting, etc…
Testbeds & Taverna • SCAPE Testbeds are generating scenarios and preservation workflows that explore them • Uses Taverna to build workflows that invoke our tools • WSDL/SOAP calls mature • RESTful style maturing rapidly • Core integration of CLI plugin as of 2.3 • We’ll play with it this morning
The SCAPE Platform • Initially, vanilla Hadoop, HDFS and HBase • Pure Java preferred • Local CLI okay too • Later on, running Taverna workflows • Pure Java, local CLI • Or locally deployed web services if necessary • Later on, may invoke services from inside VMs • e.g. when tools need a particular OS • Web services would make integration easier
Why HBase? • Hadoop+HDFS provides massive fault-tolerant file system and processing • But HDFS does not cope well with lots of 'small' files • See here for information:http://www.cloudera.com/blog/2009/02/the-small-files-problem • HBase architecture is a good fit for our use cases • HBase is used in many places… • Including web archiving at SCAPE partners • We’ll play with Hadoop and HBase this afternoon
PLATO & Wider Integration • PLATO Planning Tool (PW) • CC, PA and QA during planning process • Stable API web services required (WSDL or REST) • Repository Integration • Local CLI for some (e.g. ePrintsPLATO integration [*]) • RESTful for others • Web Integration • RESTful style preferred
The Challenges • Reproducible tool invocation across all contexts: • CLI, Java, SOAP / REST • But ease of development and deployment is critical • Interoperable data formats and consistent semantics across contexts where required • So clients can understand tool outputs correctly • Tomorrow we’ll explore the proposed integration plan