160 likes | 281 Views
Exploring ‘Workspaces’. Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010. Background Overview of cases Technical possibilities Opportunities and risks Expected results Proposed approach. The CLARIN-NL connection.
E N D
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21st September 2010
Background • Overview of cases • Technical possibilities • Opportunities and risks • Expected results • Proposed approach
The CLARIN-NL connection • Seeking to create an infrastructure for language resources • Providing access to tools and technologies • CLARIN-NL and BiG Grid are exploring possibilities • The WHOLE pipeline • Creating • Curation • Collecting • DO SCIENCE • Depositing
Already • SARA has developed a client implementation of a Persistent Identifier Service (HANDLE) and has become an EPIC consortium member • Instance of service currently hosted at SARA • BiG Grid / SURFNET pilot with Short lived credential service • Activities with Computational Linguistics (e.g. Named Entity Recognition) & forthcoming Computational Humanities institute (KNAW) • Series of workshop to find a common ground between BiG Grid and the CLARIN infrastructure
Questions of today • When is a user workspace service? • Why do we need user workspaces? • What are their characteristics in a distributed environment? • How do we support processing chains in distributed environments driven by community environments • Are there generic frameworks for the execution of distributed processing chains and deployment of web-services
Core problems • Where to store • How to store • How to access • How to foster collaboration amongst people • How to support: Data discovery, exploration and exploitation • How to realize such a service • What SLA / service description / responsibilities
What it should be • A temporary storage place (days, weeks, years) • Global home / global scratch • A ‘logical mount point’ • Accessible by web services • Meaningfully accessible by a human • Autonomy to communities • Instantiate • Content • Control • Identifiable • Store digital objects and metadata • Journaling (register interactions)
Create • Read • Write • Update • Grant access to (Authorization) • List contents • Search contents • Adopting & offering known best practices and services in the ecosystem • …
Considered technical possibilities • iRODS • Cloud platform (SNIA/CDMI) • HADOOP implementation • AMAZON S3 / OpenCloud / Azure /
Risks and opportunities • Creating something that is only generic - specific • Looking uphill, but what will you know when you’ve climbed the hill • Knowledge of the community • Epistemological problems • Bootstrapping • Trust • Proces focus: we are starting a small scale pilot within 1 month, short iterations, keeping everyone involved.
Approach: BiG Grid and Dutch partners • Many interesting addressable cases • Keyword extraction from dutch audio and film institute • MPI video repository annotations • City of Den Haag government proceedings: minutes and video alignment (feature extraction) • OCR & Machine learning on dutch handwritings • Expected results • Common understanding of a workspace service • Bootstrap implementation vertically crossing all layers
When is a user workspace service? • When it is used and has become an indispensible tool • Why do we need user workspaces? • To be able to flexibly work with data • Initiate collaborations • Have a trustable storage resource availble • What are their characteristics in a distributed environment? • Clear core functionality, many service providers, integration with identity providers • How do we support processing chains in distributed environments driven by community environments • By having open, known, and easily accessible services • Are there generic frameworks for the execution of distributed processing chains and deployment of web-services • Yes!