180 likes | 196 Views
PAWN Test III. Producer – Archive Workflow Network (PAWN). Distributed and secure ingestion of digital objects into the archive. Use of web/grid technologies – platform independent Ease of integration with data grids or digital libraries. XML Representation of metadata and bitstream
E N D
Producer – Archive Workflow Network (PAWN) • Distributed and secure ingestion of digital objects into the archive. • Use of web/grid technologies – platform independent • Ease of integration with data grids or digital libraries. • XML Representation of metadata and bitstream • Self describing bitstream submissions • Accountability of transfer and guarantee of data integrity
Previous workflow • Negotiate Submission Agreement. • Create Submission Information Packet.(SIP) • Submit SIP metadata for approval. • Transfer of SIPs to receiving servers after approval. • Validation of SIP transfer. • Organization of data into collections and transfer into the distributed archive.
Changes • Role of manager is much more significant • Client requirements reduced • No submit, approval, interaction • Multiple domains per management server • Stateless receiving server, manager can review submissions • New model for representing record schedules and file plans
Workflow Overview: Producer • Producer has records that need to be archived. • PAWN presents the producer with the list of record sets the producer is authorized to archive. • Producer selects the relevant record set and selects the list of data to be archived. • PAWN builds package and sends it to the receiving server.
Example: Producer • Works at the Patuxent Wildlife Research Center (domain) • Research scientist for the migratory birds population databases. • Available record sets: • Breeding Bird Census • Breeding Bird Survey • Avian Point Counts • Breeding Bird Atlas Explorer
Example: Record Submission • Submission of the Avian Point Count data published for 2001. (record set) • Presented with the following categories: • Raw Tabular Data • Summarized Tabular Data • FGDC Metadata • Producer selects the relevant files and directories for each category.
Authorities • Categories map to authorities in a record schedule. • Data Layers by USGS (1201-01c) • Raw Tabular Data • Summarized Tabular Data • Documentation (1201-01e) • FGDC Metadata • Producer is not required to know (but can know) the mapping between authorities and categories.
Workflow Overview: Managers • Create record schedules with authorities. • Create accounts for producers authorized to archive. • Create record sets to limit producers to specific archive duties. • Map record set categories to record schedule authorities. • Review packages submitted by producers.
Domains • Logical unit of administration and delegation • Domain contains its own set of: • Producers • Record schedules • Record sets • Managers • Each producer belongs to a domain. • Each manager controls actions within a specific domain.
Overview: Administrators • Create organization hierarchy and domains • Create managers for domains. • Can perform the activities of any manager.
Overview: Security • Separate realms of security • Archive, each management server • Trust between management server and archive. • Management server issues SAML Assertion to client software that contains: • Domain and identity • Current role (admin, producer, mgr, etc)
Overview: Receiving Server • Mostly disposable, configuration stored on scheduler • Can migrate data between central resource (SRB) and local cache. • Can handle packages from multiple producing sites. • Packages contain authorization information.
Test Collections • NARA Collections • Model current record schedules with focus on producer ingestion • ARL collection • Bush library • Test to see how PAWN works with non-scheduled data sets.