490 likes | 785 Views
DIGITAL ANTIQUITY: Planning a Digital Information Infrastructure for Archaeology Grant Period: 7/1/2007-6/30/2008. ARCHAEOINFORMATICS.ORG Steering Committee Keith Kintigh , Convener, Arizona State University Jeffrey Altschul , SRI Foundation Tim Kohler , Washington State University
E N D
DIGITAL ANTIQUITY: Planning a Digital Information Infrastructure for ArchaeologyGrant Period:7/1/2007-6/30/2008 ARCHAEOINFORMATICS.ORG Steering Committee Keith Kintigh, Convener, Arizona State University Jeffrey Altschul, SRI Foundation Tim Kohler, Washington State University Fred Limp, University of Arkansas Julian Richards, University of York Dean Snow, The Pennsylvania State University also John Howard, Arizona State University C. Lee Giles, The Pennsylvania State University
DIGITAL ANTIQUITY: Planning a Digital Information Infrastructure for Archaeology Planning Grant In Progress (7 months) Goals: Preservation, Discovery & Access, & Data Integration Scope: Newly Created an Legacy Data CRM & Academic Archaeology Documents, Databases, and Images, + Plan for Geospatial & Exotic Initial Context: Americanist Archaeology Needs and Vision Organization Assessment Prototyping Platform & Tools Jump-starting Case Studies Current State of Planning Technical, Social, Financial Washington State University
Disciplinary Needs for Broad-based Information Infrastructure • Explosion of digital information • $1 Billion/year in Archaeology (US) • 50,000-100,000 reports/year, 1000s of databases (US) • Discovery & Full Access • Lack of availability (on-line or otherwise) of information resources • Absence of intelligent discovery tools • Problem of data standardization • Lack of tools to enable semantic integration • Digital Preservation Problems • Absence of existing facilities for preservation • Media degradation & software obsolescence • Degradation & loss of data semantics (metadata)
Professional Ethics SAA Principle No. 5: Intellectual Property “Intellectual property, as contained in the knowledge and documents created through the study of archaeological resources, is part of the archaeological record. As such it should be treated in accord with the principles of stewardship rather than as a matter of personal possession.” SAA Principle No. 7: Records and Preservation “Archaeologists should work actively for the preservation of, and long term access to, archaeological collections, records, and reports.” Society for American Archaeology - Principles of Archaeological Ethics 1996 Washington State University
Archaeoinformatics.org Vision • Sustainable, general-purpose infrastructure • New and legacy digital archaeological data • Goals for the infrastructure • Preservation (with Versioning and Persistent Cite-ability), • Discovery & Access, • Data Integration • Interoperability discovery and download with related Infrastructures • Registration of Extensive, Machine Processable Metadata • Integrated in workflows of those generating the data • Initial focus on delivering tools for: • Text (Gray Literature) • Databases, • Images • Work with ADS and others on metadata standards • Fostering interoperability • For Photogrammetry/Geospatial/CAD/Remote Sensing, LiDAR, Laser Scanning (HDS), Geophysical Data
Outcomes Advance humanistic understandings of the past • Sustainable and reliable digital data & metadata preservation • Ability to re-evaluate hypotheses and arguments • Improved research through increased reuse of existing data • Enable large-scale & synthetic research • Time & cost savings • More effective use of research $ • Expanded availability to the broader public
OrganizationSteering Committee Keith Kintigh, Past President SAA,Arizona State University Jeff Altschul, Past Treasurer SAA,SRI Foundation Tim Kohler, Past Editor, American Antiquity,Washington State University Fred Limp, Past Treasurer SAA,University of Arkansas Julian Richards, University of York (UK) Dean Snow, President SAA,The Pennsylvania State University
Board of Directors Brian Crane, DOD/Versar, Inc. Katherine Emery, Florida Museum of Natural History, University of Florida Sebastian Heath, Archaeological Institute of America Eric Kansa, University of California at Berkeley, Alexandria Archive Institute Francis McManamon, Chief Archaeologist, National Park Service Worthy Martin, University of Virginia Fraser Neiman, Thomas Jefferson Foundation, University of Virginia Vincas Steponaitis, Past President SAA, Research Laboratories of Anthropology, University of North Carolina Herbert Van de Sompel, Los Alamos National Laboratory Phillip Walker, Past President AAPA, University of California at Santa Barbara Willeke Wendrich, University of California at Los Angeles Thomas Whitley, Brockington & Associates Mellon Funded Project Participants
NSF-Access Grid Virtual Lectures • Eric C. Kansa - Executive Director of the Alexandria Archive Institute • "Open Context:Community Tools for Publishing Research Data on the Web" • Chaitan Baru - Director of Science Research and Development at the San Diego Supercomputer Center • "GEON: Geosciences Network" • Michael J. Halm (& John Yoo) - Senior Strategist and Manager for the Special Project activities for the Teaching and Learning with Technology group, Penn State University, • "LionShare: Secure P2P File Sharing and Collaboration" • Mark Gahegan (& Chaitan Baru, Boyan Brodaric) - Professor of Geography and affiliate professor of Information Science and Technology at the Pennsylvania State University • "Sharing our resources, sharing our understanding: Cyberinfrastructure for Archaeology" • Fred Limp- Leica Chair and Director Center for Advanced Spatial Technologies, University of Arkansas • “Interoperability and net-centric architectures: lessons for archaeoinformatics from the Open Geospatial Consortium” • Mark Schildhauer - National Center for Ecological Analysis and Synthesis, Santa Barbara • "Ecological informatics: challenges and approaches, and potential relevance for archaeology ” • Julian D Richards - Professor of Archaeology, University of York and Director, Archaeology Data Service • “Current challenges for digital preservation and delivery” • Ian Johnson - Archaeological Computing Laboratory, University of Sydney • “ECAI: The snowball still survives“ • Katherine Skinner - Digital Projects Librarian at the Emory University Libraries • "Collaborative Adventures in Distributed Digital Preservation: The MetaArchive Cooperative and the Educopia Institute"
Assessment: On-Line SurveyUser Needs and Attitudes • 270 responses primarily from members of the SAA’s Digital Data Interest Group • 94% responded that documentation of the archaeological record is being lost • 94% responded that they would use electronic data more if it were accessible • 90% responded that it is the responsibility of a project sponsor to fund and ensure curation of databases • More than 60% responded that users should not be charged access fees
Phase I - Case Studies Goal: Demonstrate research value to the community Criteria for implementation case studies driven by compelling research questions executed by multi-institutional cooperatives at least one international at least one with large component of legacy data; at least one with large component of recent CRM data Southwest Dolores Archaeological Project Bandelier Archaeological Excavation Project Central & West Mexico Teotihuacan Mapping Project La Quemada, Zacatecas Fauna – Southwest & Midwest (NSF) Washington State University
Strategic Emphases • Access & Discovery • Preservation & Archiving • Interoperability • Leveraging existing open source initiatives • Incorporating Web 2.0 characteristics • Building on ADS and European experience • Develop next generation shared infrastructure • Prototypes allow us to assess potential designs • Platform • Tools • Institutional Structure
Staged Implementation - Level 1 • Interoperable gateway • Development/adoption of publish and discovery specifications and tools for federated search • Package & search high-level metadata (not looking inside resources) • Creation of preservation archives • Development/adoption of best practices for workflows • Building collaboratively on ADS “Guides to good practice” • “Test bed” pilot projects • Focused on “high value” data sets & information • Investigate automated search and ontology development tools/strategies • e.g., Lagoze & Van de Sompel (interoperability) Washington State University
Staged ImplementationLonger Term - Level 2 • Address complex semantic & ontology issues • Engage expert and institutional groups • Workshops, etc. • Develop/adopt semantic web tools to assist semantic mapping & ontology development • Integrate complex document search • Semantic database integration • Image and location search
Publish and Discovery • Interoperate using core metadata specs including ability to access actual data • e.g., what, where, when, permissions/control • Adopt existing metadata discovery & aggregation toolkits • English Heritage Gateway • ARENA - Archaeological Record of Europe Networked Access • MIDAS/CIDOC (ISO 21127) Harmonization • Consistent w/ OAI-PMH Protocol for Metadata Harvesting • Assign unique and persistent addresses for resources • Institutional adoption of specifications & best practices • e.g., Fedora, DSpace,…? • Community Building • Workshops and evangelization Washington State University
Tentative Archive Architecture • Base Platform • Central metadata catalog • Search & discovery • Trusted repository for information resources • Expanded access to distributed resources if registered • Federated Repositories (Branding) • running same software stack • Discovery & Access through Other Repositories • Based on metadata sharing standard (e.g., OAI)
Prototyping: Archive Platform • Prototype platform for an open source, Internet accessible archaeological information infrastructure. (NSF Funded) • tDAR already provides basic preservation, discovery and access functions • tDAR provides concept-oriented access and semantic integration across datasets • tDAR focuses on databases but also processes text and images. • One element of an international federated structure
tDAR’sApproach • Users: Anyone may register; approval for contributors • Register Project & Resources to tDAR Metadata Catalog • Goal: Preserve the original semantics of data • Project & Resource metadata (extended Dublin Core) • Extensive machine processable metadata at the level of data tables, columns, and values • Upload Files (or Point to Distributed Resources) • Text files in ASCII or PDF; Images in JPG and TIFF images; • Databases ingested as Access®, Excel®, or CSV files then converted to PostgreSQL for search integration & maintenance • Search: metadata or resource content (db or text) • Add ontology-driven concept-oriented search • Add search & download to other infrastructures, such as ADS or OpenContext • Download: Resources for further analysis • Add semantic integration across databases, output integrated databases • Complete citation information • Add Semantic Data Integration (output integrated databases)
tDAR: Semantic Integration • tDAR will reconcile the semantic demands of a query with the semantic content of the available datasets (rather than global reconciliation of data sources). • tDAR uses query-driven, ad-hoc data integration in which, given aquery, • it will identify relevant data sources • reason with potentially incomplete or inconsistent information. • perform interactive, on-the-fly metadata matching to align key portions of the metadata • Interact, as necessary with the user • Expands on ADS Capabilities • open source code available for reuse
Integrated search engine for archaeology • Searches text, citations, maps, tables, locations, time • Prototype data: 8,000 documents from JSTOR archaeology journals • Leverages other open source projects - Lucene indexer • JSTOR metadata used for metadata extraction and indexing • ChemXSeer, chemistry, table extraction and indexing (at Penn State) • Will use aspects of CiteSeerX ingestion, indexing and crawling • Table search and data extraction • Extract data from tables in an XML OAI format • For use in other experiments or data aggregation • Provide open source extraction tools for other systems. • Progress to date from a 6 month effort. • http://cxs02.ist.psu.edu:8080/archseer/ Washington State University
Implementation Case Studies • Expand Planning Grant Case Studies • Spread of agricultural societies in southwestern and southeastern US • New Case Studies • Arkansas Archaeological Survey • Exemplar of US comprehensive system • North Carolina Gray Literature • Open Context - Catahoulk, Petra, etc. • UCLA Encyclopedia of Egyptology • Global History of Health • SRI human skeletal scan data • Others? Washington State University
Social Demands on a Sustainable Digital Infrastructure • Credible Organizational Structure • Multi-institutional Board of Directors & Executive Committee • Buy-in from CRM and academic communities • Ease of use • Address confidentiality of archaeological site locations • Allow data in infrastructure to be private data for a time • Buy-in from funding, reviewing, or permitting bodies • Assist in meeting accountability and management needs • Integrate registration in grant or compliance contract workflows • Automated check consistency & completeness • Project is not complete until Agency signs off on deposit • Strengthen US regulations (36CFR79) and formal guidance
Additional Social Demands on a Sustainable Digital Infrastructure • Work with professional societies • Establishment of “industry standards” will help federal agencies mandate use • Require publication of digital data with journal articles • Work with museums with responsibilities as digital data repositories • Ensure proper credit is given to contributors • Citations with downloads • Usage statistics • Optional peer review • Training: on-line and in person
Professional Society Buy-In A related vision for databases developed by an NSF workshop and published in American Antiquity (Kintigh 2006) was endorsed by: • Society for American Archaeology • American Association of Physical Anthropologists • Society for Historical Archaeology
SAA Digital Data Interest Group Purpose: To promote the preservation and sharing of archaeological data maintained in digital form. • The long-term conservation and protection of the archaeological record demands that we preserve digital documents, images, and databases, and make them available to other scholars in order to advance archaeological understandings of the past. • The interest group will foster the development of shared digital archives of archaeological data. It will promote data sharing and preservation to the broader archaeological community and enhance communication and collaboration among data sharing initiatives. • Contact: Eric Kansa (UC Berkeley) • 796 Members to Date (>10% of SAA membership)
Financial Dimensions of a Sustainable Digital Infrastructure • Infrastructure development and startup from grants • Revenues to maintain cyberinfrastructure from: • contracts with federal and state agencies to maintain and to provide access to publicly-funded archaeological data • disintermediation - capture savings from academic and CRM projects (e.g. .5% of $1Billion) • Fundraising to develop a long-term endowment to support the cyberinfrastructure • To the extent possible, user fees will not be employed • Time to operational solvency – 5-6 years?
Acknowledgments Support from • The Andrew W. Mellon Foundation • NSF Grant IIS 0624341 • Steering Committee Institution Teams • Disciplinary & Technical Advisory Board Members Partners