1 / 25

Delivering Data For New Generations of Research

Delivering Data For New Generations of Research. Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010. Introduction . Digital Repository Initial focus on digitized book and journal content “Light” archive Collections and Collaboration Comprehensive collection

palti
Download Presentation

Delivering Data For New Generations of Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010

  2. Introduction • Digital Repository • Initial focus on digitized book and journal content • “Light” archive • Collections and Collaboration • Comprehensive collection • Shared strategies • Local services • Public Good

  3. Content Distribution 6,173,575 – Total 1,177,667 – Public Domain * As of June 15, 2010

  4. Language Distribution (1) * As of June 15, 2010

  5. Language Distribution (2) The next 40 languages make up ~13% of total * As of June 15, 2010

  6. Originating Institution * As of June 15, 2010

  7. Content over time * As of June 15, 2010

  8. Content Growth

  9. Data Distribution & APIs • OAI-PMH • Metadata files • Bibliographic API • Data API

  10. Extended Services • Community Development Environment • Non-Google Ingest • Non-Book/Non-Journal Ingest • Computational Research

  11. Strategies for Computational Research • Data distribution • Protocol-based access • Research Center

  12. SEASR Architecture Visualizations User Interfaces Apps Plugins Web Apps Services Meandre Workbench Repositories Data Analysis Components Flows Meandre Data-Intensive Flows Components Developer Tools Data Analytics Visualization Component Repository Component Discovery Meandre Infrastructure Virtualization Infrastructure Cloud Computing

  13. SEASR @ Work – Tag Cloud • Count tokens • Filter options supported • Stem words

  14. SEASR @ Work – Entity Mash-up • Entity Extraction with OpenNLP or Stanford NER • Locations viewed on Google Map • Dates viewed on Simile Timeline

  15. SEASR @ Work – Entities To Network • Identify entities • Define relationships between entities within same sentence

  16. SEASR @ Work – Text Clustering • Clustering of Text by token counts • Filtering options for stop words, Part of Speech • Dendogram Visualization

  17. SEASR @ Work – Audio Analysis • NEMA: Executes a SEASR flow for each run • Loads audio data • Extracts features for every 10 sec moving window of audio • Loads and applies the models • Sends results back to the WebUI • NESTER: Annotation of Audio via Spectral Analysis

  18. SEASR @ Work – Zotero • Plugin to Firefox • Zotero manages the collection • Launch SEASR Analytics • Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR • Zotero Export to Fedora through SEASR • Saves results from SEASR Analytics to a Collection • Launch MONK Processing • MONK DB Ingestion Workflow

  19. SEASR @ Work – Emotion Tracking Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)

  20. Sentiment Analysis: Visualization

  21. Person Extraction:Scott's Waverley, Ivanhoe, and The Heart of Midlothian.

  22. Location Extraction:Top: Walter Scott's Waverley Bottom: Maria Edgeworth's Castle Rackrent

  23. Thank you!hathitrust-info@umich.edujjyork@umich.edu

More Related