1 / 18

Maristella Feustle MOUG - Portland, Oregon January 30, 2018

The Life-Changing Magic of OpenRefine. The Open-Source Art of Data Decluttering and Organizing. Maristella Feustle MOUG - Portland, Oregon January 30, 2018. Today’s data sets. http:// goo.gl/P4zKkE. Why OpenRefine?.

sammy
Download Presentation

Maristella Feustle MOUG - Portland, Oregon January 30, 2018

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Life-Changing Magic of OpenRefine The Open-Source Art of Data Decluttering and Organizing Maristella Feustle MOUG - Portland, Oregon January 30, 2018

  2. Today’s data sets http://goo.gl/P4zKkE

  3. Why OpenRefine? Establishing meaningful relationships within and between datasets requires that all of the data are being read as intended. Use OpenRefine to: • Clean up data - standardize, correct, rearrange • Automate tedious editing • Match with outside controlled vocabularies • Prep and export data for use in programs like MarcEdit, Tableau, Gephi, Raw, CartoDB, Google FusionTables, and more

  4. Before and after

  5. A brief history Freebase Gridworks Google Refine OpenRefine GREL - General Refine Expression Language Many versions to choose from. Even “release candidate” versions are useful.

  6. Creating a project Data can be as simple as an Excel spreadsheet inventory, or even a text file Data “lives” on your computer - security, preservation pros and cons MarcEdit can export TSV or JSON files to OpenRefine. Memory issues with manipulations of very large files

  7. http://www.thinkgeek.com/product/6806/

  8. Then what? How you proceed depends on: • The final form you want your data to have • How much intervention your data requires to get there

  9. *Disclaimer Maristella mainly works in archival descriptive standards. She has not been evaluated by the FDA (FRBR’s Dedicated Aficionados) and is not intended to diagnose, treat, cure, or prevent any RDA-related maladies. Consult your cataloging professional for advice.

  10. Project 1: Tabular data Columns are the basis of organization in OpenRefine. If your data is already in rows and columns (Excel, CSV, TSV, etc.), there is little translation to do in importing a project: “What you see is what you get.” Check encoding for special characters. Specify a first row as header, if applicable.

  11. Basic transformations Moving things around Editing data Demonstration: Column order, sorting, facets, clustering.

  12. Lather, rinse, repeat Can extract command history to reuse on other datasets.

  13. Fun with GREL Many alternatives exist to writing code from scratch for common transformations: https://data-lessons.github.io/library-openrefine/04-basic-functions-II/ https://github.com/OpenRefine/OpenRefine/wiki/Recipes http://arcadiafalcone.net/GoogleRefineCheatSheets.pdf

  14. Project 2: MARCEdit Exports Source: http://www.dlib.vt.edu/projects/OAI/marcxml/marcxml.html Use MARCEdit / MARCBreaker to parse raw MARC. MARCEdit exports to OpenRefine as JSON and TSV. Use Facet function to isolate MARC fields.

  15. Project 3: Bibframe 1.0 Source: http://kcoyle.net/bibframe/book.rdf.xml

  16. Export TSV and other formats.

  17. Other resources: Terry Reese: MarcEdit and OpenRefine: http://blog.reeset.net/archives/1873 OpenRefine.org, book and videos: http://openrefine.org/ Free Your Metadata: http://freeyourmetadata.org/ OpenRefine Reconciliation services: http://refine.codefork.com/ More reconcilable data sources: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources Overdue Ideas: “A Worked Example of Fixing Problem MARC Data”: http://www.meanboyfriend.com/overdue_ideas/?s=a+worked+example

  18. Thank you! Maristella.Feustle@unt.edu If you can read this, you don’t need glasses.

More Related