1 / 31

Processing Non-English Content

Processing Non-English Content. Andrew Weidner NDNP New Mexico. Overview. Vendors Workflow QR Tools Alternatives. Vendors. Communication: start early, ask questions. Vendors. Communication: start early, ask questions One language vs. Multiple languages. Vendors.

turner
Download Presentation

Processing Non-English Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing Non-English Content Andrew Weidner NDNP New Mexico

  2. Overview • Vendors • Workflow • QR Tools • Alternatives

  3. Vendors • Communication: start early, ask questions

  4. Vendors • Communication: start early, ask questions • One language vs. Multiple languages

  5. Vendors • Communication: start early, ask questions • One language vs. Multiple languages • Processing Level

  6. Vendors • Communication: start early, ask questions • One language vs. Multiple languages • Processing Level One language = title

  7. Vendors • Communication: start early, ask questions • One language vs. Multiple languages • Processing Level One language = title Multiple languages: title, reel, issue, page, article

  8. Vendors • Communication: start early, ask questions • One language vs. Multiple languages • Processing Level One language = title Multiple languages: title, reel, issue, page, article • Pricing / Rework

  9. Workflow • Know your content: MARC record, essay research

  10. Workflow • Know your content: MARC record, essay research • Microfilm evaluation: confirmation / discovery

  11. Workflow • Know your content: MARC record, essay research • Microfilm evaluation: confirmation / discovery Best to find new content during film eval

  12. Workflow • Know your content: MARC record, essay research • Microfilm evaluation: confirmation / discovery Best to find new content during film eval • Batch QR: characterize content / check OCR quality

  13. Workflow • Know your content: MARC record, essay research • Microfilm evaluation: confirmation / discovery Best to find new content during film eval • Batch QR: characterize content / check OCR quality QR discovery = OCR rework

  14. QR Tools • Command Line: discover new content

  15. QR Tools • Command Line: discover new content find . -name "*.xml" -exec grep -Hil "aviso" {} \;

  16. QR Tools • Command Line: discover new content find . -name "*.xml" -exec grep -Hil "aviso" {} \;

  17. QR Tools • Command Line: discover new content find . -name "*.xml" -exec grep -Hil "aviso" {} \;

  18. QR Tools • Command Line: locate & quantify encoded content

  19. QR Tools • Command Line: locate & quantify encoded content find . -name "*.xml" -exec grep -Ho "language=\"spa\"" {} \; | uniq -c

  20. QR Tools • Command Line: locate & quantify encoded content find . -name "*.xml" -exec grep -Ho "language=\"spa\"" {} \; | uniq -c

  21. QR Tools • Web browser: check OCR accuracy

  22. QR Tools • Web browser: check OCR accuracy

  23. QR Tools • Web browser: check OCR accuracy

  24. Alternatives • ASCII Text Editor: edit pages

  25. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels

  26. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels language="spa"  language="eng"

  27. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels language="spa"  language="eng" • Unencoded non-English content already on ChronAm?

  28. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels language="spa" language="eng" • Unencoded non-English content already on ChronAm? Reprocess OCR & deliver overwrite content

  29. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels language="spa" language="eng" • Unencoded non-English content already on ChronAm? Reprocess OCR & deliver overwrite content Unencoded content is discoverable in basic search

  30. Alternatives • ASCII Text Editor: edit pages • Find & Replace: edit entire issues/reels language="spa" language="eng" • Unencoded non-English content already on ChronAm? Reprocess OCR & deliver overwrite content Unencoded content is discoverable in basic search Only encoded content is discoverable with language specific Advanced Search

  31. Questions ?

More Related