1 / 20

FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012

Improving Indexing Efficiency & Quality: Comparing A-B-Arbitrate and Peer Review. FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012. Derek Hansen, Jake Gehring , Patrick Schone , and Matthew Reid. FamilySearch. FamilySearch indexing. A-b-arbitrate process (a-b- arb ). A. ARB. B.

lida
Download Presentation

FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Indexing Efficiency & Quality:Comparing A-B-Arbitrate and Peer Review FAMILY HISTORY TECHNOLOGY WORKSHOPFebruary 3, 2012 Derek Hansen, Jake Gehring, Patrick Schone, and Matthew Reid

  2. FamilySearch

  3. FamilySearch indexing

  4. A-b-arbitrate process (a-b-arb) A ARB B

  5. The problem

  6. Our approach • Historical Data Analysis • Field Experiment comparing quality control models

  7. Historical data analysis • Quality (estimated based on A-B agreement) • Measures difficulty more than actual quality • Underestimates quality, since an experienced Arbitrator reviews all A-B disagreements • Good at capturing differences across people, fields, and projects • Time (calculated using keystroke-logging data) • Idle time is tracked separately, making actual time measurements more accurate • Outliers removed

  8. A-B Agreement by field

  9. A-b agreement by language English Language French Language Given Name: 62.7% Surname: 48.8% 1871 Canadian Census • Given Name: 79.8 • Surname: 66.4

  10. A-b agreement by experience Birth Place: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

  11. A-b agreement by experience Given Name: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

  12. A-b agreement by experience Surname: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

  13. A-b agreement by experience Gender: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

  14. A-b agreement by experience Canada - English U.S. - English Mexico - Spanish Canada - French

  15. Time & keystroke by experience

  16. Time & Keystroke of ARB

  17. A new approach? (A-R-ARB) • Peer review model • Efficiency ++ • Quality ?

  18. Peer review process (A-R-ARB) A R ARB Already Filled In Optional?

  19. Field Experiment • Develop Truth Set of 2,000 1930 Census images • Use historical A-B-ARB data • Create new A-R-ARB dataset by having new indexers review and arbitrate • Compare quality & efficiency • Qualitatively identify types of errors

  20. Discussion IMPLICATIONS • Transition users from novice to expert • Recruit foreign language indexers • Intelligent matching based on expertise (in A-B-ARB &/or A-R-ARB) FUTURE POSSIBILITIES • Peer review by algorithms? • Initial indexing by algorithms?

More Related