1 / 49

The Certainty of Citations

The Certainty of Citations. A proposal for an objective method of measuring certainty. Genealogy Background. Notice the light at the top of the picture. The FM Bobo Story. Grandmother. Grandfather of grandmother. 1860 Census. 1870 Census.

lucus
Download Presentation

The Certainty of Citations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Certainty of Citations A proposal for an objective method of measuring certainty

  2. Genealogy Background Notice the light at the top of the picture.

  3. The FM Bobo Story Grandmother Grandfather of grandmother

  4. 1860 Census

  5. 1870 Census

  6. Marriage RecordCarroll County Arkansas Marriage Records Eastern District Grooms Index 1869-1930 Note 3 year gap in age.

  7. 1880 Census

  8. Remember Jarrett for later

  9. 1920 Census

  10. Jarrett’s Funeral Book

  11. Record Summary

  12. Let’s talk about that … Note person partially in picture.

  13. The Information Flow Diagram • Event – an association of an action, place, time, and person(s) EVENT Dick Eastman at GENTECH2, January 1994

  14. The Information Flow Diagram • Reporter – a person who creates a record about an event. • We can measure confidence or bias. EVENT John Wylie, president of GENTECH for 5 years REPORTER

  15. The Information Flow Diagram • Record – a report about an event, which may not be complete or accurate • Measure granularity. EVENT RECORD REPORTER

  16. What’s Granularity?

  17. Granularity Examples

  18. The Information Flow Diagram “ER Gap” • Reviewer – a person who reviews records and draws conclusions. • Evaluate ER Gap, evaluate Reporter. EVENT RECORD REPORTER REVIEWER Tony Burroughs, NGS 2001, Portland OR

  19. The Information Flow Diagram • Conclusion – a statement by a reviewer about a collection of records related to an event • Report – a collection of conclusions. EVENT RECORD REPORT REPORTER REVIEWER

  20. ER Gap Far All Records about my family0 “Secondary” Record1 “Secondary” Record1 “Primary” Record2 Near Near Far

  21. Features of EVIDENCE: The Record • Granularity • “Mind the Gap” - ER Gap • Reporter

  22. CONCLUSION – Rate It • 1 - Believe • 2 - Know • 3 - Can Prove • 0 – No claim • Negative numbers -1, -2, -3

  23. TRUST: The Report • Do this like eBay

  24. So many formulas … • … so few examples. • Record granularity measurement – 3 to 9 • ER Gap – 0, 1, or 2 • Reviewer evaluation of reporter -1 to 10 • Reviewer confidence - -3 to 3 • Trust number, positive feedback ratio • [Granularity / 5] + [ER Gap] + [Report Eval / 5] + [Reviewer Confidence] + [Trust ratio / 0.5]

  25. The Death Certificate Demographic Info Medical Info

  26. It’s “What-if” Time What if we could make the future however we like?

  27. Mechanical Certainty Finding Needles in Really Big Haystacks

  28. Record Linking • Building Indices • Finding larger patterns

  29. Where: • x indicates the identifier and its value on the record from the file initiating the search (record A); • y indicates the identifier and its value on the record from the file being searched (record B); • LINKED pairs may refer either to all linked pairs, or to a defined subset of these; and • UNLINKABLE pairs may refer either to all unlinkable pairs, or to a defined subset, provided the linked and the unlinkable sets (or subsets) are otherwise strictly comparable with each other.

  30. Examples • FIRST INITIALS • AGREEMENT • DISAGREEMENT • LETTER “Q” • YEAR OF BIRTH • SIMILARITY (difference = 1 year) • DISSIMILARITY (difference = 11+ years) • GIVEN NAMES • SIMILARITY (first 3 letters agree, none disagree – eg Sam vs Samuel) • SIMILARITY + DISSIMILARITY (first 3 letters agree, 4th disagrees – eg Samuel vs Sampson) • DIFFERENT BUT LOGICALLY RELATED IDENTIFIERS • PLACE of WORK vs PLACE of DEATH (Provo vs Salt Lake City)

  31. Some more examples

  32. Discrimination • A lookup table containing the frequencies of values for identifiers, as they appear in the file being searched. • SURNAMES Brown (0.39), Aube (0.014), and Skuda (0.00004). • FIRST NAMES John(5.30), Axel (0.020), and Ulder (0.0045).

  33. Competing Hypotheses

  34. The Digital Research Assistant • Search for records on internet • Evaluate their relevance to assignment • Evaluate their granularity, confidence, etc • Evaluate patterns, such as families • Report matches • Let me set the knobs for the parameters

  35. The DRA will have ... • A heirarchy of useful comparison algorithms • A method of searching across the Internet - and paying for it • A method of documenting the source of that search that satisfies the rules of preserving intellectual property and academic research

  36. Who knows what the formula will be? • We are asking which dragons must be slain, but we aren’t saying how it must happen. • We are talking about possible ways to accomplish our goal. • That goal is connecting to new information, with confidence.

  37. Summary • Any type of review • Measurements of Records • Measurement of conclusions • Rating of publishers • Mechanical searches • Record Linking • Smart Searches • Groupwork and Rights

  38. Never forget to have fun

More Related