490 likes | 626 Views
The Certainty of Citations. A proposal for an objective method of measuring certainty. Genealogy Background. Notice the light at the top of the picture. The FM Bobo Story. Grandmother. Grandfather of grandmother. 1860 Census. 1870 Census.
E N D
The Certainty of Citations A proposal for an objective method of measuring certainty
Genealogy Background Notice the light at the top of the picture.
The FM Bobo Story Grandmother Grandfather of grandmother
Marriage RecordCarroll County Arkansas Marriage Records Eastern District Grooms Index 1869-1930 Note 3 year gap in age.
Let’s talk about that … Note person partially in picture.
The Information Flow Diagram • Event – an association of an action, place, time, and person(s) EVENT Dick Eastman at GENTECH2, January 1994
The Information Flow Diagram • Reporter – a person who creates a record about an event. • We can measure confidence or bias. EVENT John Wylie, president of GENTECH for 5 years REPORTER
The Information Flow Diagram • Record – a report about an event, which may not be complete or accurate • Measure granularity. EVENT RECORD REPORTER
The Information Flow Diagram “ER Gap” • Reviewer – a person who reviews records and draws conclusions. • Evaluate ER Gap, evaluate Reporter. EVENT RECORD REPORTER REVIEWER Tony Burroughs, NGS 2001, Portland OR
The Information Flow Diagram • Conclusion – a statement by a reviewer about a collection of records related to an event • Report – a collection of conclusions. EVENT RECORD REPORT REPORTER REVIEWER
ER Gap Far All Records about my family0 “Secondary” Record1 “Secondary” Record1 “Primary” Record2 Near Near Far
Features of EVIDENCE: The Record • Granularity • “Mind the Gap” - ER Gap • Reporter
CONCLUSION – Rate It • 1 - Believe • 2 - Know • 3 - Can Prove • 0 – No claim • Negative numbers -1, -2, -3
TRUST: The Report • Do this like eBay
So many formulas … • … so few examples. • Record granularity measurement – 3 to 9 • ER Gap – 0, 1, or 2 • Reviewer evaluation of reporter -1 to 10 • Reviewer confidence - -3 to 3 • Trust number, positive feedback ratio • [Granularity / 5] + [ER Gap] + [Report Eval / 5] + [Reviewer Confidence] + [Trust ratio / 0.5]
The Death Certificate Demographic Info Medical Info
It’s “What-if” Time What if we could make the future however we like?
Mechanical Certainty Finding Needles in Really Big Haystacks
Record Linking • Building Indices • Finding larger patterns
Where: • x indicates the identifier and its value on the record from the file initiating the search (record A); • y indicates the identifier and its value on the record from the file being searched (record B); • LINKED pairs may refer either to all linked pairs, or to a defined subset of these; and • UNLINKABLE pairs may refer either to all unlinkable pairs, or to a defined subset, provided the linked and the unlinkable sets (or subsets) are otherwise strictly comparable with each other.
Examples • FIRST INITIALS • AGREEMENT • DISAGREEMENT • LETTER “Q” • YEAR OF BIRTH • SIMILARITY (difference = 1 year) • DISSIMILARITY (difference = 11+ years) • GIVEN NAMES • SIMILARITY (first 3 letters agree, none disagree – eg Sam vs Samuel) • SIMILARITY + DISSIMILARITY (first 3 letters agree, 4th disagrees – eg Samuel vs Sampson) • DIFFERENT BUT LOGICALLY RELATED IDENTIFIERS • PLACE of WORK vs PLACE of DEATH (Provo vs Salt Lake City)
Discrimination • A lookup table containing the frequencies of values for identifiers, as they appear in the file being searched. • SURNAMES Brown (0.39), Aube (0.014), and Skuda (0.00004). • FIRST NAMES John(5.30), Axel (0.020), and Ulder (0.0045).
The Digital Research Assistant • Search for records on internet • Evaluate their relevance to assignment • Evaluate their granularity, confidence, etc • Evaluate patterns, such as families • Report matches • Let me set the knobs for the parameters
The DRA will have ... • A heirarchy of useful comparison algorithms • A method of searching across the Internet - and paying for it • A method of documenting the source of that search that satisfies the rules of preserving intellectual property and academic research
Who knows what the formula will be? • We are asking which dragons must be slain, but we aren’t saying how it must happen. • We are talking about possible ways to accomplish our goal. • That goal is connecting to new information, with confidence.
Summary • Any type of review • Measurements of Records • Measurement of conclusions • Rating of publishers • Mechanical searches • Record Linking • Smart Searches • Groupwork and Rights