240 likes | 254 Views
Research team developing CLARIN-supported open-source web applications for mining historical data in public media archives. Explore public debates on drugs, addiction, and eugenics in Dutch newspapers (1900-1945) using semantic document selection tools.
E N D
WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media
WAHSP/BILAND Research team: Stephen Snelders(UU), Pim Huijnen(UU), Daan Odijk(ISLA, UvA), Fons Laan(ISLA), Maarten de Rijke (ISLA), Toine Pieters (UU),
Research Creating big-data resources
National library of the Netherlands Digital Newspaper Archive > 1200 titles > 30.000.000 articles 1618 - 1995 > 10.000.000 pages Still growing...
Sampling Dutch press on Germany Frank van Vree (1989) 4.000 > 1200 titles > 31.000.000 articles 4 1618 - 1995 1930 - 1939
WE NEED: Research A semi-automatic and interactive open-source application An application that does not replace, but supports the intuition and insights of the historical researcher with expert knowledge of a specific topic or domain. An application that is user-friendly.
Research Problem: Context and background of Dutch drug and eugenics debates in time Aim Understanding and evaluation of public debates around drugs, addiction and eugenics in the Netherlands, 1900-1945 Research question What are the dynamics (in terms of patterns and trends) of public debates and sentiments around drugs and addiction, and eugenics in the Dutch newspapers in the first half of the twentieth century
Research Poe’s detective finds the truth by using data in those newspaper articles that do not concern the murder. In a similar way we will find terms and sentiments in those newspaper articles that may seem irrelevant, but are not.
Information-extraction • Recognize structure in text • Part of speech • Noun, verb, … • Entities • people, organisations, locations, temporal expressions, … • Relations • Who, what, with whom, how, why E-everything
Information-extraction (2) E-everything
Start Query: Opium Research
Odijk D., de Rooij O., Peetz M-H., Pieters T., de Rijke M., Snelders S. (2012). "Semantic Document Selection", TPDL 2012: Theory and Practice of Digital Libraries: Springer, September.
Research By carefully inspecting the word counts, we found quantitative evidence for historical turning points that indicated the criminalization of the drugs debate around 1924
Research Eugenics case; query overerving (hereditarian) 1867 Primarily associations with health related terms/entities
Research Eugenics case;
Research Eugenics case; query overerving 1935 In 1935, however, the medical context of using the term inheritance made way for a legal and racial context
NEW HORIZONS in DIGITAL HUMANITIES E-Humanity Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990 • Challenges: • 1. OCR-Repair • 2. Improving Text-mining software and data infrastructure • 3. Developing new historical research strategies • 4. Educating historians and other humanities researchers