190 likes | 346 Views
Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May 27, 2009. Fredric C. Gey UC Data Archive & Technical Assistance. University of California, Berkeley http://ucdata.berkeley.edu/gey.html Institute for Museum and Library Services Grants:
E N D
Combining Statistics and Text for a View of Irish Cultural HeritageIASSIST 2009, Tampere Finland, May 27, 2009 • Fredric C. Gey • UC Data Archive & Technical Assistance. • University of California, Berkeley • http://ucdata.berkeley.edu/gey.html • Institute for Museum and Library Services Grants: • Seamless search of textual and numeric databases (1999-2002), • Going places in the catalog: Improved Geographic Access (2002-2004), • What Where, When and Why– support for the learner (2004-2006), • Bringing Lives to Light – Biography in Context (2006-2008) • Context and Relationships – Ireland and Irish Studies, (2007-2010) • Colleagues: Michael Buckland, Ray Larson, Kim Carl, Jeanette Zerneke, host of students including Ryan Shaw and Vivien Petras • Collaboration with Centre for Digitisation, Queens University, Belfast • Paul Ell, collaborating PI GESIS – Vocabulary, Statistics, Time and Geography
HETEROGENEOUS DIGITAL INFORMATION SEARCHCurrent Search Technology (multiple independent searches without search aids) Patents Bibliography Full Text Numeric Statistical Databases QUERY Maps and other Geospatial data Music and other media GESIS – Vocabulary, Statistics, Time and Geography
Heterogeneous Digital Information SearchDirect Mappings and Search Between Multiple Information Types Patents Bibliography Full Text Numeric Statistical Databases EVMs EVMt EVMp QUERYplus EVMm EVMg QUERY Maps and other Geospatial data Music and other media GESIS – Vocabulary, Statistics, Time and Geography
Context and Relationships: Ireland and Irish Studies (Goals) (2007-2010 NEH/IMLS Grant) • Enable automatic and manual editorial markup of scanned scholarly materials for personal names and geography • Recognition of place/person names in middle English and Gaelic • Combine historical statistics with external search of documents by geographic commonality • Utilize Hogan’sOnomasticon Goedelicum locorum et tribuum Hiberniae et ScotiaeAn index, with identifications, to the Gaelic names of places and tribes (1909 Edmund Hogan, SJ), a kind of concordance of Irish documents by place GESIS – Vocabulary, Statistics, Time and Geography
Who, What, Where When IMLS Project(2004-200 IMLS grant) Developed multi-genre search using common geography (data/books) GESIS – Vocabulary, Statistics, Time and Geography
Biography Markup and Search Goals(2006-2006 IMLS grant) • To develop tools for editors, archivists and compilers of historical papers • Emma Goldman papers • To develop display in time/space to facilitate historical discovery, i.e. who lived there at the same time and what important events occurred there • To visualize biography as an ordered sequence of 4-tuple events (activity, time,place, other-people) – developing biographical markup standards • Congressional Biography – automatic markup of place, date, time-range <biog source="cong_dict" page_start="19" page_end="19"> <name>ADAMS, JOHN QUINCY. </name> <text>Born in Braintree, now Quincy, Mass., July 11, 1767. When ten years of age, he accompanied his father to France; and when fifteen, was private secretary to the American Minister in Russia. He was graduated at Harvard University in 1787 ; studied law in Newburyport, and settled in Boston. From 1794 to 1801 he was American Minister to Holland, England, Sweden, and Prussia. He was a Senator in Congress from 1803 to 1808 </text> </biog> GESIS – Vocabulary, Statistics, Time and Geography
Biography Markup: Emma Goldman Travels(2006-2009 IMLS grant) The Atom format feeds directly into GOOGLE maps GESIS – Vocabulary, Statistics, Time and Geography
From Publishing Context to Building Context GESIS – Vocabulary, Statistics, Time and Geography
Context and Relationships: Ireland and Irish Studies (2007-2009 NEH/IMLS Grant) • Collaboration with Center for Digitization, Queens University Belfast • Digitizing ~500,000 pages of Irish Historical and Cultural Studies • To develop display and contextual search in time/space to facilitate scholarly discovery: http://gray.ischool.berkeley.edu/oldw4/irish/ GESIS – Vocabulary, Statistics, Time and Geography
Digital Library of Core Materials on Ireland exemplar £620,000 grant from JISC to digitise journals, monographs and manuscripts relating to Irish Studies and create the foundations of a digital library resource Initial archive of around 470,000 pages 100 journals covering 200 year period and about 400,000 pages 2,500 pages of manuscript 205 key monographs Machine-readable text for all journals and monographs and some manuscripts Detailed ‘object’ level metadata
Project Imperatives Access to rare resources without visiting Belfast Resource discovery – use of less common journals New, complex searching using detailed metadata and semantic searching Serendipity A one stop shop for journals – and more Enhanced research developing from better access Insert image
Ireland and Irish Studies: Statistical Data about Ireland • Center for Digitization, Queens University Belfast has digitized 200 years of Irish Historical Statistics • We wish to integrate statistical data display with scholarly search and browsing by time and place GESIS – Vocabulary, Statistics, Time and Geography
The Database of Irish Historical Statistics 32,934,018 data values from 1821 to 1971, and then linked to contemporary digital sources Mostly census data but also annual agricultural statistics, civil registration information, crime statistics . . . Topics include population statistics, crop and stock data, language, literacy, religion, occupations, employment, housing, emigration, industry and industrial structure, trade and commerce, wages, pauperism etc www.qub.ac.uk/cdda/iredb/dbhme.htm
Ireland and Irish Studies: Our new approachUtilize the capabilities of Google Earth • Obtain historic Irish sub-county boundary files (Baronies and Poor Law Union) GESIS – Vocabulary, Statistics, Time and Geography
Ireland and Irish Studies: Our new approachUtilize the capabilities of Google Earth (2) • Utilize the KML markup language to integrate statistical data display with scholarly search and browsing by time and place GESIS – Vocabulary, Statistics, Time and Geography
Ireland and Irish Studies: Google Earth (3)Search links added to statistical data display GESIS – Vocabulary, Statistics, Time and Geography
Ireland and Irish Studies: next steps • Add more statistics • Religion (percent Catholic, Protestant, other) • Agriculture • Add more resources to search • Begin working with and geographically indexing the 500k pages of Irish journals and books. • Refine our user interfaces and develop more prototype demonstrations GESIS – Vocabulary, Statistics, Time and Geography
References • M Buckland and L Lancaster 2004, "Combining Place, Time, and Topic" D-Lib Magazine, May 2004, Volume 10 Number 5 http://www.dlib.org/dlib/may04/buckland/05buckland.html • M Buckland, A Chen, F Gey & R Larson, 2006. “Search Across Different Media: Numeric Data Sets and Text Files.” Information Technology and Libraries. December 2006, pp 181-189. • M Buckland, A Chen, F Gey, R Larson, R Mostern & V Petras 2007 ”Geographic Search: Catalogs, Gazetteers, and Maps.” College & Research Libraries, Sept 2007 • F Gey, R Shaw, R Larson, M Buckland, B Pateman and D Melia, “Marking Up Cultural Materials for Time and Geography,” in Proceedings of the Workshop on Information Access to Cultural Heritage, Aarhus, Denmark, Sept 28, 2008. • F Gey, R Shaw, R Larson, B Pateman, “Biography as events in time and space”, Proceedings of ACM GIS Conference, Irvine, California, Nov 4-7, 2008 • Emma Goldman papers (http://sunsite.berkeley.edu/Goldman/) • http://www.ucc.ie:8080/cocoon/doi/locus (onomasticon) GESIS – Vocabulary, Statistics, Time and Geography
Grant home pages Biography project • http://ecai.org/imls2006/ Irish project • http://ecai.org/neh2007/ GESIS – Vocabulary, Statistics, Time and Geography