560 likes | 667 Views
Project CLiMB C omputational Li nguistics for M etadata B uilding. Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004. Using Computational Linguistic Techniques to Harvest Image Descriptors. Session Outline. Subject Access in a Digital Age: A Sisyphean Task
E N D
Project CLiMB Computational Linguistics for Metadata Building Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004 Using Computational Linguistic Techniques to Harvest Image Descriptors
Session Outline Subject Access in a Digital Age: A Sisyphean Task Angela Giral - Columbia University CLiMBing the Mountain: Automating Subject Access for Image Professionals Roberta Blitz - Columbia University A New Vista? Preview and Feedback for the CLiMB ToolKit Rebecca Passonneau - Columbia University
Session Outline Subject Access in a Digital Age: A Sisyphean Task Angela Giral - Columbia University CLiMBing the Mountain: Automating Subject Access for Image Professionals Roberta Blitz - Columbia University A New Vista? Preview and Feedback for the CLiMB ToolKit Rebecca Passonneau - Columbia University
Subject Access in a Digital Age: A Sisyphean Task Angela Giral – Columbia University
Thank you! Any further questions? www.columbia.edu/cu/cria/climb
Session Outline Subject Access in a Digital Age: A Sisyphean Task Angela Giral - Columbia University CLiMBing the Mountain: Automating Subject Access for Image Professionals Roberta Blitz - Columbia University A New Vista? Preview and Feedback for the CLiMB ToolKit Rebecca Passonneau - Columbia University
CLiMBing the Mountain: Automating Subject Access for Image Professionals Roberta Blitz – Columbia University
Presentation Outline • Introduction to CLiMB • Two examples with images from the North Carolina Museum of Art • CLiMB Subject Access Terms
CLiMB: Interdisciplinary Research at Columbia University • CRIA (Center for Research on Information Access) • Libraries • Computer Science Department Funded by the Andrew W. Mellon Foundation 2002-2004
CLiMB Project Members Judith Klavans, PI Stephen Davis Angela Giral Patricia Renfro Bob Wolven Roberta Blitz Rebecca Passonneau Veronika Horvath David Elson
Problems in Image Access • Cataloging digital images • Traditional approach: manual expertise • labor intensive • expensive • Can automated techniques help?
Can we harvest image descriptors? angled porch v-shaped plan sandstone boulders
CLiMB Technical Contribution • CLiMB will identify and extract • proper nouns • terms and phrases • from text related to an image: By September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundationwould be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes. — Edward R. Bosley. Greene & Greene. London: Phaidon, 2000. p.127.
CLiMB Overall Goals The essence of CLiMB: • Use scholars themselves as “catalogers” by employing scholarly publications • Enhance existing descriptive metadata The CLiMB project: • Research: Development of richer retrieval through increased numbers of descriptors • Practice: Development of CLiMB ToolKit
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit • Evaluation
CLiMB Collections • Greene & Greene Architectural Records, • Avery Architectural and Fine Arts Library • Columbia University • Chinese Paper Gods Collection, • C.V. Starr East Asian Library • Columbia University • Digital Images from the • North Carolina Museum of Art
Greene& Greene Architectural Records and Papers Collection Drawings and Archives Avery Architectural and Fine Arts Library Columbia University Libraries
NYDA.1960.001.00023 All Saints Episcopal Church (Pasadena, Calif.). Alterations1902-1903
Greene & Greene Catalog Record Author: Greene & Greene. Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.). Alterations.] Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917] Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.) Location: Columbia University, Avery Architectural Drawings Other Authors: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954. Subjects: Houses Alterations Architecture--Designs and plans--United States. Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.) Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no. Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] floor plan, part plan of basement.
Greene & Greene Bibliography (associated texts) • Bosley, Edward R. Greene & Greene. London : Phaidon, 2000. • Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974] • Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977. • Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998. • Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998. • Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974]
Chinese Paper Gods Anne S. Goodrich Collection C.V. Starr East Asian Library, Columbia University
Pan-hu chih-shen God of tigers
Chinese Paper Gods Catalog Record Title: Chuang gong chuang mu [graphic]. Published: [193-] Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm. In: Anne S. Goodrich Collection. Location: Columbia University, C.V. Starr East Asian Library (CJK) EAX GAC 1 no. 16 Subjects: Gods, Chinese, in art. Folk art--China. Genre Or Form: Woodcuts--Chinese. Notes: Date according to time period Anne S. Goodrich collected prints in Beijing. Record ID: NYCP02-F20
Chinese Paper Gods Bibliography (associated texts) • Day, Clarence Burton. Chinese peasant cults : being a study of Chinese paper gods. Taipei : Ch'eng Wen Pub. Co., 1974. • Goodrich, Anne Swann. Peking paper gods : a look at home worship. Nettetal : Steyler Verlag, 1991. • Laing, Ellen Johnston. Art and aesthetics in Chinese popular prints: selections from the Muban Foundation collection. Ann Arbor, MI : Center for Chinese Studies, University of Michigan, c2002.
Chinese gods: selection from LC Authority File HEADING: Nezha (Chinese deity) Used For/See From: Daluoxian (Chinese deity) Jinhuan Yuanshuai (Chinese deity) Jinkang Yuanshuai (Chinese deity) Li Nezha (Chinese deity) Luoche Taizi (Chinese deity) Ne Zha (Chinese deity) Nezhataizi (Chinese deity) No-cha (Chinese deity) Nuozha (Chinese deity) Tailuoxian (Chinese deity) Taizi Yuanshuai (Chinese deity) Taiziyeh (Chinese deity) Yühuang Taizi (Chinese deity) Zhongtan Yuanshuai (Chinese deity) Search Also Under: Gods, Chinese
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit • Evaluation
Target Object Identification (TOI) • Define based on institutional needs • Varies from collection to collection • Greene & Greene – Project Names • Chinese Paper Gods – God Names • North Carolina Museum – Artist / Work Names • Compile authority list
North Carolina Museum of Art Museum Catalog (Associated Text) Images (Catalog Records) image descriptors North Carolina Museum of Art: Handbook of the Collections. Ed. Rebecca Martin Nagy. Raleigh, NC: North Carolina Museum of Art, Hudson Hills Press, 1998.
North Carolina Museum of Art Museum Catalog (Associated Text) Images (Catalog Records) image descriptors • village of Cebolla • adobe Church of Santo Niño • rusted tin roof • window North Carolina Museum of Art: Handbook of the Collections. Ed. Rebecca Martin Nagy. Raleigh, NC: North Carolina Museum of Art, Hudson Hills Press, 1998.
Georgia O'Keeffe (American, 1887-1986) Cebolla Church, 1945 Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.) Purchased with funds from the North Carolina Art Society (Robert F. Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1 North Carolina Museum of Art <http://ncartmuseum.org/collections/highlights/20thcentury/20th/1910-1950/038_lrg.shtml>
MARC format 100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe. 260 ≠c2003 300 1 slide : ≠ b col. • Object date: 1945. 500 Oil on canvas. 500 20 x 36 in. 535 North Carolina Museum of Art ≠ b Raleigh, N.C. 650 Painting, American ≠ y 20th century. • Women artist ≠ z United States 650 Church buildings in art.
MARC format with CLiMB subject terms 100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe. 260 ≠c2003 300 1 slide : ≠ b col. 500 Object date: 1945. 500 Oil on canvas. 500 20 x 36 in. 535 North Carolina Museum of Art ≠ b Raleigh, N.C. 650 Painting, American ≠ y 20th century. 650 Women artist ≠ z United States 650 Church buildings in art. CLiMB New Mexican highlands CLiMB village of Cebolla CLiMB adobe Church of Santo Niño CLiMB sagging, sun-bleached walls CLiMB rusted tin roof CLiMB isolation CLiMB human endurance CLiMB window
Joseph Cornell (American, 1903-1972)Suzy's Sun (For Judy Tyler), 1957 Mixed media construction, 10 3/4 x 15 x 4 in. (27.3 x 38.1 x 10.2 cm.)Purchased with funds from the State of North Carolina, 78.1.1 North Carolina Museum of Art <http://ncartmuseum.org/collections/highlights/20thcentury/20th/1950-2000/030_lrg.shtml>
VRA Core 3.0 Record Type=work Type=shadow box Title=Suzy's Sun(For Judy Tyler) Measurements.Dimensions=10 3/4 x 15 x 4 in. (27.3 x 38.1 x 10.2 cm.) Material.Medium=mixed media Creator.Personal name=Cornell, Joseph Creator.Role=artist Date.Creation=1957 Location.Current Repository=Raleigh (NC, USA), North Carolina Museum of Art ID Number.Current Accession=78.1.1 Subject=assemblages (sculpture)
Suzy's Sun (For Judy Tyler), 1957 Mixed media construction, 10 3/4 x 15 x 4 in. (27.3 x 38.1 x 10.2 cm.)Purchased with funds from the State of North Carolina, 78.1.1 Joseph Cornell fabricated shadow boxes and filled them with objects collected both by chance and choice. With invention and insight, he teased themes from these unlikely groupings, most often revolving around time and memory. In Suzy’s Sun (for Judy Tyler), the sun (a cutout from an antipasto tin) and the sea (an implied presence) speak eloquently of life cycles and passing time. Equally potent symbols, driftwood and the infinitely spiraling seashell readily bring to mind the tides on which they ride, summoning a universal metaphorfor theebb and flowof life itself. In small details-a postage stamp showing a multi-masted schooner, the collaged word "hotel"-Cornell uses the romantic notion oftravel to far-off lands as additional commentary on one’s passage through life. Cornell dedicated this box to an actress. Judy Tyler had just achieved a certain celebrity when she was killed in an automobile accident. "Suzy" probably refers to the artist’s assistant, Suzanne Miller. Thesun, designated as Suzy’s, presides over the box, as a life-sustaining force counteracting the finality of death.
VRA Core 3.0 with CLiMB subject terms Title=Suzy's Sun(For Judy Tyler) Creator.Personal name=Cornell, Joseph Subject=assemblages (sculpture) CLiMB=shadow boxesCLiMB=spiraling seashell CLiMB=sunCLiMB=tides CLiMB=timeCLiMB=metaphor CLiMB=memoryCLiMB=ebb and flow of life CLiMB=cutoutCLiMB=postage stamp CLiMB=antipasto tinCLiMB=multi-masted schooner CLiMB=seaCLiMB=collaged word hotel CLiMB=life cyclesCLiMB=travel CLiMB=passing timeCLiMB=actress CLiMB=driftwoodCLiMB=Judy Tyler CLiMB=Suzanne Miller
CLiMB Metadata Can Be Employed in Standard Catalog Formats USMARC: MARC 21 Concise Format for Bibliographic Data (http://lcweb.loc.gov/marc/bibliographic/ecbdhome.html) Fields for CLiMB metadata: 650 - SUBJECT ADDED ENTRY--TOPICAL TERM (R) 653 - INDEX TERM--UNCONTROLLED (R) VRA Core Categories, Version 3.0: A project of the Visual Resources Association Data Standards Committee (http://www.vraweb.org/vracore3.htm) Field for CLiMB metadata:SUBJECT Dublin Core Metadata Element Set, Version 1.1: Reference Description (http://dublincore.org/documents/dces/) Field for CLiMB metadata: SUBJECT AND KEYWORDS
Possible controlled vocabularies • *AAT Getty Art & Architecture Thesaurus • TGN Getty Thesaurus of Geographic Names • ULAN Union List of Artist Names • LC NAF Library of Congress Name Authority File • LCSH Library of Congress Subject Headings • TGM I Thesaurus for Graphic Materials: Subject Terms • TGM II Thesaurus for Graphic Materials: Genre and Physical Characteristic Terms Getty LC
Thank you! Any further questions? www.columbia.edu/cu/cria/climb
Session Outline Subject Access in a Digital Age: A Sisyphean Task Angela Giral - Columbia University CLiMBing the Mountain: Automating Subject Access for Image Professionals Roberta Blitz - Columbia University A New Vista? Preview and Feedback for the CLiMB ToolKit Rebecca Passonneau - Columbia University
A New Vista? Preview and Feedback for the CLiMB ToolKit Rebecca Passonneau – Columbia University
The CLiMB ToolKit • Software prototype • For large image collections • Semi-automated metadata • Subject access terms • Human intervention at all steps • Iterative development cycle
Presentation Outline • Process flow in ToolKit • Prerequisites to use ToolKit • Demonstration
Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB ToolKit • Evaluation
CLiMB TOOLKIT: Process Flow 5. Review 4. Select Subject Access Terms 3. Analyze Text 2. Load TOI List 1. Load Text
Prerequisites and Resources • Image collection • Minimal-level catalog records • Texts about the images (associated texts) • Scholarly Monographs • Museum Catalog • TOI list of images • NOW: CLiMB format • FUTURE: Authoritative list of images in your format • [ Controlled Vocabularies ]