Gary Marchionini, PhD University of North Carolina at Chapel Hill ils.unc/~march

Practice and Theory in Digital Libraries: The Case of Open VideoLibraries in the Digital Age (LIDA05)Dubrovnik, Croatia Gary Marchionini, PhD University of North Carolina at Chapel Hill www.ils.unc.edu/~march march@ils.unc.edu May 30, 2005

Outline • Digital Libraries as phenomena • Multimedia and video challenge our text biases • Open Video concepts and system Moebius • User studies • Conclusion

Pragmatics • Useful theory and practice are a Moebius strip • DL practice in informed by multiple theories related to: • Information structure • Human behavior • System design • Social-political-economic constraints and organizational behavior • History and epistemology • “We want principles, not only developed—the work of the closet—but applied, which is the work of life.” Horace Mann, Thoughts, 1867

Theories of What and Why • Digital extensions of physical libraries • Augmentations of intellect • Collaborative spaces: sharium • Cultural institutions • World Brain • Economic models • Complex information systems

Theories of How • Reuse and open source information • Levels of abstraction • Information retrieval • Information interaction • Iterative design and evaluation • Resource management

Digital Library Design Space1999: What Has Changed in 2005? Adapted from Marchionini & Fox, IP&M, 1999

Provocation: Text no longer rules: • The Net generation depends much less of reading (they are entering universities as students and soon, as professors; Oblinger & Oblinger, 2005 Educause book). In the US: • Children age 6 or younger: average of 2 hrs/day using screen media, 1.6 hrs/day playing outside, 39 min. reading • 13-17 yr olds: average 3.1 hrs/day watching TV and 3.5 hrs/day with digital media. They multitask • >2M million US children (ages 6–17) have their own Web site. Girls are more likely to have a Web site than boys (12.2 percent versus 8.6 percent). • Ability to use nontext expression—audio, video, graphics—appears stronger in each successive cohort. • Multimedia and Multitasking the trend of 21st century • Information specialists MUST get over our text bias

Open Video DL Case • Open • Public good • Reusable • Files not streams • Chunking • Agile views user interface • Alternative representations (views) • Agile control mechanisms

Open Video Vision/Contributions • An open repository of video files that can be re-used in a variety of ways by the education and research communities • Encourages contributions • A testbed for interactive interfaces • An easy to use DL based upon the agile views interface design framework • Multiple, cascading, easy to control views (pre, over, re, shared, peripheral) • Views based upon empirically validated surrogates • An environment for building theory of human information interaction • A set of methods and metrics that reveal how people understand digital video through surrogates

Background & Status • Begun 1995 with colleagues at UMD & BCPS • Funding: NSF, NASA, NSF/LoC • Collaborators/Contributors: I2-DSI, ibiblio, CMU, UMD, NIST, Prelinger and Internet Archives, NASA, ACM • ~2600 video segments • ~2000 different titles • ~15000 unique visitors per month • MPEG-1, MPEG-2, MPEG-4, QT • OAI provider • Ongoing user studies • New Preservation initiative

Agile Views Interface Research • Provide a variety of access representations (e.g., indexes) and control mechanisms • Usual search and browse capabilities • Leverage both visual and linguistic cues • Create and test surrogates for overview preview, shared and history views

User Study Framework

The Surrogates • Storyboard with text keywords (20-36 per board@ 500 ms) • Storyboard with audio keywords • Slide show with text keywords (250ms repeated once) • Slide show with audio keywords • Fast forward (~ 4X) • Fast forwards 32X, 64X, 128X, 256X • Poster frames • Real time clips • Text titles

Surrogate Examples

Metrics

User Studies • Study 1: Qualitative Comparison of Surrogates (ECDL 02) • Study 2: Fast Forwards (JCDL 03) • Study 3: Narrativity (CHI 02; ASIST 03 paper) • Study 4: Shared views and History Views (Geisler dissertation) • Study 4: Poster frames and text (eye tracking, CIVR 03) • Study 5: TREC evaluations (03 and 04) • Study 6: cognitive load and ISEE (Mu diss.) • Study 7: relevance judgments for video (Yang diss.) • Study 8: Surrogate integration study (in analysis) • Others: several specific master’s papers (Hughes, Gruss

Study 1: Compare Surrogates • What are the strengths and weaknesses of different surrogates from the users’ perspective? • Are any of the surrogates better than the others in supporting user performance?

The Surrogates • Storyboard with text keywords (20-36 per board@ 500 ms) • Storyboard with audio keywords • Slide show with text keywords (250ms repeated once) • Slide show with audio keywords • Fast forward (~ 4X)

Method • 7 video segments (2-10 min), 5 surrogates created for each • 10 subjects with high video and computer experience • Three phases (all multi-camera videotaped) • View full video then use 3 surrogates, repeat • Participant observation and debriefing • Do NOT view full video, use 3 surrogates, repeat • Participant observation and debriefing • Complete 3 assigned tasks with surrogates of choice • Think aloud and debriefing • http://www.open-video.org/experiments/chi-2002/methods/study1.mov

Tasks • Gist determination—free text • Gist determination—multiple choice • Object recognition—textual • Object recognition—graphical • Action recognition (2-3 second clips) • Visual gist (predict which frames belong) • http://www.open-video.org/experiments/chi-2002/surrogates/index.html

Preferences • In debriefing after each phase, subjects asked about preferences. • Some preferences changed over the phases • 2 subjects preferred ff • 4 subjects said ff if audio keywords added • 1 storyboard with audio keywords • 2 slide show with audio keywords •  drop ss with text keywords, develop ff

Performance • No SRD on gist (both free text and multiple choice) • SRD on action recognition favoring ff • ‘Near’ SRD on text object recognition favoring SB/w audio keywords • 8:1 to 29:1 compaction rates suitable for tasks • Psychometric and face validity support for the tasks (means and variances; relevant to real tasks) • SRD in gist and visual gist for one video • Homogeneity of frames diminishes surrogate value • Keywords help when visual variability decreases

Qualitative Results • Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles) • Three senses of gist • Topic (T) • Narrativity (N) • T+N+visual style • Individual preferences and experiences influence surrogate effectiveness

Study 2: Fast Forward • How fast can we make fast forwards? • 4 ff conditions (32X, 64X, 128X, 256X) • Four video segments for each condition • 45 subjects (1/2 UG, 1/2 grad, 2/3 female) • 6 tasks (full text gist, multiple choice gist, word object recognition, graphical object recognition, action recognition, visual gist) • Counterbalance speed and videos • Web-driven experimental condition, 3-camera video tapes, single subject at a time in usability laboratory

Example Image Recognition Stimulus

Results • SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate • Video content/genre interacts with performance • Preference does not parallel performance (people can perform well under extreme conditions but do not like/enjoy) • No user characteristic differences (age, sex) • Give users control but select appropriate defaults • Caveat: controlled, independent focus on FF, likely a lower bound on performance

Speed Effects on Performance

Narrativity Study • CHI walk up kiosk, 20 people used • 20 one-minute clips ( half b&w, no audio) selected on 2 criteria: contain characters, have cause/effect relations between scenes (5 in each category) • SRD on chars, cause, and interaction

Shared Views and History Views Studies • Evaluate AV Design Framework by instantiating and evaluating a design • Shared (based on recommendations) and History Views (based on logs) • Phase 1: compare OV to Views interface (28 participants). OV>accuracy; NSRD on time, but learning effect; AV>navigation/efficiency; AV>satisfaction • Phase 2: qualitative analysis of shared and history views

Poster Frame Study • Research Questions: • Given both textual and visual metadata; which surrogate will be utilized,which surrogate will be preferred? • Does the placement of the surrogates affect how they are used? • Does the assigned task affect how surrogates are used? • Does personal preference play a role in how surrogates are used?

Study Methods / Procedures • 12 undergraduate students (paid volunteers) • Pre-Study questionnaire • Demographics • Visual vs. Verbal learning style (VVQ) • 10 search problems • Counter-balanced • Design 1 and 2 • 1 : text on left / visuals on right • 2 : visuals on left / text on right • Eyetracking • Post-study questionnaire • Follow up questions

Results • All participants over all tasks: • Mean time looking at text = 29.7 sec. • Mean time looking at pics = 6.8 sec. • 75% of fixations over text • 18% of fixations over pics • First fixations over text = 65 • First fixations over pics = 54 • Text requires and gets more user attention

Results cont’d • Design 1 vs. Design 2 • When text was placed on the left, mean time per fixation was slightly higher • VVQ • Balanced group spent more time looking at text • Tasks • Varied by task: • Time spent looking at text • Time spent per fixation over text • Frequency of fixations over text

Screen Shots

Tasks • Please find a video that discusses the destruction earthquakes can do to buildings. These search results are from a search on the word “Earthquake”. • Please find a video that discusses nurses and their contributions to the United States Army. These search results are from a search on the word “Work”. • Please choose a video from the following list that you think would be entertaining for you and your friends to watch.

Discussion • In this restricted situation (i.e. pre-formulated results page) participants used text as the main anchor point • ? Because text is a better surrogate? • ? Because text contains more information? • ? Because text is more familiar to people • ? Because tasks directed users to text?

Discussion cont’d • Layout seemed to have little effect on how surrogates were used. • Difference of .03 of a second • Participants didn’t report a significant preference for layout • Some liked design 1 and some liked design 2 • VVQ • Hypothesis that visual learners would use visual surrogates and verbal learners would use verbal surrogates was not supported

Discussion cont’d • Tasks • Some tasks took more time to complete • Regardless of: • Counterbalancing order • Participant • Layout design

Text or Pictures? • Text was reported as: • Being the search anchor • Containing significant topical information • Taking longer to read than pictures • Visuals were reported as: • Being globally liked • Being used to quickly narrow down choices • Taking less time to decode than text • All participants said the results page would be weaker without them • Often lacking in reference points

Conclusion • Visual metadata was used to make (confirm???) relevance judgments • Combination of visual & verbal stronger than one or the other • Generalize with caution: • Small number of study participants • Specific set of search results pages • Ten specific search tasks.

The Integration Study • Compare old OV to redesign? Compare to Internet archive? • How do multiple surrogates and agile control mechanisms affect understanding of video? • Accuracy? Time? Satisfaction? Cognitive load? Navigational overhead? • Data analysis underway

Relevance Study (Yang) • 3 task groups (illustration [10 profs], collection building [8 video librarians], video production [8 producers/editors]) • In-depth interviews • Text, audiovisual, implicit categories of 39 different criteria • Topicality most often mentioned, but far less than text studies • Production groups less varied, more audiovisual criteria

Theory-Practice Lessons from OV • User-centered design and user testing pays off, i.e. research informs practice • Production system operation raises new kinds of research questions • Sustainability models • Curatorial models • Preservation challenges • Upgrade paths for universal access

DL Research Directions • Incorporating people into DLs (patrons, librarians) • Leveraging contributions and implications for curatorship • Preservation strategies; how much context? • Hybrid physical-digital library operations

Observations • A moebius strip is infinite: the interplay between theory and practice goes on • Need for collaboration between working libraries and researchers

Selected Open Video Readings • Yang, M. & Marchionini, G. (2005). “Deciphering visual gist and its implications for video retrieval and interface design.” Conference on Human Factors in Computing Systems (CHI). Portland, OR. Apr. 2-7, 2005. • Yang, M. & Marchionini, G. (2004). “Exploring Users' Video Relevance Criteria -- A Pilot Study.” Proceedings of the Annual Meeting of the American Society of Information Science and Technology, pp. 229-238. Nov. 12-17, 2004. Providence, RI. • Yang, M., Wildemuth, B., & Marchionini, G. (2004). “The relative effectiveness of concept-based versus content-based video retrieval.” Proceedings of the ACM Multimedia conference, pp. 368-371. • Mu, X., & Marchionini, G. (2003). “ Enriched video semantic metadata: authorization, integration, and presentation.” Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 40, 316-322. • Wilkens, T., Hughes, A., Wildemuth, B. M., & Marchionini, G. (2003). “ The role of narrative in understanding digital video: an exploratory analysis.” Proceedings of the Annual Meeting of the American Society for Information Science, 40, 323-329. • Hughes, A., Wilkens, T., Wildemuth, B., Marchionini, G. (2003). “Text or Pictures? An Eyetracking Study of How People View Digital Video Surrogates.” Proceedings of CIVR 2003, pp. 271-280. • Wildemuth, B. M., Marchionini, G., Yang, M., Geisler, G., Wilkens, T., Hughes, A., and Gruss, R. (2003). “How Fast Is Too Fast? Evaluating Fast Forward Surrogates for Digital Video.” Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2003), pp. 221-230. (Vannevar Bush Award Winner for Best Paper at JCDL 2003) • Mu, X., Marchionini, G., & Pattee, A. (2003). “ The Interactive Shared Educational Environment: User interface, system architecture and field study.” Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 40, 291-300. • Mu, X., Marchionini, G. (2003) “Statistical Visual Features Indexes in Video Retrieval.” Proceedings of SIGIR 2003, pp. 395-396. • Marchionini, Gary (2003). “Video and Learning Redux: New Capabilities for Practical Use.” Educational Technology. • Marchionini, Gary and Geisler, Gary. (2002). “The Open Video Digital Library.” D-Lib Magazine, Vol. 8, Number 12, December. • Barbara M. Wildemuth, Gary Marchionini, Todd Wilkens, Meng Yang, Gary Geisler, Beth Fowler, Anthony Hughes, and Xiangming Mu (2002). “Alternative Surrogates for Video Objects in a Digital Library: Users� Perspectives on Their Relative Usability.” Proceedings of the 6th European Conference on Digital Libraries, September 16 - 18, 2002, Rome, Italy. • Geisler, G., Marchionini, G., Wildemuth, B. M., Hughes, A., Yang, M., Wilkens, T., and Spinks, R. (2002). “Video Browsing Interfaces for the Open Video Project.” Proceedings of CHI 2002, Extended Abstracts. • Nelson, Michael L., Marchionini, Gary, Geisler, Gary, and Yang, Meng (2001). "A Bucket Architecture for the Open Video Project [short paper]." JCDL ’01, ACM - IEEE Joint Conference on Digital Libraries (June 24-28, 2001, Roanoke, Virginia). • Geisler, Gary, and Gary Marchionini (2000). "The Open Video Project: A Research-Oriented Digital Video Repository [short paper]." In Digital Libraries '00: The Fifth ACM Conference on Digital Libraries (June 2-7 2000, San Antonio, TX). New York: Association for Computing Machinery, 258-259. • Slaughter, L., Marchionini, G. and Geisler, G. (2000). "Open Video: A Framework for a Test Collection." Journal of Network and Computer Applications, Vol. 23(3). San Diego: Academic Press.

Gary Marchionini, PhD University of North Carolina at Chapel Hill ils.unc/~march