460 likes | 621 Views
Frank Shipman Professor, Department of Computer Science and Engineering Associate Director, Center for the Study of Digital Libraries Texas A&M University. Human-Centered Computing. Outline. Short discussion of research area Supporting access to sign language video
E N D
Frank Shipman Professor, Department of Computer Science and Engineering Associate Director, Center for the Study of Digital Libraries Texas A&M University Human-Centered Computing
Outline • Short discussion of research area • Supporting access to sign language video • Observations of potential user community causes redefinition of the problem • Multi-application user interest modeling • Iterative design moving from concept to relatively complete system
Research “Area” • Many interests • Multimedia • New Media • Computers and Education • Computers and Design • Software Engineering • Computer-Supported Cooperative Work • Human-Computer Interaction • Knowledge-Based Systems Best descriptions I have come up with: • Cooperative Problem Solving Systems • Systems where humans & computers cooperatively solve problems (humans are part of overall system) • Intelligent User Interfaces • Interactive systems that process information in non-trivial ways AI IR HCI MM
What is human-centered computing? • Developing software or computational techniques with a deep understanding of the human activities they will support • Implications • Most often need to study the human activity before designing the software • Design may be (likely will be) a cooperative problem solving system rather than a software system
Cooperative Problem Solving System • What is a cooperative problem solving system? • A system that includes human and software components to perform a task or solve a problem • Implications • Take advantage of the asymmetry of partners in system design • Evaluation of overall system involves humans
First Example: Supporting Access toSign Language Video
Sharing Sign Language Video • Opportunity • Cameras in laptops and attached to computers enable easy capture of sign language video • Video sharing sites (e.g. YouTube) allow the publication of such expressions • Practice • Pointers to the videos are passed around in other media (e.g. email, Facebook) • Some sites specifically support the sign language community
Sharing Sign Language Video • Locating a sign language video on a particular topic is still difficult • The community-specific sites have limited collections • People must upload to the site or • Must add a pointer for each video to the site • Locating desired videos within the large video sharing sites rely on metadata (e.g. tags) • Tags must be accurately applied indicating both the language and the topic
How Good is Text-based Search? • Search for sign language discussions of the top 10 news queries for 2011 from Yahoo! • Queries performed with the addition of “ASL” and “sign language”
Why Tags Are Not Enough • Consider results from the first page of results for the query “sign language” • Tags are ambiguous • In sign language vs. about sign language • Different meanings of sign language • Sign language as a song title Duarte, Gutierrez-Osuna, and Shipman, Texas A&M University
Automatic Identification of SL Video • Our approach is to develop a technique that can automatically identify if a video is in sign language • To run on a site the size of YouTube • Should be accurate enough to be run without human verification of results • Should be efficient enough to be run during video upload without significant extra resources
What is Sign Language Video • We decided to scope the problem by focusing on the equivalent of sign language documents • Recorded by an individual with the intent of being watched • What we are not trying to identify (yet) • Videos of sign language conversations • Sign language translations
Related and Prior Work • Work on sign language recognition • Recognizing what is being said in sign language • Often assumes the video is in sign language • Too heavyweight for our purpose • Detecting sign language • Recognizing when a person starts signing for more efficient resource utilization • Not designed to work on likely false positives
Designing a SL-Video Classifier • Our classifier • processes a randomly selected 1 minute segment from the middle of the video • returns a yes/no decision being a SL video • Design method • Use standard video processing techniques • Five video features selected based on their expected relation to SL video • Test classifiers provided with one or more of the features
Video Processing • Background Modeling • Convert to greyscale • Dynamic model (to cope with changes in signer body position and lighting) • BPt = .96 * BP(t-1) + .04 P • Foreground object detection • Pixels different from background model by more than a threshold are foreground pixels • Spatial filter removes regions of foreground pixels smaller than a minimum threshold • Face location to determine position of foreground relative to the face • Videos without a single main face are not considered as potential SL videos
Five Visual Features • VF1: overall amount of activity • VF2: distribution of activity in camera view • VF3: rate of change in activity • VF4: symmetry of motion • VF5: non-facial movement • SVM classifier worked best
Corpus for Evaluation • Created corpus of 98 SL videos and 94 likely false positive (non-SL) videos • Majority of non-SL videos were likely false positives based on visual analysis • Person facing camera moving their hands and arms (e.g. gesturing presenter, weather forecaster) • Small number of non-SL videos were selected were false positives based on tag search • Number kept small because these are likely easier than the others to detect
Evaluation Method • Common method for testing classifier • Each classifier tested on 1000 executions in each context • Randomly select training and testing sets each execution • Metrics • Precision – % of SL videos classified as SL videos that really are SL videos • Recall – % of SL videos correctly classified as SL videos • F1 score – harmonic mean of precision and recall
Overall Results • All five features, varying size of training set • While larger training sets improve recall the effect is fairly small • Later results are with 15 training videos/class.
All But One Feature • Comparing the results when one feature is removed from the classifier • Removing VF4 (symmetry of motion) has the largest effect meaning it has the most useful information not found in the other features
Only One Feature • Comparing the results when only one feature is provided to the classifier • Again, VF4 (symmetry of motion) has the most valuable information • VF4 alone does better than the other four features combined
Discussion of Failures (False Positives) • Our non-SL videos were chosen to be hard • Precision of ~80% means about one in five videos identified as sign language was really one of these • Performance on the typical video sharing site would be much better because most non-SL videos would be easy to classify • We are happy with this performance
Discussion of Failures (False Negatives) • Examining the SL videos not recognized by the classifier • Some failures were due to signers frequently turning away from the camera • Others were due to the background being similar in color to the signer’s skin tone • Still others were due to movement in the background • Backing off our requirements for the signer to face the camera and improving our background model would help in many of these cases
HCC Conclusions • Examined current practice to determine need for system • Identified new problem of locating SL videos • Quantified the difficulty with existing tools • Developed method • Tested with real world data • Future work • Deploy system to test if it meets the need
Example 2: Multi-Application User Interest Modeling
Task: Information Triage • Many tasks involve selecting and reading more than one document at once • Information triage places different demands on attention than single-document reading activities • Continuum of types of reading: • working in overview (metadata), • reading at various levels of depth (skimming), • reading intensively • How can we bring user’s attention to content they will find valuable?
User Interest Modeling • User model – a system’s representation of characteristics of its user • Generally used to adapt/personalize system • Can be preferences, accessibility issues, etc. • User interest model – a representation of the user’s interests • Motivation: information overload • History: many of the concepts found in work on information filtering (early 1990s)
Interest Modeling for Information Triage • Prior interest models tend to assume one application • Example: browser observing page views and time on page • Multiple applications are involved in information triage (searching, reading, and organizing) • When applications do share a user model, it is with regard to a well-known domain model • Example: knowledge models shared by educational applications • Not possible since triage deals with decisions about relative value among documents of likely value
Acquiring User Interest Model • Explicit Methods • users tend not to provide explicit feedback • long tail assumptions not applicable • Implicit Methods • Reading time has been used in many cases • Scrolling and mouse events have been shown somewhat predictive • Annotations have been used to identify passages of interest • Problem: Individuals vary greatly and have idiosyncratic work practices
Potential Value?: A First Study • Study designed to look at: • deciding what to keep • expressing an initial view of relationships • Part of a larger study: • 8 subjects in role of a reference librarian, selecting and organizing information on ethnomathematics for a teacher • Setting: top 20 search results from NSDL & top 20 search results from Google presented in VKB 2 • Subjects used VKB 2 to organize and Web browser to read • After task, subjects were asked to identify: • 5 documents they found most valuable • 5 documents they found least valuable
Many User Actions Anticipate Document Assessment Correlated actions (p < .01) (from most to least correlated) • Number of object moves • Scroll offset • Number of scrolls • Number of border color changes • Number of object resizes • Total number of scroll groups • Number of scrolling direction changes • Number of background color changes • Time spent in document • Number of border width changes • Number of object deletions • Number of document accesses • Length of document in characters Blue – from VKB White – from browser
Interest Models Based on the data from first study, we developed four interest models • Three were mathematically derived • Reading-Activity Model • Organizing-Activity Model • Combined Model • One hand-tuned model included human assessment based on observations of user activity and interviews with users.
Evaluation of Models • 16 subjects with same: • Task (collecting information on ethnomathmatics for teacher) and • Setting (20 NSDL and 20 Google results) • Different rating of documents • Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)
Predictive Power of Models • Models limited due to data from original study • Used aggregated user activity and user evaluations to evaluate models Lower residue indicates better predictions • Combined model better than reading-activity model (p=0.02) and organizing-activity model (p=0.07) Model Avg. Residue Std. Dev. Reading-activity model 0.258 0.192 Organizing-activity model 0.216 0.146 Combined model 0.175 0.138 Hand-tuned model 0.197 0.134
Reading Application Interest User Interest Profile Estimation Engine Manager Reading Application Reading Application Organizing Location/Overview Interest Profile Application Application Architecture for Interest Modeling • Results of study motivated development of infrastructure for multi-application interest modeling
New Tools: VKB 3 • New Document Object • User expression via coloring document objects’ user layer leads to user interests • System layer used to indicate documents’ relations to inferred interests MainLayer SystemLayer
Evaluation of the New Design • 20 subjects organized 40 documents about “antimatter” returned by Yahoo! search • Subjects assessed the relevance of each document at the end of the task • 10 with and 10 withoutsuggestions/thumbnails • Measured • Task switching • Time on documents
Results • Task Switching • Fewer but longer reading sessions with new interface • Average reading time • 10.7 seconds with new features • 4.3 seconds without • p < 0.0001 • Interpretation: People are doing more in-depth reading
Results • Document Attention • 6 of 10 subjects with new interface had correlations between reading time and document value • Only 2 subjects with old interface had significant correlations • Interpretation: New interface users located and spent more time on documents of value to their task
HCC Conclusions • Question simplifying assumptions • Recognized that users are engaged with multiple documents and multiple applications simultaneously • Iterate between design and user studies • Design software as an extensible environment enabling easier redesign • New system resulted in more in-depth reading and more time spent on relevant documents
Broad View of Computer Science • Many really important problems require cooperative problem solving systems • Solutions that assume we can vary the behavior of only one of the computer and the user are less likely to succeed • Need CPS design, development, and evaluation skills • Recognize whether the problem is one of computation, representation, or interaction • You can be part of solving big problems
Contact Information Email: shipman@cse.tamu.edu Web: www.csdl.tamu.edu/~shipman