600 likes | 706 Views
CADIP Research at UW-Milwaukee. Ethan Munson and Yelena Tsymbalenko. Research Foci. Languages for implementing agents MS Thesis by Preeti Seshadri Multimedia information retrieval Exploiting metadata to improve MM IR Yelena Tsymbalenko’s MS research Models of media
E N D
CADIP Research atUW-Milwaukee Ethan Munson and Yelena Tsymbalenko
Research Foci • Languages for implementing agents • MS Thesis by Preeti Seshadri • Multimedia information retrieval • Exploiting metadata to improve MM IR • Yelena Tsymbalenko’s MS research • Models of media • Usability of information visualization • future work
Using HTML Metadata to Retrieve Relevant Images from the Web Yelena Tsymbalenko University of Wisconsin-Milwaukee
Why is image search important? • The Web is primary source of obtaining information. • Images are one of the most valuable sources of information available on the Web. • Few WWW image search engines currently exist. • Using textual search engines to find images manually is laborious.
A Requirement for Web Image Search • We need an efficient method of discovering and indexing image content. • Two main sources of information about image content: • image processing • associated text • text content • markup
Related work • WebSeek(J. Smith & S. Chang, Columbia University) • performs a semi-automated classification of the images • uses image file name for categorization • searches by browsing or searching through the categories • uses image features such as color content to find images of similar color
Related work • WebSeer(M. Swain et al., The University of Chicago) • uses associated text and markup to supplement information derived from analyzing image content • uses multiple kinds of metadata • decides which images are photographs
Why look for new methods for image retrieval? • The number of WWW documents is growing rapidly and constantly changing. • Image processing is complex and computationally expensive. • We need fast and efficient methods for finding images. • Extensive image processing is not necessary.
Our research • Obtain information about image content from HTML Source Code: • Explicit: file and HREF names • Implicit: markup structure • Determine which features of Web documents are best clues to image content
Search Strategy Examples • Image file name • Title of HTML document • Alternate text (ALT tag) • Text of hyperlink • Text of the same paragraph • Header text
Analysis Plans • Will collect data about search results for a number of queries (several dozen) • Suggestions for queries are welcome ! • Will test which clues are most effective • Are some redundant ? • Does a combination of clues produce better recall ? • Are some clues more precise ? • Is search performance dependent on query type ? • Proper names (Chaplin, Garvey) • Phenomena (riot, explosion)
Using HTML Metadata to Retrieve Relevant Images from the Web Yelena Tsymbalenko Department of Computer Science University of Wisconsin - Milwaukee yelena@cs.uwm.edu
Agent Implementation Languages • Preeti Seshadri’s thesis has two parts • Survey of languages • Pure scripting languages • Tcl, Perl • Scripting/general-purpose languages • Java, Python, Telescript • Resource management service for Java • Interface design • Partial implementation
Language Requirements • Good language infrastructure • OO or other good modularity features • Automated memory management • Decent performance • Byte-compilation is probably enough • Portability • Security • Mobile agents must either be trusted or controlled • Control is always better
Language Survey • Systems programming languages (C, C++) • high-performance, but non-portable and insecure • Pure scripting languages (Tcl, Perl) • low-to-medium performance, portable • limited security and communication services • Scripting/general-purpose languages (Java, Python, Telescript) • medium-performance, portable • more security and communication support
Systems Programming Languages • Native-code compilation yields very high performance • Native-code is not portable • compilation is too complex to perform at client site • Language definitions are limited • no security or coordination infrastructure • little is guaranteed about higher-level services • even exception handling is limited
Pure Scripting Languages • Tcl is a bad choice • poorly suited for larger applications • low performance • poor language infrastructure • non-OO, no threads, no exceptions • Perl is a bit better • performance is better, but not great • limited security • language complexity is high
Telescript • “Environment” for constructing agent societies • Proprietary (General Magic, Inc.) • Language, engine, communication protocols • Claimed to be fast, easy-to-use, secure • Core concepts • “places” are execution contexts and can be nested • No agent-to-agent communication • agents move to places and do things • Capability-based security (“permits”)
Python • An OO scripting language • Unusual dynamic type system • Many high-level data types • Socket-level networking support • Typical byte-compiled characteristics • portability, dynamic linking • Limited security support • “Restricted execution,” similar to sandboxing • appears poorly integrated with mobility
Java • General-purpose language widely used for scripting-style applications • Excellent language design • Medium performance • Strong security features • customizable “sandboxes” • Heavily and effectively hyped • portability is overrated • performance will probably never match C++
Security Issues in Java • Java is very secure, but problems remain • e.g. security managers are inflexible • Agent portability is a problem • A newly arrived agent must be trusted • sandboxing addresses the obvious trust issues • Denial of service attacks are still possible • deliberate and accidental • Java lacks standard resource management services
Resource Management Interface • Supports both monitoring and control • CPU time • memory • threads • Granularity is per-thread and per-threadgroup • Designed to work on bytecode, not source • can monitor “outside” agents
RunTimeException UsageException ThreadRegister ThreadRegister ChiefMonitor ExceedUse Class Structure has uses interface uses exception uses is
Interface Details • Initialization • Resource usage queries • consumption • limits • Resource usage control • set usage bounds and policy • reset usage bounds • Resource exceptions • interrupt-style control
Implementation Plans • Prototype requires that agent be built with internal monitoring support • agent’s implementor must cooperate • We want to impose monitoring on arbitrary mobile agents • Solution: bytecode rewriting • All interesting operations have well-defined representation in Java bytecode • Will wrap relevant bytecodes in monitoring code • Similar to Purify/Quantify
A Theory of Media for Multimedia Authoring and Browsing Systems
The Original Problem • Develop a multimedia document system that allows easy addition of new media modules • Kernel/shell architecture • Shells support individual media • text, graphics, video • Kernel provides medium-independent services • document structure, scripting language, style sheet system
Proteus Style Sheet System • Portable style sheet system • PSL style language adapts to application (or media module) • medium supported by application is specified with MSPEC language • Architecture designed for multiple, simultaneous presentations • Used in Ensemble document environment and in MPMosaic WWW browser