70 likes | 174 Views
Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism, tools, services, and standards, and social collaboration. E. Michael (Max) Maximilien Almaden Services Research http://maximilien.org. Approach and thesis. Two key problems
E N D
Multifaceted approach to ontologizing the ONTOLOG contentRooted in pragmatism, tools, services, and standards, and social collaboration E. Michael (Max) Maximilien Almaden Services Research http://maximilien.org Almaden Research Center, San Jose, CA
Approach and thesis • Two key problems • Lack of pragmatism in the goals of ontologies • Heterogeneity of usage and use cases • Summary of approach • Simple tagging for human collaboration (folksonomies) as well as rating systems for content parts • Covert audio automatically into annotated text transcript • Mining tools to automate annotation of content and infer taxonomies • Ontology for outline of content • Secret sauce is in how we combine the semantics, i.e., algorithm, and the use cases we try to solve Almaden Research Center, San Jose, CA
Tagging and ratings – Human collaboration • Tagging • Idiosyncratic • Results in bag of tags forming folksonomies • Various available services, e.g., http://del.icio.us, http://flikr.com, and so on • Need incentives for humans, e.g., easier search • Evolving into some form of “ontology” (see Peter Mika’s paper “Ontologies are us: A unified model of social networks and semantics” at ICSW 2005) • Ratings • Enables feedback • Rate ratings to avoid collusion • Similar to http://digg.com, Amazon’s rating system, and eBay.com reputation system (various works in literature) Almaden Research Center, San Jose, CA
Audio content • Automated transcript • Use services to convert audio to text transcript • Some services, e.g., http://podzinger.com, also annotate the transcript and do more than close captions • May involve human collaboration to gradually improve content (especially resolving context errors) • Issues • ONTOLOG audio (Podcast) have some low quality MP3 • Static noise and “voice storms” Almaden Research Center, San Jose, CA
Mining • Automatic annotation of content • Mature tool set in UIMA • Others (?) • Generate initial taxonomy • Continual process to update annotation • Dr. David Ferrucci (IBM Research) lead architect of UIMA project to present to community on May 11, 2006 Almaden Research Center, San Jose, CA
Ontology • Outline • Create initial outline of site content with some upper ontology • Reuse existing ontology • IMO this ontology can be specific to ONTOLOG • What are the primary goals for this outline and ontology? • Cataloguing • Search (why not just use Google services?) • Statistics (why not just use Amazon’s Alexa services?) • Others (?) Almaden Research Center, San Jose, CA
Hindi Thai Traditional Chinese Gracias Spanish Russian Thank You Obrigado English Brazilian Portuguese Arabic Danke German Grazie Merci Italian French Simplified Chinese Tamil Japanese Korean Almaden Research Center, San Jose, CA