1 / 23

OKKAM An Entity Name System (ENS) for the Web

OKKAM An Entity Name System (ENS) for the Web. Paolo Bouquet Dept. of Engineering and Information Science University of Trento (Italy) OKKAM Project Coordinator. Korea-EU Cooperation Forum on ICT Seoul – June 16 & 17, 2008. The OKKAM factsheet. University of Trento, IT L3S Hannover, DE

Download Presentation

OKKAM An Entity Name System (ENS) for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OKKAM An Entity Name System (ENS) for the Web Paolo Bouquet Dept. of Engineering and Information Science University of Trento (Italy) OKKAM Project Coordinator Korea-EU Cooperation Forum on ICT Seoul – June 16 & 17, 2008

  2. The OKKAM factsheet

  3. University of Trento, IT L3S Hannover, DE SAP Research, DE Elsevier, NL Expert System, IT Europe Unlimited, BE MAC, IE EPFL, CH DERI Galway, IE University of Malaga, SP INMARK, SP ANSA, IT The OKKAM consortium 5 universities 4 SMEs 3 corporations

  4. OKKAM: the rational How many “entities” (persons, locations, organizations, events, projects, products, …) are named in: Content in your Intranet/ Enterprise Information System Contents/applications on the Web Files in your laptop 1B~1000B 100K~1M 1K~10K

  5. An invaluable asset “Entities” is what a large part of our knowledge is about: Locations Organizations Projects Persons Products Events

  6. However … How many names, descriptions or IDs (URIs) are used for the same “entity”? London 런던 ܠܘܢܕܘܢ लंडन लंदन લંડન ለንደን ロンドン লন্ডন ลอนดอน இலண்டன் ლონდონი Llundain Londain Londe Londen Londen Londen Londinium London Londona Londonas Londoni Londono Londra Londres Londrez Londyn Lontoo Loundres Luân Đôn Lunden Lundúnir Lunnainn Lunnon لندن لندن لندن لوندون לאנדאן לונדון Λονδίνο Лёндан Лондан Лондон Лондон Лондон Լոնդոն 伦敦 … The capital of UK, host city of the IV Olympic Games, host city of the XIV Olympic Games, future host of the XXX Olympic Games, the city of the Westminster Abbey, the city of the London Eye, the city described by Charles Dickens in his novels, … http://sws.geonames.org/2643743/ http://en.wikipedia.org/wiki/London http://dbpedia.org/resource/Category:London …

  7. … or … How many “entities” have the same name? • London, KY • London, Laurel, KY • London, OH • London, Madison, OH • London, AR • London, Pope, AR • London, TX • London, Kimble, TX • London, MO • London, MO • London, London, MI • London, London, Monroe, MI • London, Uninc Conecuh County, AL • London, Uninc Conecuh County, Conecuh, AL • London, Uninc Shelby County, IN • London, Uninc Shelby County, Shelby, IN • London, Deerfield, WI • London, Deerfield, Dane, WI • London, Uninc Freeborn County, MN • ... • Or • London, Jack2612 Almes DrMontgomery, AL(334) 272-7005 • London, Jack R2511 Winchester RdMontgomery, AL 36106-3327(334) 272-7005 • London, Jack1222 Whitetail TrlVan Buren, AR 72956-7368(479) 474-4136 • London, Jack7400 Vista Del Mar AveLa Jolla, CA 92037-4954(858) 456-1850 • ...

  8. Social networks in London Videos and tags for London … or … How many content types / applications provide valuable information about each of these “entities”? News about London Revyu.com reviews on hotels in London Wiki pages about the London Pictures and tags about London

  9. This is an immense loss of value • Precision and recall of keyword-based search engines is deeply affected • Semantic search still mainly relies on keyword search/matching (as very few URIs are consistently reused across RDF datasets) • Information integration (e.g. from heterogeneous databases) is hard to achieve • Mash-ups from different sources are not easy to create • Data/Text/Web mining tools must struggle with object consolidation from different data sources • Information extraction produces poorly integrated results (co-reference is supported mainly in the same document/corpus, not across large-scale distributed environments)

  10. The OKKAM vision OKKAM envisions an open, decentralized and global knowledge space in which: • Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique • The same identifier is used to name the same entity across any type of content, format, application, domain, language, culture • Contents on the Web (from text to multimedia) are consistently annotated with entity unique identifiers • Users and application are provided with easy and straightforward ways (e.g. GUIs, APIs) for retrieving the IDs of the entities they need to name in their content This will enable a new powerful generation of entity-centric applications and services

  11. Enabling the OKKAM vision: an ENS for the Web OKKAM aims at implementing an Entity Name System (ENS) for the Web, namely an infrastructure which can offer the following basic services: • ID storage and management: stores, maintains and makes available for reuse IDs (URIs) for anything which is named in a networked environment • Entity matching (and look-up): maps any arbitrary description of an entity to its global ID (or to the collection of known IDs) • APIs: provides application interfaces which can return IDs to applications which needs them in the creation of new content (e.g. word processors, ontology editors, HTML editors, web-based authoring environments, …) • Access: allow users or application to create new IDs for things which do not have one, or to modify/update an entity description in the ENS

  12. OKKAM ENS – Global and Decentralized Replicated public nodes (ENS servers) hosted in different locations Local “corporate“ nodes for non-public data (and cache)

  13. ENS-empowered tools Any application which is used to create/edit content can be extended with “plugins” which enable the application to get from the ENS the right ID for a named entity Tools we have already empowered (demos available!): Annotating texts created with MS Word with IDs automatically obtained from the ENS Reusing URIs in RDF/OWL knowledge bases built with Protégé with IDs automatically obtained from the ENS Introducing global IDs to content created through web-based authoring interfaces (e.g. FOAF content)

  14. Three OKKAM applications In the project, we have three application scenarios: • Entity-centric semantic search engine (DERI Galway) • Entitity-centric organizational knowledge management systems: entity-centric product lifecycle management within SAP • Multimedia authoring environments: supporting the creation of news (ANSA) and scientific papers (Elsevier) in an ENS-empowered environment

  15. Potential for Collaboration • OKKAM is by construction an open project: the only chance of success for it is to get involved the largest possible number of contributors/users (network externality effect) • We identified 4 main areas where we seek (and need!) collaboration: • Infrastructure • Population of the repository • Content creation and annotation • Application projects in large organizations/ companies • … but of course other ideas are welcome!

  16. Collaboration 1 - Infrastructure No muscles, no good. The ENS needs a very powerful hardware and software infrastructure. We seek help for deploying the best available solutions to difficult scalability challenges • The size of the entity repository will be huge, and so will be the indexes necessary for retrieval • WE need efficient and secure synchronization methods across different ENS nodes • The number of requests for second to the ENS will become extremely high (millions per second …), leading to network bottlenecks and overloads • Response time must be extremely fast (Google-like …) • Malicious attacks must be detected and stopped immediately to prevent misuses of the ENS • Innovative access control policies must be studied (tradeoff between protection and social contribution)

  17. Collaboration II – Population of the ENS repository No entities, no fun! When the ENS goes live, the entity repository must contain all the entities whose IDs are most likely to be searched for. • We seek any available methods for extracting entities (and their description) both from structured and unstructured data sources • In addition to the obvious ones (e.g. Wikipedia, GeoNames, UniProt, …), we desperately seek data sources which can be used to extract entities for bootstrapping the repository population • We need to creative people to design simple (but popular) applications which may lead people to create IDs for new entities and edit the description of existing entities • And we also need editors for the entity repository Potential for collaboration with the “Internet of Things”?

  18. Collaboration III – Content creation and annotation No content, no party! The advantages of the ENS will be only apparent if a large amount of annotated content will be available for ENS-enabled applications. We seek collaboration in the following areas: • Entity recognition in a variety of content types (text, images, videos, data streams, …) • Efficient and reliable entity matching methods (i.e. matching an incoming description with the available entity descriptions in the repository) • Annotation of different types of content • Third party which wants to develop a new plugin for content creation tools (e.g. for blog platforms, forums and discussion lists, content tagging systems, spreadsheets, RDBMS, …)

  19. Collaboration IV – Entity-centric applications No apps, no use. We need early adopters among large organizations/companies which can prove the advantages of the approach • Content providers that want to annotate their content with OKKAM IDs (like ANSA and Elsevier in the OKKAM consortium) • Large-scale enterprises which want to adopt an entity-centric approach to managing their information and knowledge space (like SAP in the OKKAM consortium) • Web2.0 portals (including social network platforms) which want to enrich the quality of the user provided content through entity annotation • Companies which sell information extraction tools which could be improved through the interaction with the ENS (like Expert System) • Consulting companies which may want to define a methodology for entity-centric project with their customers

  20. The timeframe Three integration prototypes: Month 11 (November 2008): basic infrastructure and services and first OKKAM-empowered tools Month 20 (August 2009): advanced infrastructure, improved service, more OKKAM-enabled tools Month 28 (April 2010): full infrastructure + services, applications

  21. Pictures and tags about London Social networks in London Videos and tags for London London? News about London Revyu.com reviews on hotels in London Wiki pages about the London

  22. Pictures and tags about London Social networks in London Ah, that London! News about London Revyu.com reviews on hotels in London Wiki pages about the London http://www.okkam.org/entity/ok5806bbcb-a934-457c-ba65-5aa1c35327a7 Videos and tags for London

  23. http://fp7. .org Thanks! And Visit us at:

More Related