760 likes | 843 Views
The Evolving Semantic World. Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia. About me. Masters in Library and Information Science Long Island University New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference Entertainment Weekly Data Manager
E N D
The Evolving Semantic World Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia
About me • Masters in Library and Information Science • Long Island University • New York Public Library • Branch librarian • NYPL for the Performing Arts – Drama reference • Entertainment Weekly • Data Manager • Time Inc. • Senior Data Manager, Taxonomist, Metadata Architect, Ontologist • Martha Stewart Living Omnimedia • Taxonomist
agenda • What is the Semantic Web? • Big “S” and little “s” semantics • What we used to believe • Time Inc. & the theory of overkill • What we know now • Martha Stewart and the theory that less is more • Where we’re going • Leaner and meaner (but more standards)
The Semantic Web is a web of data…. (it) provides a common framework that allows data to be shared and reused across applications, enterprise, and community boundaries. --w3c
"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” • --Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001
The Semantic Web is about making knowledge machine and human-readable
---- AmitAgarwal http://www.labnol.org/internet/web-3-concepts-explained/8908/
Big Ssemantic web • Little ssemantic web
Big SSemantic Web …big "S" web technologies provide a framework for describing data on a web page when the data on the website is published. If data is read or captured, because the data's semantic meaning has already been described, you don't have to go through the process of understanding the meaning of the data after the fact. --Sean Martin, CEO of Cambridge Semantics
Little s Semantics Little "s" web technologies capture and filter data with no description or understanding of the data provided after the capture process. The process of understanding the meaning of that data starts once data capture has happened. People have to intervene to provide the context and meaning for language on the web. --Sean Martin, CEO of Cambridge Semantics
Big S– W3C approved standard • Little s • Looser groups of unaffiliated • standards
Essentials of Big S Semantic Web • URI – Uniform Resource Identifier • RDF – Resource Description Framework • OWL – Web Ontology Language • Semantic reasoner (inference engine)
URI – Uniform Resource Identifier • Way to identify things • Images, pages of text, locations • De-referenceable • Freebase • http://www.freebase.com/view/en/will_smith • URI’s are unique, no two are the same • Will Smith • http://www.freebase.com/view/en/will_smith
RDF – Resource Description Framework • Framework used to describe relationships between objects • Extends and formalizes XML • Subject>Predicate>Object
RDF – Resource Description Framework Subject>Predicate>Object >> >>> is the lead actor >>>>>> Will Smith Bad Boys http://ew.com/PersonsTax/Will_Smith http://ew.com/EntertainmentOnt/leadPerformanceIn http://ew.com/EntertainmentTax/Movies/Bad_Boys
OWL – Web Ontology Language …designed to be used by applications that need to process the content of information instead of just presenting it to humans -- W3C
OWL – Web Ontology Language • Metadata model • Extends RDF to further define properties • Ex:Equivalent relationships >> >>> is married to >>>>>> >> >>> is married to >>>>>>
Semantic reasoner • Software able to infer logical consequences from a set of asserted facts • Follows inference rules specified by OWL properties • Inverse • Transitive • Symmetric • Functional/Inverse functional • Equivalent
Putting it all together • Ontology • Rule set • Classes and Properties • Taxonomy • Application of Rule Set • Tags and Relationships • Everything is a statement • Subject>Predicate>Object Ex: Will Smith is lead performer in Bad Boys
Benefits of RDF/OWL • Persistent URIs • Verifiable XML • Unambiguous Relationships • Polyhierarchy • Interoperability
Limitations of RDF/OWL • Difficult to propagate across web • Challenge to integrate with legacy systems • Expensive queries • No “Killer App”
RDFa- Resource Description Framework (in) Attributes • W3C recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents • Easy to implement • Not HTML 5 compliant
Linked open data 2007 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data 2010 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Microformats • Semantic markup which seeks to re-use existing HTML/XHTML class attributes to structure data • Easy to implement • Limited formats
Microdata • AWHATWG HTML5 specification used to nest semantics within existing content on web pages • Officially supported by Bing, Yahoo, & Google • Can imbed other markup languages like RDFa, microformats, and Dublin Core • Not well-known (yet)
Open Graph Protocol • Facebook-created markup language that turns any web page into an Open Graph Objects allowing for any page to become a Facebook page • I “Like” you • Good for targeted advertising • Limited in scope
Status report on S Semantic Web • Linked Open Data graph growing • Many countries have developed government sites with rich semantics • Development of Semantic search • More widespread adoption of lighter semantics
Where we might be going • Pharmaceutical industry identifies trends across clinical studies, and not just within them • News industry better targets content by locale • Department of Defense using it to make better decisions in the field • Utilized in advertising to drive more and more revenue
Time Inc • Largest magazine media company in U.S. • 48 websites worldwide • Websites attract more than 50M unique visitors each month • Domains includes lifestyle, entertainment, style, news, sports, and business • Early adopter (2005-2006) of SW technologies
Goals • Enhance data integrity • Improve editorial efficiency • Create contextual presentation of content • Develop relationships that cannot be derived from content • Share resources among titles • Improve search and facilitate guided navigation
Challenges • Aging CMS with sites on different versions • Many different domains • Scalability to accommodate volume of data and development of complex relationships • Lack of resources, money, and time
Why we need controlled vocabularies (or why freeform keywords just don’t work) • Star Wars: Episode I -- The Phantom MenaceEpisode 1Episode IPhantom MenaceStar Wars Episode I The Phantom MenaceStar Wars Episode I: The Phantom MenaceStar Wars prequelStar Wars: Episode 1 -- The Phantom MenaceStar Wars: Episode i -- the Phantom MenaceStar Wars: Episode I: The Phantom MenaceStar Wars: Episode I--The Phantom MenaceStar Wars: Episode I--The Phantom MenanceStar Wars: Episode One -- The Phantom MenaceStar Wars: The Phantom MenaceStar Wars: The Phantom Menace -- Episode IThe Phantom MenaceThe Phanton Menace Star Wars: Episode I -- The Phantom Menace
What standard to adopt? • RDF • Flexible • Scalable • Fits business needs • New technology but industry standard • Microformats • Easy to implement • No inferencing • Solved some business needs but not all • No standards • Limited formats
Search for vendors • In 2005 fewcommercial RDF/OWL tool available that fit our needs • Open source reasoners like Jena and a proprietary design seemed more cost-effective and realistic
TOPICS • Time Ontologies for Publishing, Inference, Classification and Semantics
What is TOPICS? • Librarian Tool – allows librarians to create resources and properties • Relationship Tool - generates unambiguous connections between data • Classification Tool - allows editors to add uniform, structured metadata to content • Semantic reasoner - finds new facts from existing data • Query Engine - manages logical retrieval of data
Technical Details of System • Java application • Jena semantic reasoner • Joseki query engine • Sybase database