330 likes | 486 Views
WEB MINING. Nuri Kayaoglu Humboldt University Master‘s Program in Economics and Management Sci. SEMANTIC WEB-MINING. Overview. Current Web Semantic Web Structure, components of SW Some key concepts Design facts Conclusion. Current Web. The web was pretty revolutionary, right?
E N D
WEB MINING Nuri Kayaoglu Humboldt University Master‘s Program in Economics and Management Sci. SEMANTIC WEB-MINING Web-Mining WS 01-02
Overview • Current Web • Semantic Web • Structure, components of SW • Some key concepts • Design facts • Conclusion Web-Mining WS 01-02
Current Web • The web was pretty revolutionary, right? • Before the web: systems like HyperCard • But the web was world-wide • Origin and Goals of the Web • Human communication through shared knowledge • Working together: Social efficiency, understanding and scaling • Exploitation of computing power in real life Web-Mining WS 01-02
Current Web • Anyone with a server could • publish documents for the rest of the world, • hyperlink any document to any other document. • No matter where the servers were • if you could browse the page, then link. • These early days were exciting indeed. Web-Mining WS 01-02
Current Web • Hyperlinking to everything in the universe is cool: but, it’s become rather boring. • Now, we have all of these documents linked together: question--> • isn't there something more we can do with them? Web-Mining WS 01-02
Current Web • Web is a source of resources and links; • To a user this has become an exciting world. • Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully • To a machine, however, very little machine readable information is available. Web-Mining WS 01-02
Semantic Web (SW) • The Web is huge but not very smart. • Computer scientists are beginning to build a „Semantic Web“ that understands the meanings that underlie the tangle of information. • Idea: weave a Web that not only links documents to each other but also recognizes the meaning of the information in those documents. • a task people can do quite well, but is a tall order for computers; e.g.: what do “head”, “cook” mean? Web-Mining WS 01-02
Semantic Web vs. Current Web • “The Semantic Web is really data that is processable by machine.” Berners-Lee, director of W3C (father of the web) • Adding semantics will radically change the nature of Web: • from a place where information is merely displayed to one where it is interpreted, exchanged and processed. • Semantic-enabled search agents will be able to collect machine-readable data from diverse resources, process it and infer new facts. Web-Mining WS 01-02
SW - Extension to Current Web • Ultimate goal of the Semantic Web: • Give users near omniscience over the vast resources of the Internet, turning the millions of existing database islands into a single gigantic database. • To a user, this will become even a more exciting world. • Realizing the full potential of the Web. Web-Mining WS 01-02
Semantic-Web: an example • Gabriel, Aicha, mom • Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. • So, set up an appointment! • Instruct the Semantic Web agent! • In a few minutes agents* provide the plan. • *: A piece of software that runs without direct human control or constant supervision to accomplish goals provided by a user • Thanks not to WWW of today but rather the Semantic Web that it will evolve into tomorrow. Web-Mining WS 01-02
Semantic Web: Some features • The Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. • Semantic Web aims to make up for this. • Semantic Web will be as decentralized as possible (like the Internet). Web-Mining WS 01-02
Components of SW • eXtensible Markup Language (XML): • A markup language like HTML that let individuals define and use their own tags, • has no built-in mechanism to convey the meaning of the user’s new tags to other users, • lets everyone create their own tags, hidden labels such as <zip code> that annotate Web pages or sections of text on a page. Web-Mining WS 01-02
Components of SW • Resource Description Framework (RDF): • A scheme for defining information on the Web, • provides the technology for expressing the meaning of terms and concepts in a form that computers can readily process. • Meaning is expressed by RDF, which encodes it in sets of triples; • each triple being rather like the subject, verb and object of an elementary sentence, • these triples can be written using XML tags. Web-Mining WS 01-02
Components of SW • RDF (cont.) • In RDF a document makes assertions that particular things • people, Web pages or whatever • have properties • such as “is a sister of”, “is the author of” • with certain values • another person, another web page. Web-Mining WS 01-02
Components of SW • RDF (cont.) • Subjects and objects are each identified by a Universal Resource Identifier (URI), just as used in a link on a Web page. • A URI defines or specifies an entity, not necessarily by naming its location on the Web. • URLs, Uniform Resource Locators, are the most common type of URI. • Verbs are also identified by URIs. Web-Mining WS 01-02
Components of SW • RDF (cont.) • WWW was originally built for human consumption. • although everything on it is machine-readable, this data is not machine-understandable. • Solution: Use metadata to describe the data contained on the web! • Metadata: Data about data (e.g.: library catalogpublications) Web-Mining WS 01-02
Components of SW • RDF (cont.) • RDF is a foundation for processing metadata, • provides interoperability between applications that exchange machine-understandable information on the Web, • RDF with digital signatures will be key to building the “Web of Trust” for electronic commerce, collaboration, etc. Web-Mining WS 01-02
Components of SW • Ontologies • Collections of information, • A document or file that forms the relations among the terms, • Collection of statements written in a language such as RDF that define the relations between concepts and specify logical rules for reasoning about them, • Computers will “understand” the meaning of semantic data on a Web page by following links to specified ontologies. Web-Mining WS 01-02
Components of SW • Ontologies (cont.) • No SW without metadata, but metadata alone won‘t suffice. • The metadata in Web pages will have to be linked to special documents that define metadata terms and the relationships between these terms. • These sets of shared concepts and their interconnections are ontologies. • Example: members of faculty, condors • Problem: Political and cultural bias will creep into ontologies, e.g.: Chinese governmentTaiwan Web-Mining WS 01-02
SW and ERM • Question: Is the RDF an ERM? • Answer: Yes and No! • It is great as a basis for ER-modelling, but because RDF is used for other things as well, RDF is more general. • RDF is a model of entities (nodes) and relationships. • If you are used to the “ER” modelling system for data, then the RDF model is basically an opening of the ER model to work on the Web. Web-Mining WS 01-02
Real Power of SW • Agents • The real power of SW will be realized when people create many programs that • collect Web content from diverse resources, • process the information, • exchange the results with other programs. • The effectiveness of such software agents will increase exponentially as more machine-readable Web-content and automated services become available. Web-Mining WS 01-02
Evolution of Knowledge • The SW is not merely the tool for conducting individual tasks. • If properly designed, the SW can assist the evolution of human knowledge as a whole. • A small group can • innovate rapidly and efficiently, • but this produces a subculture whose concepts are not understood by others. • Coordinating actions across a large group, however, is painfully slow and and takes an enormous amount of communication. Web-Mining WS 01-02
Evolution of Knowledge (cont.) • The world works across the spectrum between these extremes, with a tendency to start small, from the personal idea, and move toward a wider understanding over time. • An essential process is the joining together of subcultures when a wider language is needed. • The SW must allow the independent work of diverse communities to be combined effectively. Web-Mining WS 01-02
Some design facts • Inconsistency • Surely, once you have one statement that A and another somewhere on the Web that not A, then doesn’t the whole system fall apart? • This fear is quite valid. • Solution: Digital signature adds a notion of security to the whole process. • Key concept: Trust Web-Mining WS 01-02
Some design facts • Expiry Gabriel: What is the time, Michael? Michael: Five past ten, my friend. [They chat for a minute] Gabriel: What is the time, Michael? Michael: Six minutes past ten, Mr. Gabriel. Gabriel: But Michael, you just told me just a minute ago it was five minutes past ten. How can I ever believe you again? Web-Mining WS 01-02
Some design facts • Expiry (cont.) • Problem and question: • Time-varying information is one cause of apparent contradiction. • People and documents change status. • How does one base inference on information which may be out of date? Web-Mining WS 01-02
Some design facts • Expiry (cont.) • A solution proposals: • Put explicit or implicit expiry dates on everything. • Whenever a server sends resource to an HTTP client, it can give an expiry date. • The client can track this, and ensure that all deductions from that document are cancelled when the date arrives, unless a more recent copy can be obtained. Web-Mining WS 01-02
Some design facts • Expiry (cont.) • Another technique: • Make any looseness which exists in the real system visible. Instead of saying • Any employee of any member organization of W3C may register. • you say formally to the registration engine • Any person who was sometime in the last 2 months an employee of an organization which was sometime in the last 2 months a W3C member may register. • In other words, if an organization were to drop its membership, the system doesn’t have to support propagating that information instantly. Web-Mining WS 01-02
A sampling of companies developing tools and applications for the SW Web-Mining WS 01-02
Conclusion • The WWW is an information resource with virtually unlimited potential. • However, this potential is relatively untapped because it is difficult for machines to process and integrate this information meaningfully. • Solution: Semantic Web • Human understandable content is structured in such a way as to make it machine processable. • Key components: XML, RDF, ontology, agents Web-Mining WS 01-02
Further information • World Wide Web Consortium (W3C): www.w3.org • W3C Semantic Web Activity: www.w3.org/2001/sw • An introduction to ontologies: www.SemanticWeb.org/knowmarkup.html • Simple HTML Ontology Extensions Frequently Asked Questions (SHOE FAQ): www.cs.umd.edu/projects/plus/SHOE/faq.html • DARPA Agent Markup Language (DAML) home page: www.daml.org Web-Mining WS 01-02
HAPPY NEW YEAR Web-Mining WS 01-02