230 likes | 336 Views
Web Service Clustering. Building Homogenous Service Communities. Wei Liu Wilson Wong. School of Computer Science and Software engineering The University of Western Australia. Outline. A brief introduction on Web services Text mining Web Service Clustering The motivation
E N D
Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong School of Computer Science and Software engineering The University of Western Australia
Outline • A brief introduction on • Web services • Text mining • Web Service Clustering • The motivation • The challenges • The process • The results
What are Web Services • It is software designed to be used by other software via Internet protocols and formats (Forrester) • Web services are self-describingcomponents that can discover and engage other web services or applications to complete complex tasks over the Internet. (Sun Microsystems, Inc) • Web Services are loosely coupled software components delivered over the Internet via standards-based technologies like XML, and SOAP. (Gartner) • Self-describing, self-contained, modular unit of application logic that provides some business functionality to other applications through an Internet connection… (UDDI.org) • Web services are Internet-based, modular applications that perform a specific business task and conform to a particular technical format. (IBM) • A web service is application logic that is programmatically available, exposed using the Internet. (Microsoft)
The Web Service Triangle • Web services are applications accessible via the Web to be consumed by clients. • Clients of a Web Service are usually refer as service requester. • Technologies standardized by the W3C to support Web service applications are: • Web Service Description Language (WSDL) • Simple Object Access Protocol (SOAP) • Universal Discovery, Description, and Integration (UDDI)
[2] Garofalakis, J., Panagis, Y., Sakkopoulos, E., Tsakalidis, A.: Web service discovery mechanisms: Looking for a needle in a haystack? In: International Workshop on Web Engineering, Hypermedia Development and Web Engineering Principles and Techniques: Put them in use, in conjunction with ACM Hypertext, Santa Cruz (2004) What is Web Service Discovery • Broadly defined as “the act of locating a machine-processable description of a web service that may have been unknown and that meets certain functionalcriteria” • Originated from agent match-making paradigm (middle agents and brokers), later moved onto UDDI [2] • The discovery mechanisms differ according what languages are used for describing the service (WSDL or OWL-S)
Ill-fated Registry Based Structure • Static and Not scalable • The registry can become a bottle neck • New services have to be added through a laborious process to ensure “correct” categorisation, which deters people from using it • Search is keyword based • Ontology supported semantic search are only available agent and semantic web services
What we propose • Make use of the wsdl files collected by Google • Automatically cluster these files into functionally similar groups using text mining methods • linguistic analysis, and statistical techniques combined • The resulting clusters will help service discovery by reducing the size of the haystacks
Challenges • Traditional Information Retrieval and Document Clustering techniques cannot be borrowed directly, because of the following observations • web service files do not usually contain sufficiently large number of words for use as index terms or features. • Moreover, the small number of words present in the web service files are erratic and unreliable. • Related web pages that describe the WSDL service are also considered. GoogleAPI for discovering web page referral or citation. However, most of the WSDL files do not have related web pages that provide hyperlinks to them. The few that have hyperlinks referring to them are typically examples teaching how to program in a service-oriented paradigm. Observations are concurred by [9]
Obtaining Content and Context • Content – Parse the WSDL file for service descriptions in natural language • Context – Relate documents by looking at parent/grandparent directories • Tokenising, stemming, • Remove function words* • Remove programming terms*
Content Words vs. Function Words • One of the properties of content words is that they tend to “clump” or to re-occur whenever they have appeared once [10]. • On the other hand, the occurrence of function words tend to be independent of one another. • Very often, such contrasting property can be captured through the inability of the Poisson distribution to model word occurrences in documents [11]. • In other words, unlike content words, function words tend to be Poisson distributed.
Remove Function Words A segment of the output during content-word recognition performed on the word tokens in the web service context set for the service QuranService. (single parameter poisson distribution)
Remove Programming Terms Using term clustering methods that based on Normalised Google Distance to identify programming term clusters using our Tree-Traversing Ants featureless term clustering [12]
Clustering Results for QuranService A small oracle: runtime, webservice, developer, module, data
The service host and the service name • The service host is the second and top-level portion of the domain name (i.e. a segment of the authority part of the URI) of the host containing the WSDL file, and • The service name is the name of the WSDL file. • As one may note, the four features are by no means the best or the only ones available for describing a web service. • However they are the most accessible and feasible ones to use in this case.
Conclusions • The paper presented techniques for automatic discovery of web services of similar functionalities. • We term such service clusters as homogeneous service communities. If the crawling and the clustering process are in continuous operation like a typical search engine does, the approach has the potential of enabling self-organisation of the Web as proposed in [3]. • The proposed web service clustering approach assumes no registries, and can automatically reduce the search space of web services effectively. Therefore, it can be seen as a predecessor for Web Service Discovery. • This paper gathers real service description files from the Web instead of working on hypothetical examples. • The resulting clusters not only provide a useful glimpse on what services are out there, but also an insight into the types of technologies which have proliferated in this area.
The Web Service “Hype” • Web service has become a new trend for doing business online. • U.S. – 65% of companies will and have been working on Web service projects. • 2003 – $3 billions; 2008 – $15.8 billions • Web services help in e-business and e-commerce development. “Just as the Web revolutionized how users talk to applications, XML transforms how applications talk to each other.” (Bill Gates) “Web services are expected to revolutionize our life in much the same way as the Internet has during the past decade or so.” (Gartner)
Why IBM, Microsoft and SAP stopped UBR • The UDDI Business Registry (UBR) was part of the UDDI Project announced in September 2000. • The project goals were to define a set of specifications to enable description, discovery and integration and to prove interoperability through operational experience. • The UBR ran for 5 years, demonstrating live, industrial strength UDDI implementations managing over 50,000 replicated entries.
Thank You ??? Questions