200 likes | 345 Views
On the Evaluation of Semantic Web Service Matchmaking Systems. Vassileios Tsetsos , Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications
E N D
On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and Telecommunications University of Athens – Greece ECOWS ’06 @ Zurich
Outline • Introduction • Problem Statement • A Generalized Fuzzy Evaluation Scheme for Service Retrieval • Experimental Results • A Pragmatic View • Conclusions
SWS Matchmaking • Matching service requests and advertisements, based on their semantic annotations (expressed through ontologies) • Numerous matchmaking approaches • Logic-, similarity-, structure-based (graph matching) • Various matched entities • functional service parameters (e.g., IOPE attributes) • Non-functional parameters (e.g., QoS attributes) • Ultimate goal: More effective service discovery, based on semantics and not just on syntax of service descriptions
Degree of Match • A value that expresses how similar two entities are, with respect to some similarity metric(s) • Important feature of almost all SWS matchmaking approaches • Allows for ranking of discovered services • Example DoM set: exact, plugin, subsumes, subsumed-by, fail
Matchmaking Engine Expert S1 e(R,S1) r(R,S1) S2 r(R,S2) e(R,S2) R R . . . r(R,Sn) e(R,Sn) Sn Evaluation Basics • Most works evaluate the performance of SWS Discovery (i.e., response times, scalability) • Limited contributions to the evaluation of retrieval effectiveness (i.e., the ability to discover relevant services) Q: possible service requests S: advertisements of published services e: QxS→W (DoM, analogous to Retrieval Status Value in IR) r: QxS→W (expert mappings) Evaluation is the determination of how closely vector e approximates vector r
Evaluation Schemes • W is the set of values denoting DoM (for e) or degree of relevance (for r) • W defines different evaluation schemes (EVS):
Boolean Evaluation (EVS1) W={0,1} Information Retrieval (IR) measures can be used: Precision (PB) and Recall (RB) RT: set of retrieved advertisements RL: set of relevant advertisements
Si e(R,Si) Si e’(R,Si) S1 A S2 B S3 A S4 D S5 D S6 C S7 B S1 1 S2 1 S3 1 S4 0 S5 0 S6 0 S7 1 Threshold = “B” Problem Statement (1/2) • Since, SWS matchmaking systems have multi-valued vectors e, application of Boolean evaluation implies the introduction of a relevance threshold • Problem 1: This “Booleanization” process filters out any service semantics captured through DoM • Problem 2: An optimal threshold value is hard to find
Problem Statement (2/2) • Problem 3: Boolean expert mappings are too coarse-grained and do not always reflect the intention of the domain expert. • Experiment • Manually defined multi-valued mappings between 6 requests and 135 advertisements of TC2 with W={0, 0.25, 0.5, 0.75, 1} • Calculation of deviation from existing Boolean mappings • Only ~33% of the Boolean mappings agree with the multi-valued ones • ~40% of the Boolean mappings are not even close to the multi-valued ones (deviation > 0.25)
A Generalized Fuzzy Evaluation Scheme • Such scheme (EVS2) can provide solutions to the aforementioned problems • Main design decisions • Expert mappings are fuzzy linguistic terms • DoM are fuzzy sets • Boolean measures are substituted by generalized ones • Why fuzzy modeling? • Relevance is an “amorphic” concept (L. Zadeh). I.e., its complexity prevents its mathematical definition • Numeric values have vague semantics • Fuzzy linguistic variables assume values from a linguistic term set, with each term being a fuzzy variable set • Warning: Fuzziness does not refer to the matchmaking process per se
I S SW R V F SB S P E 1.0 1.0 Membership Value Membership Value 0.0 0.0 0.5 0.5 1.0 1.0 Degree of Relevance Degree of Match I: Irrelevant S: Slightly relevant SW: Somewhat relevant F: FAIL SB: SUBSUMED-BY S: SUBSUMES P: PLUGIN E: EXACT R: Relevant V: Very relevant Fuzzification of e and r fr: QxS→[0,1] fe: QxS→[0,1] If there is not one-to-one correspondence between the number of fuzzy variables in each set, fuzzy modifiers could be used (e.g., dilutions, concentrators)
Generalized Evaluation Measures • Based on [Buell and Kraft, “Performance measurement in a fuzzy retrieval system”, 1981] the following measures are defined: • The cardinalities of the sets RT and RL are transformed to fuzzy set cardinalities, since the above sets are fuzzy. • Note: the evaluation measures take into account all services Si
ExperimentalResults (1/3) • Manual assessment of fuzzy relevance in the “Education” subset of TC v2 • Matchmaking engine: OWLS-MX Matcher • Used only logic-based matching algorithms • Threshold = FAIL Difference between RG and RB is due to considerable deviation between Boolean and fuzzy expert mappings
Experimental Results (2/3) • Sensitivity of the proposed scheme • Only the generalized measures, are affected by “stronger” false negatives/positives
EVS1 EVS2 EVS1 (average) EVS2 (average) Experimental Results (3/3) • Similar overall behavior but better accuracy/sensitivity as already shown
Statistics Logic implications Boolean Value (e.g., “1”) Adjusted Fuzzy Value (e.g., “relevant”) Other inference rules Reasoning about “Relevance” A Pragmatic View • A reasonable assumption • experts are not willing to provide more than Boolean mappings • Automatic fuzzification of Boolean expert mappings would be valuable
Service S1 Sx S3 R S5 S6 S7 A First Approach • Services are represented as concepts and form a service profile ontology • Then an inference matrix is used for adjusting the Boolean r values
Experimental Results • The new scheme (EVS2’) approximates EVS2 better than EVS1 • Under the assumption that EVS2 is more accurate, the EVS2’ seems promising EVS1 EVS2 EVS1 (average) EVS2 (average) EVS2’
Conclusions • Service retrieval evaluation should be semantics-aware • A generalization of the current evaluation measures is deemed necessary • Fuzzy Set Theory may assist towards this direction • However, many practical issues remain open
Thank You! Questions??? http://p-comp.di.uoa.gr