1 / 20

On the Evaluation of Semantic Web Service Matchmaking Systems

On the Evaluation of Semantic Web Service Matchmaking Systems. Vassileios Tsetsos , Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications

cybil
Download Presentation

On the Evaluation of Semantic Web Service Matchmaking Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and Telecommunications University of Athens – Greece ECOWS ’06 @ Zurich

  2. Outline • Introduction • Problem Statement • A Generalized Fuzzy Evaluation Scheme for Service Retrieval • Experimental Results • A Pragmatic View • Conclusions

  3. SWS Matchmaking • Matching service requests and advertisements, based on their semantic annotations (expressed through ontologies) • Numerous matchmaking approaches • Logic-, similarity-, structure-based (graph matching) • Various matched entities • functional service parameters (e.g., IOPE attributes) • Non-functional parameters (e.g., QoS attributes) • Ultimate goal: More effective service discovery, based on semantics and not just on syntax of service descriptions

  4. Degree of Match • A value that expresses how similar two entities are, with respect to some similarity metric(s) • Important feature of almost all SWS matchmaking approaches • Allows for ranking of discovered services • Example DoM set: exact, plugin, subsumes, subsumed-by, fail

  5. Matchmaking Engine Expert S1 e(R,S1) r(R,S1) S2 r(R,S2) e(R,S2) R R . . . r(R,Sn) e(R,Sn) Sn Evaluation Basics • Most works evaluate the performance of SWS Discovery (i.e., response times, scalability) • Limited contributions to the evaluation of retrieval effectiveness (i.e., the ability to discover relevant services) Q: possible service requests S: advertisements of published services e: QxS→W (DoM, analogous to Retrieval Status Value in IR) r: QxS→W (expert mappings) Evaluation is the determination of how closely vector e approximates vector r

  6. Evaluation Schemes • W is the set of values denoting DoM (for e) or degree of relevance (for r) • W defines different evaluation schemes (EVS):

  7. Boolean Evaluation (EVS1) W={0,1} Information Retrieval (IR) measures can be used: Precision (PB) and Recall (RB) RT: set of retrieved advertisements RL: set of relevant advertisements

  8. Si e(R,Si) Si e’(R,Si) S1 A S2 B S3 A S4 D S5 D S6 C S7 B S1 1 S2 1 S3 1 S4 0 S5 0 S6 0 S7 1 Threshold = “B” Problem Statement (1/2) • Since, SWS matchmaking systems have multi-valued vectors e, application of Boolean evaluation implies the introduction of a relevance threshold • Problem 1: This “Booleanization” process filters out any service semantics captured through DoM • Problem 2: An optimal threshold value is hard to find

  9. Problem Statement (2/2) • Problem 3: Boolean expert mappings are too coarse-grained and do not always reflect the intention of the domain expert. • Experiment • Manually defined multi-valued mappings between 6 requests and 135 advertisements of TC2 with W={0, 0.25, 0.5, 0.75, 1} • Calculation of deviation from existing Boolean mappings • Only ~33% of the Boolean mappings agree with the multi-valued ones • ~40% of the Boolean mappings are not even close to the multi-valued ones (deviation > 0.25)

  10. A Generalized Fuzzy Evaluation Scheme • Such scheme (EVS2) can provide solutions to the aforementioned problems • Main design decisions • Expert mappings are fuzzy linguistic terms • DoM are fuzzy sets • Boolean measures are substituted by generalized ones • Why fuzzy modeling? • Relevance is an “amorphic” concept (L. Zadeh). I.e., its complexity prevents its mathematical definition • Numeric values have vague semantics • Fuzzy linguistic variables assume values from a linguistic term set, with each term being a fuzzy variable set • Warning: Fuzziness does not refer to the matchmaking process per se

  11. I S SW R V F SB S P E 1.0 1.0 Membership Value Membership Value 0.0 0.0 0.5 0.5 1.0 1.0 Degree of Relevance Degree of Match I: Irrelevant S: Slightly relevant SW: Somewhat relevant F: FAIL SB: SUBSUMED-BY S: SUBSUMES P: PLUGIN E: EXACT R: Relevant V: Very relevant Fuzzification of e and r fr: QxS→[0,1] fe: QxS→[0,1] If there is not one-to-one correspondence between the number of fuzzy variables in each set, fuzzy modifiers could be used (e.g., dilutions, concentrators)

  12. Generalized Evaluation Measures • Based on [Buell and Kraft, “Performance measurement in a fuzzy retrieval system”, 1981] the following measures are defined: • The cardinalities of the sets RT and RL are transformed to fuzzy set cardinalities, since the above sets are fuzzy. • Note: the evaluation measures take into account all services Si

  13. ExperimentalResults (1/3) • Manual assessment of fuzzy relevance in the “Education” subset of TC v2 • Matchmaking engine: OWLS-MX Matcher • Used only logic-based matching algorithms • Threshold = FAIL Difference between RG and RB is due to considerable deviation between Boolean and fuzzy expert mappings

  14. Experimental Results (2/3) • Sensitivity of the proposed scheme • Only the generalized measures, are affected by “stronger” false negatives/positives

  15. EVS1 EVS2 EVS1 (average) EVS2 (average) Experimental Results (3/3) • Similar overall behavior but better accuracy/sensitivity as already shown

  16. Statistics Logic implications Boolean Value (e.g., “1”) Adjusted Fuzzy Value (e.g., “relevant”) Other inference rules Reasoning about “Relevance” A Pragmatic View • A reasonable assumption • experts are not willing to provide more than Boolean mappings • Automatic fuzzification of Boolean expert mappings would be valuable

  17. Service S1 Sx S3 R S5 S6 S7 A First Approach • Services are represented as concepts and form a service profile ontology • Then an inference matrix is used for adjusting the Boolean r values

  18. Experimental Results • The new scheme (EVS2’) approximates EVS2 better than EVS1 • Under the assumption that EVS2 is more accurate, the EVS2’ seems promising EVS1 EVS2 EVS1 (average) EVS2 (average) EVS2’

  19. Conclusions • Service retrieval evaluation should be semantics-aware • A generalization of the current evaluation measures is deemed necessary • Fuzzy Set Theory may assist towards this direction • However, many practical issues remain open

  20. Thank You! Questions??? http://p-comp.di.uoa.gr

More Related