1 / 50

Similarity Evaluation Techniques for Filtering Problems

Similarity Evaluation Techniques for Filtering Problems. ?. Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi. Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent. Are these two the same?. … No !

janine
Download Presentation

Similarity Evaluation Techniques for Filtering Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla vagan@it.jyu.fi

  2. Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent Are these two the same? … No ! The difference is equal to 0.234

  3. Contents • Goal • Basic Concepts • External Similarity Evaluation • An Example • Internal Similarity Evaluation • Conclusions

  4. Reference Puuronen S., Terziyan V., A Similarity Evaluation Technique for Data Mining with an Ensemble of Classifiers, In: A.M. Tjoa, R.R. Wagner and A. Al-Zobaidie (Eds.), Proc. of the 11th Intern. Workshop on Database and Expert Systems Applications, IEEE CS Press, Los Alamitos, California, 2000, pp. 1155-1159. http://dlib.computer.org/conferen/dexa/0680/pdf/06801155.pdf

  5. Goal • The goal of this research is to develop simple similarity evaluation technique to be used for social filtering • Result of social filtering here here is prediction of a customer’s evaluation of certain product based on known opinions about this product from other customers

  6. Basic Concepts:Virtual Training Environment (VTE) • VTEis a quadruple: <D,C,S,P> • Dis the set of goods D1, D2,..., Dn in the VTE; • C is the set of evaluation marks C1, C2,..., Cm ,that are used to rank the products; • Sis the set of customers S1, S2,..., Sr , who select evaluation marks to rank the products; • Pis the set of semantic predicates that define relationships between D, C, S

  7. Basic Concepts:Semantic Predicate P

  8. Problem 1:Deriving External Similarity Values

  9. External Similarity Values External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D. ESV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

  10. Problem 2:Deriving Internal Similarity Values

  11. Internal Similarity Values Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S. ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

  12. Why we Need Similarity Values (or Distance Measure) ? • Distance between products is used to advertise the customers a new product based on evaluation of already known similar products • distance between evaluations is necessary to estimate evaluation error when necessary, e.g. in the case of adaptive filtering technologies used • distance between customers is useful to evaluate weights of all customers when necessary, e.g. to be able to integrate their opinions by weighted voting.

  13. Deriving External Relation DC:How well evaluation fits the product Evaluation marks Products Customers

  14. Deriving External Relation SC:Measures customer’s competence in the use of evaluation marks • The value of the relation (Sk,Cj) in a way represents the total support that the customer Sk obtains selecting (refusing to select) the mark Cj to evaluate all the products.

  15. Example of SC Relation Evaluation marks Products Customers

  16. Deriving External Relation SD:Measures customer’s competence in the products • The value of the relation (Sk,Di) represents the total support that the agent Sk receives selecting (or refusing to select) all the solutions to solve the problem Di.

  17. Example of SD Relation Products Evaluation marks Customers

  18. Normalizing External Relations to the Interval [0,1] nis the number of products mis the number of evaluation marks ris the number of customers

  19. Competence of a customer Evaluation marks Goods Conceptual pattern of evaluation marks definitions Cj Conceptual pattern of goods’ features Di Competence in the goods Competence in the evaluation marks Customer

  20. Customer’s Evaluation:competence quality in Products

  21. Customer’s Evaluation:competence quality in evaluation marks use

  22. Quality Balance Theorem The evaluation of a customer’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “virtual world of products” or “conceptual world of evaluation marks” because both competence values are always equal.

  23. Proof ... ...

  24. An Example • Let us suppose that four customers have to evaluate three products from virtual shop using five different evaluation marks available. • The customers should define their selection of appropriate mark for every product. • The final goal is to obtain a cooperative evaluation result of all the customers concerning the quality of products.

  25. C set (evaluation marks) in the Example Evaluation marks Notation Nicely designed C1 Expensive C2 Easy to use C3 Reliable C4 Safe C5

  26. S (customers) Set in the Example Customers IDs Notation Fox S1 Wolf S2 Cat S3 Hare S4

  27. D (products) Set in the Example D1 - Ultra Cast Spinning Reel D2 - Nokia Communicator 9110 D3 - iGrafx Process Management Software

  28. Evaluations Made for the Good“Reel” D1 P(D,C,S) C1 C2 C3 C4 C5 S11 -1 -1 0 -1 S20+ -1** 0 ++ 1* -1*** S30 0 -1 1 0 S41 -1 0 0 1 Customer Wolf prefers to select mark Reliable*to evaluate “Reel” and it refuses to select Expensive** or Safe***. Wolf does not use or refuse to use the Nicely designed+or Easy to use++ marks for evaluation.

  29. Evaluations Made for the Good“Communicator” D2 P C1 C2 C3 C4 C5 S1-1 0 -1 0 1 S21 -1 -1 0 0 S31 -1 0 1 1 S4-1 0 0 1 0

  30. Evaluations Made for the Good“Software” D3 P C1 C2 C3 C4 C5 S11 0 1 -1 0 S20 1 0 -1 1 S3-1 -1 1 -1 1 S4-1 -1 1 -1 1

  31. Example: Calculating Value DC3,4 D3 P C1 C2 C3C4 C5 S11 0 1 -1 0 S20 1 0 -1 1 S3-1 -1 1 -1 1 S4-1 -1 1 -1 1

  32. Resulting DC relation

  33. Normalized and “Thresholded” DC relation 0 1 -1 0 0.25 0.5 0.75 1

  34. Result of Cooperative Goods Evaluation Based on DC Relation D1 is nicely designed, reliable, not expensive, but not easy to use D2 is reliable, safe, not expensive, but not easy to use D3 is easy to use, safe, but not reliable

  35. An Example: Calculating ValueSD1,1

  36. An Example: Calculating ValueSC4,4

  37. Resulting SD and SC relations

  38. Normalized and “Thresholded” SD relation Fox Wolf Cat Hare Evaluations obtained from the customer Fox should be accepted if he evaluates goods similar to “Reels” ... … or similar to “Software” . Fox’s evaluations should be rejected if they concern goods similar to “Communicator”

  39. Normalized and “Thresholded” SD relation Fox Wolf Cat Hare Only evaluation from the customer Cat can be accepted if it concerns goods similar to “Communicator” All four customers are expected to give an acceptable evaluations concerning “Software” related goods

  40. Normalized and “Thresholded” SC relation Nicely designed Easy to use Expensive Safe Reliable Fox Wolf Cat Hare Evaluation obtained from the customer Fox should be accepted if it concern usability (easy to use) of a good... Fox’s evaluations should be rejected if they concern design of goods … or reliability of a good .

  41. Problem 2:Deriving Internal Similarity Values

  42. Internal Similarity Values Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S. ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

  43. Deriving Internal Similarity Values Via one intermediate set Via two intermediate sets

  44. Internal Similarity for Customers:Goods-based Similarity Goods Customers

  45. Internal Similarity for Customers:Evaluation marks-Based Similarity Evaluation marks Customers

  46. Internal Similarity for Customers:Evaluation marks-Goods-Based Similarity Goods Evaluation marks Customers

  47. Internal Similarity for Evaluation Marks Goods-based similarity Customers-based similarity Goods-customers-based similarity

  48. Internal Similarity for Goods Evaluation marks-based similarity Customers-based similarity Evaluation marks-customers-based similarity

  49. Normalized and “Thresholded” DDCrelation similar neutral different

  50. Conclusion • Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported goods evaluation and to rank the customers according to their competence • We also discussed relations between elements taken from the same set: goods, evaluation marks, or customers. This can be used, for example, to divide customers into groups of similar competence relatively to the goods evaluation environment

More Related