1 / 24

Fussy Set Theory

Fussy Set Theory. Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1]. Set Theory: A={a, b, c}. Subset of A : {a, c}.

ayoka
Download Presentation

Fussy Set Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fussy Set Theory • Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u of U a number in the interval [0,1]. • Set Theory: A={a, b, c}.Subset of A: {a, c}. • An element is either in a set of not in a set. is either 0 or 1.

  2. Set Theory • Let U be the set of all elements (universe) • There are three basic operations: • AB={elements in A or in B}. • AB={elements in both A and B} • Not A=U-A.

  3. Definition Let U be the universe of discourse, A and B be two fussy subsets of U, and be the complement of A relative to U. Also, let u be an element of U. Then,

  4. Fuzzy Information Retrieval We first set up term-term correlation matric: For terms ki and kl, Where ni is the number of documents containing ki , nl is the number of documents containing kl And ni,l is the number of documents containing both ki and kl. Note Ci,i=1.

  5. Fuzzy Information Retrieval We define a fuzzy set for each term ki. In the fuzzy set for ki , a document dj has a degree of membership ij computed as Example: c1,2=0.1, c1,3=0.21. D1=(0, 1, 1, 0). 1,1= 1-0.9*0.79. D2=(1, 0, 0, 0). 1,2= 1-0. (since c1,1=1.) How is d3=(1, 0, 1,0)?

  6. Fuzzy Information Retrieval Whenever, the document dj contains a term that is strongly related to ki, then the document dj is belong to the fuzzy set of term ki, i.e., i,j is very close to 1. Example, c1,2=0.9, d1=(0, 1, 0, 0). 1,1 =1-(1-0.9)=0.9

  7. Query: • Query is a Boolean formula, e.g., • q=Ka and (Kb or not Kc). • q= (1, 1, 1) or (1, 1, 0) or (1, 0, 0). • Suppose q is

  8. Figure 1. Fuzzy document sets for the query . Each is a conjunctive component. is the query fuzzy set.

  9. Where is the membership of in the fuzzy set associated with . q,j is the membership of document j for query q.

  10. Exercise: suppose there are 3 doc. and 4 terms. d1=(1, 0, 1, 0), d2=(1, 1, 0, 0), and d3=(0, 1, 1, 0). (1) Compute the term-term correlation matrix ci,j. (2) Compute i,j (membership of document j in term i.) (3) If the query q=(1, 0, 0, 0) or (1, 1, 0, 0), compute q,k for each document dk.

  11. Some changes in the last slide. q, j= cc1+cc2+cc3,j=max {cc1,j, cc2,j , cc3,j}, where cc1,j, cc2,j , cc3,j are computed as before.

  12. String Matching Allowing Errors • Problem: Given a short pattern P of length m, a long text T of length n, and a maximum allowed number of errors k, find all the text positions where the pattern occurs with at most k errors.

  13. Dynamic Programming • C[i,j] be the number of errors allowed, i and j are the indices for the pattern and the text. • Three kinds of error: mismatch (a, b), insertion( a, )and deletion ( , a).

  14. The matrix The dynamic programming algorithm search ‘survey’ in the text ‘surgery’ with two errors. Bold entries indicate matching positions. Running time O(nm).

  15. Exercise • Let ABCABCDDABEDF be the text and pattern be ABCDAB. Find the occurrence of the pattern with at most 1 error.

  16. String Matching Allowing Errors (FAST Algorithm) • Just keep the cells with value at most k. • This will reduce the time complexity .

  17. Regular expressions Matching • Regular expression: • Any letter x in {},is a regular expression, where  is the set of all letters. • if A and B are regular expression, then A|B, A.B and (A)* are regular expressions.

  18. Regular expressions Matching(Not Required) • Given an regular expression E and a string T, find all the substrings in T that match E. • Let d(i) be the set of all states in the automaton that can be reached after T1T2…Ti is accepted. • Given d(i), d(i+1) can be computed easily. • There is a starting and final state in the automaton. • Whenever the final state is reach, we find a substring in T that match the expression.

  19. Example: • E=(A|AA).(B|AB). • T=ABBAB. • D(1)={a, b, d, c} • D(2)={ a,b, d, e, f, g, i }, • D(3)={a,b,c, e, f, g, i, h, l}. • D(4)={a,b,d,c,j} • D(5)={a,b,d, e, f, g, i, k}

  20. Running time • O(n2), where n is the size of the automaton since d(s, i) could contain O(n) states.

More Related