50 likes | 190 Views
Project4 - will be updated. Project. Use previous results to compute TFIDF(token_i, token_j, document_j) = tf( ti, tj ; dj) log | Tr|/|Tr(ti,tj) where ti and tj are distinct and nearby (they are 10 tokens apart). High Dimension TFIDF.
E N D
Project Use previous results to compute TFIDF(token_i, token_j, document_j) = tf(ti, tj; dj)log |Tr|/|Tr(ti,tj) where ti and tj are distinct and nearby (they are 10 tokens apart)
High Dimension TFIDF Definition 2. The notion of LSI can be extended to q-terms TFIDF(ti1,. . . tiq; dj) =tf(ti1,. . . tiq; dj) log |Tr|/|Tr(ti1,. . . tiq) Where ti1,. . . tiq is a set of keywords that 10 tokens apart); keywords mean token with high TFIDF value. .
High Dimension LSI To a set of documents, we consider 1. Keywords (1-associations) 2. Co occurring of q keyword set (q-associations)
Project 1. Tr(ti1,. . . tiq)= the # of documents in Tr in which (ti1,. . . tiq)occurs at least once, =1 +log(N(ti1,. . . tiq; dj); dj)) 2. tf(ti; dj) if N(ti1,. . . tiq; dj)> 0 =0 otherwise 3. N(ti1,. . . tiq; dj)= the frequency of (ti1,. . . tiq)in dj.