1 / 19

Concept Switching

Concept Switching. Azadeh Shakery. Concept Switching: Problem Definition. C1. C2. Ck. …. Past Work: A Programming Language for Mining Fuzzy ER Graphs. fly. bee. fly. Forager. Rover. bee. fly. bee. g1. …. Behavior Term. g2. gene1. gene2. g3.

marcel
Download Presentation

Concept Switching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concept Switching Azadeh Shakery

  2. Concept Switching:Problem Definition C1 C2 Ck …

  3. Past Work: A Programming Language for Mining Fuzzy ER Graphs fly bee fly Forager Rover bee fly bee g1 … Behavior Term g2 gene1 gene2 g3

  4. Past Work: A Programming Language for Mining Fuzzy ER Graphs • Added Features • Type Definition • Function Definition • Seq. Operators • Project • Reverse • Seq2Set • Aggregate • Operators: • Neighbor Finding: • NBSet • WNBSet • Path Finding: • Shortestpath • Wpath • Set Operators: • Union • Intersect • Cardinality • topk

  5. Past Work: High Level Scripts for Entity Comparison • Based on intersection and union of neighbors: NB(e1)  NB(e2) / NB(e1)  NB(e2) • Tehran, Iran: 27/52 • Baghdad, Iran: 11/52 • Washington, Iran: 0 • Based on the shortest path between the two entities • gpcr__g_protein__plc__diacylglycerol • bush__leader__khomeini • Based on the length of the shortest path to a base entity • Connection to a center node NB(e)  NB(c) / NB(e)  NB(c)

  6. Current Work: Topic/Concept Map • Alternative way of accessing information • Create an index of information which resides outside that information • The topic map describes the information in the documents and databases

  7. Multi-Resolution Topic Maps Low resolution High Level Concepts WORDS Word Net High resolution

  8. Multi-Resolution Topic Map • Static • Discrete Navigation • Challenges: • Define resolution • Community finding algorithm • Summarize Communities • Define distance between communities • Between which communities do we allow the navigation? • Dynamic • Continuous Navigation • Challenge: • Define Resolution • Online community finding algorithm • Summarize communities

  9. Challenges • Resolution definition • : Resolution • {C1, C2, …, Ck}: Communities at this level • One way is to define  as the link strength threshold •   0 : all links,    : No links • Community finding algorithm • Community distance: • C1, C2  , Similarity(C1, C2) =? |C1  C2| / |C1  C2| • Works if communities are allowed to have intersection • Community summarization Low resolution Low threshold High resolution High threshold

  10. Community Summarization • Use the documents to do the summarization • Summarize based on the community nodes • Define center nodes to do the summarization: • Based on the average MI distance to the other nodes in the community • Slow on very large communities • Based on the degree of the nodes • Counts all neighbors as equally important • Based on a PageRank like algorithm: • Each node has a centrality value • In each step, each node distributes its centrality to its neighbors proportional to the strength of the link • Do this iteratively until the centrality values converge

  11. Community Finding Algorithms:Newman’s Algorithm • Newman’s algorithm for detecting community structure in networks: • Modularity: A measure of the quality of a particular division of a network • Modularity measure measures the fraction of the edges in the network that connect vertices of the same type (within community) minus the expected value of the same quantity in the same network with random connections • Consider different divisions of the graph to communities and find the community which maximizes the modularity measure • The number of distinct community divisions grows exponentially in the number of nodes • They use a greedy algorithm to solve the problem • The algorithm is of O((m + n)n)

  12. Newman’s Algorithm • Communities are of very different sizes • A few very large communities and a lot of small communities • No overlapping communities • Definition of neighbor communities is hard • Experiments on bee data: • 1200 records about apis mellifera (honey bee) • Thr = 0.003 • Results

  13. Community Finding Algorithms:CPM • Clique Percolation Method (CPM) • Locates the kclique communities of unweighted, undirected networks. • Observation: A typical member in a community is linked to many other members, but not necessarily to all other nodes. • A community can be interpreted as a union of smaller complete subgraphs that share nodes. • k-clique community is defined as the union of all k-cliques that can be reached from each other through a series of adjacent k-cliques. • Two k-cliques are said to be adjacent if they share k-1 nodes.

  14. Properties of CPM • Not too restrictive (compared to cliques) • Based on the density of links • Local • Does not yield cut-nodes or cut-links (whose removal would disjoin the community) • Allows overlaps

  15. Results • thr = 0.05 • 228 nodes • 1197 edges • CPM: 0 min 0.088 sec Newman: 0 min 0.11 sec • 16 communities of more that one nodes • thr = 0.04 • 312 nodes • 1483 edges • CPM: 0 min 0.168 sec Newman: 0 min 0.21 sec • 20 communities of more than one nodes • thr = 0.03 • 507 nodes • 2924 edges • CPM: 0 min 0.511 sec Newman: 0 min 0.49 sec • 29 communities of more than one node • thr = 0.01 • 4349 nodes • 28595 edges • CPM: 5 min 25.313 sec Newman: 1 min 15.21 sec • 103 communities of more than one node

  16. Sample of Resolution Change neural 0.0889141 nervous 0.0827306 coordination 0.0785593 brain 0.0585748 proboscis 0.0552424 extension 0.0537362 conditioning 0.0487368 learning 0.0470799 system 0.0457777 mushroom 0.0420242 Homeostasis 0.037191 olfactory 0.0310599 juvenile 0.0302844 hormone 0.0296593 endocrine 0.0283738 bodies 0.0270794 antennal 0.0237212 conditioned 0.0225992 chemical 0.0216736 reflex 0.021086 juvenile 0.297036 hormone 0.292105 jh 0.223579 endocrine 0.18728 neural 0.178864 Nervous 0.178613 coordination 0.167388 brain 0.129372 system 0.116024 mushroom 0.103932 bodies 0.0665873 neurons 0.0592205 proboscis 0.207976 extension 0.20287 conditioning 0.180429 learning 0.143213 conditioned 0.0925473 olfactory 0.0872045 reflex 0.08576 homeostasis 0.404266 chemical 0.319049 coordination 0.276685

  17. Concept Switching • Construct a topic map for each collection separately • Construct one universal topic map

  18. Discussion • Better ideas for community summarization? • Dynamic via static topic maps? • Alternative ways of defining resolution

  19. Thank you Questions?

More Related