Concept Space Construction

Concept Space Construction Todd Littell 27SEP06

Roadmap • Last Year: • High quality concept space construction performed offline. • First version of inference engine via graph operations. • This Year: • High quality concept space construction performed online. • Second implementation of reasoning engine or graph mining application? • Future: • Online semantic net construction. • Sophisticated reasoning engine.

Semantic Net • Goal: Efficient, high-quality learning of a semantic network from text. • Semantic Net is a kind of Knowledge Representation (KR) structure that is typically represented by a graph model. • What does a Semantic Net comprise of? • Concepts • Entities • Types • Relationships: associative, categorical, functional, structural, mechanical, temporal, spatial… • Concept Space or Association Network is a simplified SN that captures concepts and concept-associations. Ref: www.jfsowa.com: “A cat is on the mat”.

Applications • Uses of Semantic Nets: • Knowledge Base for Reasoning and Inference: Fuzzy ER Model, Markov Net, Bayesian Net, Causal Net, etc. • Information Browsing: Navigation thru space, drill-up, drill-down, drill-across. • Domain Modeling: Communication, Information System construction, etc. • Query Formulation for Retrieval: Feedback, User Modeling, etc.

Related Knowledge Models • Other kinds of knowledge models: • Concept Graphs: see John Sowa’s site: http://www.jfsowa.com/cg/index.htm, defines rules for assertions, composition and limited inference. Also OWL/RDF. • Graphical Models: Belief Nets, Causal Diagrams, Signed Directed Graphs, Lattices. • Concept Lattices: FCA – mathematical theory for objects, attributes & mappings. • UML: formal modeling notation & semantics defined specifically for software industry. • Express-G: formal modeling notation & semantics defined for engineering. • Moore/Mealy/Petri Nets: actionable semantics for modeling & simulation. • Aspects: domain, expressability, ad hoc vs. well-defined semantics, discrete vs. continuous, typing, purpose/application, underlying theory, etc.

Why Bother?

Levels of Complexity

Example Semantic Net

Associative Net (1)

Associative Net (2)

Associative Net (3) Ref: Mark Steyvers, Joshua B. Tenebaum, “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth”, Cognitive Science 29, 2005.

Basic Algorithm • Calculate measure of association between two terms using a similarity measure and output best associations. Let T = set of terms, D = set of documents, F(t,d) = freq of t in d. Let adj(t) := {d | f(t,d) > 0 } be term adjacency list. Let adj(d) := {t | f(t,d) > 0 } be document adjacency list. For each t1 in T For each d in adj(t1) For each t2 in adj(d) s1 += g(f(t1,d), f(t2,d)) Calculate s := h(s1, params) if (s >= thresh) output (t1, t2, s). • Note: • Term vectors are sparse –don’t need to iterate through all dimensions. • Many similarity/distance measures exist, as well as other kinds of measures. • Characteristic of “ideal similarity metric” is tied to application. • All calculations are independent, hence easily parallelizable.

BeeSpace Variations • Only interested in calculating for representative terms with thresh1 < freq < thresh2. • In some cases, only interested in calculating for user-specified documents. Implies: • where D_R,D_C are selection matrices and F is term-by-doc freq matrix. • Only need to output top K similar terms. • Only need to output terms with sim > thresh3.

Co-occurrence Metrics • Many extensions possible for incorporating weighting functions, features such as POS tags, context, word distance, window size, etc. • Ref: Frequency Estimates for Statistical Word Measures by Terra & Clarke. • Similarity Reqs: s(x,y) >= 0; s(x,y) > s(x,z) => y is more similar to x than z; optionally s(x,y)=s(y,x).

MI & PWMI assuming MLE f(x,y) := |adj(x)^adj(y)| = sum_d(1 : f(x,d)>0 & f(y,d)>0)

MI & PWMI Generalizations • Obvious generalizations: utilize parameters p(d): • Non-obvious generalizations: Ref: Barry Robson, “Clinical and Pharmacogenomic Data Mining…”, Journal of Proteome Research, 2003. Ref: Jonathan Wren, “Extending the mutual information measure to rank inferred literature relationships”, BMC Bioinformatics, 2004.

Results • Look at results in spreadsheet…

“Metric Closeness”

“Make a Faster Wheel” • Optimized I/O • Parallelize • Better Algorithm • Better Code • Smarter Data Structures

Optimize I/O • Only use formatted I/O for human consumption; use binary I/O for all other cases. • Use buffered I/O if reading/writing small chunks at a time. • See handout.

Parallelize • Do the problems split naturally? • Divide-n-conquer apply? • Level of parallelization: • Very coarse grained: distributed agents. • Coarse grained: parallel jobs. • Medium grained: forked processes. • Fine grained: multi-threaded.

Better Algorithm • How to compare algorithms? • Time complexity. • Space complexity.. • Parallelizable. • Time-to-develop.

Better Code • Know your language. • Factor invariant expressions outside of loops. • Pre-compute whenever possible: cache results. • Sacrifice OO-ness. • Customized data structures. • Optimized I/O. • Avoid “long” calls (e.g. network, disk, etc.). • Tune to memory hierarchy.

Smarter Data Structures • Understand your language’s built-in collections library. • Roll-your-own data structures can often out perform generic libraries. Why? • Hybrid techniques.

To-Do/Unresolved • Decide what the complete set of applications will be for this component: browsing, inference, retrieval, etc. • Evaluate metrics using SME. • Decide what set of mined relations are significant for the applications in (a). • Investigate more advanced methods and compare trade-offs.

Concept Space Construction

Concept Space Construction

Presentation Transcript

DOPPLER A Space Weather Doppler Imager Mission Concept

CONFINED SPACE in Construction An Overview of OSHA Standards and Confined Space Hazards

Revisiting Target Atmospheres for Space-based DWL Concept Comparisons

Rhetorical Précis Test v.I Concept, Content, Construction, Quality

Space Planning and Construction

Confined Space Hazards in Construction

A Concept Space Approach to Semantic Exchange

Sense/Concept of Space

Operational Concept Modeling for Responsive Space

Space Technology 7 (ST-7) Concept Study

Construction of an International Space Vehicle Using the Space Station Dan Roukos

Space Concept of Operations

Digital Communication Vector Space concept

A System of Residence Space Construction

CONFINED SPACE in Construction An Overview of OSHA Standards and Confined Space Hazards

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS

Office Space Construction

Elevate Your Space with Certified Construction Wall Decals

From concept to reality: how engineering behind skyscraper construction