1.28k likes | 1.45k Views
COGNITIVE COMPUTATIONAL INTELLIGENCE for data mining, financial prediction, tracking, fusion, language, cognition, and cultural evolution. IASTED CI 2009 Honolulu, HI 1:30 – 5:30 pm, Aug. 19. Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL. OUTLINE.
E N D
COGNITIVE COMPUTATIONAL INTELLIGENCEfordata mining, financial prediction, tracking, fusion, language, cognition, and cultural evolution IASTED CI 2009 Honolulu, HI 1:30 – 5:30 pm, Aug. 19 Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL
OUTLINE 1. Cognition and Logic 2. The Knowledge Instinct -Dynamic Logic 3. Language 4. Integration of cognition and language 5. High Cognitive Functions 6. Evolution of cultures 7. Future directions
INTRODUCTION The mind
PHYSICS AND MATHEMATICS OF THE MINDRANGE OF CONCEPTS • Logic is sufficient to explain mind • [Newell, “Artificial Intelligence”, 1980s] • No new specific mathematical concepts are needed • Mind is a collection of ad-hoc principles, [Minsky, 1990s] • Specific mathematical constructs describe the multiplicity of mind phenomena • “first physical principles of mind” • [Grossberg, Zadeh, Perlovsky,…] • Quantum computation • [Hameroff, Penrose, Perlovsky,…] • New unknown yet physical phenomena • [Josephson, Penrose]
GENETIC ARGUMENTSFOR THE “FIRST PRINCIPLES” • Only 30,000 genes in human genome • Only about 2% difference between human and apes • Say, 1% difference between human and ape minds • Only about 300 proteins • Therefore, the mind has to utilize few inborn principles • If we count “a protein per concept” • If we count combinations: 300300 ~ unlimited => all concepts and languages could have been genetically h/w-ed (!?!) • Languages and concepts are not genetically hardwired • Because they have to be flexible and adaptive
COGNITION • Understanding the world • Perception • Simple objects • Complex situations • Integration of real-time signals and existing (a priori) knowledge • From signals to concepts • From less knowledge to more knowledge
EXAMPLE • Example: “this is a chair, it is for sitting” • Identify objects • signals -> concepts • What in the mind help us do this? Representations, models, ontologies? • What is the nature of representations in the mind? • Wooden chairs in the world, but no wood in the brain
VISUAL PERCEPTION • Neural mechanisms are well studied • Projection from retina to visual cortex (geometrically accurate) • Projection of memories-models • from memory to visual cortex • Matching: sensory signals and models • In visual nerve more feedback connections than feedforward • matching involves complicated adaptation of models and signals • Difficulty • Associate signals with models • A lot of models (expected objects and scences) • Many more combinations: models<->pixels • Association + adaptation • To adapt, signals and models should be matched • To match, they should be adapted
ALGORITHMIC DIFFICULTIES A FUNDAMENTAL PROBLEM? • Cognition and language involve evaluating large numbers of combinations • Pixels -> objects -> scenes • Combinatorial Complexity (CC) • A general problem (since the 1950s) • Detection, recognition, tracking, fusion, situational awareness, language… • Pattern recognition, neural networks, rule systems… • Combinations of 100 elements are 100100 • This number ~ the size of the Universe • > all the events in the Universe during its entire life
COMBINATORIAL COMPLEXITY SINCE the 1950s • CC was encountered for over 50 years • Statistical pattern recognition and neural networks: CC of learning requirements • Rule systems and AI, in the presence of variability : CC of rules • Minsky 1960s: Artificial Intelligence • Chomsky 1957: language mechanisms are rule systems • Model-based systems, with adaptive models: CC of computations • Chomsky 1981: language mechanisms are model-based (rules and parameters) • Current ontologies, “semantic web” are rule-systems • Evolvable ontologies : present challenge
CC AND TYPES OF LOGIC • CC is related to formal logic • Law of excluded middle (or excluded third) • every logical statement is either true or false • Gödel proved that logic is “illogical,” “inconsistent” (1930s) • CC is Gödel's “incompleteness” in a finite system • Multivalued logic eliminated the “law of excluded third” • Still, the math. of formal logic • Excluded 3rd -> excluded (n+1) • Fuzzy logic eliminated the “law of excluded third” • How to select “the right” degree of fuzziness • The mind fits fuzziness for every statement at every step => CC • Logic pervades all algorithms and neural networks • rule systems, fuzzy systems (degree of fuzziness), pattern recognition, neural networks (training uses logical statements)
LOGIC VS. GRADIENT ASCENT • Gradient ascent maximizes without CC • Requires continuous parameters • How to take gradients along “association”? • Data Xn (or) to object m • It is a logical statement, discrete, non-differentiable • Models / ontologies require logic => CC • Multivalued logic does not lead to gradient ascent • Fuzzy logic uses continuous association variables, b • A new principle is needed to specify gradient ascent along fuzzy associations: dynamic logic
DYNAMIC LOGIC • Dynamic Logic unifies formal and fuzzy logic • initial “vague or fuzzy concepts” dynamically evolve into “formal-logic or crisp concepts” • Dynamic logic • based on a similarity between models and signals • Overcomes CC • fast algorithms • Proven in neuroimaging experiments (Bar, 2006) • Initial representations-memories are vague • “close-eyes” experiment
ARISTOTLE VS. GÖDEL logic, forms, and language • Aristotle • Logic: a supreme way of argument • Forms: representations in the mind • Form-as-potentiality evolves into form-as-actuality • Logic is valid for actualities, not for potentialities (Dynamic Logic) • Thought language and thinking are closely linked • Language contains the necessary uncertainty • From Boole to Russell: formalization of logic • Logicians eliminated from logic uncertainty of language • Hilbert: formalize rules of mathematical proofs forever • Gödel (the 1930s) • Logic is not consistent • Any statement can be proved true and false • Aristotle and Alexander the Great
OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions
STRUCTURE OF THE MIND • Concepts • Models of objects, their relations, and situations • Evolved to satisfy instincts • Instincts • Internal sensors (e.g. sugar level in blood) • Emotions • Neural signals connecting instincts and concepts • e.g. a hungry person sees food all around • Behavior • Models of goals (desires) and muscle-movement… • Hierarchy • Concept-models and behavior-models are organized in a “loose” hierarchy
THE KNOWLEDGE INSTINCT • Model-concepts always have to be adapted • lighting, surrounding, new objects and situations • even when there is no concrete “bodily” needs • Instinct for knowledge and understanding • Increase similarity between models and the world • Emotions related to the knowledge instinct • Satisfaction or dissatisfaction • change in similarity between models and world • Related not to bodily instincts • harmony or disharmony (knowledge-world): aesthetic emotion
REASONS FOR PAST LIMITATIONS • Human intelligence combines conceptual understanding with emotional evaluation • A long-standing cultural belief that emotions are opposite to thinking and intellect • “Stay cool to be smart” • Socrates, Plato, Aristotle • Reiterated by founders of Artificial Intelligence [Newell, Minsky]
Neural Modeling Fields (NMF) • A mathematical construct modeling the mind • Neural synaptic fields • A loose hierarchy • bottom-up signals, top-down signals • At every level: concepts, emotions, models, behavior • Concepts become input signals to the next level
NEURAL MODELING FIELDSbasic two-layer mechanism: from signals to concepts • Bottom-up signals • Pixels or samples (from sensor or retina) x(n), n = 1,…,N • Top-down signals (concept-models) Mm(Sm,n), parameters Sm, m = 1, …; • Models predict expected signals from objects • Goal: learn object-models and match to signals (knowledge instinct)
THE KNOWLEDGE INSTINCT • The knowledge instinct = maximize similarity between signals and models • Similarity between signals and models, L • L = l ({x}) = l (x(n)) • l (x(n)) = r(m) l (x(n) | Mm(Sm,n)) • l (x(n) | Mm(Sm,n)) is a conditional similarity for x(n) given m • {n} are not independent, M(n) may depend on n’ • CC: L contains MN items: all associations of pixels and models (LOGIC)
SIMILARITY • Similarity as likelihood • l (x(n) | Mm(Sm,n)) = pdf(x(n) | Mm(Sm,n)), • a conditional pdf for x(n) given m • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) • Note, this is NOT the usual “Gaussian assumption” • deviations from models D are random, not the data X • multiple models {m} can model any pdf, not one Gaussian model • Use for sets of data points • Similarity as information • l (x(n) | Mm(Sm,n)) = abs(x(n))*pdf(x(n) | Mm(Sm,n)), • a mutual information in model m on data x(n) • L is a mutual information in all model about all data • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) • Use for continuous data (signals, images)
DYNAMIC LOGIC (DL) non-combinatorial solution • Start with a set of signals and unknown object-models • any parameter values Sm • associate signals (n) and models (m) • (1) f(m|n) = r(m) l (n|m) /r(m') l (n|m') • Improve parameter estimation • (2) Sm = Sm + a f(m|n) [ln l (n|m)/Mm]*[Mm/Sm] • Continue iterations (1)-(2). Theorem: NMF is a convergingsystem - similarity increases on each iteration - aesthetic emotion is positive during learning
OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Application examples • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions
APPLICATIONS • Many applications have been developed • Government • Medical • Commercial (about 25 companies use this technology) • Sensor signals processing and object recognition • Variety of sensors • Financial market predictions • Market crash on 9/11 predicted a week ahead • Internet search engines • Based on text understanding • Evolving ontologies for Semantic Web • Every application needs models • Future self-evolving models: integrated cognition and language
APPLICATION 1 – CLUSTERING(data mining) • Find “natural” groups or clusters in data • Use Gaussian pdf and simple models l (n|m) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) Mm(n) = Mm;each model has just 1 parameter, Sm = Mm • This is clustering with Gaussian Mixture Model • For complex l(n|m) derivatives can be taken numerically • For simple l(n|m) derivatives can be taken manually • Simplification, not essential • Simplify parameter estimation equation for Gaussian pdf and simple models ln l (n|m)/Mm = (-DmnTCm-1 Dmn) /Mm = Cm-1 Dmn + DmnTCm-1 = 2 Cm-1 Dmn, (C is symmetric) Mm = Mm + a f(m|n) Cm-1 Dmn … • In this case, even simpler equations can be derived samples in class m: Nm = f(m|n); N = Nm rates (priors): rm = Nm / N means: Mm = f(m|n) X(n) / Nm covariances: Cm = f(m|n) Dmn * DmnT / Nm - simple interpretation: Nm, Mm, Cm are weighted averages. The only difference from standard mean and covariance estimation is weights f(m|n), probabilities of class m • These are iterative equations, f(m|n) depends on parameters; theorem: iterations converge
1 km (a) True Tracks Range 0 Cross-Range 0 1 km b Initial state of model 2 iterations 1 km Range 0 5 iterations 9 iterations 12 iterations Converged state Example 2: GMTI Tracking and Detection Below Clutter DL starts with uncertain knowledge and converges rapidly on exact solution 18 dB improvement
TRACKING AND DETECTION BELOW CLUTTER (movie, same as above) DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities, but converges rapidly on exact solution 3 targets, 6 scans, signal-to-clutter, S/C ~ -3.0dB
TRACKING EXAMPLE complexity and improvement • Technical difficulty • Signal/Clutter = - 3 dB, standard tracking requirements 15 dB • Computations, standard hypothesis testing ~ 101700, unsolvable • Solved by Dynamic Logic • Computations: 2x107 • Improvement 18 dB
CRAMER-RAO BOUND (CRB) • Can a particular set of models be estimated from a particular (limited) set of data? • The question is not trivial • A simple rule-of-thumb: N(data points) > 10*S(parameters) • In addition: use your mind: is there enough information in the data? • CRB: minimal estimation error (best possible estimation) for any algorithm or neural neworks, or… • When there are many data points, CRB is a good measure (=ML=NMF) • When there are few data points (e.g. financial prediction) it might be difficult to access performance • Actual errors >> CRB • Simple well-known CRB for averaging several measurements st.dev(n) = st.dev(1)/√n • Complex CRB for tracking: • Perlovsky, L.I. (1997a). Cramer-Rao Bound for Tracking in Clutter and Tracking Multiple Objects. Pattern Recognition Letters, 18(3), pp.283-288.
APPLICATION 3 FINDING PATTERNS IN IMAGES
IMAGE PATTERN BELOW NOISE Object Image Object Image + Clutter y y x x
PRIOR STATE-OF-THE-ART Computational complexity Multiple Hypothesis Testing (MHT) approach: try all possible ways of fitting model to the data For a 100 x 100 pixel image: Number of ObjectsNumber of Computations 1 1010 2 1020 3 1030
NMF MODELS • Information similarity measure lnl (x(n) | Mm(Sm,n)) = abs(x(n))*ln pdf(x(n) | Mm(Sm, n)) n = (nx,ny) • Clutter concept-model (m=1) pdf(X(n)|1) = r1 • Object concept-model (m=2… ) pdf(x(n) | Mm(Sm, n))= r2 G(X(n)|Mm (n,k),Cm) Mm (n,k) = n0+ a*(k2,k); (note: k, K require no estimation)
ONE PATTERN BELOW CLUTTER Y X SNR = -2.0 dB
DYNAMIC LOGIC WORKING DL starts with uncertain knowledge, and similar to human mind converges rapidly on exact solution • Object invisible to human eye • By integrating data with the knowledge-model DL finds an object below noise y (m) Range x (m) Cross-range
MULTIPLE PATTERNS BELOW CLUTTER Three objects in noise object 1 object 2 object 3 SCR - 0.70 dB -1.98 dB -0.73 dB 3 Object Image 3 Object Image + Clutter y y x x
b d a c h e f g IMAGE PATTERNS BELOW CLUTTER (dynamic logic iterations see note-text) Logical complexity = MN = 105000, unsolvable; DL complexity = 107 S/C improvement ~ 16 dB
MULTIPLE TARGET DETECTION DL WORKING EXAMPLE DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities like an MHT, but converges rapidly on exact solution y x
COMPUTATIONAL REQUIREMENTS COMPARED Dynamic Logic (DL) vs. Classical State-of-the-art Multiple Hypothesis Testing (MHT) Based on 100 x 100 pixel image Number of Objects Number of Computations DL vs. MHT 1 2 3 108 vs. 1010 2x108vs. 1020 3x108 vs. 1030 • Previously un-computable (1030), can now be computed (3x108 ) • This pertains to many complex information-finding problems
APPLICATION 4 SENSOR FUSION Concurrent fusion, navigation, and detection below clutter
SENSOR FUSION • The difficult part of sensor fusion is association of data among sensors • Which sample in one sensor corresponds to which sample in another sensor? • If objects can be detected in each sensor individually • Still the problem of data association remains • Sometimes it is solved through coordinate estimation • If 3-d coordinates can be estimated reliably in each sensor • Sometimes it is solved through tracking • If objects could be reliably tracked in each sensor, => 3-d coordinates • If objects cannot be detected in each sensor individually • We have to find the best possible association among multiple samples • This is most difficult: concurrent detection, tacking, and fusion
NMF/DL SENSOR FUSION • NMF/DL for sensor fusion requires no new conceptual development • Multiple sensor data require multiple sensor models • Data: n -> (s,n); X(n) -> X(s,n) • Models Mm(n) -> Mm(s,n) • PDF(n|m) is a product over sensors • This is a standard probabilistic procedure, another sensor is like another dimension • pdf(m|n) -> pdf(m|s,n) • Note: this solves the difficult problem of concurrent detection, tracking, and fusion
Source: UAS Roadmap 2005-2030 UNCLASSIFIED
CONCURRENT NAVIGATION, FUSION, AND DETECTION • multiple target detection and localization based on data from multiple micro-UAVs • A complex case • detection requires fusion (cannot be done with one sensor) • fusion requires exact target position estimation in 3-D • target position can be estimated by triangulation from multiple views • this requires exact UAV position • GPS is not sufficient • UAV position - by triangulation relative to known targets • therefore target detection and localization is performed concurrently with UAV navigation and localization, and fusion of information from multiple UAVs • Unsolvable using standard methods. Dynamic logic can solve because computational complexity scales linearly with number of sensors and targets
GEOMETRY: MULTIPLE TARGETS, MULTIPLE UAVS UAV m Xm= X0m + Vmt UAV 1 Xm=(Xm,Ym,Zm) X1=(X1,Y1,Z1) X1= X01 + V1t
CONDITIONAL SIMILARITIES (pdf) FOR TARGET k Data from UAV m, sample number n, where βnm = signature position and fnm = classification feature vector: Similarity for the data, given target k: signature position where classification features Note: Also have a pdf for a single clutter component pdf(wnm| k=0) which is uniform in βnm, Gaussian in fnm.
Compute parameters that maximize the log-likelihood Data Model and Likelihood Similarity Total pdf of data samples is the summation of conditional pdfs (summation over targets plus clutter) (mixture model) classification feature parameters UAV parameters target parameters
Concurrent Parameter Estimation / Signature Association (NMF iterations) FIND SOLUTION FOR SET OF “BEST” PARAMETERS BY ITERATING BETWEEN… Parameter Estimationand Association Probability Estimation (Bayes rule) (probability that sample wnm was generated by target k) Note1: bracket notation Note2: proven to converge (e.g. EM algorithm) Note 3: Minimum MSE solution incorporates GPS measurements
Sensor 1 (of 3): Models Evolve to Locate Target Tracks in Image Data