Pattern representation & the future of pattern recognition

Pattern representation & the future of pattern recognition Lev Goldfarb ETS group Faculty of Computer Science UNB Fredericton, Canada

Outline • The wisdom of modern physicists (3 slides) • The maturity of a science (1 slide) • The currently prevailing wisdom in our field (1 slide) • Why should this be the guiding wisdom? (4 slides) • Are we mature enough for the task? (2 slides) • The (social) reason for the status quo (1 slide) • The forgotten history: syntactic pattern recognition (8 slides) • Syntactic pattern recognition: the unrealized hopes (4 slides) • How should we apply the wisdom of physicists? (2 slides) • ETS formalism: its inspiration (2 slides) • ETS formalism: temporal information (3 slides) • ETS formalism (13 slides) • ETS formalism: representational completeness (1 slide) • ETS formalism: the intelligent process (1 slide) • Learning without representation? (2 slides) • Conclusion (3 slides)

[ The wisdom of modern physicists From: Freeman Dyson, Innovations in Physics, Scientific American, September 1958: A few months ago two of the great historical figures of European physics, Werner Heisenberg and Wolfgang Pauli, believed that they had made an essential step forward in the direction of a theory of elementary particles. Pauli happened to be passing through New York, and was prevailed upon to give a lecture explaining the new ideas to an audience that included Niels Bohr, who had been mentor to both . . . in their days of glory thirty years earlier when they made their great discoveries. Pauli spoke for an hour, and then there was a general discussion during which he was criticized sharply by the younger generation. Finally, Bohr was called on to make a speech summing up the argument. “We are all agreed,” he said, “that your theory is crazy. The question which divides us is whether it is crazy enough to have a chance of being correct. My own feeling is that it is not crazy enough.” Lev Goldfarb, ICPR, Aug. 2004

The wisdom of modern physicists (the quote continues) The objection that they are not crazy enough applies to all the attempts which have so far been launched at a radically new theory of elementary particles. It applies especially to crackpots. Most of the crackpot papers that are submitted to the Physical Review are rejected, not because it is impossible to understand them, but because it is possible. Those that are impossible to understand are usually published. When the great innovation appears, it will almost certainly be in a muddled, incomplete, and confusing form. To the discoverer himself it will be only half-understood. To everybody else it will be a mystery. For any speculation that does not at first glance look crazy, there is no hope. Lev Goldfarb, ICPR, Aug. 2004

The wisdom of modern physicists ] Why did two of the 20th century leading physicists behave so “childishly”? Their wisdom is this: based on the past experience, a radical novelty of the proposed physical model (of a unified field theory) is a necessary prerequisite for it to be a serious “contender”. Lev Goldfarb, ICPR, Aug. 2004

The maturity of a science Why did I begin the talk with the above quote? I want to draw your attention to one very important informal fact: the maturity of a science is reflected in the ability of its practitioners to estimate the quality of the match between the reality and its model. This is one of the main messages I would like you to keep in mind, and I hope we will discuss it in this workshop. How is our field doing in this respect?

The currently prevalent wisdom in our field • The main (subconscious?) postulate: Completely rely on statistical models (and, therefore, on the vector space formalism). • In accordance with the above postulate, expect that there exist some new, statistically “profound” models/algorithms that would do a satisfactory job. • Innovative, by taking an apparently structural representation, convert it to a numeric one (to “tame” its structural elements), and reduce the problem to the more familiar statistical setting (in the process, misleading yourself and others that one is actually dealing with the structuralrepresentation). Lev Goldfarb, ICPR, Aug. 2004

[ Why should this be the guiding wisdom? • Indeed, why should we confine ourselves to the statistical framework? • In their 1974 book, similar (however somewhat rhetorical) doubts were expressed by two of the present leaders in the field of statistical pattern recognition, Vapnik and Chervonenkis: • In those days it appeared that the pattern recognition problem carried within itself the beginnings of some new idea, which is in • no way based on the system of old concepts; researchers wanted to • find new formulations, not to reduce the problem to already known • mathematical schemes. In this sense the reduction of the pattern • recognition problem to the scheme of average risk minimization • rouses some disappointment. True, there are attempts to understand the problem in a more complex formulation . . . . As yet, however, such attempts are extremely rare.

Why should this be the guiding wisdom? (the quote continues) Now, many years after the period of ‘pattern recognition romantics’, it is difficult to estimate what this problem formalization brought. It is possible that the desire to find a rigorous formulation led scientists to restrict the meaningful problem solution of which was attempted at the beginning of the ‘60s. Lev Goldfarb, ICPR, Aug. 2004

Why should this be the guiding wisdom? From the preface of Probability, Statistics and Truth (1957), by one of the 20th century pioneers of modern probability and statistics, Richard von Mises: The stated purpose of these [mentioned earlier] investigations is to create a theory of induction or ‘inductive logic’. According to the basic viewpoint of this book, the theory of probability in its application to reality is itself an inductive science; its results and formulas cannot serve to found the inductive processes as such . . .. Lev Goldfarb, ICPR, Aug. 2004

Why should this be the guiding wisdom? ] From: A. N. Kolmogorov, Logical basis for information theory and probability theory, IEEE IT-14(1968) (one of the founders of modern probability theory): The proceeding rather superficial discourse should prove two general theses: (1) Basic information theory concepts must and can be founded without recourse to the probability theory . . . . (2) Introduced in this manner, information theory concepts can form the basis of the concept random, which [would then] naturally suggestthat the random is the absence of periodicity. Lev Goldfarb, ICPR, Aug. 2004

[Are we mature enough for the task? Why are the present day physicists feel compelled to venture (and on a big scale) into such highly speculative theories as “string theories”, while we are infatuated with the “good old” statistics that simply cannot address the qualitative side of modeling, i.e. the structure of the model itself? Do we understand that an adequate modeling of information processes cannot succeed in the same manner as, for example, the modeling of a flight has succeeded (i.e. without capturing the essence of the corresponding biological processes)? In particular, do we understand that without producing reasonably good modelsof the information processesin nature we will not succeed in developing satisfactory information systems? Lev Goldfarb, ICPR, Aug. 2004

Are we mature enough for the task? ] Are we mature enough for the task?  God forbid if we are not: from the very beginning of our science, we are faced with modeling of much more abstract processes then physicists have ever been => when modeling information processes, we need even greater imagination than physicists do (who, as I mentioned above, are ahead of us in many ways). It seems quite obvious to me that without some radically new insights we are not going to get to any “promised land”. Lev Goldfarb, ICPR, Aug. 2004

The (social) reason for the status quo The above prevalent wisdom has not always been as popular as it is today. One of the main reasonsfor the status quois the forgotten part of our history, due to the emergence during the last 15-20 years of two “new” popular areas, neural networks and machine learning. Both of them are dealing with the same subject matter as pattern recognition however starting, basically, all over again, and eventually rediscovering the importance of symbolic representations. (In contrast to pattern recognition, the professional milieu is not any more engineering, but psychological and computational/statistical, respectively, although both of them attracted many young physicists.) Lev Goldfarb, ICPR, Aug. 2004

[ The forgotten history: syntactic pattern recognition • In North America, one of the few early general texts on pattern recognition, Pattern Recognition Principles (1974), by Tou and Gonzalez, had the last chapter (chapter 8) titled “Syntactic Pattern Recognition” and considered “structural” pattern representations. • Among English books that came out in the ’70s and ’80s and devoted entirely to this topic, we had those by Fu (Syntactic Pattern Recognition and Applications), Grenander (Lectures in Pattern Theory), Gonzalez and Thomason (Syntactic Pattern Recognition), Watanabe (Pattern Recognition) and several others. Lev Goldfarb, ICPR, Aug. 2004

The forgotten history: syntactic pattern recognition In the resulting excitement and during the making of so many careers in the above two new sister areas, some of the important lessons learned in syntactic/structural pattern recognition were lost, i.e. the critical role of (non-vector) pattern representations and formalisms was overlooked. Lev Goldfarb, ICPR, Aug. 2004

The forgotten history: syntactic pattern recognition • Syntactic pattern recognition • Pioneers: Eden, Narasimhan, and Ledley (published their initial work in the early 60s) and others • King-Sun Fu(of Purdue university, also instrumental in the founding of IAPR and was its first president) mounted a productive and influential applied scientific program to shift emphasis from the vector space based representation to other, “structural”, forms of representation, predominantly those associated with formal grammars and its various generalizations. • Fu began his career in statistical pattern recognition; later, in the ’70s and early ’80s, he was largely responsible for the creation of a burgeoning subfield of syntactic pattern recognition, and his untimely death in 1985 had a big impact on the vitality of this subfield. Lev Goldfarb, ICPR, Aug. 2004

The forgotten history: syntactic pattern recognition • Narasimhan (1964): • The aim of any adequate recognition procedure should not be merely to arrive at a “yes”, “no”, “don’t know” decision but to produce a structural description of the input picture. Lev Goldfarb, ICPR, Aug. 2004

The forgotten history: syntactic pattern recognition There are applications of syntactic pattern recognition to almost any field, from seismic oil exploration to speech recognition, from face recognition to fingerprint recognition. Lev Goldfarb, ICPR, Aug. 2004

The forgotten history: syntactic pattern recognition The main overlooked lesson from syntactic pattern recognition (already noted by its pioneers) is this: even in this incomplete form, “structural” pattern and class representations have substantial advantages over their vector space counterparts, from both applied and theoretical points of view. (However, see slides 23-26). Lev Goldfarb, ICPR, Aug. 2004

] Compared to the vector space representation of the digitized image, under symbolic representation, one moves immediately into a more meaningful, higher level representation, with a generative class description.

[ Syntactic pattern recognition: the unrealized hopes What is the problem, then? Why have not these advantages materialized yet in a more apparent manner? ______________________________ Fundamental inadequacy of the (conventional) string representation Take a string: afdbaaccbdfaddbbcacbffacda => no temporal information is represented in it (i.e. how the string was “formed”) => exponentially many candidate operations to consider that could have been involved in the generation of the string

Syntactic pattern recognition: the unrealized hopes => given a training set of strings, the inductive learning process simply cannot recover reliably the set of “generative operations”, i.e. to recover the class description => basic inadequacy of the underlying formal structure of the conventional syntactic (similarly all computational) formalisms: the “link” between the class and object representations is too week. Lev Goldfarb, ICPR, Aug. 2004

Syntactic pattern recognition: the unrealized hopes Thus,the conventional string is not an adequate/reliable form of representation: there are just too many formative object histories that are “hidden” behind this representation. (The related observation applies to graphs and various numeric representations.) Lev Goldfarb, ICPR, Aug. 2004

] A somewhat obvious problem―which is a consequence of the above fundamental inadequacy―is the presence of the second (“spurious”) alphabet of the non-terminals.

We need to be wise About 2500 years ago Democritus wrote: “Fools can learn from their own experience; the wise learn from the experience of others.”  ____________ So, let’s try to be wise and learn as much as we can from the experience of physicists, mathematicians, and biologists. Lev Goldfarb, ICPR, Aug. 2004

[ So how should we apply the wisdom of physicists? Going back to slide 5, since an incremental wisdom has not really worked for our field, how should we interpret “a radical novelty”for our needs? I suggest that we should interpret it in two (equally important) ways, both pointing towards radically new forms of representation. Lev Goldfarb, ICPR, Aug. 2004

How should we apply the wisdom of mathematicians? • First: why the representation? • If we interpret correctly, from the applied point of view, the wisdom of modern mathematics, we would immediately accept that form the representational point • the data operations that are not derivatives/compositions of the basic operations (specified by the underlying axiomatic structure of the “data space”) cannot be inductively recovered/discovered. Lev Goldfarb, ICPR, Aug. 2004

How should we apply the wisdom of physicists? ] Second: • we should demand from the modela radical explanatory novelty we should expect it to offer some basic insights into the nature of information processes in the Universe • we should demand radical novelty in its formal structure • we should expect it to embody a radically new formal structure Lev Goldfarb, ICPR, Aug. 2004

ETS formalism: its inspiration From the very beginning, the ETS framework has been inspired by the formal/esthetical beauty and power of a dynamic (and generalized) version of the generative grammar model: to support an evolving concept of class, one needs an evolving set of transformations that captures the class description and also modifies the corresponding (evolving) mathematicalstructure on the representation “space”. (In that sense, if besides ETS there is another formal realization of this vision, it should definitely be investigated.) Lev Goldfarb, ICPR, Aug. 2004

ETS formalism: its inspiration In mathematics, so far, we have been dealing with various static (abstract) structures. For example, in group theory, which does study the subgroup lattice of a given group, there are, quite naturally, no expectations thatone subgroup is obtained by modifying another one. Even in a more “continuous” setting of a topological space, there are again no expectations that a topological structure itself is evolving. ----------------------- In contrast, in ETS formalism, some of the central building blocks of the formal structure, the set of transformations, are being modified on the basis of the inductive experience. Lev Goldfarb, ICPR, Aug. 2004

[ ETS formalism: temporal information Thus, it should not come as a surprise that, when we came to the formalization stage about 5 years ago, we had to begin literally from scratch. The main difficulties have been (and will continue to be) associated withthe need to introduce temporal informationinto a structural representation, i.e. with the concept of object’s formative/generative structural history. And it is precisely this feature that characterizes the radical departure from all known mathematical paradigms. Lev Goldfarb, ICPR, Aug. 2004

ETS formalism: temporal information Event environment versus object environment: In State 1, three unbonded oxygen atoms are shown. After the first “real” event has occurred, OA and OB become bonded, and the corresponding “ideal” event (primitive p1) is depicted on the right.

ETS formalism: temporal information ] From Edward Witten, Universe on a String, Astronomy, June 2002: Note how one event (particle on the left or string on the right) is immediatelyfollowed by two events (two particles/strings).

[ ETS formalism: (class) primitive transformations initial sights time terminal sights • Think of a primitive as an “elementary” processthat transforms the initial “objects” into terminal ones: it is a symbolic “notation” of a typically nontrivial process (structured event). • The circle and the square denote two site types: letters {a, b} and {x, y} are names of the variables that are allowed to vary over non-overlapping sets of numeric labels. • Brackets [ ] signify that we are, in fact, dealing with a class of (original) primitives, where each original primitive carries concrete numeric labels.

ETS formalism: structs (segments of formative history) number 3 representations of a more general structural object Each pi denotes an ETS primitive transformation (the order in which the primitive transformations are applied is captured in the representation).

ETS formalism: extructs (contexts) • Examples of extructs: heavy lines identify the interface sites and crosses identify detached sites. • Contexts should be thought of as parts of the formative history that are necessary for the presence of the (immediately following) “important” segments of history.

ETS formalism: transformation context body context body Formal definition The “assembled” transform

ETS formalism: a supertransformation A supertransform, t(tau bold),is a generalization of the concept of transformation, and it can be thought of as an abstraction of the set of several “closely related” and inductively acquired transforms. Here, all contexts have the same interface sites and all bodies have the same initial and terminal sites.

ETS formalism: class supertransform(structural class representation) The class supertransform,[t] ,is obtained on the basis of the supertransform, by abstracting away the supertransform’s site labels.

ETS formalism: (single level) class representation Class representation (associated with a class supertransform [t ]) is defined as a pair CLASS [t ] = ([t] , CBt), where CBtis the context-body association strength scheme, or simply class weight scheme: CBt : {t| tfrom t} → R+ . (Obviously, [t ] is the main, “structural”, part of the representation.)

ETS formalism: (structural) description of a single representational level Transformation systemTS is simply a finite set of class supertransforms: TS = { [t1], [t2], . . . , [tm] } . Lev Goldfarb, ICPR, Aug. 2004

ETS formalism: transition to the next level (a tentative form) For each class supertransform in a transformation system, we choose a canonical supertransform (shown on the left) and construct the corresponding next-level primitive (shown on the right).

ETS formalism: transition to the next level Simplified multi-level ETS representation with different time scales for each level. Two consecutive levels are shown. The time scale for the higher level is measured in coarser units: t’0 corresponds to t0 , t’1corresponds to t2 , t’2corresponds to t5 . The contexts of the transformations are not identified.

ETS formalism: multi-level view A multi-level representational tower with a single-level sensor at level 0.

ETS formalism: multi-level view of class representation Pyramid view (partial) of a k-th level class supertransform: the pyramid is formed by all subordinate class supertransforms.

ETS model basics: the evolution of a class ] Since any class is specified by a finite set of weighted k-th level (for some k) transformations, the class evolution is readily understood via modification of the set of transformations (structural change) and/or their weights (quantitative change). And this is exactly what you will observe in the functioning of the ETS intelligent process, discussed in the next talk. Lev Goldfarb, ICPR, Aug. 2004

The proceedings cover page This is “Metamorphosis III” by Escher, which was chosen for the cover page of the proceedings as intimating an evolution of a class. Lev Goldfarb, ICPR, Aug. 2004

ETS formalism: representational completeness A most distinguishing feature of this formalism is unprecedented representational completeness and explicitness. This representational completeness radically changes the formal side of the modeling (the corresponding “future mathematics”). Lev Goldfarb, ICPR, Aug. 2004

Pattern representation & the future of pattern recognition