130 likes | 202 Views
On the futility of attempts to formalize clustering within conventional formal frameworks. Lev Goldfarb ETS group Faculty of Computer Science UNB Fredericton, Canada. About the talk.
E N D
On the futility of attempts to formalize clustering within conventional formal frameworks Lev Goldfarb ETS group Faculty of Computer Science UNB Fredericton, Canada
About the talk • To approach the foundations of clustering, one must rely on an adequate concept of class, which I claim is completely lacking. • This most fundamental issue that concerns our area (understood broadly to include machine learning) has been systematically neglected, putting any progress in the area in question. • So … what is a class? • In particular: Can the concept of class be adequately addressed within conventional math/CS formalisms ? • As the title of the talk suggests, the answer is “no”. • (For a radically new representational formalism, ETS, not considered here, see references in the abstract of this talk.)
Physical objects Natural numbers (Peano representation) object representation map (via fixed measurable property) What is a numeric representation? Restricting ourselves to natural numbers: Classical measurement is a systematic method for representing objects by numbers. (Classes of objects do not at all enter into consideration.)
What is a representational formalism? Physical objects Representational formalism object representation map thetwo mappings are coupled class class class class representation map I.The two representation mappings should not be decoupled. II. We should postulate thatallclasses have “inductive generative structure”.
What is a representational formalism? I. Formal implications of the tight link between the two mappings (for objects & classes ) • for the interpretation of basic operationsin the chosen formalism: we should treat them as object operations and hence takethem seriously, in contrast to present practice in applied mathematics • for the general structure of class representation: a class representation must be expressed via basic operations in modern mathematics, this is a standard structural requirement (ignored in ML)!
What is a representational formalism? • Refinement of the general structure of class representation [ I b) ]. • Relying on our understanding of the structure of classes in nature, the above “inductive generative structure” of classes should mean that class representation must be • of generative form: it must incorporate the mechanism by means of which the members of the class are constructed via the basic operations • (also a standard structural math. requirement) • and • inductive: it must be effectively and reliably learnable from a very small training set. Lev Goldfarb, NIPS 2005, Clustering
Inadequacy of formal grammars • Grammars do not offer an “inductive” class representation [ II b) ]. • The main reason: a string over a finite alphabet does not carry within itself enough representational information to link it “effectively and reliably” with the corresponding grammar, i.e. to identify the class to which it belongs (see also the next slide). • Thus, the overall deficiency of formal grammars is twofold: • poor object representation • class representation is “disconnected” from object representation • e.g. nonterminals are not derivable from the object representation Lev Goldfarb, NIPS 2005, Clustering
contexts Inadequacy of the string representation: there are better choices Two of the possible formative histories for stringabaca: An ETS representation (has nothing to do with a tree; captures the temporal sequence of insertions):
the only candidate for class/class representation is the affine subspace The vector space as a representational formalism vector space representation basic operations are {+, ·} from above However, the overwhelming practice amounts to: “take the vectors and run”, i.e. do what you want with them. When modeling various phenomena in science,classes have not yet become the focus of attention, hence it is up to us to addressthese new scientific representational issues. Lev Goldfarb, NIPS 2005, Clustering
Inadequacy of the vector space formalism • Obviously, it lacks generative[ II a) ] class representation. Why? • The absence of “sufficient” representationalstructure results in: • operations{+, ·} being too “simple” • linear generativity producing only very “regular” classes. • To compensate, a class description had to be brought in from outside the algebraic formalism proper(which again violates the standard “structural” wisdom of mathematics). • The resulting class description: • is structurally and representationally “alien” and “meaningless” • (there is no tight link between an object and its class representation) • includes non-class vectors that satisfy the class description
Inadequacy of the vector space formalism Unfortunately, the prevailing trend in machine learning is that clever distance measures or kernels should “solve the problem”. • However, these have to be crafted manually, and, more importantly: • they cannot rectify the inadequacy of a vector as an object representation • again, they are being brought in from “outside” the representational (algebraic) formalism. Lev Goldfarb, NIPS 2005, Clustering
Inadequacy of the vector space formalism Thus, ML practice reinforces the scientifically counterproductive view that classes are our creation, rather than existing in nature (due to the fact that class representation is not “related” to object representation). On the other hand, once we develop a formalism in which the concept of class follows the “structural” mathematical wisdom, we would then offer the sciences a formal language of inestimable value, i.e. something that mathematics has traditionally provided. Lev Goldfarb, NIPS 2005, Clustering
Conclusion No adequate class representation No foundation for clustering The golden age of classification (and “clustering”) is still ahead of us, though its arrival depends on the development of the “right” representational formalism. Lev Goldfarb, NIPS 2005, Clustering