90 likes | 203 Views
Mathematics – A new Domain for Datamining?. Simon Colton simonco@cs.york.ac.uk http://www.dai.ed.ac.uk/~simonco Universities of Edinburgh & York United Kingdom. Mathematics is the new Biology. Many databases of math information Massive potential for datamining This talk
E N D
Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk http://www.dai.ed.ac.uk/~simonco Universities of Edinburgh & York United Kingdom
Mathematics is the new Biology • Many databases of math information • Massive potential for datamining • This talk • Overview of mathematics databases • Hurdles to overcome for datamining • Suggested Methods • Potential Rewards
Mathematical Databases • Mathworld encyclopedia • 8974 entries, 153958 cross-references, 1400 pages • MathSciNet citation service • 10843 reviews, 151350 articles, 358104 authors • Mizar library of formalised maths • 666 articles, 2000 concept definitions • Mathematica CAS functions • Tens of thousands of computer algebra functions
Mathematical Databases • Encyclopedia of Integer Sequences • 60,000 sequences with terms, definitions, etc. • Inverse Symbolic Calculator • 50 million constants, 400 tables • Gap library (CAS) • 6 million groups • Ad hoc databases everywhere • Geometry junkyard, My favourite constants
Problems with the Data • Highly heterogeneous • No agreed upon format for concepts, conjectures • Distributed • Hundreds of websites • Dynamic • Eg. 50 new integer sequences daily • Really need to impose homogenuity
Suggestions for Datamining • Conjectures: simple relationships between concepts • Equivalence, implication, nonexistence, moonshine • Need to worry about interestingness • Plausibility, complexity, surprisingness • Concept formation to get correct statements • Composition, tweaking, monster-barring
Potential Rewards - Example • NumbersWithNames program • http://machine-creativity.com/programs/nwn • Datamining the Encyclopedia of Integer Sequences • Perfect numbers are pernicious • Perfect: sum of divisors is twice the number • Pernicious: prime number of 1s in binary • 6, 28, 496, …. • Found by looking for subsequences • Lots more of similar examples
Potential Rewards: Money & Fame • Money • EPSRC funded big project: e-science • E-maths initiative being discussed • Fame • Monstrous Moonshine Conjectures • Found by accident (numbers 196833 & 196884) • Led to Fields Medal (see paper)
Conclusions and Future Work • Consider mathematics as a datamining domain • Much data available, but there are problems • Techniques required are simple • Possible to make important conjectures • Cross domain/database sharing of data • Projects like NumbersWithNames • http://machine-creativity.com/programs/nwn