60 likes | 78 Views
Explore the challenges of information overload in the digital age and discover how patterns can unlock valuable knowledge from vast data sets. Delve into applications like data mining, signal processing, and information retrieval while contemplating the future of universal pattern modeling. Dive deep into approximate data solutions and the evolving landscape of query languages for patterns.
E N D
Patterns not just Data • Information overload which escalates beyond any of our traditional beliefs. • “The world produces between 1 and 2 exabytes of unique information per year, which is roughly 250 megabytes for every man, woman, and child on earth.” [P. Lyman and H.R. Varian, "How Much Information", 2000. Retrieved from http://www.sims.berkeley.edu/how-much-info on January 2002] • Still, even novel DBMS architectures are insufficient to cover the gap between the exponential growth of data and the slow growth of our understanding [Gray02], due to our methodological bottlenecks and simple human limitations. Timos Sellis
Patterns not just Data • To compensate for these shortcomings, we reduce the available data to knowledge artifacts (i.e., clusters, rules, etc.) through data processing methods (pattern recognition, data mining, knowledge extraction • This reduces their number and size (so that they are manageable from humans) while preserving as much as possible from their hidden/interesting/available information. • These knowledge artifacts are patterns. Patterns can in general be distinguished with respect to how they are constructed and what they are used for. Timos Sellis
Patterns not just Data - Applications • Data Mining • Clusters, Classifications, Assoc. Rules, Time-Series • Signal Processing • Music, Voice, Vision • Information Retrieval • Corpus • Mathematical applications • Graphs, numbers, Cryptography • You can name more….. Timos Sellis
Patterns not just Data – The Challenge • Can we find a universal model that allows modelling patterns in general? • What would a query language for patterns look like? • What would be the essential “new” system components (indexing, visualization, etc)? • Can such systems be built on top of ORDBMS? Timos Sellis
Approximate Data/Answers • In most real, big, applications approximations are the only solution. • At the same time, the user needs to know the quality of the approximations, at the information level as well as at the answer level • Support must be provided by the DBMS at all levels: models, query languages, indexes, physical storage, visualization of results Timos Sellis
Approximate Data/Answers– The Challenge • Scalable approximation schemes (histograms, wavelets, etc) • Learning out of the tolerance a user can show to approximate answers deemed as acceptable • What is an approximation of an XML document? How much schema/ontology information is required? • Approximation may change according to the context of a user query; how is this taken under account? Timos Sellis