Data Generation for Application-Specific Benchmarking

Data GenerationforApplication-Specific Benchmarking Y.C. Tay National University of Singapore

Background benchmarks help research and development --- the dominant database benchmark is TPC SIGMOD Conference 2011 research track: 87 papers, 17 use TPC (20%) industry track: 14 papers, 6 use TPC (43%) Problem : a few TPC benchmarks but many, many applications TPC becoming irrelevant?

Vision a paradigm shift in database benchmark development from top-downcommittee consensus domain-specific package (data generator + queries) to bottom-up community collaboration application-specific tools (dataset scaling) synthetically scale up/down application data application already has queries

Challenge Dataset Scaling Problem : Given a set of relational tables D and a scale factor s, generate a database state D’ that is similar to D but s times its size. E.g. What would DBLP look like in 2020? s > 1 why: scalability testing difficulty: copying doesn’t work (e.g. social network data) s < 1 why: application testing difficulty: sampling not straightforward (similar to web crawling) s = 1 why: privacy/proprietary reasons difficulty: encryption is risky

Challenge Dataset Scaling Problem : Given a set of relational tables D and a scale factor s, generate a database state D’ that is similar to D but s times its size. by query results difficulty: data correlation E.g. database = {photos, owners, comments, tags} • inter-column correlation • foreign keys • age and gender • user likely to comment • on own photos • gardener likely to tag • photos of flowers • inter-row correlation • photo dimensions • (same camera) • tags used by gardener • (“rose”, “bee”, “beetle”) • inter-column + inter-row • 2 users comment on • each other’s photos • (social network)

Challenge scaling a social network: extract scale by s inject ~ ~ ~ ~ D G G G D D synthetic dataset empirical dataset empirical social graph synthetic social graph use join query use graph theory #edges? #triangles? path lengths? any database theory? E.g. how to inject into * correlation from indicating X and Y comment on each other’s photos * correlation between Alice’s birthday and wall posts by her classmates * correlation among tags used by bird watchers

Challenge * online social networks are here to stay * their datasets can be huge * their datasets have commercial value where is the database theory? Attribute Value Correlation Problem for Social Networks : Suppose a dataset D records data from a social network. How do the social interactions affect the correlation among attribute values in D ?

Vision (for the next 25 years): a paradigm shift from a top-down design of domain-specific benchmarks by committee consensus to a bottom-up collaborative development of tools for application-specific dataset scaling Challenges: • Dataset Scaling Problem • Attribute Value Correlation Problem for Social Networks Payoff: • commercial value in dataset scaling tools • new database research areas (social network data, schema design, • vertical/horizontal partition, query optimization, business intelligence, …) Start: UpSizeR (http:www.comp.nus.edu.sg/~upsizer ) • single-server version • Hadoop version

Data Generation for Application-Specific Benchmarking

Data Generation for Application-Specific Benchmarking

Presentation Transcript

Benchmarking for Object-Oriented Unit Test Generation

Synthesizable, Application-Specific NOC Generation using CHISEL

Benchmarking Web Application Scanners for YOUR Organization

Fun with Benchmarking Data

BGP Data-Plane Benchmarking

Source-Specific Multicast (SSM ) for application developers

BGP Data-Plane Benchmarking

Application-specific constraints for multimedia presentation generation

Application Specific Instruction Generation for Configurable Processor Architectures

Application-Specific Languages

IC-Specific Data

Application Specific Networking Patterns

Application-specific User Data

Application Specific Module

Traffic Classification for Application Specific Peering

Synergy-Based, Data-Driven Generation of Object-Specific Grasps for Anthropomorphic Hands

IC-Specific Data

Traffic Classification for Application Specific Peering

Compiling Application-Specific Hardware

Distributed Energy Generation (DEG) Market Segmentation | Application Outlook | Product Benchmarking