320 likes | 481 Views
David Chays Polytechnic University Brooklyn, NY. A Framework for Testing Database Applications. Joint work with Phyllis G. Frankl (Polytechnic) Saikat Dan (Polytechnic) Filippos Vokolos (Lucent Technologies) Elaine J. Weyuker (AT&T Labs - Research). Motivation.
E N D
David Chays Polytechnic University Brooklyn, NY A Framework for Testing Database Applications Joint work with Phyllis G. Frankl (Polytechnic) Saikat Dan (Polytechnic) Filippos Vokolos (Lucent Technologies) Elaine J. Weyuker (AT&T Labs - Research)
Motivation • Database systems play an important role in virtually every modern organization • Faults can be very costly • Programmers/testers may lack experience and/or time • Little attention has been paid to DB application program correctness
Outline of Talk • Background • Aspects of DB system correctness • Issues in testing DB application programs • Architecture of tool set • Tool for generating database states • Additional issues and approaches
DBMS and DB application DB schema, eg., Emp(ssn, name, addr, sal) Dept(id, dept-name) Database Management System DB application, eg., /* C program with embedded SQL*/ DB
ssnname 001-00-0356 Johnson 012-34-5678 Smith 036-54-5555 Jones 051-88-9911 Blake TableS Relational databases • Data is viewed as a collection of relations • relation schema • relation (relation state) • Tables, tuples, attributes, constraints • for example, create tableS(ssn char(11)primary key, • namechar(25)not null)
Aspects of Correctness • Does the DBMS perform all operations correctly? • Is concurrent access handled correctly? • Is the system fault-tolerant? • ... • Does the application program behave as intended?
function imperative nature function declarative nature Traditional vs. DB programs input DB state input output DB state output
Example of an Informal Specification Customer-feature table: customerID address features ... Billing table customerID billing plan ... Input customer ID and name of feature to which the customer wishes to subscribe. Invalid ID: return 0 feature unavailable in that area: return code 2 feature available but incompatible with existing features: return code 3 else update customer’s feature record, update billing table, return code 1
What are the Input/Output Spaces? • Naïve approach • I = {customer-IDs} X {feature-names} • 0 = {0,1,2,3} • More suitable approach: • I = {customer-IDs} X {feature-names} X {database-states} • 0 = {0,1,2,3} X {database-states} • Problem: • must control and observe the DB state
DB Application Testing Goal • Select “interesting” DB states along with user inputs that exercise “interesting” behavior • Cover wide variety of situations that could arise in practice • Do so in a way that facilitates checking of output to user and resulting DB state
Situations to Explore • Customer already subscribes to that feature • Feature not available in customer’s area • Feature available, but incompatible with other features customer already has • Feature available and compatible with existing features • Customer doesn’t yet subscribe to any features • ...
feature incompatible_feature F1 F2 ... ... ID area F1 F2 ... FN 011 11235 ... ... feature area F1 11235 F2 11235 ... ... May involve interplay between several tables • Table 1: incompatible features • Table 2: features available in various areas • Table 3: customers and features
Will Live Data Suffice? • May not reflect sufficiently wide variety of situations • May be difficult to find the situations of interest • May violate privacy or security constraints
Generating Synthetic Data • DB state is a collection of relation states, each of which is a subset of the Cartesian product of some domains • Generating domain elements and gluing them together isn’t enough, since constraints must be honored • We attempt to generate interesting data that obey integrity constraints • Use schema and user supplied info
DB schema Suggestions from tester App source State Generator DB state Input Generator State Checker Output Checker User input Results App exec Output
DB state generator • Inputs DB schema (in SQL) • Parses schema to derive info about • attributes • tables • constraints : uniqueness, not-NULL, referential integrity • inputs additional info from user • suggested attribute values, divided into groups, similar to Category-Partition Testing [Ostrand-Balcer] • additional annotations
Example Schema create table s (sno char(5), sname char(20), status decimal(3), city char(15), primary key(sno)); create table p (pno char(6) primary key, pname char(20), color char(6), weight decimal(3), city char(15)); create table sp (sno char(5), pno char(6), qty decimal(5), primary key(sno,pno), foreign key(sno) references s, foreign key(pno) references p);
Stmt Stmt Column Definition Nodetag type = T_ColumnDef colname = “sno” type name = “bpchar” Constraints = NIL Table Constraint Nodetag type = T_Constraint contype = CONSTR_PRIMARY keys Create Stmt Nodetag type = T_CreateStmt relname = “s” Create Stmt Nodetag type = T_CreateStmt relname = “s” Column Definition Nodetag type = T_ColumnDef colname = “sno” type name = “bpchar” Constraints T_IDENT name = “sno” contype = CONSTR_PRIMARY Create table s( sno char(5), primary key(sno) ); Create table s( sno char(5) primary key );
0 1 2 3 0 1 2 3 0 1 2 3 sno | F| F| F| F| F| F| F| sno | F| F| F| F| F| F| F| sno | S | char | pr | un | ~nn cp globalTablePointer sname | S | char | ~pr | ~un | ~nn sname | F| F| F| F| F| F| F| cp status | S | dec | ~pr | ~un | ~nn status | F| F| F| F| F| F| F| status | F| F| F| F| F| F| F| cp City | F| F| F| F| F| F| F| city | S | char | ~pr | ~un | ~nn City | F| F| F| F| F| F| F| cp S | 4 | 0 1 2 3 4 0 1 2 3 pno | F| F| F| F| F| F| F| pno | P | char | pr | un | ~nn cp pname | P | char | ~pr | ~un | ~nn pname | F| F| F| F| F| F| F| cp P | 5 | color | P | char | ~pr | ~un | ~nn color | F| F| F| F| F| F| F| cp weight | P | dec | ~pr | ~un | ~nn weight| F| F| F| F| F| F| F| cp city | P | char | ~pr | ~un | ~nn cp SP | 3 | 0 1 2 sno |SP | char | pr | un | ~nn | foreign cp pno |SP | char | pr | un | ~nn | foreign cp Null qty |SP | dec | ~pr | ~un | ~nn cp
Selecting Attribute Values • Initial prototype queries tester for suggested values and guidance on how to use those values • Values may be partitioned into data groups (choices) • Tester may specify probabilities for data groups
--choice_name: low 10 20 30 ---- --choice_name: medium 300 400 ---- --choice_name: high 5000 6000
cp low medium high 10 300 5000 20 400 6000 30 Each category (column) can have a list of choices pointed to by cp.
DB table generation • Tester specifies table sizes • Tool generates tuples for insertion • select data group or NULL, guided by annotations • select value from data group, obeying constraints • keep track of values used • Outputs sequence of SQL insert statements
Input files for Parts-Supplier database sno: --choice_name: sno S1 S2 S3 S4 S5 sname: --choice_name: sname Smith Jones Blake Clark Adams pname: --choice_name: interior seats airbags dashboard ---- --choice_name: exterior doors wheels bumper city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown ---- --choice_name: foreign --choice_prob: 10 London Bombay pno: --choice_name: pno P1 P2 P3 P4 P5 status: --choice_name: status --null_prob: 50 0 1 2 3 color: --choice_name: color blue green yellow weight: --choice_name: weight 100 300 500
city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown ---- --choice_name: foreign --choice_prob: 10 London Bombay status: --choice_name: status --null_prob: 50 0 1 2 3
Table s sno pno qty S1 P1 5000 S1 P2 300 S1 P3 10 S2 P1 6000 S2 P2 400 S2 P3 5000 S3 P1 20 S3 P2 300 S3 P3 30 S4 P1 6000 sno sname status city S1 NULL 0 Brooklyn S2 Smith 1 Florham-Park S3 Jones NULL London S4 Blake NULL Middletown pno pname color weight city P1 NULL blue 100 Brooklyn P2 Seats green 300 Florham-Park P3 airbags yellow 500 Middletown A database state produced by the tool Table sp Table p
Related work • Lyons-77, DB-Fill, TestBase • Like our approach, rely on user to supply attribute values • Do not handle integrity constraints as completely • Require tester to describe tables in special-purpose language (rather than SQL)
Testing Techniques in DB literature • Focus on DB system performance, rather than DB application correctness • Benchmarks • Performance of SQL processor • Generation of large number of DML statements [Slutz] • Generation of huge tables with given statistical properties [Grey et al]
Summary • Issues • Framework • Prototype
Future Work • Refinement based on feedback from DB application developers / testers • Other DB state generation heuristics • boundary values • “missing” constraints • difficult SQL features • Interplay between DB state and user inputs • Checking DB state after test execution • Checking application outputs