E N D
This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government. If you fix everything youlose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas(WVU) Dan Baker (WVU) Karen Lum (JPL) tim@menzies.usoelrawas@mix.wvu.edu International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007
What does this mean? A supposedly np-hard task abduction over first-order theories nogood/2 Q: for what models does (a few peeks) = (many hard stares)?
Grow Monte Carlo a model Picking input settings at random For each run Score each output Add score to each input settings Harvest Rule generation experiments, favoring settings with better scores If “collars”, then … small rules … … learned quickly … … will suffice “Collar” variables set the other variables Narrows Amarel in the 60s Minimal environments DeKleer ’85 Master variables Crawford & Baker ‘94 Feature subset selection Kohavi & John ‘97 Back doors Williams et al ‘03 Etc Implications for uncertainty? A: models with “collars” Feather & Menzies RE’02
USC software process models for effort, defects, threats y[i] = impact[i] * project[i] + b[i] for i {1,2,3,…} ≤ project[i] ≤ : uncertainty in project description ≤ impact[i] ≤ : uncertainty in model calibration Random solution pick project[i] and impact[i] from any .. , .. .. set via domain knowledge; e.g. process maturity in 3 to 5 range of .. known from history; Score solution by effort (Ef), defects (De) and Threat (Th) For example STAR: collars + simulated annealing on Boehm’s USC’s software process models controllable uncontrollable
Certain methods Using much historical data Learn the magnitude of the impact[i] relationship With fixed impact[I] Monte Carlo at andom across the project[i] settings E.g. Regression-based tools that learn impact[I] from historical records 93 records of JPL systems SCAT: JPL’s current methods 2CEE: WVU’s improvement over SCAT (currently under test) Methods with more uncertainty Using no historical data Monte Carlo at random across the project[i] settings and impact[i] settings E.g. STAR Monte Carlo a model Score each output Sort settings by their “C”, “C”= cumulative score Rule generation experiments, favoring settings with better “C”. Two studiesy[i] = impact[i] * project[i] + b[i] one two Tame uncontroll-ables via historical records
for setting Sx { value[setting] += E } Sort all settings by their value Ignore uncontrollables impact[I] Assume the top (1 ≤ i≤ max) project[I] settings Randomly select the rest “Policy point” : smallest I with lowest E Median = 50% percentile Spread = (75-50)% percentile Inside STAR 1. sampling - simulated annealing 2. summarizing - post-processor Bad Good 38 not-so- good ideas 22 good ideas
SCAT vs 2CEE vs STAR project[i]
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i]
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I]
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)%
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4%
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4%
Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4%
Control impact[I] via historical data Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?)
Control impact[I] via historical data Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?)
Control impact[I] via historical data Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?) If you fix everything, you lose fixes for everything else
Luke, trust the force, I mean, collars IEEE Computer, Jan 2007 “The strangest thing about software”
Feather, DDP, treatment learning Optimization of requirement models XEROC PARC, 1980s, qualitative representations (QR) not overly-specific, Quickly collected in a new domain. Used for model diagnosis and repair Can found creative solutions in larger space of possible qualitative behaviors, than in the tighter space of precise quantitative behaviors Abduction : World W = minimal set of assumptions (w.r.t. size) such that T A => G Not(T U A => error) Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,… Theoretically slow (NP-hard) but this should be practical: Abduction + stochastic sampling Find collars Learn constraints on collars Related work
STAR, an example of a general process: Stochastic sampling Sort settings by “value” Rule generation experiments favoring highly “value”-ed settings See also, elite sampling in the cross-entropy method If SA convergence too slow Try moving back select into the SA; Constrain solution mutation to prefer highly “value”-ed settings BORE (best or rest) n runs Best= top 10% scores Rest = remaining 90% {a,b} = frequency of discretized range in {best, rest Sort settings by -1 * (a/n)2 / (a/n + b/n) Other valuable tricks: Incremental discretization: Gama&Pinto’s PID + Fayyad&Irani Limited discrepancy search: Harvey&Ginsberg Treatment learning: Menzies&Yu Possible optimizations (not used here) Ask me why, off-line
LC : learn impact[i] via regression (JPL data) STAR: no tuning, randomly pick impact[i] Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual { “” “”} same at {95, 99}% confidence (MWU) Why so little Diff (median= 75%)? Most influential inputs tightly constrained At the “policy point”,STAR’s random solutionsare surprisingly accurate diff diff diff diff same diff same same same same
In many models, a few “collar” variables set the other variables Narrows (Amarel in the 60s) Minimal environments (DeKleer ’85) Master variables (Crawford & Baker ‘94) Feature subset selection (Kohavi & John ‘97) Back doors (Williams et al ‘03) See “The Strangest Thing About Software (IEEE Computer, Jan’07)” Collars appear in all execution traces (by definition) You don’t have to find the collars, they’ll find you So, to handle uncertainty Write a simulator Stagger over uncertainties From stagger, find collars Constrain collars (Model uncertainty = collars) << inputs This talk: a very simple example of this process
Comparisons • Standard software process modeling • Models written more than run (PROSIM community) • Limited sensitivity analysis • Limited trade space • Or, expensive, error-prone, incomplete data collection programs • Point solutions • Here: • No data collection • Found stable conclusionswithin a space of possibilities • Search : very simple • Solution, not brittle • With trade-off space 22 good ideas, sorted
Summary Bad Good • Living with uncertainty • Sometimes, simpler than you may think • more useful than you might think • Simple: • Here, the smallest change to simulating annealing • Useful: • Sometimes uncertainty can teach you more than certainty • If you fix everything, you lose fixes to everything else • Collars control certainty • Uncertainty plus constrained collars more certainty • Also, can drive model to better performance An example you can explain to any business user An example you can explain to any business user 22 good ideas, sorted