230 likes | 370 Views
Inferring the type systems under data processing programs. Motivation. Data processing programs Retrieving runtime system status, recorded information, … On specific APIs Type systems (structure ?) of data sources are necessary for inspecting and developing programs
E N D
Motivation • Data processing programs • Retrieving runtime system status, recorded information, … • On specific APIs • Type systems (structure?) of data sources are necessary for inspecting and developing programs • What kinds of data, relations, how to invoke the API • Not easy to establish the type systems • Generic APIs do not reflect the data types • Sufficient and accurate documents are not always available • Reading source code is not always practical
This work • “Systematically inferring the type systems of data sources, through static analysis of data processing programs” • For inspection: detecting problems related to data usages • For programming: sample code snippets for retrieving a specific type of data • Basic idea • Recover entire data flow of the program • Clarify the different ways of API invocations to retrieve data • Challenges • Big scale and complex structure of source code • Complex data retrieving logic
Data processing programs • A simple example • Retrieving memory information of JEE server • Through JMX API
Inferring the type system getAttribute_Verbose Memory Verbose See what data it gets, and what other data used when getting them.
Challenging for practical programs Complex data flow One instruction to retrieve different kinds of data
Source Code Analysis • Purpose: recover the data flow from source code • Source code abstraction • Object- and call-site-sensitive points-to analysis • About points-to analysis • A heapH storing all objects allocated in the source code • A points-to mapping Ptshowing what objects a variable may point to • Extension to typical points-to analysis • Tracing the API invocation results: new obtained objects • Depends on constant values: pre-calculation on constants
Data type inference • Raw inference • A new calculus to clarify API invocations • Construct classes and associations accordingly • Code snippets slicing • Backward slicing along data flow • Meta-model refinement • Remove redundant duplicated elements • Meta-model decoration • Names, multiplicity
Raw meta-modeling • Points-to analysis result • for get, cds, s, and the anonymous return value • Calculate the source of • ) • Two classes for from the two clauses, to associations from classes of to the two classes
Code snippet slicing • For an association from • Source and auxiliary variables from the clause • Backtracking the invocations
Refinement and decoration • Refinement • Rewriting rules • Decoration • Empirical namingprinciples
Implementation • Points-to analysis: Extend WALA • Inference: Implement thealgorithms on • WALA • EMF
Experiments • To evaluation the following three aspects • Applied to practical data sources and programs • Useful for inspecting existing programs • Useful for writing new programs • Three experiments • Inference test on typical data sources and open source programs • Result investigation, finding problems for the programs • User study, comparing the programing efficiency with and without the inferred type system
Inspection with type systems • Informal but interesting finds for the selected programs • Version incompatibility • Two programs on JOnAS, CarteBlanche and jonasAdmin (4.7) • A “DeploymentPlan” type in CarteBlanche but not jonasAdmin • Conjectures: DeploymentPlan is a feature in a later version and CarteBlanche is not compliant to JOnAS 4.7 • Confirmed by their documents • Incompete support • JabRef sub function to import from MS Bib reference source • 76 out of 77 XML elements supported, without “RefOrder” • Indicating potential improvement • Conclusion: Assist developer in detecting wrong or sufficient use of data source
User study • Four data sources (Exists, JOnAS, Flickr, GeoRss) • 12 problems about retrieving data • Q1: get the ID of a query under processing • Q2: get the ID of a running job • 6 volunteers, 3 grad, 1 ugrad, 2 engineers • Experiment result • Programming efficiency (time spent) • Programming processes
Findings • Process • Without type systems • Most chose to search the sample clients • Hard to find the proper keyword • Some chose to use the XML schema, but block a while for writing code • Sometimes miss the relation between problems • With type systems • Read the meta-model intuitively, chose the element, go on • Result • Really improve • Significant for related problems • Significant for non-expert developers
Related work • API programming assistant • Restraint: summarize and detect “bad smells” • Guidance: Not formal or precise, but show potential ways • A guidance approach, but for data not API itself • Data type inference • Inferring data types from text and XML • Not from data themselves, but the programs using them, no need for huge amount of sample set • Points-to analysis: A new usage and corresponding extension • Def-use analysis: not just “uses”, but the compositions of “uses” to form sufficient and independent invocation
Conclusion • A novel approach to inferring type systems of data sources under data processing programs • Usage and extend points-to analysis • A new calculus to clarify different API usages • Experiments to show this approach • Applies to practical data sources and programs • Assist program inspection • Assist writing new data processing programs • Future work • Accuracy improvement • More experiments on different APIs