120 likes | 248 Views
Code recognition & CL modeling through AST . Xingzhong Xu Hong Man. Outline. Introduction of AST in SSP AST for Code Recognition AST for Cognitive Linguistic Modeling Summary and Future Work. Introduction of AST in SSP.
E N D
Code recognition & CL modeling through AST XingzhongXu Hong Man
Outline • Introduction of AST in SSP • AST for Code Recognition • AST for Cognitive Linguistic Modeling • Summary and Future Work Semantic Signal Processing Stevens
Introduction of AST in SSP • Most language application use Abstract Syntax Tree(AST) as an Intermediate Representation(IR) to help the computer semantically understanding code in programming domain.* • Signal Processing Code • How to semantically analyzing it? • How to semantically modeling it? for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i]; } *Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007 **ANTLR Semantic Signal Processing Stevens
Code Recognition • In order to perform code re-hosting and other semantic code analysis, we may firstly recognize the functionality of each code segment. • In Computer Science, there are two approaches to perform Code Recognition: • AST based recognition [Gabel, 2008] [Roy 2009] • Generate the AST • Perform Tree Matcher • Random Test based recognition [Jiang, 2009] [Bertran, 2005] • Segment the code • Test the I/O behavior Semantic Signal Processing Stevens
Code Recognition • AST represents the source code in programming domain. • Radio and computational primitives has their feature in AST. • Filter ≈ LOOP + ACCUMULATION + MULTIPLY for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i]; } Semantic Signal Processing Stevens
Code Recognition Result • In order to test the idea, I design a Code Recognition demo (not fully debugged). • Source: GNU-Radio 3.2.2 (C++) • Objective: Recognize and print the filter code. • Platform: Ubuntu 10.04 + Java SE 1.6+ ANTLR 3.2 • Process: • Generate AST for each C++ file. • Match the filter sub-tree pattern. • Print the matched code segment. Semantic Signal Processing Stevens
Code Recognition Result • Result: • Totally 932 C++ source files in GNU-Radio. • 689 files successfully analyzed (to be continued). • 59 filter patterns found. for (i = 0; i < n; i += N_UNROLL){ acc0 += d_taps[i + 0] * input[i + 0]; acc1 += d_taps[i + 1] * input[i + 1]; acc2 += d_taps[i + 2] * input[i + 2]; acc3 += d_taps[i + 3] * input[i + 3]; } for (int j = 0; j < d_len; j++) {if (j != 0)d_pn= 2.0*d_reference->next_bit()-1.0; sum += *in++ * d_pn;} for (i=0; i < d_ff_taps.size(); i++) acc += conj(d_ff_delayline[(i+d_ff_index) & ff_mask]) * d_ff_taps[i]; Semantic Signal Processing Stevens
CL Modeling • Intermediate Representation: • AST (Programming Domain) • CL Modeling (Signal Processing Domain) k = N – i; Semantic Signal Processing Stevens
CL Modeling • Rewrite and mapping the structure and tokens from the AST to CL Modeling Tree. k = N – i; Semantic Signal Processing Stevens
CL Modeling Result • In order to test our idea, I designed a CL Modeling demo based on AST.* • One tree rewriter will translate and modify the current AST to CL Modeling Tree. • Based on the CL Modeling Tree, print the CL Modeling XML file. https://sites.google.com/site/stevensxingzhong/home/clmb *Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010. Semantic Signal Processing Stevens
Summary & Future Work • The programming domain AST is a key interface for language application, in SSP project: • Code Recognition: Determine the functionality of the code segment. • Cognitive Linguistic Modeling: As an intermediate form to modeling the radio code. • Future Work: • Cover more code, C++, Matlab, VHDL etc. • Discover more computational and radio primitive. • Fully support CL Modeling. Semantic Signal Processing Stevens
Reference • Jiang L. and Su, Z. 2009. Automatic Mining of Functionally equivalent code fragments via random testing. In Proceedings of the Eighteenth international Symposium on Software Testing and Analysis. • Gabel, M., Jiang, L., and Su, Z. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international Conference on Software Engineering. • C.K. Roy, J.R. Cordy and R. Koschke B. 2009. Comparison and Evaluation of code Clone Detection Techniques and Tools: A Qualitative Approach. Science of Computer Programming. • Bertran, M., Babot, F., and Climent, A. 2005. An Input/Output Semantics for Distributed Program Equivalence Reasoning. Electron. Notes Theor. Comput. Sci. 137,1 (Jul.2005) • Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007 • Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010. Semantic Signal Processing Stevens