340 likes | 507 Views
YACC no more. Integrating parsers, interpreters and compilers into your application. Sriram Srinivasan (“Ram”). This is he. Sriram Srinivasan One of the core engineers of the WebLogic app server Wrote the first commercially available EJB implementation Wrote the TP engine in the WLS
E N D
YACC no more Integrating parsers, interpreters and compilers into your application Sriram Srinivasan (“Ram”) Session # 2221
This is he • Sriram Srinivasan • One of the core engineers of the WebLogic app server • Wrote the first commercially available EJB implementation • Wrote the TP engine in the WLS • Author: “Advanced Perl Programming” (O’reilly) Session # 2221 Beginning
Why this talk? • Quest for higher level programming patterns • More productive / faster / maintainable etc… • Integrating compilers, parsers, interpreters into your application Session # 2221 Beginning
Embeddable Parsers Case Study: Configuration Data • JDK parsers for configuration data • java.util.Properties, XML, regex library • java.util.Properties • Limited to “property = value” format • Takes care of comments, multi-line values, quotes #app server properties connectionPoolName = testPool numThreads = 10 … p = new Properties().load(inputStream) Session # 2221 Middle
XML parsers • Good for structured, hierarchical data • DOM (Document Object Model) parser • Converts an entire XML document into a corresponding tree of Nodes. • SAX (Simple API for XML) • Callback class extends DefaultHandler • Supplies methods for startDocument(…), startElement(…), endElement(…) etc. Session # 2221 Middle
Adding code to data • Problem: We want to add add macros and expressions to our properties. numThreads = numProcessors # Ensure that connection pool is smaller than # thread pool. connectionPoolSize = min(numThreads – 2, 1) • This requires an expression evaluator Session # 2221 Middle
Embeddable interpreters • Plethora of free, high quality interpreters available • BeanShell (Java-like syntax) • Rhino (JavaScript) • Jython (Python in Java) • Kawa (Scheme in Java) • When embedded, flow of control easily goes from java to interpreter to back. • Command-line shell always included Session # 2221 Middle
BeanShell • Expressions identical to java • Types are inferred dynamically add( a, b ) { return a + b; } sum = add(1, 2); // 3 str = add("Web", "Logic"); // "WebLogic" Session # 2221 Middle
Embedding BeanShell • Instead of writing code to parse the properties file, just eval it! • Comments should be “// … ”, not “# … • Each property definition line should end in “;” import bsh.Interpreter; Interpreter i = new Interpreter(); i.set("foo", 5); i.eval("bar = foo*10"); System.out.println("bar = "+ i.get("bar")); i.eval(new FileReader("config.properties")); Integer n = i.get("connectionPoolSize"); Session # 2221 Middle
BeanShell features • Strict java expression syntax • no class declarations • Loose convenience syntax b = new java.awt.Button(); b.label = "Yo" // eqvt. to b.setLabel("Yo") h = new Hashtable(); h{"spud"} = "potato"; // Swing stuff b = new JButton("My Button"); f = new JFrame("My Frame"); f.getContentPane().add(b, "Center"); f.pack(); f.show(); Session # 2221 Middle
Rhino • Free ECMAScript interpreter from Mozilla • Slightly more cumbersome to embed than BeanShell • Contains bytecode compiler that can be called from within java • Closures • Regex support built-in. Good for text manipulation Session # 2221 Middle
Case study: Command pattern • Undo/Redo in an editor function insertCommand(text) { this.pos = buf.pos buf.insert(text) this.len = text.length this.undo = function () { buf.moveTo(this.pos) buf.erase(this.len); } undoStack.push(this); } new insertCommand("foo") undoStack.pop().undo() Session # 2221 Middle
Python • Python (Java implementation is "Jython") • powerful high-level language • Compiles to bytecode. • True scripting language • Can extend java classes • Static compilation and standalone execution Session # 2221 Middle
More case studies • Embedded expressions • Spreadsheet formulae • Customizable GUIs • Macro facility, keyboard mapping • Remote agents • Monitoring • Performance through partial evaluation Session # 2221 Middle
Case Study: Remote Agents • Example: Test Agents • Can upload script to each agent to launch processes, control them locally. • Jython is well-suited for this kind of task • Example: Scriptable IMAP mail server • "All messages that contain this regex, make a copy in this folder" Session # 2221 Middle
Case Study: Monitoring • SNMP model: Obtain attributes from each node over the network, do calculation • Alternatively, upload script to each node, and let it return the result • Conserves network bandwidth • Can insert any kind of probe • Study application data structures • Application-specific profiling Session # 2221 Middle
Case Study: Performance • Partial evaluation can yield substantial performance benefits • Object - RDBMS adaptors • Code generator studies class and db schema • Omits unnecessary conversions, null checks • Vector dot product dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2]; // But if 'a' is fixed {16,0,4} … dp = b[0] << 4 + b[2] << 2 Session # 2221 Middle
Generating java • Moving from embedded interpreters to generating java source • Example: JSP. • Convert template to java, compile and dynamically load • BEA/WebLogic's weblogic.dtdc • Converts XML DTD to a high performance SAX parser tuned to that DTD Session # 2221 Middle
Generating code with Doclets • javadoc is a general purpose parser javadoc –doclet ListClass foo.java • ListClass.start() called with a hierarchy of *Doc nodes import com.sun.javadoc.*; public class ListClass { public static boolean start(RootDoc root) { ClassDoc[] classes = root.classes(); for (int i = 0; i < classes.length; ++i) { System.out.println(classes[i]); } return true; } • Arbitrary tags can be introduced at any level Session # 2221 Middle
Case study: iContract • Pattern: doclet expressions converted to annotated java code /** * Ensure that argument is always > 0* @pre f >= 0.0** Ensure that the function produces the sqrt * within a * @post Math.abs((return * return) - f) < 0.001*/ public float sqrt(float f) { ... } Session # 2221 Middle
Case Study: EJBGen /** * @ejbgen:entity * ejb-name = AccountEJB-OneToMany * data-source-name = demoPool * table-name = Accounts */ abstract public class AccountBean implements EntityBean { /** * @ejbgen:cmp-field column = acct_id * @ejbgen:primkey-field * @ejbgen:remote-method transaction-attribute = Required */ abstract public String getAccountId(); Session # 2221 Middle
Generating bytecode • Example: WebLogic RMI adaptors • Sometimes, some facilities are available only in bytecode (goto's!) • Example: fast string matching • Given a search string, encode the state machine into bytecode • Worth it if the same pattern is going to be used many times • Virus scanners • Searching genome sequences Session # 2221 Middle
0 1 0 1 S0 S1 S2 S3 S4 S5 1 0 1 0 0 1 Example: String matching • Problem: match "10100" • Convert to a state machine • Each state encodes a succesful prefix match Session # 2221 Middle
String matching (contd.) • If only goto were allowed in java … • But, goto's are allowed in bytecode! try { //buf is the buffer to be searched int i = -1; s0: i++; if (buf[i] != '1') goto s0; s1: i++; if (buf[i] != '0') goto s1; s2: i++; if (buf[i] != '1') goto s0; s3: i++; if (buf[i] != '0') goto s1; s4: i++; if (buf[i] != '0') goto s3; s5: i++; return i-5; } catch (ArrayIndexOutOfBoundsException e) { return -1; } Session # 2221 Middle
String matching (contd.) • Using an assembler like jasmin iconst_m1 istore_1 S0: ;; i++; if a[i] != '1' goto S0; iinc 1 1 ; i++ aload_0 ; load a[i] iload_1 caload bipush 49 ; load '1' if_icmpne S0 ; if .. goto S0 S1: ;; i++; if a[i] != '0' goto S1 iinc 1 1 aload_0 iload_1 caload bipush 48 if_icmpne S1 Session # 2221 Middle
Custom languages • Craft a language that fits the context you are working in • Avoid XML ugliness: SRML (Simple Rule Markup) • Instead of "if s.purchaseAmount > 100 …" <simpleCondition className="ShoppingCart" objectVariable="s"> <binaryExp operator="gt"> <field name="purchaseAmount"/> <constant type="float" value="100"/> </binaryExp> </simpleCondition> Session # 2221 Middle
Antlr Introduction • Antlr: A recursive descent parser with configurable lookahead (LL(k) parser) • Much, much simpler than lex/yacc • Yacc error messages are cryptic, tough for non-CS types to understand • Even generated code easy to understand • Includes tree building and recognition • No such facility in yacc • Lexer, parser and tree recognizer phase have similar syntax Session # 2221 Middle
Antlr • Example: hierarchical property list • A list consists of name value pairs • Names are identifiers, values are numbers or lists ( a 200 b (c 10 d 20) ) Session # 2221 Middle
Antlr (contd.) class LispLexer extends Lexer; ID : ('a' .. 'z')+; NUM: ('0' .. '9')+; LP : '('; RP : ')'; class LispParser extends Parser; list : LP (nameValuePair)+ RP; nameValuePair : ID value ; value : NUM | list; Session # 2221 Middle
Antlr (contd.) • Adding code, arguments, return values nameValuePair returns [NVP ret=null] {Object v;} : t:ID v=value {ret = new NVP(t.getText(),v);} ; value returns [Object ret=null] : t:NUM {ret=t.getText();} | ret=list ; Session # 2221 Middle
Way out there … • Configurable hardware • New circuits on the fly • Intentional programming • Code not represented as a stream of characters Session # 2221 Middle
Summary • Run-time evaluation gives you a lot of power • Other languages add features (e.g. closures) to java • Lots of simple, free, quality parsers, interpreters • Produce custom java source or byte code for performance • Roll your own domain-specific language with ANTLR or javacc. • Yacc No More. Session # 2221 End
References • Doclets • Doclet tools: www.doclet.com • EJBGen: www.beust.com, Cedric Beust • Icontract: www.reliable-systems.com, Reto Kramer • Languages, interpreters • Beanshell: www.beanshell.org • Rhino: www.mozilla.org/rhino • Python: www.python.org, www.jython.org • ANTLR: www.antlr.org • More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html • SRML: xml.coverpages.org/srml.html Session # 2221 End
References (contd.) • Bytecode manipulation: • Jasmin: mrl.nyu.edu/~meyer/jasmin/ • Jikes Bytecode toolkit: www.alphaworks.ibm.com/tech/jikesbt • BCEL: bcel.sourceforge.net • "Rapid" - Reconfigurable hardware • www.cs.washington.edu/research • "The death of computer languages, the birth of intentional programming", Charles Simonyi • research.microsoft.com/scripts/pubs/trpub.asp • Microsoft tech report MSR-TR-95-52 • Thinking in Patterns with Java, Bruce Eckel • www.mindview.net/Books/TIPatterns Session # 2221 End