100 likes | 110 Views
Programming Language Abstractions for Semi-Structured Data Martin Odersky Sebastian Maneth Burak Emir EPFL. Scala and XML. The project studies language constructs and implementation techniques for processing XML data in a general purpose programming language.
E N D
Programming Language Abstractions for Semi-Structured DataMartin OderskySebastian ManethBurak Emir EPFL Martin Odersky, LAMP, EPFL
Scala and XML • The project studies language constructs and implementation techniques for processing XML data in a general purpose programming language. • It s based on the recently released Scala programming language (scala.epfl.ch) • Scala unifies functional and object-oriented programming. • Both idioms have a lot to offer. • New applications will require a combination of the two. Martin Odersky, LAMP, EPFL
Example 1: Distributed programming and web services: Immutable data are essential for achieving robustness and efficiency of applications in the face of replication and partial failure. Example 2: XML processing: Conciseness and safety helped by • pattern matching over trees • regular expression patterns and types • tree transformer combinators (Design principle: fusion instead of agglutination). Martin Odersky, LAMP, EPFL
Design Aim You should not have the impression that you are programming either functionally or object-oriented. Three examples how this is achieved: • Modules are objects. • Pattern matching over class hierarchies. • XML Processing Martin Odersky, LAMP, EPFL
1. Modules are Objects Traditional modules and objects have complementary strengths: • Modules are good at abstraction: e.g. abstract types in signatures. • Objects are good at composition: e.g. inheritance, recursion, dynamic composability because objects are first-class. Idea: Identify Object = Module Interface = Signature Class = Functor Consequence: Objects and interfaces need to contain type members. Furthermore, type members can be either abstract or concrete. (Papers on this at FOOL10, ECOOP2003) Martin Odersky, LAMP, EPFL
2. Pattern Matching over Class Hierarchies • How are data decomposed? OO-approach: Through virtual member access. Functional approach: Through pattern matching over algebraic data types. • Complementary wrt extensibilty: OO: Easy to add new kinds of data with fixed method interface. Functional: Easy to add new kinds of processors over fixed data type. • How can we get extensibility in both directions? Martin Odersky, LAMP, EPFL
Case Classes and Pattern Matching Idea: Allow Pattern Matching over constructors of classes in a class hierarchy. • trait Base {trait Exp;caseclass Num(x: int) extends Exp;def eval(e: Exp): int = e match { case Num(x) => x }} • trait BasePlus extends Base { • caseclass Plus(l: Exp, r: Exp) extends Exp; • def eval(e: Exp): int = e match { • case Plus(l, r) => eval(l) + eval(r) • case _ => super.eval(e) }} • Full code-reuse possible; easy to set up. • Static type-safety can be achieved by refining this pattern(see FOOL 11) Martin Odersky, LAMP, EPFL
3. Representing XML Documents On an abstract level, an XML documents is simply a tree. • We use and extend standard software to convert between external documents and trees. • Trees are pure data; no methods are attached to nodes. • Trees should reflect the application domain, rather than being a generic “one size fits all” such as DOM. • This means we want different kind of data types for different kinds of tree nodes. BookList Header Book* Publisher Date Title Author* Abstract Keyword* Martin Odersky, LAMP, EPFL
Parsing XML Trees in Java • How can trees be decomposed? • In an object-oriented language: • Type test and type casts – ugly and inefficient. • if (node instanceof Header) { Header header = (Header)node; Publisher pub = (Publisher)header; • } else if (node instanceof Book) ... • Visitors – heavyweight, hard to extend. • node.visit(new Visitor() { • void visitHeader() { ... } • void visit Book() { ... } • } • In a functional language: • Recursive pattern match over trees. • Problem again: extensibility. Martin Odersky, LAMP, EPFL
Parsing XML Trees in Scala • In Scala, we can represent XML data as instances of case classes and use pattern matching to access their elements. E.g: • entry match { • case Header(pub, date) => … • case Book(title, info) => …} • In general, we need to match in sequences of XML trees. • This is done by extending Scala’s pattern matching to regular expressions. Martin Odersky, LAMP, EPFL