140 likes | 230 Views
TweaXML. A Language to manipulate & extract data from XML files. Kaushal Kumar (kk2457) Srinivasa Valluripalli (sv2232). Contents. Overview and motivation Language features XML handling functionalities Architectural Design Tutorial (with example) Lessons learned Summary.
E N D
TweaXML A Language to manipulate & extract data from XML files Kaushal Kumar (kk2457) Srinivasa Valluripalli (sv2232)
Contents • Overview and motivation • Language features • XML handling functionalities • Architectural Design • Tutorial (with example) • Lessons learned • Summary
Overview and Motivation • TweaXML is a language to parse and extract data from XML files and create new csv/txt files in user defined data-formats. • XML is a universal language and is used to pass data around between heterogeneous systems. • (But) Parsing an XML file programmatically is not straightforward. • To parse an XML file: • First you need to learn Java (for example) • Then learn APIs like DOM-Parser and SAX-Parser. • These API-usage can be too complicated. • TweaXML provides a much simpler language to parse XML files. Moreover, it provides a way to create output files containing this data in user-defined formats.
Language Features • Carefully chosen set of keywords • Multiple Types (int, string, node, file, array) • Several Operators • Unary Operators (~, !) • Arithmetic Operators (+, -, *, /) • Comparison (<, <=, >, >=, ==, !=) • Logical Operators (&&, ||) • node operators (getchild, getvalue) • file operators (open, create, print, close) • inbuilt functions (add, subtract, multiply, divide, length)
Language Features (cont) • various types of statements • Conditional statements (if … else) • Iterative statements (while) • jump statements (return, continue, break) • I/O statements (open, create, print, close) • inbuilt function calls (add, subtract, multiply, divide, length)
XML Handling functionalities • Open an XML file to read (open) • returns the root node of the xml file • Get the child nodes of a node, using the xpath of the child-nodes (getchild) • returns an array of child-nodes • Get the length of the child nodes array (length) • Get the value of a node (getvalue) • returns the value of the node in string format • add the values of two nodes (add) • implicit checks of data types • subtract the values of two nodes (subtract) • multiply the values of two nodes (multiply) • divide the values of two nodes (divide)
File Handling functionalities • Create an output file to write (create) • returns the file type • Write in the file (print) • close the output file once you are done (close)
Architectural Design Front end (TweaXMLLexer & TweaXMLParser) Tree Walker (TweaXmlWalker & TweaXmlCodeGen) Back End (CodeGen.java) Run time Libraries (Apache’s DOM Parser)
Tutorial - Example (A tweaxml program to extract student’s performance data and create a csv file with the average marks of each student) Input XML file: (marks_data.xml) <students> <student> <name>kaushal</name> <homework1>85</homework1> <homework2>85</homework2> <midterm>70</midterm> <final>90</final> </student> <student> <name>Srini</name> <homework1>80</homework1> <homework2>85</homework2> <midterm>87</midterm> <final>95</final> </student> … … </students>
Tweaxml program: start(){ file output; node rootNode; output = create "AvgMarks.csv"; rootNode = open "marks_data.xml"; node studentNodes[]; studentNodes = getchild rootNode "student"; int len; len = length studentNodes; if(len > 0) { int j; j=0; while(j < len) { node nameNode[], homework1Node[], homework2Node[], midtermNode[], finalNode[]; string name, homework1Marks, homework2Marks, midtermMarks, finalMarks; nameNode = getchild studentNodes[j] "name"; homework1Node = getchild studentNodes[j] "homework1"; homework2Node = getchild studentNodes[j] "homework2"; midtermNode = getchild studentNodes[j] "midterm"; finalNode = getchild studentNodes[j] "final";
name = getvalue nameNode[0]; homework1Marks = getvalue homework1Node[0]; homework2Marks = getvalue homework2Node[0]; midtermMarks = getvalue midtermNode[0]; finalMarks = getvalue finalNode[0]; string totalMarks; totalMarks = add homework1Marks homework2Marks; totalMarks = add totalMarks midtermMarks; totalMarks = add totalMarks finalMarks; string avgMarks; avgMarks = divide totalMarks "4"; print output name; print output "\t"; print output avgMarks; print output "\n"; j = j + 1; } } close output; }
Output Output file: (AvgMarks.csv) kaushal 82.5 Srini 86.75 … …
Lessons Learned • Start early on the project • More functionalities could have been added • More data types could have been provided • User defined functions could have been added
Summary • TweaXML provides an easier way to deal with xml files. • Data can be extracted and written out in user-defined formats. • No need to learn APIs like DOMParser and SAXParser • It’s not perfect, but it’s highly useful. • More functionalities could have been provided if given more time.