200 likes | 333 Views
Improving the output capabilities of Stata with Open Document Format xml. Adam Jacobs Dianthus Medical Limited. Stata’s 3-fold capabilities. Statistics Graphics Data management. Statistics. Graphics. Data management. But there is a 4 th . Text output. A recent clinical study:
E N D
Improving the outputcapabilities of Stata withOpen Document Format xml Adam Jacobs Dianthus Medical Limited
Stata’s 3-fold capabilities • Statistics • Graphics • Data management
Text output • A recent clinical study: • 92 pages of raw data listings • 124 pages of descriptive data tabulations • 3 pages of statistical analysis • All from a study in 12 healthy volunteers
Problems with Stata’s text output • No pagination • No formatting (or limited formatting with smcl) • Variable labels not always shown • No Unicode support • No tables of contents • etc etc
Open Document Format • An open standard, approved by ISO • XML based • For a variety of office-type documents • Used by the popular open-source office suite OpenOffice.org • Here, we are just interested in word-processing documents
.odt files • A .odt file is the native file format of OpenOffice.org Writer • A zip file • Contains various files, the most important of which is content.xml • content.xml is simply a plain-text file • Stata is good at writing plain-text files!
The Stata code • Creates the content.xml file by writing data with appropriate xml tags • Added to other files, zipped to .odt file • .odt file can be opened directly with Writer
Basics of XML <company name=“Dianthus Medical Limited”> <employee role=“speaker”> <firstname>Adam</firstname> <lastname>Jacobs</lastname> </employee> <employee role=“delegate”> <firstname>Flavia</firstname> <lastname>White</lastname> </employee> </company>
XML code for start of table <table:table table:style-name="Table42"> <table:table-column table:style-name="TabCol13"/> <table:table-column table:style-name="TabCol9"/> <table:table-column table:style-name="TabCol8"/> <table:table-column table:style-name="TabCol8"/>
XML code for table cells <table:table-cell table:style-name="cell1211"> <text:p text:style-name="Table_20_Contents"> Mileage (mpg)</text:p> </table:table-cell> <table:table-cell table:style-name="cell1111"> <text:p text:style-name="Table_20_Contents">N</text:p> </table:table-cell> <table:table-cell table:style-name="cell1111"> <text:p text:style-name= "Table_20_ContentsNumeric"> 52<text:s text:c="3"/></text:p> </table:table-cell>
Was this a lot of work? • 123 kB of code • 21 ado files • 45 Mata functions • And not finished yet!