200 likes | 213 Views
Learn about streamlining data processing with Python applications, front end translators, content inserters, and post-processing formatters to increase efficiency and adapt to evolving requirements. Discover the benefits of integrating open standards and Python wrappers for enhanced usability.
E N D
Documentation Costs Avoided using Python and other Open Standards Andrew Jonathan Fine Operating Systems Software Organization Engines, Systems, and Services Honeywell International
Original Core Data Flow Single Python application • set of front end translators • content inserter • post-processing formatter
Front End Translator • Selected by caller • Caller specifies input file containing corporate data • Extracts components from file • Pictures • Tables • Paragraphs • Saves to Python dictionary
Inserter • Caller selects components from Python dictionaries made by front-ends for respective documents. • Inserter creates a Word document • Inserter uses Python/Com to insert components into document
Back End Formatter • Scans corporate Word document template • Scans Word document made by inserter • Makes final style corrections.
Why? The flow was designed to cope with changes in requirements! • New projects • New teams • New data source formats • New standards for existing formats
First front-end translator Take pictures, tables, and data from a recursive property list constructed by an aerospace industry software visual programming tool called BEACON. (… actual design of translator outside the scope of this paper…)
Initial Design of Inserter • Straightforward use of principles demonstrated by Mark Hammond's book, Python Programming in Win32. • Chapter containing a thorough treatment of how to have Python use the Word 97 COM object model to create and manipulate a Word Document.
Problems!!! • Must cope with huge amounts of corporate data such as table cells.. • Speed of COM interface for new individual elements. • Reuse issues for detailed typesetting of elements.
What I wanted: • Faster conversion • Existing standard • Callable from Python What I found: • Faster conversion (OpenJade) • Existing standard (DocBook SGML)
Why Call from Python? • New scripting language to replace islands of automation (Perl, MSDOS, internal test stand controller language). • Easier to connect islands after writing in Python. • Open source thus continuously peer reviewed. • Tremendous user base! Plenty of wrappers written in Python around open source libraries supporting open standards. … so I wrote a Python wrapper around some DocBook rules …
Revised Core Data Flow • Python wrapper writes DocBook SGML • OpenJade translates DocBook SGML to Word RTF
Input to OpenJade as local DocBook SGML <!DOCTYPE informaltable SYSTEM "C:\Local.dtd"> <informaltable frame='all'> <tgroup cols='2' colsep='1' rowsep='1' align='center'> <colspec colname='Name' colwidth='75' align='left'></colspec> <colspec colname='Type' colwidth='64' align='center'></colspec> <thead> <row> <entry><emphasis role='bold'>Name</emphasis></entry> <entry><emphasis role='bold'>Type</emphasis></entry> </row> </thead> <tbody> <row> <entry><phrase role='xe' condition='italic'>statex</phrase></entry> <entry>Integer</entry> </row> <row> <entry><phrase role='xe' condition='italic'>statey</phrase></entry> <entry>Long</entry> </row> </tbody> </tgroup> </informaltable>
from DocBook import DocBook class ItalicIndexPhrase (DocBook.Rules.Phrase): "italic indexible text phrase" TITLE = DocBook.Rules.Phrase def __init__ (self, text): DocBook.Rules.Phrase.__init__ (self, 'xe', 'italic') self.data = [ text ] class NameCell (DocBook.Rules.Entry): "table row cell describing name of identifier (italic and indexible text!)" TITLE = DocBook.Rules.Entry def __init__ (self, text): DocBook.Rules.Entry.__init__ (self) self.data = [ ItalicIndexPhrase (text) ] class StorageCell (DocBook.Rules.Entry): "table row cell describing storage type of identifier (ordinary text)" TITLE = DocBook.Rules.Entry def __init__ (self, text): DocBook.Rules.Entry.__init__ (self) self.data = text class TRow (DocBook.Rules.Row): "each row in application's informal table body" TITLE = DocBook.Rules.Row def __init__ (self, binding): (identifier, storage) = binding DocBook.Rules.Row.__init__ (self, [ NameCell (identifier), StorageCell (storage) ]) class TBody (DocBook.Rules.TBody): "application's informal table body" TITLE = DocBook.Rules.TBody def __init__ (self, items): DocBook.Rules.TBody.__init__ (self, map (TRow, items)) class TGroup (DocBook.Rules.TGroup): "application's informal table group" COLSPECS = [ DocBook.Rules.ColSpec ('Name', 75, 'left'), DocBook.Rules.ColSpec ('Type', 64, 'center') ] SHAPE = [ '2', '1', '1', 'center' ] TBODY = TBody class InformalTable (DocBook.Rules.InformalTable): "application's informal table" TGROUP = TGroup class Example (DocBook): 'example application of DocBook formatting class' SECTION = str (InformalTable) def __call__ (self): self.data = [ InformalTable ()(self.data) ] return DocBook.__call__ (self) if __name__ == '__main__': print Example ([('statex', 'Integer'), ('statey', 'Long')]) () Python code to translate data into OpenJade input in local DocBook SGML (based on Python to DocBook sample wrapper class DocBook)
Using class DocBook • class DocBook from DocBook.py in Appendix F is the top-level interface callable class • Application inherits from class DocBook • Contents of application inherit from classes contained by DocBook.Rules • Use overrides to specify structure, formatting, and text.
OpenJade • OpenJade is an open source DSSSL execution engine available from SourceForge. • DSSSL is an ISO standard for typesetting specification and document conversion. • OpenJade reads DocBook DSSSL stylesheets and our local DSSSL stylesheets if any. • The DSSSL is executed by OpenJade upon SGML source text to write a final document for later loading into a word processor.
DocBook Post-Processing using Word Automation with Python/COM • DocBook/OpenJade emits RTF with different Word document style identifier names than in corporate Word DOT file. • Much faster to change document using Python/COM than to create document! • Cannibalized Python code from inserter first draft to create post-processor. • Reads RTF, changes, saves as final DOC.
Return on Investment 5 projects ranging from 30 BEACON files to 150, average about 75 files Each project has 2 releases per year where each file must generate hard copy. Previously (cut/paste by hand): Each project release: 1/5 * 75 * 4 hours = 60 hours 3/5 * 75 * 8 hours = 360 hours 1/5 * 75 * 16 hours = 240 hours ----- 660 hours Two releases per year: * 2 = 1,320 hours Five projects needing releases: * 5 = 6,600 hours Two year period (2002-2003) * 2 = 13,200 hours ------ Total effort avoided: 13,200 hours Automated: Automated releases over 2 year period: 160 hours My effort (12 * 140 hours per labor month): 1 680 hours Total investment: 1 840 hours Net effort avoided, 2002-3: 11 360 hours Net avoided by customers 2002-3 at $100/hour: 1 136 000 dollars Net labor years avoided 2002-3 at 1680 hours/year: 6.76 years Headcount avoided per year: 3.38 people ROI (Total effort avoided / total invested) 2002-3:7.17
Python and DocBook together • Python connects our department’s engineering specific islands of automation. • Python with DocBook created Word documents from engineering data. • The combination of an open language with an open standard eliminated a real-world business process bottleneck. • The return on investment was substantial.