120 likes | 236 Views
Python, XML, and PythonLabs Fred L. Drake, Jr. fdrake@beopen.com. Outline. Python 1.6 and XML What does Python offer XML users in release 1.6? PythonLabs at BeOpen.com What does the formation of PythonLabs mean for Python?. Python 1.5.* and SGML, XML. sgmllib, htmllib
E N D
Python, XML, and PythonLabs Fred L. Drake, Jr. fdrake@beopen.com BeOpen.com
Outline • Python 1.6 and XML • What does Python offer XML users in release 1.6? • PythonLabs at BeOpen.com • What does the formation of PythonLabs mean for Python? BeOpen.com
Python 1.5.* and SGML, XML • sgmllib, htmllib • Just enough SGML to work with HTML-as-deployed … somewhat. • Dispatcher model usable for small projects (SAX-like). • Does not process any DTD information. • xmllib • Simple XML support for ASCII-only element and attribute names. • Namespace support, but difficult to use. • Shared dispatch model from sgmllib, htmllib, so familiar to existing user base. • Not XML 1.0 compliant. • No Unicode support. BeOpen.com
Python 1.6 and XML • Existing modules remain for backward compatibility • But xmllib is deprecated. • Expat interface is included in standard distributions • Can generate UTF-8 or UTF-16. • Installed by default on Windows. • Add-on package for Linux (RPMs, etc.) – probably installed by default on common distributions. • Requires getting & building Expat separately when building from source. • Jack Jansen, Paul Prescod, Andrew Kuchling. • SAX 2 Interface • Contributed by Lars Marius Garshol. BeOpen.com
PyXML Extension Package • Validating parser • 100% Pure Python by Lars Marius Garshol! • Level 1 DOM • Contributed by FourThought, LLC. • Many convenience modules • Build DOM documents from ESIS streams. • ISO 8601 date format support. • SAX handler classes to dump a nicely indented XML document. • Coordinated by Andrew Kuchling • A product of the XML Special Interest Group at python.org. BeOpen.com
Unicode Support • Python 1.6 includes Unicode support in the core! • In source code: u’abc’ • From data: unicode(’raw data from file’, ’iso-8859-5’) • From file objects: <code sample next slide> • Support for over 60 codecs in the standard library. • Uses UTF-16 to avoid excess memory consumption; no support beyond the basic multilingual plane. • Basic string type is still 8-bit characters • Avoids breaking legacy code. BeOpen.com
Unicode in Files >>> import codecs >>> f = codecs.open('test.utf8', 'w', encoding='utf-8') >>> f.write(u'Marc-Andr\xE9 Lemburg') >>> f.close() >>> open('test.utf8').readline() 'Marc-Andr\303\251 Lemburg' >>> codecs.open('test.utf8', encoding='utf-8').readline() u'Marc-Andr\351 Lemburg' BeOpen.com
Unicode and Regular Expressions • New regular expression matching engine • Supports both Unicode and 8-bit strings. • Matches faster than pcre library used in Python 1.5.*. • Regular expression compiler is 100% Pure Python. • Keeps the Perl-compatible syntax for regular expressions. • Written by Fredrik Lundh of Secret Labs, AB. BeOpen.com
PythonLabs at BeOpen.com BeOpen.com
Who is PythonLabs? • The old crew from CNRI: • Guido van Rossum, the creator of Python • Barry Warsaw, maintainer of JPython, MailMan developer • Fred Drake, Python’s Documentation Tzar • Jeremy Hylton, the pragmatic academician • And a familiar voice from the community: • Tim Peters, the universal expert BeOpen.com
Why? • Core development team will devote full time to Python • Core language development & implementation. • Community building. • Extend our efforts to improve development and deployment tools: • IDLE (Python IDE using Tk) • KDevelop integration? • CPAN/CTAN-like repository for 3rd-party packages? • Improve integration facilities • Database API. • Web-related APIs should support the latest standards. • Better visibility in corporate development shops • Our development efforts will be 100% Open Source • All software will have a license that conforms to the Open Source Definition (www.opensource.org). BeOpen.com
Late Breaking News BeOpen.com