1 / 19

Python@Work 2007-Jul-12 PkManager.py

Python@Work 2007-Jul-12 PkManager.py. Howard Kapustein Director of Technology and Architecture Manhattan Associates. Background. Director of Technology and Archicture Manhattan Associates, 7 years EPCglobal Reader Protocol 1.0, Co-Chairman [RFID]

lorant
Download Presentation

Python@Work 2007-Jul-12 PkManager.py

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Python@Work2007-Jul-12PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates

  2. Background • Director of Technology and Archicture • Manhattan Associates, 7 years • EPCglobal Reader Protocol 1.0, Co-Chairman [RFID] • SMS, Platform Services (Architecture/Subsystems), 12 years • Open Source submitter (Jython, among others) • 20 years experience • Acronym Soup: C++, Java, Visual Basic, Windows, Unix, concurrency, i18n, security, RDBMS, GUI, web server, XML, TCP, web services, REST, AJAX, JSON(!), oodles more • No COBOL, No Perl  • Python since 1997 • Switched from AWK • Thompson AWK compiler created EXEs • Getting 'long in the tooth' • Tried to learn TCL – not so much • Stumbled over Eric Raymond's “Why Python?” essay • Made perfect sense  Google for “why python eric raymond” • Python is Beautiful (even back when it was 1.5) • Rich language, Richer library • Thank god for py2exe  • Pywin32 is pretty handy too

  3. Application • Warehouse Management Open Systems (WMOS) • Large C++, CORBA, portable 'enterprise' application • >8 million lines of code • Borland's Visibroker for C++ • AIX, HP-UX, Linux, Solaris, Windows • 24x7x365 – Near-realtime 'Execution' system • i.e. 1 hour outage = millions of dollars • Heavy RF+MHE interaction • 99% of activity is high volume, low latency • Heavy customization element • Routinely modified for every customer • Each customer = Forked codebase • IOW more variables + post-release • Performance, Scalability, Latency, Reliability, Resiliency • The 'not negotiable' family

  4. Problem • CORBA process: • Server = EXE: initializes, registers available factories with ORB, responds to requests • Client: • Factory*f=bind(“factory”); Object*o=f->newInstance();o->DoStuff(); release o; release f; //aka delete • ORB knows what factories are supposed-to-be and actually-available • Borland: If requested factory not running, ORB asks the Object Activation Demon (OAD) to start it [Just-In-Time Activation] • Problem: OAD stability is abominable • Runs for hours/days, then randomly hangs or crashes for no apparent reason • But JIT support made it popular for non-production (test, dev, …) • Doesn't mean we didn't regularly see support issues due to folks using the OAD  • Homegrown replacements: • PkPad: Unix shell script, pre-start list of processes, polling via ps to determine premature death to restart • Cons: 30 second sleep between sweep (or huge perf hit), no JIT, no management • PkManager.exe: NT Service, multithreaded, interrupt-driven (no polling) • Cons: Windows only (<20% customers), no JIT • Solution: PkManager.py • Superset: JIT + PreStart, interrupt-based (no polling), administration interface • And by-god-rock-solid-reliable!

  5. Basic Architecture • Global Variable: timeToExit = threading.Event() • Thread 1: Main • Initialize (parse command line etc) • Start worker threads • Main loop • while not timeToExit.isSet(): time.sleep(0.1) • Thread 2: Monitor (Process Manager) • while not timeToDie.isSet(): ProcessRequests(); StartChildren(); WaitForDeath()timeToExit.set() • Thread 3: API (Web Server) • JIT requests • Administration Console • Web Services • Thread 4: Uptime (Reporter) • while not timeToDie.isSet(): print 'Uptime: %s since %s' % (now-startup, startup) timeToDie.wait(n)

  6. Configuration (DSL) • Configuration file = Domain Specific Language (DSL) • [[wmosprod.dat]] • Python dictionaries are sweet! • Look ma, it's JSON  • symbols={'N':'order', 'OnStart':'#prestart', 'JIT':'#ondemand', …} config = []lineno = 0for line in open('wmosprod.dat').readline().strip(): lineno += 1 try: entry = eval(line, {}, symbols) config += entry except: print 'Error line %d' % (lineno) errors += 1if errors > 0: raise UserWarning('Uh-oh…') • Users see simple and obvious configuration • Code is maintainable and simple • Mostly to 'nicely' handle and report errors

  7. Signals – Ouch! • TIP: Do this very early • import signalsignals = dir(signal) if 'SIGBREAK' in signals:signal.signal(signal.SIGBREAK, signal.default_int_handler)if 'SIGTERM' in signals:signal.signal(signal.SIGTERM, signal.default_int_handler) • Surprises • #1: SIGBREAK+SIGTERM not always available • #2: Default action is usually terminate • Now except KeybreakException will trip

  8. Threading • All threads use same basic pattern e.g. • process_timeToExit = threading.Event() #Global • class Thread_Monitor(threading.Thread): def __init__(self, other, parms, …): …initialize… def run(self): try: …setup… while not self.timeToDie.isSet(): …do stuff… except KeyboardInterrupt: print 'Ctrl-Break detected; terminating…' except Exception, e: print FormatException() process_timeToExit.set() def stop(self): self.timeToDie.set() threading.Thread.join(self, timeout) • threading.Event is your friend • Global Event to coordinate process termination/cleanup • Per-thread communication • “Thread, Kill Thyself” = Event.set(); “Time to die?” = Event.isSet() • “Thread, Art Thou Dead?” = Thread.join() • Alternative, pair of events: • timeToDie = threading.Event()iAmDead = threading.Event()def KillThyself(): timeToDie.set()def TimeToDie(): timeToDie.isSet()def IAmDead(): iAmDead.set()def AreYouDeadYet(): iAmDead.isSet()

  9. FormatException() • Simplify exception reporting • def __function__(nFramesUp=1): """Create a string naming the function n frames up on the stack.""" co = sys._getframe(nFramesUp+1).f_code return "%s (%s @ %d)" % (co.co_name, co.co_filename, co.co_firstlineno)def FormatException(ei=None): if ei == None: ei = sys.exc_info() info = traceback.format_exception(ei[0], ei[1], ei[2]) return ''.join(info) • Typical usage: • try: DoSomething()except SomeException: print FormatException() • Never catch the exception object, though you can • try: DoSomething()except SomeException, e: print FormatException(e)

  10. KeyboardInterrupt • try block necessary per thread • Raised on the active thread when detected  • Worse, KeyboardInterrupt derives from StandardException • except Exception eats everything • Including KeyboardInterrupt and SystemExit! • Probably not what you wanted… • This coupled with SIGBREAK fun was a bear to figure out • Python 3000 is supposed to 'fix' this • Changing the exception hierarchy! • Should make porting…fun…

  11. Web Server • PkManager predates WSGI's emergence • class PkManagerWebServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer): #1 passclass PkManagerRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): #2 protocol_version = 'HTTP/1.0' #3 server_version = 'PkManagerHTTP/' + __version__ def do_HEAD(self): #4 self.do_GET() def do_POST(self): #4 self.ProcessRequest(self.rfile) def do_Get(self): #4 requestbody = StringIO() requestbody.seek(0) self.ProcessRequest(requestbody) requestbody.close() def ProcessRequest(self, requestbody): …parse url… name = 'Handler_' + path.replace('/', '_') #5 handler = self.__class__.__dict__.get(name) #6 if handler is None: if not self.ServeStaticFile(): #7 self.ProcessResponse(400) #8 return else: result = handler(self) if 'Cache-Control' not in headers:headers['Cache-Control'] = 'private, max-age=0' self.ProcessResponse(statuscode, body, headers) #8 • #n = Item of interest

  12. Request Handlers • PkManagerRequestHandler methods, e.g. • monitor_requests = Queue.Queue() #Global variable • def Handler__process_start(self): #1 parms = SplitToKVPairs(self.rfile) #2 processname = parms.get('exe') if processname is None: return (400, 'Missing parameter (exe=<name>)') timeout = int(parms.get('wait', TimeoutDefault)) iamdone = g_EventCache.get(timeout) #3 request = (self.effective_path, iamdone, processname) monitor_requests.put(request) #4 realtimeout = self.TimeoutMSecToRealValue(timeout) #5 if iamdone != None: iamdone.wait(realtimeout) #6 if not iamdone.isset() #7 return (408, None) g_EventCache.put(iamdone) #8 return (200, None) #9 • #1: Method name = 'Handler_' + URL's path component • #2: Parameters are fundamentally URL query parameters • #5: Timeout = N or Infinite or NoWait • #6: Wait up to the timeout • #7: If timeout, HTTP status = 408 Request Timeout • #9: Success! HTTP status = 200 OK

  13. EventCache • New Event() per request = Huge Perf Pig • Took 3 hours to identify bottleneck • Only 20 minutes to solve! IPython • class EventCache: def __init__(self): self.cache = Queue.Queue() def get(self, timeout): if timeout == Timeout_NoWait: return None try: event = self.cache.get_nowait() event.clear() return event except Queue.Empty: return threading.Event() def put(self, event): self.cache.put(event) def __len__(self): return self.cache.qsize()g_EventCache = EventCache() • Call get(timeout) for a new Event • Call put(event) to return Event to cache when done • Only if done with the Event • If errors occurred (e.g. timeout), don't put() • Python will clean up the Event object once no longer referenced

  14. Queue.Queue • All inter-thread-communication via Event and Queue • Handler creates a tuple to queue • (resource, event, …parameters…) • Output parameters passed as empty list • Iamdone = Event() : name=[] : age=[] : shoesize=[]request = (self.effective_path, iamdone, name, age, shoesize)queue.put(request)iamdone.wait()print name[0], age[0], shoesize[0] • Monitor thread pulls requests from queue • def HandleRequests(): try: while 1: request = queue.get_nowait() path = request[0] name = 'HandleAPIRequest_' + path.replace('/', '_') handler = globals().get(name) : assert handler != None handler(request) except Queue.empty, e: pass • def HandleAPIRequest__some_service_entrypoint(request): name=request[2] : age=request[3] : shoesize=request[4] …do stuff… name.append(…) : age.append(…) : shoesize.append(…) iamdone = request[1] if iamdone != None: iamdone.set() • So effective I ported Queue to C++

  15. Internationalization (i18n) • Initially tried module gettext • Standard. Capable. Simple API. Very similar to GNU gettext API • But…needed simple deployment • “Zero-Install” – anything else is just a support call (or many…) • How to find the message catalogs? • localedir/language/LC_MESSAGES/domain.mo • Create a 3-level tree, with very fixed names, to drop a bunch of localized text resources? • And what about customization? • Bah. Python to the rescue! • [[PkManagerI18N-*.py]] • i18n={} : i18nMeta={}def i18nLoad(path): sys.path.insert(0, path) for root, paths, filenames in os.walk(path) if fnmatch.fnmatch(filename, 'PkManagerI18N-*.py'): name = os.path.splitext(filename)[0] pathname = os.path.join(root, filename) try: module = __import__(name) text = getattr(module, 'Text', None) if text != None: meta = getattr(module, 'Meta', None) for locale in text.iterkeys(): i18n[locale] = text[locale] : i18nMeta = meta[locale] except (ImportError, SyntaxError), e: Abort(5, 'Error loading i18n resource %s' % (filename)) del sys.path[0]

  16. Internationalization (i18n) – Part Deux • Simple format • Text = { 'es': { 'About':'Sobre', 'English' : u'Engl\u00e9s', … } }Meta = { 'es': { 'Name':'Spanish', 'Display' :u'Espa\u00f1ol' } } • But what about complex languages?  Python source files can use arbitrary encodings! • # -*- coding: utf8 -*- Text = { 'zh': { 'About':u'亸乾些亖亃', … }, 'jp': { 'About':u'ノキアについて', … }, 'ar': { 'About':u'عن' } }Meta = { 'zh': { 'Name':'Chinese', 'Display':u'中国 ' }, 'jp': { 'Name':'Japanese', 'Display':u'日本語 ' }, 'ar': { 'Name':'Arabic', 'Display':u'العربية' } } • One neat trick in module gettext _() is defined as ‘lookup-text’. Nifty idea • print _(‘About’) • def _(s, locale=None, language=None): if locale==None: locale=options.locale textlist = i18n.get(locale) if textlist != None: text = textlist.get(s) if text != None: return text if language != None textlist = i18n.get(language) if textlist != None: text = textlist.get(s) if text != None: return text if isint(s): return s else: return ‘[%s]’ % (s)

  17. py2exe • Running PkManager.py is natural on Unix • Not so much on Windows • py2exe binds source + runtime into .exe • # setup.pyfrom distutils.core import setupimport py2exesetup(name='PkManager', version=GetVersion(), description="WMOS process manager, overseer, care and feederer", author='Manhattan Associates', url='http://www.manh.com', console=[{'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}], zipfile=None, #Append to .exe / no separate .zip data_files=[('.', [os.path.abspath(r‘wmosprod.dat')])], options={"py2exe":{"compressed":1, "optimize":2, "xref":0, "includes":[], "dll_excludes":[]}} • Create the executable • python -OO setup.py py2exe • Replace console parameter to compile an NT Service • service=[{'modules':'PkManager', 'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}],

  18. Demo

  19. Questions? • Blog: http://blog.kapustein.com • Email: hkapustein@manh.com

More Related