200 likes | 331 Views
Python@Work 2007-Jul-12 PkManager.py. Howard Kapustein Director of Technology and Architecture Manhattan Associates. Background. Director of Technology and Archicture Manhattan Associates, 7 years EPCglobal Reader Protocol 1.0, Co-Chairman [RFID]
E N D
Python@Work2007-Jul-12PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates
Background • Director of Technology and Archicture • Manhattan Associates, 7 years • EPCglobal Reader Protocol 1.0, Co-Chairman [RFID] • SMS, Platform Services (Architecture/Subsystems), 12 years • Open Source submitter (Jython, among others) • 20 years experience • Acronym Soup: C++, Java, Visual Basic, Windows, Unix, concurrency, i18n, security, RDBMS, GUI, web server, XML, TCP, web services, REST, AJAX, JSON(!), oodles more • No COBOL, No Perl • Python since 1997 • Switched from AWK • Thompson AWK compiler created EXEs • Getting 'long in the tooth' • Tried to learn TCL – not so much • Stumbled over Eric Raymond's “Why Python?” essay • Made perfect sense Google for “why python eric raymond” • Python is Beautiful (even back when it was 1.5) • Rich language, Richer library • Thank god for py2exe • Pywin32 is pretty handy too
Application • Warehouse Management Open Systems (WMOS) • Large C++, CORBA, portable 'enterprise' application • >8 million lines of code • Borland's Visibroker for C++ • AIX, HP-UX, Linux, Solaris, Windows • 24x7x365 – Near-realtime 'Execution' system • i.e. 1 hour outage = millions of dollars • Heavy RF+MHE interaction • 99% of activity is high volume, low latency • Heavy customization element • Routinely modified for every customer • Each customer = Forked codebase • IOW more variables + post-release • Performance, Scalability, Latency, Reliability, Resiliency • The 'not negotiable' family
Problem • CORBA process: • Server = EXE: initializes, registers available factories with ORB, responds to requests • Client: • Factory*f=bind(“factory”); Object*o=f->newInstance();o->DoStuff(); release o; release f; //aka delete • ORB knows what factories are supposed-to-be and actually-available • Borland: If requested factory not running, ORB asks the Object Activation Demon (OAD) to start it [Just-In-Time Activation] • Problem: OAD stability is abominable • Runs for hours/days, then randomly hangs or crashes for no apparent reason • But JIT support made it popular for non-production (test, dev, …) • Doesn't mean we didn't regularly see support issues due to folks using the OAD • Homegrown replacements: • PkPad: Unix shell script, pre-start list of processes, polling via ps to determine premature death to restart • Cons: 30 second sleep between sweep (or huge perf hit), no JIT, no management • PkManager.exe: NT Service, multithreaded, interrupt-driven (no polling) • Cons: Windows only (<20% customers), no JIT • Solution: PkManager.py • Superset: JIT + PreStart, interrupt-based (no polling), administration interface • And by-god-rock-solid-reliable!
Basic Architecture • Global Variable: timeToExit = threading.Event() • Thread 1: Main • Initialize (parse command line etc) • Start worker threads • Main loop • while not timeToExit.isSet(): time.sleep(0.1) • Thread 2: Monitor (Process Manager) • while not timeToDie.isSet(): ProcessRequests(); StartChildren(); WaitForDeath()timeToExit.set() • Thread 3: API (Web Server) • JIT requests • Administration Console • Web Services • Thread 4: Uptime (Reporter) • while not timeToDie.isSet(): print 'Uptime: %s since %s' % (now-startup, startup) timeToDie.wait(n)
Configuration (DSL) • Configuration file = Domain Specific Language (DSL) • [[wmosprod.dat]] • Python dictionaries are sweet! • Look ma, it's JSON • symbols={'N':'order', 'OnStart':'#prestart', 'JIT':'#ondemand', …} config = []lineno = 0for line in open('wmosprod.dat').readline().strip(): lineno += 1 try: entry = eval(line, {}, symbols) config += entry except: print 'Error line %d' % (lineno) errors += 1if errors > 0: raise UserWarning('Uh-oh…') • Users see simple and obvious configuration • Code is maintainable and simple • Mostly to 'nicely' handle and report errors
Signals – Ouch! • TIP: Do this very early • import signalsignals = dir(signal) if 'SIGBREAK' in signals:signal.signal(signal.SIGBREAK, signal.default_int_handler)if 'SIGTERM' in signals:signal.signal(signal.SIGTERM, signal.default_int_handler) • Surprises • #1: SIGBREAK+SIGTERM not always available • #2: Default action is usually terminate • Now except KeybreakException will trip
Threading • All threads use same basic pattern e.g. • process_timeToExit = threading.Event() #Global • class Thread_Monitor(threading.Thread): def __init__(self, other, parms, …): …initialize… def run(self): try: …setup… while not self.timeToDie.isSet(): …do stuff… except KeyboardInterrupt: print 'Ctrl-Break detected; terminating…' except Exception, e: print FormatException() process_timeToExit.set() def stop(self): self.timeToDie.set() threading.Thread.join(self, timeout) • threading.Event is your friend • Global Event to coordinate process termination/cleanup • Per-thread communication • “Thread, Kill Thyself” = Event.set(); “Time to die?” = Event.isSet() • “Thread, Art Thou Dead?” = Thread.join() • Alternative, pair of events: • timeToDie = threading.Event()iAmDead = threading.Event()def KillThyself(): timeToDie.set()def TimeToDie(): timeToDie.isSet()def IAmDead(): iAmDead.set()def AreYouDeadYet(): iAmDead.isSet()
FormatException() • Simplify exception reporting • def __function__(nFramesUp=1): """Create a string naming the function n frames up on the stack.""" co = sys._getframe(nFramesUp+1).f_code return "%s (%s @ %d)" % (co.co_name, co.co_filename, co.co_firstlineno)def FormatException(ei=None): if ei == None: ei = sys.exc_info() info = traceback.format_exception(ei[0], ei[1], ei[2]) return ''.join(info) • Typical usage: • try: DoSomething()except SomeException: print FormatException() • Never catch the exception object, though you can • try: DoSomething()except SomeException, e: print FormatException(e)
KeyboardInterrupt • try block necessary per thread • Raised on the active thread when detected • Worse, KeyboardInterrupt derives from StandardException • except Exception eats everything • Including KeyboardInterrupt and SystemExit! • Probably not what you wanted… • This coupled with SIGBREAK fun was a bear to figure out • Python 3000 is supposed to 'fix' this • Changing the exception hierarchy! • Should make porting…fun…
Web Server • PkManager predates WSGI's emergence • class PkManagerWebServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer): #1 passclass PkManagerRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): #2 protocol_version = 'HTTP/1.0' #3 server_version = 'PkManagerHTTP/' + __version__ def do_HEAD(self): #4 self.do_GET() def do_POST(self): #4 self.ProcessRequest(self.rfile) def do_Get(self): #4 requestbody = StringIO() requestbody.seek(0) self.ProcessRequest(requestbody) requestbody.close() def ProcessRequest(self, requestbody): …parse url… name = 'Handler_' + path.replace('/', '_') #5 handler = self.__class__.__dict__.get(name) #6 if handler is None: if not self.ServeStaticFile(): #7 self.ProcessResponse(400) #8 return else: result = handler(self) if 'Cache-Control' not in headers:headers['Cache-Control'] = 'private, max-age=0' self.ProcessResponse(statuscode, body, headers) #8 • #n = Item of interest
Request Handlers • PkManagerRequestHandler methods, e.g. • monitor_requests = Queue.Queue() #Global variable • def Handler__process_start(self): #1 parms = SplitToKVPairs(self.rfile) #2 processname = parms.get('exe') if processname is None: return (400, 'Missing parameter (exe=<name>)') timeout = int(parms.get('wait', TimeoutDefault)) iamdone = g_EventCache.get(timeout) #3 request = (self.effective_path, iamdone, processname) monitor_requests.put(request) #4 realtimeout = self.TimeoutMSecToRealValue(timeout) #5 if iamdone != None: iamdone.wait(realtimeout) #6 if not iamdone.isset() #7 return (408, None) g_EventCache.put(iamdone) #8 return (200, None) #9 • #1: Method name = 'Handler_' + URL's path component • #2: Parameters are fundamentally URL query parameters • #5: Timeout = N or Infinite or NoWait • #6: Wait up to the timeout • #7: If timeout, HTTP status = 408 Request Timeout • #9: Success! HTTP status = 200 OK
EventCache • New Event() per request = Huge Perf Pig • Took 3 hours to identify bottleneck • Only 20 minutes to solve! IPython • class EventCache: def __init__(self): self.cache = Queue.Queue() def get(self, timeout): if timeout == Timeout_NoWait: return None try: event = self.cache.get_nowait() event.clear() return event except Queue.Empty: return threading.Event() def put(self, event): self.cache.put(event) def __len__(self): return self.cache.qsize()g_EventCache = EventCache() • Call get(timeout) for a new Event • Call put(event) to return Event to cache when done • Only if done with the Event • If errors occurred (e.g. timeout), don't put() • Python will clean up the Event object once no longer referenced
Queue.Queue • All inter-thread-communication via Event and Queue • Handler creates a tuple to queue • (resource, event, …parameters…) • Output parameters passed as empty list • Iamdone = Event() : name=[] : age=[] : shoesize=[]request = (self.effective_path, iamdone, name, age, shoesize)queue.put(request)iamdone.wait()print name[0], age[0], shoesize[0] • Monitor thread pulls requests from queue • def HandleRequests(): try: while 1: request = queue.get_nowait() path = request[0] name = 'HandleAPIRequest_' + path.replace('/', '_') handler = globals().get(name) : assert handler != None handler(request) except Queue.empty, e: pass • def HandleAPIRequest__some_service_entrypoint(request): name=request[2] : age=request[3] : shoesize=request[4] …do stuff… name.append(…) : age.append(…) : shoesize.append(…) iamdone = request[1] if iamdone != None: iamdone.set() • So effective I ported Queue to C++
Internationalization (i18n) • Initially tried module gettext • Standard. Capable. Simple API. Very similar to GNU gettext API • But…needed simple deployment • “Zero-Install” – anything else is just a support call (or many…) • How to find the message catalogs? • localedir/language/LC_MESSAGES/domain.mo • Create a 3-level tree, with very fixed names, to drop a bunch of localized text resources? • And what about customization? • Bah. Python to the rescue! • [[PkManagerI18N-*.py]] • i18n={} : i18nMeta={}def i18nLoad(path): sys.path.insert(0, path) for root, paths, filenames in os.walk(path) if fnmatch.fnmatch(filename, 'PkManagerI18N-*.py'): name = os.path.splitext(filename)[0] pathname = os.path.join(root, filename) try: module = __import__(name) text = getattr(module, 'Text', None) if text != None: meta = getattr(module, 'Meta', None) for locale in text.iterkeys(): i18n[locale] = text[locale] : i18nMeta = meta[locale] except (ImportError, SyntaxError), e: Abort(5, 'Error loading i18n resource %s' % (filename)) del sys.path[0]
Internationalization (i18n) – Part Deux • Simple format • Text = { 'es': { 'About':'Sobre', 'English' : u'Engl\u00e9s', … } }Meta = { 'es': { 'Name':'Spanish', 'Display' :u'Espa\u00f1ol' } } • But what about complex languages? Python source files can use arbitrary encodings! • # -*- coding: utf8 -*- Text = { 'zh': { 'About':u'亸乾些亖亃', … }, 'jp': { 'About':u'ノキアについて', … }, 'ar': { 'About':u'عن' } }Meta = { 'zh': { 'Name':'Chinese', 'Display':u'中国 ' }, 'jp': { 'Name':'Japanese', 'Display':u'日本語 ' }, 'ar': { 'Name':'Arabic', 'Display':u'العربية' } } • One neat trick in module gettext _() is defined as ‘lookup-text’. Nifty idea • print _(‘About’) • def _(s, locale=None, language=None): if locale==None: locale=options.locale textlist = i18n.get(locale) if textlist != None: text = textlist.get(s) if text != None: return text if language != None textlist = i18n.get(language) if textlist != None: text = textlist.get(s) if text != None: return text if isint(s): return s else: return ‘[%s]’ % (s)
py2exe • Running PkManager.py is natural on Unix • Not so much on Windows • py2exe binds source + runtime into .exe • # setup.pyfrom distutils.core import setupimport py2exesetup(name='PkManager', version=GetVersion(), description="WMOS process manager, overseer, care and feederer", author='Manhattan Associates', url='http://www.manh.com', console=[{'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}], zipfile=None, #Append to .exe / no separate .zip data_files=[('.', [os.path.abspath(r‘wmosprod.dat')])], options={"py2exe":{"compressed":1, "optimize":2, "xref":0, "includes":[], "dll_excludes":[]}} • Create the executable • python -OO setup.py py2exe • Replace console parameter to compile an NT Service • service=[{'modules':'PkManager', 'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}],
Questions? • Blog: http://blog.kapustein.com • Email: hkapustein@manh.com