1 / 18

POSH Python Object Sharing

POSH Python Object Sharing. Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes. University of Tromsø, Norway Sponsored by Fast Search & Transfer. Test.py. for x in "TEST PROGRAM": if x not in "FORGET IT": print x,. Test.pyc. 0 SETUP_LOOP

lonna
Download Presentation

POSH Python Object Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POSHPython Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search & Transfer

  2. Test.py for x in "TEST PROGRAM": if x not in "FORGET IT": print x, Test.pyc 0 SETUP_LOOP 3 LOAD_CONST ('TEST PROGRAM') 6 GET_ITER 7 FOR_ITER 10 STORE_FAST (x) 13 LOAD_FAST (x) 16 LOAD_CONST ('FORGET IT') 19 COMPARE_OP (not in) 22 JUMP_IF_FALSE (to 33) 25 POP_TOP 26 LOAD_FAST (x) 29 PRINT_ITEM 30 JUMP_FORWARD (to 34) 33 POP_TOP 34 JUMP_ABSOLUTE 37 POP_BLOCK Byte-code compilation Output Interpretation SPAM Python Execution Model

  3. GIL Python Threading Model Thread A Thread B Byte codes • Each thread executes a separate sequence of byte codes • All threads must contend for one global interpreter lock

  4. Example: Matrix Multiplication • Performs a matrix multiplication A = B x C • The work is split between several worker threads • The application runs on a machine with 8 CPUs • Threads do not scale for multiple CPUs due to lock contention on the GIL

  5. Workaround: Processes Process A Process B IPC • Each process has its own interpreter lock • Requires inter-process communication, using e.g. message passing by means of pipes

  6. Matrix Multiplication using Processes • A master process distributes the input matrices to a set of worker processes • Each worker process computes some part of the output matrix, and returns its result to the master • The master process assembles the final result matrix • More communication, and more complex pattern than using threads

  7. Ways Ahead • Communication through standard Python container objects favors threads • Scalability on multiprocessor architectures favors processes • The GIL is here to stay, so making threads scale better is hard • However, there might be room for improvement of inter-process communication mechanisms

  8. Using Shared Memory for IPC Process B Process A Shared Memory • Processes communicate by accessing a shared memory region • Requires explicit synchronization and data marshalling, imposes a flat data structure

  9. X X.method1() L L.extend([X, Y]) Y Using POSH for IPC Process B Process A Shared Memory • Allocates regular Python objects in shared memory • Shared objects are accessed transparently through regular method calls • IPC is done by modifying shared, mutable objects

  10. Complications • Processes must synchronize their access to shared, mutable objects (just like threads) • Explicit synchronization of critical regions must be possible, while implicit synchronization upon accessing shared objects is desireable • Python’s regular garbage collection algorithm is inadequate for shared objects, which may be referenced by multiple processes

  11. X.method2() return value Proxy Objects X.method1() X return value Proxy Object Shared Object • Provides transparent access to a shared object by forwarding all attribute accesses and method calls • Provides a single entry point to a shared object, where synchronization policies may be enforced

  12. Multi-Process Garbage Collection • Must account for references from all live processes • Must stay up-to-date when processes fork, as this may create new references to shared objects • Should be able to handle abnormal process termination without leaking shared objects

  13. Proxy Object Garbage Collection in POSH Process A Shared Memory Process B X M Y L Shared Object Regular Python reference Reference from a process to a shared object (type I) Reference from one shared object to another (type II)

  14. Garbage Collection Details • POSH creates at most one proxy object per process for any given shared object • Shared objects are always referenced through their proxy objects • A bitmap in each shared object records the processes that have a corresponding proxy object. This tracks references of type I (from a process) • A separate count in each shared object records the number of references to the object from other shared objects. This tracks references of type II • Shared objects are deleted when there are no references to them of either type

  15. Performance • Performing a matrix multiplication A = B x C using POSH • The work is split between several worker processes • The application runs on a machine with 8 CPUs • More overhead, but scales for multiple CPUs

  16. Summary • Python uses a global interpreter lock (GIL) to serialize execution of byte codes • This entails a lack of scalability on multiprocessor architectures for CPU-intensive multi-threaded apps • However, threads offer an attractive programming model, with implicit communication • Processes + shared memory reduce IPC overheads, but normally impose flat data structures and require data marshalling • POSH uses processes + shared memory to offer a programming model similar to threads, with the scalability of processes

  17. Availability • Open source, hosted at SourceForge • http://poshmodule.sf.net/ • Still not very stable • Developers wanted

  18. Example Usage import posh class Stuff(object): pass posh.allow_sharing(Stuff, posh.generic_init) mystuff = posh.share(Stuff()) def worker1(): mystuff.money = 0 def worker2(): mystuff.debt = 100000 def worker3(): mystuff.balance = mystuff.money - mystuff.debt for w in worker1, worker2, worker3: posh.forkcall(w) posh.waitall()

More Related