860 likes | 1.08k Views
Optimizing User Code in Allegro CL 5.0. by Duane Rettig. Optimizing User Code in Allegro CL 5.0. Introduction Optimization-related lisp architecture Undocumented tools in Allegro CL Optimization methodology Speed optimizations Space optimizations Speed vs space tradeoffs
E N D
Optimizing User Code in Allegro CL 5.0 by Duane Rettig
Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management
Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management
Optimization-related Lisp Architecture • Static vs dynamic - Function dispatch • Closure structure • Foreign functions - entry-vec struct • Disassemble extensions
Architecture: Static Vs Dynamic • Pure static: absolute • Relocatable • Shared libraries • Dynamic shared libraries • Dynamic functions
Static Programs • Absolute addresses • Fast startup • Fast running • Large • Not reconfigurable 0 Code Data sbrk
Relocatable Programs • Not tied to a base address • Slightly longer startup times • Fast running • Large • Not reconfigurable Code Data Reloc
Programs that Use Shared Libraries • Usually need relocation • Smaller than non-shared libraries • Faster startup times • Medium speed; may start slow and gain speed after first use • Not reconfigurable Main Lib 1 Lib 2 Lib 3
Programs that Use Dynamic Shared Libraries • May be absolute or relocatable • May be very small • Very fast startup • Medium speed, amortized over lib loading • Reconfigurable Main Lib 1 Lib 2 Lib 3
Programs that Dynamically Define Functions • May be absolute or relocatable • May be very small • Very fast startup • Medium speed, amortized over function definitions • Extremely reconfigurable Main Lisp lib Heap Lib 1 Lib 2 Functions
Lisp Data Availability func pc Glob table nil function codevector
C Data Availability lib1 lib2
Caller: Store caller-saves registers set up arguments and count load name register call trampoline * restore caller-saves registers Callee: establish stack save function Execute body restore stack restore caller’s function return Calling Sequence: Lisp
Caller: Store caller-saves registers set up args (no count) store caller’s context call function, function desc, or stub restore caller’s context restore caller-saves registers Callee: setup callee’s context establish stack store callee-saves registers Execute body restore callee-saves registers restore stack return Calling Sequence: C
Required: get function register from name register get start address from function jump to start Optional: save argument registers check for stack overflow jump to call-count code jump to single-step code Lisp’s Symbol Trampoline
Architecture: Closures External vec Internal vec
Architecture: Foreign Functions Entry-vec struct (in-package :excl) (defstruct (entry-vec (:type (vector excl::foreign (*))) (:constructor make-entry-vec-boa ())) name ; entry point name (address 0) ; jump address for foreign code (handle 0) ; shared-lib handle (flags 0) ; ep-* flags (alt-address 0) ; sometimes holds the real func addr )
Architecture: Foreign Functions Entry-vec flags ;; Entry-point constants: (defconstant excl::ep-call-semidirect 1) ; Real address stored in alt-address slot (defconstant excl::ep-never-release 2) ; Never release the heap (defconstant excl::ep-always-release 4) ; Always release the heap (defconstant excl::ep-release-when-ok 8) ; Release the heap unless without-interrupts (defconstant excl::ep-tramp-calls #x70) ; Make calls through special trampolines (defconstant excl::ep-tramp-shift 4) (defconstant excl::ep-variable-address #x100) ; Entry-point contains address of C var
Architecture: Foreign Functions Entry vec “foo” missing_entry_point bind_and_call call_semidirect foo()
Architecture: Foreign Functions Excl::.saved-entry-points. table “foo” Entry vec Entry vec “bar” Entry vec “bas” Entry vec Entry vec
Architecture: Disassemble • Extensions • non-lisp names • :absolute • :addr-list • :find-callee • :find-pc • :references-only • :recurse • :target-class
Disassembling non-lisp names • A string representing a C entry point • Allows for viewing of non-lisp assembler code • Some instructions are interpreted automatically
(disassemble "qcons") ;; disassembly of #("qcons" 1074935746) ;; code start: #x401237c2: 0: 8b 8f ff fd movl ecx,[edi-513] ; C_GSGC_NEWCONSLOC ff ff 6: 3b 8f 03 fe cmpl ecx,[edi-509] ; C_GSGC_NEWCONSEND ff ff 12: 0f 84 3c 1e jz 7758 ; cons+0 00 00 18: 89 41 0f movl [ecx+15],eax 21: 89 c8 movl eax,ecx 23: 89 50 13 movl [eax+19],edx 26: 83 87 ff fd addl [edi-513],$8 ; C_GSGC_NEWCONSLOC ff ff 08 33: c3 ret
excl::*c-symbol-table* build: • dirty (excl::*rebuild-c-symbol-table-p* is non-nil): • at lisp start • after load or unload of shared library • rebuilt: • for disassemble of a string • for profiler analysis • for “:zoom :all t :verbose t” invocation
(inspect excl::*c-symbol-table*) A simple T vector (3538) @ #x2039c352 0-> cstruct (2) = #("unidentified" 0) 1-> cstruct (2) = #("_init" 134514576) 2-> cstruct (2) = #("strcpy" 134514600) 3-> cstruct (2) = #("dlerror" 134514616) 4-> cstruct (2) = #("getenv" 134514632) 5-> cstruct (2) = #("fgets" 134514648) 6-> cstruct (2) = #("perror" 134514664) 7-> cstruct (2) = #("readlink" 134514680) 8-> cstruct (2) = #("malloc" 134514696) 9-> cstruct (2) = #("malloc" 134514696) 10-> cstruct (2) = #("_lxstat" 134514712) 11-> cstruct (2) = #("isspace" 134514728) 12-> cstruct (2) = #("_xstat" 134514744) 13-> cstruct (2) = #("__libc_init" 134514760) 14-> cstruct (2) = #("strrchr" 134514776) 15-> cstruct (2) = #("fprintf" 134514792) 16-> cstruct (2) = #("fprintf" 134514792) 17-> cstruct (2) = #("strcat" 134514808) 18-> cstruct (2) = #("chdir" 134514824) 19-> cstruct (2) = #("strncmp" 134514840) ... 3537-> cstruct (2) = #("__bss_start" 1075102200)
(simple function for next examples) USER(1): (defun foo (x) (list (bar x))) FOO USER(2): (compile 'foo) Warning: While compiling these undefined functions were referenced: BAR. FOO NIL NIL USER(3):
(disassemble 'foo) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x203dcddc: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret
Disassembling with absolute addresses • :absolute • Allows debug at absolute addresses • Warning: addresses may not be in sync after gc, though per-disassemble consistency is maintained
(disassemble 'foo :absolute t) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR 204cb5a4: 55 pushl ebp 204cb5a5: 8b ec movl ebp,esp 204cb5a7: 56 pushl esi 204cb5a8: 83 ec 24 subl esp,$36 204cb5ab: 83 f9 01 cmpl ecx,$1 204cb5ae: 74 02 jz 0x204cb5b2 204cb5b0: cd 61 int $97 ; trap-argerr 204cb5b2: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 204cb5b5: 74 02 jz 0x204cb5b9 204cb5b7: cd 64 int $100 ; trap-signal-hit 204cb5b9: 8b 5e 32 movl ebx,[esi+50] ; BAR 204cb5bc: b1 01 movb cl,$1 204cb5be: ff d7 call *edi 204cb5c0: 8b d7 movl edx,edi 204cb5c2: ff 57 2b call *[edi+43] ; QCONS 204cb5c5: c9 leave 204cb5c6: 8b 75 fc movl esi,[ebp-4] 204cb5c9: c3 ret
Disassemble support for the profiler • addr-list • Marks a specific instruction • Allows for exact profiler hits to be recorded
(disassemble 'foo :addr-list -10) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x204cb5a4: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 stopped --> 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret
(disassemble 'foo :addr-list '(11 (#x204cb5ae . 4) (#x204cb5b9 . 4) (#x204cb5c5 . 3))) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x204cb5a4: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 4 (36%) 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 4 (36%) 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 3 (27%) 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret
Disassemble support for the debugger • :find-callee • Returns information given a relative pc • :find-pc • Returns information about instruction sequencing, or prints an instruction • :references-only • Returns references from function or glob table
USER(22): (disassemble 'foo :find-callee 26) BAR :CONST -1 USER(23): (disassemble 'foo :find-callee 28) BAR :CALL 0 USER(24): (disassemble 'foo) ;; disassembly of #<Function FOO> ... 14: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret USER(25):
USER(28): (disassemble 'foo :find-pc 14) 14 17 NIL NIL USER(29): (disassemble 'foo :find-pc 17) 17 19 21 :BCC USER(30): (disassemble 'foo :find-pc '(:print 17)) 17: 74 02 jz 21 USER(31): (disassemble 'foo :find-pc '(:print 21)) 21: 8b 5e 32 movl ebx,[esi+50] ; BAR USER(32):
USER(26): (disassemble 'foo :references-only t) (SYSTEM::QCONS BAR SYSTEM::C_INTERRUPT) USER(27):
Miscellaneous Disassembler modes • :recurse • Useful to control the amount of output • :target-class • Used only in cross-porting
Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management
Undocumented Tools in Allegro CL • excl::get-objects (att #42-1) • excl::get-references (typo in your notes) • excl::create-box/excl::box-value(att #42-2) • excl::atomically • allows compiler to guarantee atomic body • Autoloading facilities (described later)
Atomic forms • Generally a form is atomic if it has • no interrupt-checks • no consing • no non-atomic forms or calls • Use excl::atomically like progn; if it compiles, the body is atomic • Atomic primcalls: • gsgc-setf-protect gsgc-set-protect fd-stack-real qcar qcdr • Atomic calls: • error excl::.error excl::eq-hash-fcn excl::eql-not-eq excl::get_2op-atomic excl::sxhash-if-fast excl::symbol-hash-fcn
Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management
Optimization Methodology • Get it right first • Profile it • The time macro • The Allegro CL profiler • Hit the high cost items • Implementations • Algorithms
Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management
Speed Optimizations • Profiling • Efficient compilation • Immediate compilation • Foreign function optimizations • Hash tables • CLOS optimizations • Miscellaneous optimizations
Speed Optimizations: Profiling • Always compile top-level test functions • Example profile run (att #48-1) • Do not use time macro with profiler • Avoid simultaneous time/call-count profiles • When using time macro, beware of new closures
Time macro: extra closures This driver is not as simple as it looks! (defun test-driver (n) (time (dotimes (i n) (test-it)))
Speed Optimizations: Efficient Compilation • :explain • excl::atomically • excl:add-typep-transformer (att #50-1,2)
Speed Optimizations: Immediate Compilation • Inlining and unboxing • Immediate-args • defun-immediate (att #51-1,2,3)
Speed Optimizations: Foreign Functions • Call-direct (att #52-1,2) • comp:list-call-direct-possibilities