130 likes | 318 Views
Using Common Lisp in a high performance seach environment. Martin Cracauer – ITA Software, Cambridge, MA. Lisp at ITA Software Inc. We want all the features of dynamic programming languages ... ... but it all of it has to be gone by runtime. ITA – the ultimate cherry-pickers.
E N D
Using Common Lisp in a high performance seach environment. Martin Cracauer – ITA Software, Cambridge, MA Lisp at ITA Software Inc.
We want all the features of dynamic programming languages ... ... but it all of it has to be gone by runtime. ITA – the ultimate cherry-pickers
3 Why we have to be fast – part 1 • Flights don't matter (much): Boston - Hamburg
Why we have to be fast – part 1 Pricing insanity 1: - Boston to Hamburg: - no direct flights - but 1,574 direct fares that you can fill with flights - going through Amsterdam and using two fares (BOS->AMS, AMS->HAM) gets you 1700 * 290 fares = 0.5 million fares to consider, all of them you can fill with flights "freely" - going through London and Paris, using three fares: 2226 * 261 * 199 = 115 million fare combinations, all of them you can fill with flights "freely" 4
5 Why we have to be fast – part 2 • Flights don't matter (much) – Easter Island to some tiny town at the tip of Denmark. You think they publish prices for that?
Why we have to be fast – part 2 Easter Island: - Island way out west of Chile - Population: 3,791 humans, 887 big stone figures (they don't fly) - Yesterday's flights: 1 in, 1 out [May 2006 data] - Day before yesterday: 0 (zero) flights - Prices directly published in database files: ==> 1432 - Prices after expanding various forms of autogenerated fares: ==> 160,000 - Prices published for travel from Easter Island to Aalborg, Denmark: ==> 57 - Of these 57 prices to Aalborg, actually used when flying from Easter Island to Aalborg: ==> 0 (zero) That is because the cheapest routes always use price combinations through other cities. But we still have to look at those prices. 6
The slides with the raw number dump... one Going from Boston to L.A.: - reasonable ways (flight combinations) to get there: - 10,000 (10^4) - reasonable ways to get back: - 10,000 (10^4) - valid pricing solutions (as opposed to flights) for each of them (if you don't have complications): - 10,000 - 40,000 (10^4) ==> 10^12 solutions This is not like getting on the bus paying at the door. 7
The slides with the raw number dump... two What travel companies were doing before ITA (on mainframes): - 10 ways out (1 promille of useful pool) - 10 ways back (1 promille of useful pool) - 10,000 pricing solutions ==> 10^6 solutions What ITA is doing (on Linux PCs), simple case: - 400 ways out - 400 ways back - variable number of pricing solutions ==> 10^9 solutions for “simple” search (and this isn't looking at the more complex international pricing) 8
The slides with the raw number dump... three -when things are not that “simple” Sometimes you just don't get away with slacking off with 10^9 solutions: - Picking the worst of random 1,000 actual customer queries from ITA's website shows one complicated itinerary with 3 adults, 2 youth and 1 child: ==> 10^28 solutions. (we saved that family a whole lot of money!) - Manually constructing the worst we could we could (in one minute on a Pentium-III Xeon with 1 GHz and 2 GB RAM at the time): ==> 10^31 solutions. “Solutions” here means verified flyable: every price in there checked to be allowed for this travel, all seats checked to be available. Keep in mind that even in 2009 we only have 10^10 bytes of RAM to keep all this. 9
ITA SoftwareProgramming language cherry-pickers, Inc All the features of dynamic languages – and by the time it actually runs there should be no trace left. Classic dynamic languages drawbacks (to kill one by one): Mixed causes: - non-native code compilation [assumed solved – yeah, right] - run-time type checks - GC - producing lots of heap garbage where static languages don't (e.g. from bignum arithmetic) - unneccessary initialization of new data structures - too many function calls, no inlining, required proxy functions - mandatory array bounds checking 10
ITA SoftwareProgramming language cherry-pickers, Inc All the features of dynamic languages – and by the time it actually runs there should be no trace left. Classic dynamic languages drawbacks (to kill one by one): Memory: - inefficient way to represent user-defined structs in memory - inefficient way to represent arrays in memory - [combination of the last two] - inability to have efficient sub-byte data types (bitfieds), either lack the capability entirely, or no fast bitfields 11
ITA SoftwareProgramming language cherry-pickers, Inc All the features of dynamic languages – and by the time it actually runs there should be no trace left. Classic dynamic languages drawbacks (to kill one by one): Memory, the outside world and C functions: - inability to access C data wiithout proxy functions for conversion - inability to call C functions without proxy functions to convert arguments and return value ==> leading to inability to use mmap'ed data built externally (mmap == big deal for mostly readonly data users) 12
High-performance Lisp at ITA Software – getting there <== [whiteboard] 13