240 likes | 367 Views
CS569 Selected Topics in Software Engineering Spring 2012. Principles: Performance. Key performance principles. Use indexes Minimize traffic Minimize locks Parallelize. GAE Indexes. An index is a list of pointers Each pointer indicates the location of one entity
E N D
CS569 Selected Topics in Software EngineeringSpring 2012 Principles: Performance
Key performance principles • Use indexes • Minimize traffic • Minimize locks • Parallelize
GAE Indexes • An index is a list of pointers • Each pointer indicates the location of one entity • The list is sorted according to selected entity member variables • Multi-value queries are handled by binary search + direct scan (single-value by hashtable) • Example: A list of pointers to Course entities could be sorted by department, then number Use indexes Minimize traffic Minimize locks Parallelize
GAE Index Creation • When entities of a certain kind are saved, GAE will (generally) create an index for each simple member variable • When a query comes in with more than one member variable, GAE will automatically create a new index (generally) • For exceptions to the rules, refer to GAE book pages 166-169 Use indexes Minimize traffic Minimize locks Parallelize
Example • Many Course entities • PK is automatically generated • The entities are scattered around datastore • Index based on department • Index based on coursenum • How to query for • Courses in the “BIO” department • Courses with numbers >= 700 Use indexes Minimize traffic Minimize locks Parallelize
Let’s walk through it… How can these be supported using indexes? • Filters combined with && • Filters combined with || if they operate on the same member variable • Filters that use != in your queries, as well • Filters that match == on a set of values, meaning “if any value matches” • Cannot use != on a set of values • Sorting using setOrdering() on the query before you invoke execute Use indexes Minimize traffic Minimize locks Parallelize
A few notes about indexes in GAE • GAE disallows joins on JDO queries • Instead, you have to do the join in the application • Indexes will not be used for filters combined with || on different member variables • A query like this will (usually) throw an exception • Try not to use any query that filters based on more than one multi-valued member variable • Because the resulting index is a big space-hog Use indexes Minimize traffic Minimize locks Parallelize
Minimize traffic • Traffic bogs down the server and the client • And costs money on the server • And wastes battery on the client • Keys to minimizing traffic • Minimal messages • Minimal roundtrips • Aggressive caching • Local computation Use indexes Minimize traffic Minimize locks Parallelize
Minimal messages • When client-server communicate… • Only send data needed at that moment • Use a concise data format (i.e., probably JSON) • For example, suppose that an app needed to retrieve a list of courses in response to a queryin order to show a list of links • http://www.myserver.com/info.jsp?prof=cscaffid Use indexes Minimize traffic Minimize locks Parallelize
Option #1565 bytes <?xml version="1.0"?> <courses> <course><dept>CS</dept><num>361</num><prof>cscaffid</prof><title>Intro to SE</title><description>Blah blah blah blah blah blah blah blah blah</description></course> <course><dept>CS</dept><num>494</num><prof>cscaffid</prof><title>Web development</title><description>Blah blah blah blah blah blah blah blah blah</description></course> <course><dept>CS</dept><num>496</num><prof>cscaffid</prof><title>Cloud+Mobile development</title><description>Blah blah blah blah blah blah blah blah blah</description></course> </courses> Use indexes Minimize traffic Minimize locks Parallelize
Option #2108 bytes [{n:"CS361",t:"Intro to SE"}, {n:"CS494",t:"Web development"}, {n:"CS496",t:"Cloud+Mobile development"}] Combine fields if appropriate (e.g., dept and number) Omit fields if not needed (e.g., description) Shorten field names if appropriate (e.g., n and t) Use JSON if feasible Use indexes Minimize traffic Minimize locks Parallelize
Minimal roundtrips • Eliminate unnecessary messages • E.g., cache images on the client so that these do not need to be repeatedly downloaded • Combine messages if feasible • E.g., if you need to query CS and MA courses, design server to handle both queries at once • Defer messages if feasible • E.g., give the user the option to defer logging in until it’s abolutely necessary Use indexes Minimize traffic Minimize locks Parallelize
Aggressive caching • If a computation or transmission is expensive, then do not repeat it unnecessarily • Cache images on the client • Cache expensive computation results on server • Options for caching on server • Write to the datastore • Write to memcache (might disappear and need recomputing) Use indexes Minimize traffic Minimize locks Parallelize
Pseudocode for caching – an example of computing rainfall String location = read from client e.g., “Albany, OR” String rainfall = memcache[location] If (rainfall is null) { latlon = convert location to latitude/longit. map = load weather map from data store pixelcolor = color of pixel for latlon in map rainfall = convert pixelcolor to inches of rain memcache[location] = rainfall } return rainfall as JSON to client
Local computation • If a computation uses a very large amount of data, then move the computation to the data, instead of the data to the computation. • Example: Find city with maximal rainfall in US • Option #1: • Server sends rainfall for 4500 cities to client • Client loops through cities to choose maximum • Option #2: • Server loops through cities to choose maximum • Server sends just the maximum to the client Use indexes Minimize traffic Minimize locks Parallelize
Local computationanother example • Example: Exercise app • Every user’s cellphone logs activity during the day (every 1 minute, logs accelerometer) • Need to have a “winner board” • Option #1: • Every client sends every minute’s data to server • Server computes each user’s total for the day • Server picks winner (person with most exercise) • Option #2: • Every client computes that user’s total for day • Client sends that user’s total to the server • Server picks winner (person with most exercise)
Lock only when necessary • If entities need to be modified by different people at the same time, then put the entities in different entity groups. • Clean up any inconsistency problems • Using transactional tasks • Or just before reading from data Use indexes Minimize traffic Minimize locks Parallelize
Example: An application that tracks college revenue • Suppose that there are N Course entities, each with a “cost” and a “num_students” member. • Suppose there is also a Projections entity with “total_num_students” and “total_revenue”. • Should we make all of the entities be in the same entity group? Use indexes Minimize traffic Minimize locks Parallelize
Example: An application that tracks college revenue • Option #1: • Make all of the Course entities to be JDO children of the Projections entity • When a student registers for a course, lock the Course and the Projections, update both • Option #2: • Put each Course in its own entity group • When a student registers, only update the Course • Schedule a transactional task to update the Projections Use indexes Minimize traffic Minimize locks Parallelize
Parallelize work when possible • When you need to update many, many entities, divide the work • Assign 1/N of the work to each of N tasks • Also useful when you have a complex computation that can be divided (even if there is no “update” involved) Use indexes Minimize traffic Minimize locks Parallelize
Example: Computing “best student” award • For each student, we shall compute a score based on that student’s grade in all courses • But it isn’t just GPA • We also are going to take into account the difficulty of different courses, and weight different courses differently • The computation will also take into account other data, such as numbers of papers published and time required to graduate. • It is a very detailed, complicated computation Use indexes Minimize traffic Minimize locks Parallelize
Example: Computing “best student” award • Option #1 • In a single task, we loop over all students, and foreach student, we compute the score; then we take the maximum to select the winner. • Option #2 • If we have N students, we launch N tasks each to compute (and store) the score for 1 student • We have one additional task that periodically runs: it checks to see if N scores have been stored, and if so, it selects the winner Use indexes Minimize traffic Minimize locks Parallelize
Key performance principles • Use indexes • Automatically created but limited • Minimize traffic • Minimal messages • Minimal roundtrips • Aggressive caching • Local computation • Minimize locks • Accept inconsistency and fix it in transactional task • Parallelize
Extra credit opportunity (1XC) • Find 3 ways that the PSS application could be improved by applying the performance principles on the previous slide • For each of 3 ways, write 3 sentences (total of 9 sentences): • Principle: What principle would you apply? • Violation: What areas of the PSS code violate the principle? • Modification: How would you modify the PSS code? • Add a 10th sentence stating that you did not discuss this with any classmates, and that you worked on it alone • Create a PDF with your 10 sentences (should be less than 1 page long) and upload to Blackboard (under XC uploads)