1 / 93

E-Commerce Architectures and Technologies

E-Commerce Architectures and Technologies. Rob Oshana Southern Methodist University. Modeling Contention for Software Servers. Review of overhead factors. Processors I/O devices Routers LAN segments Also threads of a server Database locks Semaphores. A Simple Example.

onofre
Download Presentation

E-Commerce Architectures and Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E-Commerce Architectures and Technologies Rob Oshana Southern Methodist University

  2. Modeling Contention for Software Servers

  3. Review of overhead factors • Processors • I/O devices • Routers • LAN segments • Also threads of a server • Database locks • Semaphores

  4. A Simple Example • Web server with m threads • Requests handled directly by available thread or queued • Executing threads need to use the CPU and I/O and may also be queued

  5. Example of Contention for Software Threads HTTP Server threads Queue for threads 1 2 m CPU Disk

  6. Total response time for a Web request • Software contention; time spent by a request waiting for a software resource (semaphore, DB lock) • Hardware contention; time spent by a request waiting for a hardware resource (CPU, I/O device) • Use of hardware resources; time spent using hardware resources

  7. Example • HTTP server with five threads • Request requires 0.050 sec CPU • Request requires 0.065 sec I/O time • No limit on size of queue • What is the impact of contention for threads as arrival rate increases?

  8. Response time and Waiting time for Threads (Unlimited Queue)

  9. Example • For arrival rate = 12/sec, thread waiting time is 0.194 (Littles Law) • Average # requests waiting for thread = 12 X 0.194 = 2.33 • Response time – thread waiting time = request execution time, For 12 requests/sec = 0.487 – 0.194 = 0.293 • Time spent waiting for resources = 0.293 – 0.115 = 0.063 sec

  10. Contention for Server Threads with finite queue HTTP Server threads Λ (1 – P reject) Queue for threads 1 K = J? NO 2 YES Max queue = J Queue size = k m Rejected requests

  11. Response time and Waiting time for Threads (Limited Queue)

  12. Rejection Probability • Throughput = Λ X ( 1 – P reject) • P reject = probability that a request is rejected • Rejection probability with Λ = 12 • Decreases very fast with increase in queue length

  13. Rejection Probability

  14. Contention for Software in E-Business Sites • WS is multithreaded (m threads) • AS has n threads • DS has p threads • Queue for WS limited (requests may be rejected) • Requests sent to AS and/or DS and are queued there

  15. WS threads 1 m CPU CPU CPU Disk Disk Disk S/W and H/W Queues AS threads DS threads 1 1 m m Rejected requests

  16. Contention for Software in E-Business Sites • ResponseTime = SoftwareContention + ExecutionTime • SoftwareContention = Wait(WS) + Wait(AS) + Wait(DS) • ExecutionTime = HardwareContention + TotalDemands • HardwareContention = HdwWait(WS) + HdwWait(AS) + HdwWait(DS) • TotalDemand = Demand(WS) + Demand(AS) + Demand(DS)

  17. Example • E-business site with max queue size for WS = 50 requests • Parameters given below

  18. Example • Simulation results next page • Software contention, execution time, and hardware contention grow at the beginning with arrival rate and then saturate when queue is filled • Hardware contention is largest component of execution time

  19. Example • Nsite, average number of requests at the e-business site • Model shows that for Λ =12, Nsite = 59.7 and response time = 5.92 • Nsite = Throughput X ResponseTime • = Λ ( 1 – Preject) X ResponseTime • Preject = 1 – (Nsite / (Λ X ResponseTime) • = 1 – 59.7 / (12 X 5.92) = 0.16

  20. Simultaneous Resource Contention • Simultaneous resource possession; request to simultaneously hold more than one resource • Can be modeled using hardware and software resources

  21. Simultaneous Resource Possession of S/W, H/W Resources

  22. Method of Layers • Multi-tier e-business architecture makes them suitable to model with multiple layers • Layered Queuing Networks • Good for representing hardware and software hierarchy in e-business sites • With a LQN, processes with similar behavior form a group or a class of processes

  23. Example of LQN • WS running on a machine of its own • AS and DS share another machine • AS uses disk 2, DS uses disks 3, 4 • WS threads are at level 1 of LQN, requests services from CPU 1, disk 1, AS threads which are at level 2 • AS server threads use disk 2 and DS threads at level 3 • DS server threads use CPU 2 and disks 3 and 4 which are at level 4

  24. Web Server threads LQN Model for an E-Business Site Level 1 Level 2 CPU 1 Disk 1 App Server threads Disk 2 DB Server threads Level 3 CPU 2 Disk 3 Disk 3 Level 4

  25. Analytic Techniques • Based on Mean Value Analysis • 1. Method of Layers (MOL) • Iterative technique, decompose LQN into sequence of 2 level QN submodels • 2. Stochastic Rendezvous Networks (SRN) • Iterative algorithm that begins by assuming no H/W, S/W contention

  26. Characterizing E-Business Workloads

  27. Introduction • Demonstrate how CBMGs and CVMs can be obtained from HTTP logs • Describe methods based on clustering analysis to derive small groups of CBMGs or CVMs that accurately reflect the workload • Show how parameters can be obtained from the customer behavior model

  28. Workload Characterization of Web Traffic • If a web site has 1800 requests for files during a 5 minute period to 12 unique files; • 1800 n= k X ( 1/1 + ½ + .. + 1/12) = k X 3.1032 • K = 1800/3.1032 = 580.05 • Estimated number of accesses to the most popular file is k/1 = 580 , least popular file is k/12 = 580.05/12 = 48

  29. Example of Zipf’s Law

  30. Tailed Distribution • Tailed distribution implies the probability that a large value occurs is small but non-negligible • Web traffic features that are found to be heavy tailed • Size of files requested from Web servers • Number of pages requested per site • Reading time per page

  31. Characterizing Customer Behavior • CBMG can be used to capture the navigational pattern of a customer through an e-commerce site • Transitional aspect • how a customer moves between states • Matrix of transition probabilities • Temporal aspect • The time it takes to move between states • “server perceived” think time; average time elapsed since a server completes a request for a customer until it receives the next request from the same customer during the same session

  32. Browser side and Server side think times Rs Zs t1 t2 t3 Server Browser nt nt nt Zb Request i Request i+1 nt = network time Zs = server side think time Zb = browser side think time Rs = server response time

  33. Characterizing Customer Behavior • Server side think time = t3 – t1 • = 2 X nt + Zb • A think time can be associated with each transition in the CBMG • Describe as a pair (P,Z), P = [Pi,j] is an nXn matrix of transition probabilities, Z = [Zi,j], is a nXn matrix representing average think times between CBMG states

  34. Example CBMG .3 .1 2 browse .5 .3 .25 .6 .2 4 6 5 1 pay Add to cart select entry .2 .2 1.0 .1 .1 .4 .45 search .5 .4 .3 .1 3

  35. Example • Vadd = Vselect X 0.2 • Vbrowse = Vsearch X 0.2 + Vselect X 0.3 + Vadd X 0.25 + Vbrowse X 0.3 + Ventry X 0.5 • In general: Vj = Σ Vk X pkj (k = 1..n-1) and pkj is the probability that a customer makes a transition from state j to state k

  36. Example • AverageSessionLength = Σ Vj for j = 2..n-1 • For example, AverageSessionLength = Vbrowse + Vsearch + Vselect + Vadd + Vpay • = 2.498 + 4.413 + 1.324 + 0.265 + 0.053 = 8.552

  37. From HTTP logs to CBMGs • We can obtain CBMG data from HTTP logs • Can group small clusters of CBMG to determine behavior (stratification) • Logs can be merged and filtered using time stamps to help in the merge

  38. Data recorded in the log • UserID; identification of the customer (using cookies, dynamic URLs and other authentication mechanisms) • RequestType; GET on the home page, GET on another page, search request, etc • RequestTime; time request arrived at the site • ExecTime; not normally recorded, execution time of the request

  39. Customer Behavior Characterization Methodology HTTP Logs Merge and filter Request log Get sessions Session log Get CBMGs CBMGs

  40. GetSessions Algorithm • For a given session, there are three transitions between states s and t • Think times are 20, 45, 38 sec resp. • Cs,t = 3, Ws,t = 20 + 45 + 38 = 103 sec • Cs,t = nXn matrix of transition counts • Ws,t = nXn matrix of think times

  41. Basics of GetSessions • Sort request log by UserID in order of time • Separate into sessions using a session threshold time (30 minutes) • For each session form the C and W matrices (transitions and think times)

  42. Basics of GetSessions • Precision of time needs to be relevant to processor speed, etc • May want to clean the log from crawler activity

  43. GetCBMGs algorithm • Must perform a clustering analysis on the data • Creates a synthetic workload composed of a relatively small number of CBMGs • Centroid of the cluster determines the CBMG characteristics

  44. Example • HTTP log run through GEtSessions produces 20000 sessions out of 340,000 lines in the request log • Six clusters identified • Buy to visit ratio (BV) represents the % customers who buy from the store • Session length is the average # of shopper operations requested by a customer for each visit to the store • Va is the Add to Shopping Cart Visit Ratio (avg # times customer adds item to shopping cart)

  45. Example

  46. Conclusions from example • Cluster 1; represents the majority of the sessions (44.28%) • Very short average session length (5.6) • Highest % of customers that buy from the store • Cluster 6; represents a small percentage of customers • Longest session length • Smallest buying ratio

  47. Buy to Visit Ratio vs Session Length

  48. Conclusions from example • Pattern; the longer the session, the less likely it is for a customer to buy an item from the Web store • The buy to visit ration decreases in a quadratic fashion with the session length

  49. How many clusters to choose? • How many clusters accurately represent the workload? • Examines the variation in two metrics; • Average distance between points of a cluster and its centroid (intracluster distance) • Average distance between clusters (intercluster distance) • CV; coefficient of variation

  50. How many clusters to choose? • Goal of clustering is to minimize the intracluster CV while maximizing the intercluster CV • If the # of clusters is made equal to the # of points, this will be achieved • But we want a compact representation so we need to select a small number

More Related