Rohit Nigam 200702036

Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDAZheng Wei and Joseph JaJa Rohit Nigam 200702036

About the Problem

Solution Approach GPU GPU GPU CPU GPU

Problems Faced • The author failed to mention that of the ‘s’ random sub-lists generated, one of the sublist’s head must be the head of the list. Considering this, I have kept the head of the first sublist as the head of list. Rest of the lists are random as suggested in the paper. • One other problem faced was in executing steps 4,5. Since the sublists are random and not ordered, the prefix sum computation of last elements of sublists again becomes the problem of computing prefix sum of link list. For this, we need to make have another array which specifies which sublist comes after the current list.

Optimizations • The main reason for making the assumption that head is not known is to explore the impact of the presence of significant caches since the initial step that determines the head of the list will fill the cache with some of the input data thereby rendering the execution of later steps faster on such processors. • The total number of nodes handled by a thread is about the same as any other thread with high probability if the number of sublists is at least lnp n and the number of processors p < , where n is the total number of nodes. • The number of sublists are managed such that there exists an optimal balance between the desirability of a large number of sublists (for fine-grain data parallel computations and load balancing) and the splitting/merging costs.

Optimizations • The step 4 sequentially computes the prefix sum instead of a recursive method, thereby cutting down a significant overhead. • Randomizing the positions of splitters gives high probability of a overall procedure is load balanced. • The total number of sublists per thread is min(2*(size/120),32) (size>120). This is the optimum value found experimentally, as beyond this value the optimization caused by increasing the number of sublists is worse than the overhead of creating and joining them in other stages of the algorithm.

Results • For List Size 64M, stride 1001, Sublists per thread 32.

Rohit Nigam 200702036

Rohit Nigam 200702036

Presentation Transcript

Rohit Kate

Rohit Kate

Rohit Nigam 200702036

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Teleconnections Nigam, 2003

Rohit Khokher

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Sharma Biography