250 likes | 463 Views
Document Classification. Syamsul Rizal 20136089. Introduction. Texts Images Music. Introduction. Given a document and a set of labels Find the documents most likely to contain relevant information. Motivation. Reads a dictionary of keywords Locates a set of text documents
E N D
Document Classification Syamsul Rizal 20136089
Introduction • Texts • Images • Music
Introduction • Given a document and a set of labels • Find the documents most likely to contain relevant information
Motivation • Reads a dictionary of keywords • Locates a set of text documents • Reads the documents • Generates a vector for each document • Writes the document vectors
Parallel Algorithm Design The document classification problem
Partitioning and Communication The reading and profiling of each document may occur in parallel
Agglomeration and Mapping • Agglomeration • Reduce communication • Mapping • Load balancing, task scheduling
Manager / Worker Paradigm • Manager • Responsible for keeping track of assigned and unassigned data • Worker • Assigns tasks to other processes, and retrieves results back from them
Manager / Worker Paradigm Balances Workloads Increasing execution time Lowering speedup
Manager Process a = array showing document assigned to each process d = document assigned j = ID of worker k = document vector length n = number of documents p = number of processor s = storage array containing document vectors t = terminated workers v = individual document vector IntMPI_Abort (MPI_Commcomm, interror_code) Makes a “best effort” attempt to abort all processes
Worker Process • Scenario 1
Worker Process • Scenario 2 Broadcast bandwidth inside the parallel computer > bandwidth between File server & parallel computer
Worker Process f = file name k = dictionary size v = document vector
Worker Process Decide which process will be the manager MPI_COM_WORLD
Creating a Workers-Only Communicator Int id; MPI_Commworker_comm; … If (!id) /* Manager */ MPI_Comm_Split (MPI_COMM_WORLD, MPI_UNDEFINED, id, &worker_comm); Else/* Worker */ MPI_Comm_split (MPI_COMM_WORLD, 0, id, &worker_comm); To split a communicator into one or more new communicator
Non Blocking Communications 3 phases of Manager Process:
Non Blocking Communications Does not return until either message has been copied into a system buffer or message has been sent. Does not return until the message has been received into the buffer. Overlap System Hang
Non Blocking Communications Ok, Here is the status Hello, I’ve just started this send It looks like the message has been sent Not yet MPI_REQUEST_NULL
Function for Manager • MPI_Irecv • MPI_Wait IntMPI_Irecv (void *buffer, intcnt, MPI_Datatypedtype, intsrc, int tag, MPI_Commcomm, MPI_Request *handle) We cannot access buffer until a matching call to MPI_Wait has returned. A handle(pointer) to an MPI_Request object that identifies the communication operation that has been initiated IntMPI_wait (MPI_Request *handle, MPI_Status *status) Blocks until the operation associated with pointer handle completes. MPI_Status object containing information about the received message.
Function for Worker • MPI_Isend • MPI_Probe • MPI_Get_count IntMPI_Isend (void *buffer, intcnt, MPI_Datatypedtype, intdest, int tag, MPI_Commcomm, MPI_Request *handle) MPI_Request object created by the run-time system. The message buffer may not be reused until the matching call to MPI_Wait has returned. IntMPI_Probe (intsrc, int tag, MPI_Commcomm, MPI_Status *status) Src: the rank of the message source Tag: the incoming message’s Comm: the communicator Block until a message matching the source and tag is available to be received. IntMPI_Get_Count (MPI_Status *status, MPI_Datatypedtype, int *cnt)
Summary • Parallel Algorithm Design • Partitioning • Communication • Agglomeration • Mapping • Manager • find the plain text files • Allocates the document to the worker • Writes the complete set of document (output) • Non Blocking Communication • To enhanced the performance on the system using MPI_Wait function