360 likes | 642 Views
Java Distributed Application Framework. David Noblet, Josh Gutman, and Sid Patel. Distributed Application Toolkit. Provide a set of generic components that can be used to build distributed systems We base this on a SETI@home -like model Compute nodes are voluntary
E N D
Java Distributed Application Framework David Noblet, Josh Gutman, and Sid Patel
Distributed Application Toolkit • Provide a set of generic components that can be used to build distributed systems • We base this on a SETI@home-like model • Compute nodes are voluntary • Framework should take care of • Work distribution • Advertise available work to be processed • Distribute the work to available compute nodes • Reliability/fault-tolerance
JDAF Architecture • Work unit • This is basically a serializable closure (a pair of some executable code and data as input to that code) • Producer • A process that generates work units • Worker • A process that processes work units • Directory • A process that keeps track of producers with available work units
Implementation - what we have • The work unit • A centralized directory service • Basic worker process • Producer framework • A message passing communication infrastructure • And a toy test application (recently!)
The Work Unit • Uses Java object serialization mechanism • Serialize both classes and input objects • Makes special use of class loader • Automatic class dependency calculation • Simply tag external classes with a special empty interface • Needs more than regular Java reflection for this • We use BCEL instead
Using the Work Unit • The distributed application developer provides 3 classes: • WorkUnitData • WorkUnitProcessor • WorkUnitResult • The WorkUnit object takes care of the rest
The Producer • Generates work units • Advertises the availability of work to the directory service • Distributes work to requesting clients • Processes incoming work unit results that are generated by processing clients
Using the Producer • The distributed application developer provides 2 methods: • WorkUnit getWU() • void processResult(WorkUnitResult result)
Directory Service • Helps work producers to advertise availability of work. • All advertised work must be for some registered process. • Workers can request list of registered processes. • Workers can request work from a filtered set of processes.
Directory Service Properties • Loose database • Information stored in database is update by the work producers. Information is not actively maintained. • Avoid starvation if possible • If no one want to work on some process, we can not do anything. • Some fairness • Round robin distribution
Directory Service Implementation • Database • Process Information Table (Provides information to workers about processes) • Process ID: Unique id associated with the process • Process Description: What it is about • Work Information Table • Process ID: • Producer ID: Who has the work. Contact for work unit. • Count: Quantity of available work.
Directory Service Interface • advertiseProcess(process description) • Called by producers • getAdvertisedProcesses() • Called by workers • publishWork(work description) • Called by producers • getAvailableWork(worker preferences) • Called by workers
Implementation • Directory Service Databases is MySql • DirectoryServiceSetup creates database, database user, password, and tables to be used by jdaf. • Uses MySQL Connector-J for data operations. • Objects (like ProcessID, ProducerID) are stored as bytes in the database and can be re instantiated This allows implementation independence.
Worker Implementation • Event Driven • One action fires the next • Initiated and stopped at user’s request • Leverages Flexible Messaging System • Provides User Choice • Doesn’t join unwanted processes • User chooses when to work and when not to work
Worker Implementation • Built for Easy Modification • Three Levels: • Worker Interface • BasicWorker Class • WorkerUI Class
Where to go from here • Write test application • Perhaps a basic webcrawler • Decentralize directory service • Add other advanced features as time permits • Security/authentication • Advanced scheduler w/ support for prioritization • Performance enhancements
Test Application: Web Crawler • WU Data is a URL • WU Process is a method to extract other URLs and word counts from original URL • WU Result is packaged word counts and linking URLs • Producer to turn WU Result into more WUs and keep track of running word-count totals • Plug this WU into a basic implementation and web crawler running in no time.
Distributed Directory Service • Directory services gossip with one another to maintain a complete graph and share process information. • List of directories is locally stored. • A directory is removed if it has not contacted within some threshold time. • Randomly choose one of directories and send information about • other directories • local processes: Indirection level is increased by 1. • Preference is given to processes with indirection level 0.
P2 P1 D1
P2 P1 D1 D2
P2 P1 D1 D2
P3 P2 P1 D2 D1
P2 P3 P1 D2 D1
P3 D2 P2 P1 D1
P3 D2 P2 P1 D1
Work Units • Web Crawler : Given a web page extract • Links to other web pages (for the Web Crawler) • Image URLs (for the Image Text Extractor) • Image Text Extractor: Given an image URL extract • String from the image by using an OCR (For words) • Post Processor: Given a set of strings extract • English words using spell checker.
Links Web Crawler Web Crawler Image Text Extractor URL Image URLS Strings Post Processor English Words: Associated with images and URLs.