Nova System Architecture

Nova System Architecture

Metadata Database • Modules are stateless • Keep state externally • Shared among multiple instances • No sync instead round robin policy

Watchdog • Via Zookeeper • Leader election • Detects unresponsive server instances & kill them • Start new replacements • Reconfigure load balancer’s routing table

Clients Types • Human clients via command line interface • Web interface via SOAP • NOVA is part of larger environment • Data onboarding and data serving interact with NOVA

User Interface • Provides API for registering channels and workflowettes • Binds workflowettes to channels • (de)registers triggers and insertion of new blocks to channels • Monitors workflowettes execution status -Workflowetts are written in a variant XPDL to describe a directed graph of processing steps

Process manager • Keeps track of registered & bound workflowettes • Responds to trigger events by creating executable instance of workflowettes • Pass executable instance of workflowettesto process executor & has their execution status

Data manager • Maintains list of blocks associated with each channel & task input cursor. • When process manager prepare to execute workflowette, data manager does: • Create pig latin expressions • Reserve output block positions

Process optimizer • Currently implemented: • Merging pairs of workflowette executions that read the same input data and run in the same time • Future plan: • Pipelining workflowettes that form a “chain”

Process executor • Forward execution requests to Oozie(run pig jobs) • Track execution status • Report the execution status back to process manager

Trigger manager • Runs its own tread & fires triggers • Uses metadata DB to avoid conflicts with other concurrent Nova server instances • Triggers cause a workflowette to run via requests to process manager • Triggers garbage collection events that are performed by data manager

Data & metadata replicator • Cross-data-center replication is asynchronous & one way ( stand-by instance) for: • Fail-over • Migration • In Nova complicated, cause data is stored in two places with enormous links • Avoids creating dangling references by: • keep track of references & delaying the replay of transactions until all referenced HDFS files are copied

Scan sharing • Workflowettes scheduled independently, but often multiple workflowette executions that read same data are triggered almost in the same time. • “Load” operation is used once across different workflowettes. • Merging is governed in XML workflowette by : • “ Mergeable ” annotation • “ Maximam queue time “ • Only workflowettes that use ALL consumption mode are supported.

Experiments • Incremental Processing: • Merge overhead • Different block sizes • Incremental Vs. non-incremental Join • Scan Sharing

Merge overhead • Merge function that co-groups records by a key, then applies UDF to examine each record • Pig offers 2 alg. For physical co-group: • Map-side • Reduce-side

Merge overhead “we have not been able to pin down the source of this per-block overhead, i.e. whether it lies mostly in opening an HDFS file or in Pig's le handling code.” • Question: Given a data set with fixed size, how much difference does it make to divide it into large number of small blocks or small number of large blocks?? • Depends on the size of the data size • if large data set the difference is small , • If small data set the difference is large.

Incremental Vs. non-incremental Join • (old A JOIN new B) U (new A JOIN old B) U (new A JOIN new B), which emits a delta block of the join result VS. (All A JOIN All B) • this experiment sorts the data by the join key and bulk-loads it into Zebra (a sorted and sparsely-indexed Hadoopfile format) files and then runs a map-side join.

Scan Sharing

Thank you!!Questions??

Nova System Architecture

Nova System Architecture

Presentation Transcript

VII System Architecture

SAP System Architecture

ATCA System architecture

Safeview System Architecture

Database System Architecture

System Architecture

System Architecture

SYSTEM ARCHITECTURE

Global system architecture

System Architecture

System management Architecture

Control System Architecture

DSpace System Architecture

System Architecture

System Architecture

SYSTEM ARCHITECTURE

System Architecture

P802.19.1 System Architecture

Information System Architecture

System Architecture

System Architecture