260 likes | 331 Views
Csci 490 / Engr 596 Special Topics / Special Projects Software Design and Scala Programming Spring Semester 2010 Lecture Notes. Pipes and Filters Architectural Pattern. Created: 14 September 2004, Revised: 13 April 2010. Definition.
E N D
Csci 490 / Engr 596Special Topics / Special ProjectsSoftware Design and Scala ProgrammingSpring Semester 2010Lecture Notes
Pipes and Filters Architectural Pattern Created: 14 September 2004, Revised: 13 April 2010
Definition The Pipes and Filters architectural pattern provides a structure for systems that process a stream of data. • Each processing step is encapsulated in a filter component. • Data are passed through pipes between adjacent filters. • Recombining filters allows you to build families of related filters 1
Context • Programs that must process streams of data 2
Problem Build a system that • Must be built by several developers • Decomposes naturally into several independent processing steps • For which the requirements are likely to change 3
Forces • Possible to substitute new filters for existing ones or recombine steps into different communication structure • Components implementing small processing steps easier to reuse than components implementing large steps • Two steps share no information if they are not adjacent • Different sources of input data exist • Possible to display or store final results of computation in various ways • If user stores intermediate results in files, then likelihood of errors increases and file system cluttered • Possible for parallel execution of steps 4
Solution • Divide task into a sequence of processing steps • Implement each step by filter program that consumes from its input and produces data on its output incrementally • Connect output of one step as input to succeeding step by means of pipe • Enable filters to execute concurrently • Connect input to sequence to some data source • Connect output of sequence to some data sink 5
Pipe Pipe Pipe Data Sink Data Source Filter1 Filter2 Structure 6
Structure (cont.) • Filter • Processing units of the pipeline • enrich data by computing new information from input data and adding it to output data stream • refine data by concentrating or extracting information from input data stream and passing only that information to output stream • transform input data to new form before passing it to output stream • do some combination of enrichment, refinement, and transformation • Active filter • separate process or thread • pulls data from the input data stream • pushes the transformed data onto the output data stream • Passive filter • called as a function, a pull of the output from the filter • called as a procedure, a push of output data into the filter 7
Structure (cont.) • Pipes • Connectors between data source and first filter, between filters, and between last filter and data sink • Data source • Entity (e.g., a file or input device) that provides input data to system. • Either actively push data down pipeline or passively supply data when requested • Data sink • Entity that gathers data at end of pipeline • Either actively pull data from last filter element or passively respond when requested by last filter element 8
Implementation • Divide functionality into sequence of processing steps • Each step depends upon outputs of previous step and becomes filter in system • Define type and format of data to be passed along each pipe • Determine how to implement each pipe connection • Pipe connecting to passive filter might be implemented as direct call of adjacent filter • push connection as call of downstream filter as procedure • pull connection as call of upstream filter as function 9
Implementation (cont.) • Design and implement filters • Active filter needs to run with its own thread of control • heavyweight operating system process • having its own address space • lightweight thread • sharing an address space with other threads • Passive filter not require separate thread of control • Selection of the size of the buffer • large buffers use up much available memory but involve less synchronization and context-switching overhead • small buffers conserve memory at the cost of increased overhead • Different processing options 10
Implementation (cont.) • Design for robust handling of errors • Example: Unix program use stderr channel to report errors • Recover from errors • discard bad input and resynchronize at some well-defined point later in input data. • back up input to some well-defined point and restart processing, using different processing method for bad data • Configure pipes-and-filters system and initiate processing • Use standardized main program to create, connect, and initiate needed pipe and filter elements of pipeline • Use end-user tool to create, connect, and initiate needed pipe and filter elements of pipeline 11
A retargetable compiler for programming language Source element reads program text from file (or sequence of files) as stream of characters Lexical analyzer converts stream of characters into stream of lexical tokens for language – keywords, identifiers, operators, etc. Parser recognizes sequence of tokens that conforms to language grammar and translates sequence to abstract syntax tree Semantic analyzer reads abstract syntax tree and writes appropriately augmented abstract syntax tree Program text lexical tokens stream/ characters augmented AST AST … Semantic Analyer Lexical Analyer Parser Source Example 12
instruction sequence for VM augmented AST efficient sequence optimized AST … … Global optimizer Intermediate code generator Local optimizer Example (cont.) • Global optimizer reads augmented syntax tree and outputs equivalent that is more efficient in space and time usage • Intermediate code generator translates augmented syntax tree to sequence of instructions for virtual machine • Local optimizer converts sequence of intermediate code instructions into more efficient sequence 13
single executable module relocatable binary module instructionsequence for RM efficient sequence … Backend code generator Sink Linker Assembler File Example (cont.) • Backend code generator translates sequence of virtual machine instructions into sequence for some real platform • for some hardware processor augmented by operating system and runtime library calls • Assembler needed to translate symbolic instruction sequence into relocatable binary module if previous step generated assembly code • Linker needed to bind separate modules with library modules to form single executable (i.e., object code) module if previous steps generated sequence of binary modules • Sink element outputs resulting binary module into file 14
single executable module instruction sequence for VM relocatable binary module instructionsequence for RM augmented AST efficient sequence efficient sequence optimized AST … … … Backend code generator Global optimizer Intermediate code generator Local optimizer Sink Linker Assembler Program text File lexical tokens stream/ characters augmented AST AST … Semantic Analyer Lexical Analyer Parser Source Example (cont.) 15
Example (cont.) Pipeline support different variations • If source code preprocessing is to be supported • Preprocessor filter inserted in front of lexical analyzer • If language to be interpreted rather than translated into object code • Backend code generator (and all components after it) replaced by interpreter for virtual machine • If compiler to be retargeted to different platform • Backend code generator (and assembler and linker) for new platform substituted for old one • If compiler to be modified to support a different language with same lexical structure • Parser, semantic analyzer, global optimizer, and intermediate code generator replaced • If a load-and-go compiler desired • File-output sink replaced by loader that loads executable module into main memory and starts module executing 16
Example (cont.) To make the system more efficient or convenient • System of filters may directly share global state • Combine adjacent active filters and replace pipe by an upstream function call or downstream procedure call • Make new information available in a filter • Example: symbol table information for runtime debugging tools 17
cat tr tr sort tee pipeA uniq cat comm pipeB uniq Variants A generalization allows filters with multiple input or output pipes to be connected in any directed graph structure • Restrict to directed acyclic graph structures • tee filter in Unix • provides mechanism to split stream into two streams, named pipes provide mechanisms for constructing network connections, and filters with multiple input files/streams provide mechanisms for joining two streams • # create two named pipes • mknod pipeA p mknod pipeB p • # set up side chain computation • #(running in the background) • cat pipeA >pipeB & • # set up main pipeline computation • cat filename | tr -cs "[:alpha:]" "[\n*256]" \ • | tr "[:upper:]" "[:lower:]" | sort | tee pipeA | uniq \ • | comm -13 - pipeB | uniq 18
Consequences Benefits • Intermediate files unnecessary, but possible. • Flexibility by filter exchange. • Flexibility by recombination. • Reuse of filter elements. • Rapid prototyping of pipelines. • Efficiency by parallel processing 19
Consequences Liabilities • Sharing state information is expensive or inflexible • Efficiency gain by parallel processing is often an illusion • Data transformation overhead • Error handling 20
References • Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, and Michael Stal. Pattern-Oriented Software Architecture: A System of Patterns, Wiley, 1996. • Mary Shaw and David Garlan. Software Architecture: Perspectives on an Emerging Discipline, Prentice-Hall, 1996. 21
Acknowledgement This work was supported by a grant from Acxiom Corporation titled “The Acxiom Laboratory for Software Architecture and Component Engineering (ALSACE).” 22