310 likes | 841 Views
Transformations. Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target. Transformations are of two types: 1) Active 2) Passive. Overview. Aggregator transformation allows you to perform aggregate functions, such as averages and sums.The aggregator transformation is unlike the Expression transformation, We can use aggregator transformation to perform calculat197
E N D
1. AGGREGATOR TRANSFORMATION Basics and Advanced Concepts
3. Overview Aggregator transformation allows you to perform aggregate functions, such as averages and sums.
The aggregator transformation is unlike the Expression transformation, We can use aggregator transformation to perform calculations on groups.
Expression transformation permits you to perform calculations on a row-by-row basis only
4. Aggregator Transformation
We can use conditional clauses to filter rows, providing more flexibility
After we create a session that includes an aggregator transformation ,we can enable the session option, Incremental aggregation. While performing incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
5. Creating an Aggregator Transformation
To use an Aggregator transformation in a mapping, you add the Aggregator transformation to the mapping, then configure the transformation with an aggregate expression and group by ports, if desired.
To create an Aggregator transformation:
In the Mapping Designer, choose Transformation-Create. Select the Aggregator transformation.
Enter a name for the Aggregator, click Create. Then click Done. The Designer creates the Aggregator transformation.
Drag the desired ports to the Aggregator transformation.
The Designer creates input/output ports for each port you include.
6. Creating an Aggregator Transformation Double-click the title bar of the transformation to open the Edit Transformations dialog box.
Select the Ports tab.
Click the group by option for each column you want the Aggregator to use in creating groups.
You can optionally enter a default value to replace null groups.
Click Add and enter a name and data type for the aggregate expression port. Make the port an output port by clearing Input (I). Click in the right corner of the Expression field to open the Expression Editor. Enter the aggregate expression, click Validate, then click OK.
Make sure the expression validates before closing the Expression Editor.
7. Continuation… Select the Properties tab.
9. Continuation… Click OK.
Choose Repository-Save to save changes to the mapping
10. Components of the Aggregator Transformation The Aggregator is an active transformation, changing the number of rows in the pipeline. The Aggregator transformation has the following components and options:
Aggregate expression. Entered in an output port. Can include non-aggregate expressions and conditional clauses.
Group by port. Indicates how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
Sorted input. Use to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
Aggregate cache. The PowerCenter Server stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache.
6
11. Aggregate Caches When you run a session that uses an Aggregator transformation, the PowerCenter Server creates index and data caches in memory to process the transformation. If the PowerCenter Server requires more space, it stores overflow values in cache files.
You can configure the index and data caches in the Aggregator transformation or in the session properties.
No need to configure cache memory for Aggregator transformations that use sorted ports
12. Aggregate Functions You can use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.
The transformation language includes the following aggregate functions:
AVG
FIRST
COUNT
LAST
MAX
PERCENTILE
STDEV
SUM
VARIANCE
?When you use any of these functions, you must use them in an expression within an Aggregator transformation.
13. Group By Ports
The Aggregator transformation allows you to define groups for aggregations, rather than performing the aggregation across all input data. For example, rather than finding the total company sales, you can find the total sales grouped by region.
To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports, creating a new group for each unique combination of groups.
To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports, creating a new group for each unique combination of groups.
14. Group By Ports The following Aggregator transformation groups first by STORE_ID and then by ITEM:
15. Continuation… If you send the following data through this Aggregator transformation:
16. Continuation….. The PowerCenter Server performs the aggregate calculation on the following unique groups:
17. Continuation…..
The PowerCenter Server then passes the last row received, along with the results of the aggregation, as follows:
18. Using Sorted Input
?You can improve Aggregator transformation performance by using the sorted input option. When you use sorted input, the PowerCenter Server assumes all data is sorted by group. As the PowerCenter Server reads rows for a group, it performs aggregate calculations. When necessary, it stores group information in memory. To use the Sorted Input option, you must pass sorted data to the Aggregator transformation.
?When you do not use sorted input, the PowerCenter Server performs aggregate calculations as it reads. However, since data is not sorted, the PowerCenter Server stores data for each group until it reads the entire source to ensure all aggregate calculations are accurate
19. Sorted Input Conditions Do not use sorted input if either of the following conditions are true:
The aggregate expression uses nested aggregate functions.
The session uses incremental aggregation.
If you use sorted input and do not sort data correctly, the session fails.
Pre-Sorting Data
To use sorted input, you pass sorted data through the Aggregator.
Data must be sorted as follows:
By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
Using the same sort order configured for the session. If data is not in strict ascending or descending order based on the session sort order, the PowerCenter Server fails the session
20. Mapping with a Sorter transformation configured to sort the source data in descending order by ITEM_NAME
21. Optimizing Aggregator Transformations Aggregator transformations often slow performance because they must group data before processing it. Aggregator transformations need additional memory to hold intermediate group results.
You can optimize Aggregator transformations by performing the following tasks:
Group by simple columns.
Use sorted input.
Use incremental aggregation.
22. Performance Tuning Tips Group By Simple Columns
You can optimize Aggregator transformations when you group by simple columns. When possible, use numbers instead of string and dates in the columns used for the GROUP BY. You should also avoid complex expressions in the Aggregator expressions.
Use Sorted Input
You can increase session performance by sorting data and using the Aggregator Sorted Input option.
The Sorted Input decreases the use of aggregate caches. When you use the Sorted Input option, the PowerCenter Server assumes all data is sorted by group. As the PowerCenter Server reads rows for a group, it performs aggregate calculations. When necessary, it stores group information in memory.
The Sorted Input option reduces the amount of data cached during the session and improves performance. Use this option with the Source Qualifier Number of Sorted Ports option to pass sorted data to the Aggregator transformation.
23. Performance tuning tips Use Incremental Aggregation
If you can capture changes from the source that changes less than half the target, you can use Incremental Aggregation to optimize the performance of Aggregator transformations.
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. The PowerCenter Server updates your target incrementally, rather than processing the entire source and recalculate the same calculations every time you run the session.
Filter before aggregating
If you use a filter transformation in the mapping,place the transformation before the aggregator transformation to reduce unnecessary aggregation.
24.
Thank You…