210 likes | 313 Views
The Join and Union steps. We will create a simple job demonstrating the Join and Union steps. We will assume that you are familiar with the Hello World program. Start by setting up your Global Workspace and Global Connection as you did for Hello World. Create Netezza tables.
E N D
The Join and Union steps • We will create a simple job demonstrating the Join and Union steps. • We will assume that you are familiar with the Hello World program. • Start by setting up your Global Workspace and Global Connection as you did for Hello World.
Create Netezza tables Create and populate two tables as follows: create table member_names (member_idint, member_namevarchar(30)); insert into member_namesvalues (1001, 'albert'); insert into member_namesvalues (1002, 'barbara'); insert into member_names values (1003, 'calvin'); insert into member_names values (1004, 'donna'); insert into member_names values (1005, 'earl'); create table member_scores (member_idint, pluses int, minuses int); insert into member_scores values (1002, 4, 5); insert into member_scores values (1003, 6, 6); insert into member_scores values (1006, 4, 3); insert into member_scores values (1007, 8, 2);
Create two table steps Place two table step icons on the job canvas, and connect them to the two tables you created.
Create a join step Place a join step on the canvas, and connect the two tables steps to is, as shown above.
Edit the join step Select MEMBER_ID from each column, then click on INSERT to set the join condition, as shown above. Then click OK/Save.
Create output table step Place another table step on the palette, and connect the join step to it, as shown above. Note that the transformer connecting the JOIN step to the new TABLE step is black (meaning “not yet validated”).
Edit output table step This time, use the “Create Table” option. Use a table name that doesn’t yet exist in the database.
Now the transformer is red. Click on the arrow head of the (red) transformer to edit it.
Editing the transformer When we open up the transformer, we see the mapping shown above. The is the mapping we want, so just click “OK”.
Now the transformer is blue. …meaning that it is validated. We’re ready to run the job.
Create and run batch • Save your job. • Create a batch job to run it. • Run the job. • Check the database verify that the correct table was created, and contains the correct output.
Edit the join step Next, Edit the job, and change the Join Type to Left Outer. Click OK/Save.
Edit the output table step Change the options to Existing Table, Truncate, and choose the output table that you created with the Join step.Click OK. The transformer will turn red again; open it up and click OK. Then run the job.
Verify your output HELLO_DB(USER1)=> select * from join_out; MEMBER_ID | MEMBER_NAME | PLUSES | MINUSES -----------+-------------+--------+--------- 1002 | barbara | 4 | 5 1003 | calvin | 6 | 6 1004 | donna | | 1001 | albert | | 1005 | earl | | (5 rows) Is this what you got? Is this what you expected?
Add union step Create another table step, also connected to MEMBER_NAMES. Disconnect the old MEMBER_NAMES step from the Join. Drag a Union step onto the palete, and connect both MEMBER_NAMES steps to it.
Edit union step Edit the Union step and choose UNION ALL, as shown.
Connect Union to Join Connect Union to Join as shown. There’s no need to re-edit the Join step, since the input metadata hasn’t changed. Save and run.
Verify your output HELLO_DB(USER1)=> select * from join_out; MEMBER_ID | MEMBER_NAME | PLUSES | MINUSES -----------+-------------+--------+--------- 1001 | albert | | 1001 | albert | | 1002 | barbara | 4 | 5 1002 | barbara | 4 | 5 1004 | donna | | 1004 | donna | | 1005 | earl | | 1005 | earl | | 1003 | calvin | 6 | 6 1003 | calvin | 6 | 6 (10 rows) It should look something like this.
Add a de-duplication step The De-Duplicate step removes duplicate records where every field value is the same. Add a De-Duplicate step between the join and the output table. Again, since there are no metadata changes, there is no need to refresh any transformers. Rerun, and check your output.
Exercises • What would happen if you chose UNION instead of UNION ALL inside the Union step? • Create a new table with the same column definitions as MEMBER_NAMES, but different data. Connect the second MEMBER_NAMES table to this new table, and rerun the job, with various options for the Join and Union stages. • Create yet another MEMBER_NAMES table with data that overlaps with, but isn’t the same as the original MEMBER_NAMES table. Replace the Union step with the Minus step.