260 likes | 428 Views
SEL3053: Analyzing Geordie Lecture 13. Hierarchical cluster analysis 2 - cluster tree construction.
E N D
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Lecture 12 introduced hierarchical cluster analysis, observed that construction of a hierarchical cluster tree was a two-step process --creation of a vector-distance table, and construction of the tree on the basis of that table-- and outlined the first of these two steps. This lecture deals with the second step.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction There are two main ways of constructing a cluster tree, which the literature on the subject generally refers to as 'top-down' and 'bottom-up'. These terms won't be explained here since an explanation would take us too far afield. Suffice it to say that this module confines itself to the 'bottom up' approach, and that nothing further is said about 'top-down'.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction As noted, construction of a cluster tree for a data matrix is based on the distance table abstracted from the matrix. In what follows we will use the distance table constructed in the last lecture, but a 6 x 6 subset of the original 30 x 30 distance table will be used. This makes it possible to show the whole table rather than just a fragment, thereby baking the discussion clearer.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction
To further simplify the presentation, it is observed that the table is symmetrical on either side of the diagonal of zero-vales. This is because the distance between any pair of vectors is the same in either direction: the distance between vector 2 and vector 3 is the same as that between vector 3 and vector 2. Since the upper-right triangle simply duplicates the lower-left triangle, one of the two can be deleted without losing any information; the upper-right one is deleted: SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction A cluster tree for the first 6 rows of the original data matrix will now be constructed step-by-step, showing how the distance table is used to do this. The procedure is based on the principle that a set of vectors has a cluster structure if it can be divided into two or more groups in which the members of any given group are close to one another in the data space, and far from members of other cluster in the space. At each step in tree construction, therefore, one looks for the clusters that are closest to one another and amalgamates them into a superordinate cluster, and this continues until all the vectors have been assigned to one of the clusters. The following example will demonstrate this.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Initially, each vector is taken to be a cluster on its own, that is, a cluster with only one member. The distance table is now searched to find the smallest distance between clusters. This is the distance between clusters (2) and (3): 2.24 Clusters (2) and (3) are now combined into a superordinate cluster (2,3) by drawing the tree, as below, and then emending the distance table to incorporate the new cluster.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Emendation of the distance table takes a bit of understanding, so it is described in detail. Remove the rows and columns 2 and 3 from the table, and replace them with a single blank row and column to represent the new (2,3) cluster. Note that 0 is inserted as the distance between (2,3) and itself for the self-evident reason that the distance of any object to itself is always 0.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Insert into the blank cells of the (2,3) row and column the minimum distance from (2,3) to the remaining clusters (1), (4), (5), and (6). What does this mean? Referring to the original distance table above, the distance between (2) and (1) is 2.83 and between (3) and (1) it is 5.00; the minimum here is 2.83, and it is inserted into the relevant cell:
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between (2) and (4) in the original distance table is 4.24 and between (3) and (4) it is 2.25; the minimum here is 2.25, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between (2) and (5) in the original distance table is 7.81 and between (3) and (5) it is 5.66; the minimum here is 5.66, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between (2) and (6) in the original distance table is 46.87 and between (3) and (6) it is 48.02; the minimum here is 46.87, and it is inserted into the relevant cell. Emendation of the distance table is now complete, and the result is the basis for the next step in the construction of the cluster tree. Note that the table has shrunk by one row/column. This shrinkage will continue as we proceed.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance table created in Step 1 is searched to find the smallest distance between clusters. This is the distance between clusters (2,3) and (4): 2.25 Clusters (2,3) and (4) are now combined into a superordinate cluster ((2,3),4) by drawing the tree, as below, and then emending the distance table to incorporate the new cluster. Emendation of the distance table proceeds as in Step 1.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Remove the rows and columns (2,3) and 4 from the table, and replace them with a single blank row and column to represent the new ((2,3),4) cluster.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Insert into the blank cells of the ((2,3),4) row and column the minimum distance from ((2,3),4) to the remaining clusters (1), (5), and (6). The distance between (2,3) and (1) is 2.83 and between (4) and (1) it is 7.07; the minimum here is 2.83, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between (2,3) and (5) is 5.66 and between (4) and (5) it is 3.61; the minimum here is 3.61, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between (2,3) and (6) is 46.87 and between (4) and (6) it is 47.89; the minimum here is 46.87, and it is inserted into the relevant cell. Emendation of the distance table is now complete, and the result is the basis for Step 3 below. Note that the table has again shrunk by one row/column.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance table created in Step 2 is searched to find the smallest distance between clusters. This is the distance between clusters ((2,3),4) and (1): 2.83 Clusters ((2,3),4) and (1) are now combined into a superordinate cluster (((2,3),4),1) by drawing the tree, as below, and then emending the distance table to incorporate the new cluster. Emendation of the distance table proceeds as in Steps 1 and 2.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Remove the rows and columns (2,3) and 4 from the table, and replace them with a single blank row and column to represent the new (((2,3),4),1) cluster.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Insert into the blank cells of the (((2,3),4),1) column the minimum distance from (((2,3),4),1) to the remaining clusters (5) and (6). The distance between ((2,3),4) and (5) is 3.61 and between (1) and (5) it is 10.63; the minimum here is 3.61, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance between ((2,3),4) and (6) is 46.87 and between (1) and (6) it is 46.40; the minimum here is 46.40, and it is inserted into the relevant cell. Emendation of the distance table is now complete, and the result is the basis for Step 4 below. Note again that the table has again shrunk by one row/column.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance table created in Step 3 is searched to find the smallest distance between clusters. This is the distance between clusters (((2,3),4),1) and (5): 3.61 Clusters (((2,3),4),1) and (5) are now combined into a superordinate cluster ((((2,3),4),1),5) by drawing the tree and then emending the distance table to incorporate the new cluster. Emendation of the distance table proceeds as in Steps 1-3.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Remove the rows and columns (((2,3),4),1) and 5 from the table, and replace them with a single blank row and column to represent the new ((((2,3),4),1),5) cluster.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Insert into the blank cell of the ((((2,3),4),1),5) column the minimum distance from ((((2,3),4),1),5) to the remaining cluster (6). The distance between (((2,3),4),1) and (6) in Table 4 is 46.40 and between (5) and (6) it is 49.66; the minimum here is 46.40, and it is inserted into the relevant cell.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction The distance table created in Step 4 is searched to find the smallest distance between clusters. There is only one remaining value. Clusters ((((2,3),4),1),5) and (6) are now combined into a superordinate cluster (((((2,3),4),1),5),6) by drawing the tree and then emending the distance table to incorporate the new cluster.
SEL3053: Analyzing GeordieLecture 13. Hierarchical cluster analysis 2 - cluster tree construction Remove the rows and columns ((((2,3),4),1),5) and 6 from the table, and replace them with a single blank row and column to represent the new (((((2,3),4),1),5),6) cluster. All 6 vectors have now been incorporated into the cluster tree, and tree construction stops.