330 likes | 498 Views
SEL3053: Analyzing Geordie Lecture 14. Hierarchical cluster analysis of the DECTE data. This lecture cluster analyzes M, the data matrix for the twelve selected TLS speakers created in an earlier lecture . SEL3053: Analyzing Geordie Lecture 14. Hierarchical cluster analysis of the DECTE data.
E N D
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data This lecture cluster analyzes M, the data matrix for the twelve selected TLS speakers created in an earlier lecture.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The preceding two lectures have explained how hierarchical cluster analysis works. In this one we'll apply it to the data matrix M for the twelve selected TLS speakers. One final observation needs to be made before doing so, however, and it's this: hierarchical cluster analysis is not a single method but a family of methods. Each member of the family uses a different definition of what constitutes a cluster, and commensurately constructs cluster trees differently; often, the solutions they come up with are also different. This raises a self-evident question: where there are different cluster solutions for a given data matrix, which one is right? That's a question we'll address in due course.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The figure below shows the three definitions of what constitutes a cluster that we'll be using. • Single link defines the distance between two clusters as the distance between the two closest objects. • Complete link defines the distance between two clusters as the distance between the two furthest objects. • Average link defines the distance between two clusters as the distance between the average distances of objects in each cluster from one another.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The figure below shows the cluster trees for M for each of these varieties of hierarchical analysis.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The three trees look similar, but there are differences of detail when closely examined. The significance of these differences will be discussed in the next lecture, when the above results are analyzed. The remainder of the present lecture will look at how to do a cluster analysis using software available via the University's Common Desktop, as described on the Materials section of this module website
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The SPSS statistical software used for cluster analysis is not downloaded or otherwise obtained like the data creation software, but used directly from the University Desktop. SPSS is a long established and very widely used statistical analysis system. It's very extensive and easy to get lost in. The following instructions chart a path through the system for the purposes of this module. Deviate at your own risk.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Starting SPSS 1. If there's no icon that looks like this on your desktop then type the following URL into your browser: http://ras.ncl.ac.uk
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 2. Log in using your usual ID and password. If you don't have the RAS software installed, click the relevant link below the login box and follow the instructions.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3. Select 'Statistical Software'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 4. Select 'SPSS Statistics 19'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 5. The following may eventually appear. Cancel it.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data At this point SPSS is waiting for you to do something.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data For the purposes of this lecture, that 'something' is cluster analysis. Using SPSS for cluster analysis is a two-step process: the data matrix first needs to be loaded, and this is then cluster analyzed. The aim is to cluster analyze M, the data matrix abstracted from the DECTE / TLS phonetic transcriptions and transformed so as (i) to eliminate the effect of variation in interview length, and (ii) to reduce its dimensionality. To load M into SPSS, proceed as follows.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 1. Put a copy of 'M.txt' into H:/My Data Sources. This is important; failure to do so will prevent SPSS seeing the matrix. 2. In the SPSS menu, select 'File' > 'Open' as below.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Click 'Data', select 'Files of type' to be 'All files', go the the 'H:/My data sources' directory, and select M.txt.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3. There now comes a series of popup windows: 3.1 Click 'Next'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3.2 Click 'Next'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3.3 Click 'Next'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3.4 Click 'Next'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3.5 Click 'Next'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data 3.6 Click 'Finish'
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data The screen should now look like this:
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 1. Click 'Analyze' > 'Classify' > 'Hierarchical Cluster'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 2. This popup appears. Click on v2
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 3. Use the slider to move down to end of the list, hold down the shift key (up arrow at the left of the keyboard), and click on the last variable:
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 4. Click on the right-facing arrow. This moves the variables into the right-hand text box.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 5. Click on v1 and then on the right-arrow at 'Label cases by'. V1 contains the speaker labels 'nectetlsg01.txt' and so on.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 6. Unclick 'Statistics' near the bottom of the popup.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 7. Click 'Plots', then 'Dendrogram' and 'Horizontal', then 'Continue'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 8. Click 'Method
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 9. At 'Cluster Method' select the clustering method you want to use: 'Nearest neighbor' = single link, 'Furthest neighbor' = complete link, and 'Centroid clustering' = average link. Also, at 'Interval', select 'Euclidean distance'. Then click 'Continue'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 10. Click 'OK'.
SEL3053: Analyzing GeordieLecture 14. Hierarchical cluster analysis of the DECTE data Clustering 11. The result of clicking OK is a new window containing output. Scroll to the bottom, ignoring everything else. That's where you'll find the cluster tree.