80 likes | 203 Views
Special Techniques. Cluster Analysis. Classification or Categorization Classification is mathematical and objective while interpretation is somewhat subjective Minimize within group variation and maximize between group variation Data exploration Data structure is unknown
E N D
Cluster Analysis • Classification or Categorization • Classification is mathematical and objective while interpretation is somewhat subjective • Minimize within group variation and maximize between group variation • Data exploration • Data structure is unknown • 3 Basic Methods of clustering algorithms • Hierarchical (n< 200) • K Means (n > 200) • 2 Step ( large samples and categorical or continuous variables)
Hierarchical • Clusters are nested • Larger clusters at later stages may contain smaller clusters at earlier stages • Evaluate results in a dendrogram with agglomeration schedule • Use K means with specified n to validate • Several options for distance measure and clustering method • Interval or count data • Interval- sq euclidean distance or euclidean distance measure with between groups linkage
K Means • Uses Euclidean Distance • Desired number of clusters specified in advance • Does not require case vs case proximity matrix • Observations are grouped by distance to cluster mean at each iteration and cluster means shift after each iteration • Similar to ANOVA • Iterations stop when cluster means are stable or when defined iteration limit is reached • Final decision on number of clusters is subjective • Raw data should be carefully analyzed with new cluster membership and several examples
2 Step • Very large datasets • Categorical or continuous data • Pre-clusters identified and then used in a hierarchical procedure • randomization
Discriminant Function Analysis • Logistic regression is more popular now • Classify cases into the values of a dichotomous dependent • Purposes • To classify cases into groups using a discriminant prediction equation. • To test theory by observing whether cases are classified as predicted. • To investigate differences between or among groups. • To determine the most parsimonious way to distinguish among groups. • To determine the percent of variance in the dependent variable explained by the independents. • To assess the relative importance of the independent variables in classifying the dependent variable. • To discard variables which are little related to group distinctions.
Time Series Analysis • Differs from other methods by having equally spaced time intervals on the X • Objectives • Identify the distribution pattern of the variable over time. • Pattern vs noise (error) • Trend vs seasonality • Trend analysis and autocorrelation • Forecast predicted future variables