Analysis and Comparison of the Techniques for Treating Correlativity Between Variables in Q-mode Cluster Analysis
-
-
Abstract
The Euclidean distance,which is most often used in Q-mode cluster analysis,is unable to reflect the influence of correlativity between variables on the results of cluster analysis,so it cannot fully reveal the clustering situation of samples.For this reason three techniques are proposed in a lot of literature to solve this problem;they are.1.the oblique distance method,2.the principle component analysis method,and 3.the Mahalanobis distance method.But there has been no paper researching on the characteristics of and the relationship between these methods.In practice,these methods are chosen quite at random and without rules.This paper,in both theory and practice,analyses and compares the three methods,and gives the following conclusions:1.the oblique distance method functions the same as the principle component analysis method,because (Xij-Xkj)(Xil-Xkl)rjl=d'UΛ1/2Λ1/2U'd,in the above equation,the left part is the expression for computing oblique distance,and the right part is the Euclidean distance from the principle component analysis method.2.the Mahalanobis distance is equal to the Euclidean distance computed from factor scores;this is because,after data standardization,(P1-Pk)'S-1(Pi-Pk)=(Pi-Pk)'R-1(Pi-Pk)=(Pi-Pk)'UΛ-1/2Λ-1/2U'(Pi-Pk), in this equation,the first part is the expression for computing mahalanobis distance,and the last part is the exprssion for computing Euclidean distance from factor scofes.3.the Mahalanobis distance and the oblique distance deal with correlativity in two opposite ways.Generally speaking,the Mahalanobis distance gives correlated variables smaller weight values,and the oblique distance gives them larger weight values,while the Euclidean distance gives equal weights to all variables.If any two samples are located along the direction of major axis of the distribution ellipse,the Mahalanobis distance between them is smaller than the Euclidean distance,while the oblique distance is larger than the Euclidean distance,if located along the direction of minor axis,it will be an opposite conclusion.The three conclusions stated above are instructive in choosing statistics for clustering in practical work,and helpful in avoiding blindness,and studying the computational results of cluster analysis.
-
-