郭仁忠, 张克权. Q型聚类分析中变量相关性的处理方法分析[J]. 武汉大学学报 ( 信息科学版), 1987, 12(3): 64-78.
引用本文: 郭仁忠, 张克权. Q型聚类分析中变量相关性的处理方法分析[J]. 武汉大学学报 ( 信息科学版), 1987, 12(3): 64-78.
Guo Renzhong, Zhang Kequan. Analysis and Comparison of the Techniques for Treating Correlativity Between Variables in Q-mode Cluster Analysis[J]. Geomatics and Information Science of Wuhan University, 1987, 12(3): 64-78.
Citation: Guo Renzhong, Zhang Kequan. Analysis and Comparison of the Techniques for Treating Correlativity Between Variables in Q-mode Cluster Analysis[J]. Geomatics and Information Science of Wuhan University, 1987, 12(3): 64-78.

Q型聚类分析中变量相关性的处理方法分析

Analysis and Comparison of the Techniques for Treating Correlativity Between Variables in Q-mode Cluster Analysis

  • 摘要: 本文分析了斜交距离法、主成分分析法和马氏距离法等处理原始变量相关性的方法的原理,论述了Q型聚类分析相似性统计量的几种数据处理方法之间的特点及其等价关系,并且用实际算例验证了理论推导的正确性。

     

    Abstract: The Euclidean distance,which is most often used in Q-mode cluster analysis,is unable to reflect the influence of correlativity between variables on the results of cluster analysis,so it cannot fully reveal the clustering situation of samples.For this reason three techniques are proposed in a lot of literature to solve this problem;they are.1.the oblique distance method,2.the principle component analysis method,and 3.the Mahalanobis distance method.But there has been no paper researching on the characteristics of and the relationship between these methods.In practice,these methods are chosen quite at random and without rules.This paper,in both theory and practice,analyses and compares the three methods,and gives the following conclusions:1.the oblique distance method functions the same as the principle component analysis method,because (Xij-Xkj)(Xil-Xkl)rjl=d'UΛ1/2Λ1/2U'd,in the above equation,the left part is the expression for computing oblique distance,and the right part is the Euclidean distance from the principle component analysis method.2.the Mahalanobis distance is equal to the Euclidean distance computed from factor scores;this is because,after data standardization,(P1-Pk)'S-1(Pi-Pk)=(Pi-Pk)'R-1(Pi-Pk)=(Pi-Pk)'UΛ-1/2Λ-1/2U'(Pi-Pk), in this equation,the first part is the expression for computing mahalanobis distance,and the last part is the exprssion for computing Euclidean distance from factor scofes.3.the Mahalanobis distance and the oblique distance deal with correlativity in two opposite ways.Generally speaking,the Mahalanobis distance gives correlated variables smaller weight values,and the oblique distance gives them larger weight values,while the Euclidean distance gives equal weights to all variables.If any two samples are located along the direction of major axis of the distribution ellipse,the Mahalanobis distance between them is smaller than the Euclidean distance,while the oblique distance is larger than the Euclidean distance,if located along the direction of minor axis,it will be an opposite conclusion.The three conclusions stated above are instructive in choosing statistics for clustering in practical work,and helpful in avoiding blindness,and studying the computational results of cluster analysis.

     

/

返回文章
返回