• Julien Ah-Pine
ADMA 2009 - 5thInternational Conference on Advanced Data Mining and Applications- Beijing, China, Aug 17-19, 2009
Publication by Springer-Verlag (LNCS)
In this paper, our main goal is to introduce three clustering functions based on the central tendency deviation principle. This concept simply recommends to consider to cluster two objects together providing that their similarity is above a certain threshold. However, how to set this threshold ? How to make this variable data-dependent ? This paper gives some insights regarding these issues. Interestingly, we show that we can naturally define such clustering functions by extending to the more general case of continuous numerical data, some maximal association measures that originally aim to cluster categorical data. We also propose a clustering algorithm that allows to approximately solve the different introduced clustering problems. This heuristic is based upon local transfer operations and has a linear complexity in the number of objects to be clustered. Furthermore, it doesn?t require to set the number of clusters. Then, a secondary purpose of this paper is to present a new experimental protocol for comparing different clustering techniques. In our approach, we use four evaluation criteria and an aggregation rule for combining the rankings scores provided by the latter. Finally, using fifteen data-sets of the UCI Machine Learning repository and this experimental protocol, we show that the different introduced cluster analysis methods can perform better than the popular k-means algorithm.
Report number: