What: | Robust clustering on incomplete and erroneous data sets |
Who: | Dr. Sami Äyrämö, University of Jyväskylä |
When: | 13.2. klo 15:15 |
Where: | B180 |
Abstract:
Scalable and robust clustering algorithms are useful tools, for example, in data mining and knowledge discovery applications that often deal with large, incomplete and erroneous data sets.
Based on the well-known K-means clustering, robust clustering methods can be easily derived by replacing the sample mean with a more robust estimator (e.g., coordinatewise or spatial median). Robust estimators are more insensitive to contaminated and outlying values than, for instance, the sample mean. On the other hand, the non-smooth nature of some robust estimates sets special requirements for the numerical solvers. Different formulations and techniques for solving the optimization problem underlying one particular robust estimate -the spatial median - are presented.
Based on the aforementioned components, highly automated (that is the minimal number of user-defined parameters are required) robust clustering method is presented. The method consists of a number of separately developed and tested elements such as initialization, prototype estimation, and missing data strategy. Furthermore, in order to estimate the correct number of clusters, a new proposal of a cluster validity index is presented. Sample applications are also given.