Graph clustering material

This page contains data and code used in the following paper:

S. Sieranoja and P. FrÃ¤nti, "Adapting k-means for graph clustering", *Knowledge and Information Systems* (KAIS), 4:1-28, January 2022.
(pdf)

## Data

Graph datasets: gclu_data.zip

Directories in the zip-file:
- kNN:
- kNN graphs of numerical datasets. For each dataset DS, there exists (1) orignal numerical data: DS.txt, (2) kNN graph: DS_knn30.txt, (3) ground truth: DS_knn30-gt.pa. Weights in the network are distances (smaller value means node is closer).
- varDeg:
- Artificial graphs, varying average degree
- varMu:
- Artificial graphs, varying mixing parameter mu (cluster overlap)
- varN:
- Artificial graphs, varying number of nodes
- icd10:
- disease co-occurence networks

For each dataset DS in folders varDeg,varMu,varN and icd10, the package includes (1) graph file: DS.txt, (2) ground truth: DS-gt.pa. Weights in the network are similarities (larger value means node is closer).

Data format is documented in: github.com/uef-machine-learning/gclu

File name format for artificial graphs: cN{NODES}m{mu}n{neighbors}. For example, a dataset cN5km65n44 has 5000 nodes, mixing parameter mu=65% and on average 44 neighbors for each node.

## Code

Code: github.com/uef-machine-learning/gclu