Graph clustering material


This page contains data and code used in the following paper:
S. Sieranoja and P. Fränti, "Adapting k-means for graph clustering", Knowledge and Information Systems (KAIS), 4:1-28, January 2022. (pdf)

Data

Graph datasets: gclu_data.zip

Directories in the zip-file:
kNN:
kNN graphs of numerical datasets. For each dataset DS, there exists (1) orignal numerical data: DS.txt, (2) kNN graph: DS_knn30.txt, (3) ground truth: DS_knn30-gt.pa. Weights in the network are distances (smaller value means node is closer).
varDeg:
Artificial graphs, varying average degree
varMu:
Artificial graphs, varying mixing parameter mu (cluster overlap)
varN:
Artificial graphs, varying number of nodes
icd10:
disease co-occurence networks
For each dataset DS in folders varDeg,varMu,varN and icd10, the package includes (1) graph file: DS.txt, (2) ground truth: DS-gt.pa. Weights in the network are similarities (larger value means node is closer).

Data format is documented in: github.com/uef-machine-learning/gclu

File name format for artificial graphs: cN{NODES}m{mu}n{neighbors}. For example, a dataset cN5km65n44 has 5000 nodes, mixing parameter mu=65% and on average 44 neighbors for each node.

Code

Code: github.com/uef-machine-learning/gclu