Graph clustering material
This page contains data and code used in the following paper:
S. Sieranoja and P. Fränti, "Adapting k-means for graph clustering", Knowledge and Information Systems (KAIS), 4:1-28, January 2022.
(pdf)
Data
Graph datasets: gclu_data.zip
Directories in the zip-file:
- kNN:
- kNN graphs of numerical datasets. For each dataset DS, there exists (1) orignal numerical data: DS.txt, (2) kNN graph: DS_knn30.txt, (3) ground truth: DS_knn30-gt.pa. Weights in the network are distances (smaller value means node is closer).
- varDeg:
- Artificial graphs, varying average degree
- varMu:
- Artificial graphs, varying mixing parameter mu (cluster overlap)
- varN:
- Artificial graphs, varying number of nodes
- icd10:
- disease co-occurence networks
For each dataset DS in folders varDeg,varMu,varN and icd10, the package includes (1) graph file: DS.txt, (2) ground truth: DS-gt.pa. Weights in the network are similarities (larger value means node is closer).
Data format is documented in: github.com/uef-machine-learning/gclu
File name format for artificial graphs: cN{NODES}m{mu}n{neighbors}. For example, a dataset cN5km65n44 has 5000 nodes, mixing parameter mu=65% and on average 44 neighbors for each node.
Code
Code: github.com/uef-machine-learning/gclu