Image data | |||
![]() Bridge (256x256) |
![]() N=4096, D=16 |
4x4 pixel blocks
ts
txt 4x4 binarized pixel blocks ts txt 4x4 pixel blocks: 25% randomly sampled (for training) ts txt 4x4 pixel blocks: 75% randomly sampled (for testing) ts txt |
|
![]() House (256x256) |
![]() N=34112, D=3 |
RGB-values, quantized to 5 bits per color
ts
txt RGB-values, 8 bits per color ts txt |
|
![]() Miss America (360x288) |
![]() N=6480, D=16 |
4x4 pixel blocks from the difference image of frame 1 and 2
ts
txt 4x4 pixel blocks from the difference image of frame 2 and 3 ts txt |
|
![]() Europe (vector) |
![]() Europe N=169308, D=2 |
Differential coordinates of Europe map ts txt original | |
Birch-sets | |||
![]() Birch1 |
![]() Birch2 |
Synthetic 2-d data with N=100,000 vectors and M=100 clusters. Zhang et al., "BIRCH: A new data clustering algorithm and its applications", Data Mining and Knowledge Discovery, 1 (2), 141-182, 1997. |
|
![]() Birch3 |
Birch1: Clusters in regular grid structure ts txt Birch2: Clusters at a sine curve ts txt Birch3: Random sized clusters in random locations ts txt |
||
S-sets | |||
![]() S1 ![]() S3 |
![]() S2 ![]() S4 |
Synthetic 2-d data with N=5000 vectors and M=15 Gaussian clusters with
different degree of cluster overlapping. P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006. S1: ts txt S2: ts txt S3: ts txt S4: ts txt Ground truth centroids and partitions: zip s3 and s4 updated 4.2.2015 |
|
Unbalance | |||
![]() N=6500, M=8 ts txt |
Synthetic 2-d data with N=6500 vectors and M=8 Gaussian clusters. |
||
A-sets | |||
![]() A1 N=3000, M=20 |
![]() A2 N=5250, M=35 |
Synthetic 2-d data with varying number of vectors (N) and clusters (M).
There are 150 vectors per cluster. I. Kärkkäinen and P. Fränti, "Dynamic local search algorithm for the clustering problem", Research Report A-2002-6 (pdf) A1: ts txt A2: ts txt A3: ts txt |
|
![]() A3 N=7500, M=50 |
|||
Dim-sets | |||
![]() Dim2 |
Synthetic data with Gaussian clusters in multi-dimensional space. 1351-10126 vectors in 2-15 dimensional space ts txt |
||
DIM-sets (high) | |||
![]() dim032 32 dimensions |
![]() dim064 64 dimensions |
High-dimensional data sets N=1024 and M=16 Gaussian clusters. P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph", IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006. Ground truth centroids in cb and txt format. |
|
![]() dim128 128 dimensions |
![]() dim256 256 dimensions |
Data sets in TS and TXT, ground truth partitions in PA format: dim032: ts txt pa dim064: ts txt pa dim128: ts txt pa dim256: ts txt pa dim512: ts txt pa dim1024: ts txt pa |
|
![]() dim512 512 dimensions |
![]() dim1024 1024 dimensions |
||
KDDCUP04Bio set | |||
![]() KDDCUP04Bio N=145751, M=2000, 74-dim |
KDDCUP04Bio biology dataset. KDDCUP04Bio: ts txt |
||
UCI datasets | |||
![]() Thyroid N=215, M=2, D=5 ts txt |
![]() Wine N=178, M=3, D=13 ts txt |
UCI datasets original source is
http://archive.ics.uci.edu/ml/
Breast-Cancer-Wisconsin: We have removed features 1 (sample id) and 11 (class label). All missing values are given value 1. |
|
![]() Yeast N=1484, M=10, D=8 txt ts integer |
![]() Breast N=699, M=2, D=9 ts txt |
||
![]() Iris N=150, C=3, D=4 ts txt labels |
![]() Glass N=214, M=7, D=9, ts txt labels |
||
![]() Wdbc N=569, M=2, D=32 ts numeric (31-d) full (32-d) |
|||
Categorical | |||
![]() Census N=1000-512000, D=68 zip |
Categorical attributes from Public Use Microdata Samples (PUMS) person records.
Includes subsets of size 1000, 2000, 4000, ..., 512000.
Source |
||
g2 sets | |||
![]() g2-2-30 1024 vectors per cluster, 2 clusters 1-1024 dimensions variance 10-100 |
Gaussian clusters dataset. g2: ts's in zip file (53MB) |
||
Shape sets |
![]() |
||
Third column is the label. | |||
Aggregation N=788, M=7, D=2 |
Aggregation:
txt Gionis, A., H. Mannila, and P. Tsaparas, Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 2007. 1(1): p. 1-30. |
||
Compound N=399, M=6, D=2 |
Compound:
txt Zahn, C.T., Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 1971. 100(1): p. 68-86. |
||
Pathbased N=300, M=3, D=2 |
Pathbased:
txt Chang, H. and D.Y. Yeung, Robust path-based spectral clustering. Pattern Recognition, 2008. 41(1): p. 191-203. |
||
Spiral N=312, M=3, D=2 |
Spiral:
txt Chang, H. and D.Y. Yeung, Robust path-based spectral clustering. Pattern Recognition, 2008. 41(1): p. 191-203. |
||
D31 N=3100, M=31, D=2 |
D31:
txt Veenman, C.J., M.J.T. Reinders, and E. Backer, A maximum variance cluster algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 2002. 24(9): p. 1273-1280. |
||
R15 N=600, M=15, D=2 |
R15:
txt Veenman, C.J., M.J.T. Reinders, and E. Backer, A maximum variance cluster algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002. 24(9): p. 1273-1280. |
||
Jain N=373, M=2, D=2 |
Jain:
txt Jain, A. and M. Law, Data clustering: A user's dilemma. Lecture Notes in Computer Science, 2005. 3776: p. 1-10. |
||
Flame N=240, M=2, D=2 |
Flame:
txt Fu, L. and E. Medico, FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC bioinformatics, 2007. 8(1): p. 3. |
||
Mopsi locations | |||
![]() Users' locations N=13467, D=2 |
Mopsi locations Finland until 2012 dataset. Users' locations: cb txt |
||
![]() Users' locations, Joensuu N=6014, D=2 |
Users' locations in Joensuu 2012 dataset. Users' locations Joensuu: ts txt |
||
Miscellaneous | |||
![]() t4.8k N=8000, M=6, D=2 t4.8k.txt |
ConfLongDemo N=164,860, M=11, D=3 txt |
t4.8k: G. Karypis, E.H. Han, V. Kumar, CHAMELEON: A hierarchical
765 clustering algorithm using dynamic modeling, IEEE Trans. on
Computers, 32 (8), 68-75, 1999.
ConfLongdemo has eight attributes, of which only three numerical attributes are included here. | |
MNIST N=10000, M=10, D=748 txt |
MiniBooNE N=130,065, D=50 txt |
MNIST includes 10 handwriting digits and contains 60,000
477 training patterns and 10,000 test patterns of 784 dimensions.
MiniBooNE |