### Clustering Methods (5 cp) 3621552

#### Course description

Clustering is a basic tool used in data analysis, pattern recognition and
data mining for finding groups in data. The main challenges of clustering
is to define a cost function that is then optimized by an algorithm. We
consider several cost functions and algorithms for the problem, study how
to solve the number of clusters. Numerical, categorical, text and graphs
are considered. Practical clustering methods also need to consider
outliers and missing data.

Course will be arranged as a series of video lectures. The recordings will
be done publicly during scheduled lecture times. Videos are stored in youtube
and the participants can watch them anytime. Questions and discussions will
appear immediately after each recording and during the exercise sessions.

#### Intro

#### Lectures

Teacher:
Pasi Fränti

Schedule: 28 h, starting from **24.1.2017**

Tuesday 14-16 (D106 / F213)

Lectures in YouTube

Lecture 1: 24.1. Introduction (ppt)

Lecture 3: 14.3. Objective functions (ppt)

Lecture 3: 21.3. Clustering text and web pages (ppt)

Lecture 4: 25.4. Fast nearest neighbor searches in high dimensions (ppt) (pdf)

Lecture 5: 10.5. Number of clusters (ppt) (pdf)

Lecture 6: 16.5. Outliers (ppt) (pdf)

Lecture 7: 23.5. Divisive algorithms (ppt) (pdf)

Video lecture 1:
Random Swap
(ppt)
(pdf)

Video lecture 2:
Centroid Index
(ppt)
(pdf)

Video lecture 3:
K-means properties
(ppt)
(pdf)

Video lecture 4:
Fast K-means
(ppt)
(pdf)

Video lecture 5:
Agglomerative clustering (to appear)

#### Exercises

Mondays 14-16 (D106 / F213):

Exercise 1: 30.1.

Exercise 2: 20.2.

Exercise 3: 20.3.

Exercise 4: 27.3.

Exercise 5: 10.4.
Tasks selected

Exercise 6: 8.5.

Exercise 7: 22.5.

Lectures Notes and material from 2014

Suplementary material from 2012

#### Preliminary knowledge

Design & Analysis of Algorithms

#### Exams

24.5. 12-16, Room OTS 100 (Joensuu), Room F 211 (Kuopio)

16.6. 12-16, Room OTS 100 (Joensuu), Room CA 101 (Kuopio)

#### Links

Clusterator

Animator

Visualization software