The data set was selected from different news websites during November and December 2015.
It contains two sub sets: English and Finnish articles. The ground truth keywords are the
vocabularies chosen by the article editors.
English subset
Sources:
Indianexpress (330), Macworld (220), The guardian (421), University herald (300).
Topics:
Business, cities, entertainment, news, politics, art & culture,
sports, health & life style, trending, world, technology, education, environment,
media, finance, travel, and others.
Finnish subset
Sources:
Kaksplus (200), Kotiliesi (210), Ruoka.fi (200), Taloussanomat (210), Urheilu (200), Uusi Suomi (200).
Topics:
Business, cities, entertainment, news, politics, art & culture,
sports, health & life style, trending, world, technology, education, environment,
media, finance, travel, and others.
Disclaimer:
The data might contain copyrighted material and should be only used for scientific research.