Titler Data Set
N. Gali, R. Mariescu-Istodor and P. Fränti, "Using linguistic features to automatically extract web page title", Expert Systems with Applications, 79, 296-312, 2017. [pdf]

The data set was collected from Google Maps search results.

Collection details:
Websites:1002
Language:English
When:July 2014 & April 2015
What:Pages retrieved from search engine
How:By input search query: hospital, pharmacy, sport, fitness, swimming pool, bowling alley, spa, sauna, cinema, pub, bar, auto repair, hotel, restaurant, cafe, and pizza
Region:US, UK, Canada, Australia, New zealand, Ireland
Issues:Some level of subjectivity in title tagging unavoidable


[Dataset (archived .tar.gz)]

[Text files]

[Ground truth for titles]


Mopsi data
Speech and Image Processing Unit


Disclaimer: The data might contain copyrighted material and should be used only for scientific research.