N. Gali, R. Mariescu-Istodor and P. Fränti, "Using linguistic features to automatically extract web page title", Expert Systems with Applications, 79, 296-312, 2017
More info on dataset here (including PDF of above)
These are only the names of the sites. Archive above contains HTML-files.