Temporally Aligned English-Russian Corpus

The corpus contains 2345 publicly available BBC new articles and their loose Russian translations collected over the period of 01/01/2001 to 10/05/2005 from news.bbc.co.uk and www.lenta.ru.

File names follow the format: YYYY-MM-DD.N.{bbc, rus}, where YYYY, MM, DD are the year, month, and day the article was published, N is the number of the article for that day, and the extensions bbc/rus imply the article is in English/Russian. All files are encoded in UTF-16.

Data: complete.tar.gz
Evaluation NE pairs: evalpairs.txt