The following is a translation of the coverage that the NTEU Action has received by the Spanish government’s National Language Technology Plan on its LT website.
“NTEU, the new Automatic Translation project that addresses the challenge of translating between all European languages”
With the start-up meeting held on August 26 and 27 in Valencia, the new project financed by the “Connecting Europe Facility” mechanism in which the SEAD participates began its journey, as part of the Language Technology (LT) Plan’s strategy to implement Automatic Translation in Public Administration.
This is Neural Translation for the EU (NTEU) project, which will receive around two million euros, to develop more than 500 different machine translation engines within two years. The intention is to automatically translate between all the official languages of the European Union.
Machine translation engines will be implemented using the cutting-edge artificial intelligence techniques, including deep neural network training over large bilingual data sets. Each will need a minimum training set of 15 million segments.
The three companies in charge of the development are Pangeanic (Spain), KantanMT (Republic of Ireland) and Tilde (Latvia). The General Technical Office of Spanish LT Plan, which has already collaborated with them in previous projects, will coordinate the evaluation of the results, which will be subsequently validated by European universities.
The European Commission’s interest in this project lies in its objective of expanding the coverage of the current eTranslation system, promoted by the Commission itself, which currently only translates to and from English. Translation technologies are a key tool in Europe’s strategy of creating a digital single market above language barriers.
The NTEU project has the ambitious purpose of building direct translation engines between all European languages, without the need to go through a third (usually English) that functions as a pivot.
Given the great dependence that this technology has on the data, the great challenge will be to obtain training corpus of sufficient quality and quantity to train the different engines, both bilingual and monolingual. To complete the pairs of languages with the least amount of initial data, it is planned to use automatic text generation techniques using next-generation multilayer neural networks.