Why Neural Machine Translation for Public Administrations
The availability of high-quality neural machine translation across many language pairs capable of being used intensively at different levels of public bodies, Member State and Public Administration is a key priority for the European Commission, particularly for under-resourced EU languages and in some priority domains.
The European eTranslation service has successfully offered limited machine translation (MT) services to public administration bodies and has successfully deployed multiple Neural and Statistical systems in many language combinations. However, the majority of European language combinations are still not available.
In order to address this, this proposal aims to develop peer-to-peer Neural Machine Translation systems in all combinations of European languages and to make these available via the eTranslation Service. This will assist in the development of a single digital market, a significant objective articulated in the CEF (Connecting European Framework) policy documents.
Peer-to-peer machine learning is one field of research where Neural Machine Translation (NMT) and different hybrid techniques have proven highly successful. The use of peer-to-peer language translation systems – avoiding the necessity of pivoting through a common language such as English, reduces the informational loss associated with pivot-language approaches, improving the disambiguation of the calculated neural network leading to higher quality generated translations.
The EU and National Public Administrations are significant users of translation services with increasing translation needs, which are only emphasised by a common Digital Market, pan-European trading and legal disputes, conflict resolutions, exchanges between Public Administrations, etc. However, because of capacity problems, no single company or public institution can tackle the creation of the 24 x 23 high-quality language combinations required for a true near-human quality neural translation engine farm. Whilst eTranslation is is providing mostly into and from English language combinations, the task to create and serve high-quality MT engines covering all possible combinations for EU and national institutions remains out of scope because of capacity constraints on hardware (training) and hosting.
NTEU fully completes CEF’s data gathering efforts and the lobbying for translation memory gathering from translated data generated in public contracts by offering a tangible technological delivery and implementation. To increase the volumes of parallel data available to the European Commission for developing the CEF eTranslation platform, promote the flow of translation data (specifically, TMs) from translation companies to public administrations, organise bilingual Big Data currently not being generated, enable public administrations to fully leverage TMs, and also support the work of translators working on public sector texts, the NEC TM Data consortium proposes the following central activities:
The NTEU proposal addresses the following actions as defined within the amended Work Programme 2016:
- Leveraging the data from Language Resource Co-ordination (CEF WP2014): Whenever possible, data sets collected under the ELRC initiative will be used to customise Machine Translation engines for the purposes of extending the CEF eTranslations service.
- Expanding the usage of CEF eTranslation Service: By significantly expanding the range of languages available within the eTranslation service, usage will increase and eTranslation services will be integrated into a broader range of public services. This will help promote the emergence of a Single Digital Market.
- Deployment of eDelivery Building Block: The NTEU proposal envisages using the eDelivery Building Block for the purposes of secure handling and transmission of confidential information to and from Consortium members, The NTEU proposal also envisages using a test platform provided by the eDelivery DSI to ensure comprehensive testing and compliance with the eDelivery Building Block.
- Member State Involvement: Member State institutions will be actively involved in the implementation and deployment of the NTEU proposal. (These are listed in the Consortium Members listing). Additionally, the ELRC initiative will be invited to be a contributor of relevant in-domain training data. This will help improve the quality of the training data and foster broader usage and acceptance of machine translation services across member states.
- Interaction with Expert Group under the ELRC (WP 2014) initiative: The Language Resources Network, setup under WP 2014, represents the DSI Expert Group. The NTEU consortium will seek representation within the Expert Group to ensure coordination and operational management during the development of the Neural engines.
The NTEU proposal addresses the following benefits and outcomes as defined within the amended Work Programme 2016:
- Facilitating a Single Digital Market: The central objective of the NTEU proposal is to lower language barriers to fully implementing multilingual language services across member states. This can be achieved by deploying NTEU engines to provide Machine services to Public Administration bodies.
- Wider Acceptance of multi-multilingual DSIs: The NTEU will broaden the range of machine translation services delivered by eTranslation services. This will assist in broadening acceptance of the functionality and flexibility of Machine Translation across member states.
- Significant Savings in Translation Costs: The NTEU proposal will create a complete peer-to-peer machine translation service across EU Member states. This will reduce localisation costs, improve translation quality and broader the usage of automated translations services across member states.
- Provision of highly secure translation services: The NTEU proposal envisages deploying Neural Engines using the eTranslation Building Block providing secure and flexible machine translation services for confidential information.