Developing an agenda and a roadmap
for achieving full digital language
equality in Europe by 2030

Roadmap

Digital Language Equality in Europe by 2030:

Strategic Agenda and Roadmap


Roadmap

Human language technologies have the potential to overcome the linguistic divide in the digital sphere. However, we need to define actions, tools, processes and actors that need to be involved. The goal of this SRIA is to lay out a roadmap with concrete steps for the implementation that carry tangible and measurable outputs and to obtain broad endorsement by the relevant stakeholders.

The main scientific goal of the ELE Programme is Deep Natural Language Understanding in Europe by 2030 (DNLU). This will increase efficiency by sharing knowledge, infrastructures and resources, with a view to developing innovative technologies and services, in order to achieve the next scientific breakthrough in this area and help reduce the technology gap between European languages with the (interdisciplinary) collaboration of research centres, academic experts, enterprises and other relevant stakeholders. Crucially, such a long-term ELE Programme must involve significantly intensified coordination between the European LT research and the industry.

The main societal and economical goal of the ELE Programme should be Digital Language Equality in Europe in 2030. The focus is on language equality and the provisioning of technologies, services and resources outside the often-preferred languages to achieve technological sovereignty in this crucial application area. For minority and lesser spoken languages, we need to find a (technological) way to consider DNLU within a common approach, to create synergies and increase efficiency of the solutions and their design and development. To narrow the digital divide, there is a pressing urgency for novel techniques that would bring less-resourced languages to a level comparable to state-of-the-art results for resource-rich languages.


This includes the leveraging of multimodal and multilingual resources to support the development of applications for languages and varieties with scarce resources.

This roadmap towards Digital Language Equality in Europe by 2030 provides a path and the means to ensure that the two goals outlined above are met. To tackle this challenge, the ELE Programme combines the following six themes.


Language Modelling

This theme includes research, development and deployment activities regarding large language models, especially multilingual and multimodal large language models that include text, speech, image, video etc. Time and resources need to be invested for experiments, new approaches, shared tasks etc. For novel research approaches we need to combine national projects and data sets with international consortia. With regard to innovation and deployment, large language models will be applied in industrial sectors and use cases.

Data and Knowledge

The Data and Knowledge theme is focused on the collection, production, annotation, curation, quality assessment, standardisation etc. of text data, spoken data, video data, and other multimodal data.

Machine Translation

The Machine Translation theme is focused on improving the automated translation from one natural language into another (including sign languages). While Europe has a strong foundation in this field, research needs to combine novel, groundbreaking approaches with results of the Data and Knowledge as well as Language Modelling themes (see above). The results need to be applied in different industrial sectors and use cases. Deployment needs to be fast, agile and driven by excellent teams.

Text Understanding

The Text Understanding theme aims to improve the identification and labelling of linguistic information underlying any natural language text (or other modalities). This requires exploring new strands of research and building on synergies of the other themes. An equally important aspect is applicability in the industry.

Speech

The Speech theme addresses one big challenge of the European LT community, i.,e., the shift from broad text to broad speech or multimodal processing (including corresponding research towards grounding). While progress in the area of speech applications has been made in the last decade, we also need novel research paradigms. This theme will benefit from the themes Data and Knowledge as well as Language Modelling. The development of relevant industry applications is another goal.

Infrastructure

The Infrastructure theme involves the extension and maintenance of platforms such as European Language Grid (ELG). ELG has the potential of functioning as one of the primary platforms to support the activities of the ELE Programme. Moreover, ELG will be the sharepoint for best practises and the development of bridges to other relevant platforms. New features and functionalities need be be implemented for a higher adaptability. Other important factors are the provisioning of GPUs and of standardisation.

Further details about the ELE programme, the timeline and the budget can be found in the complete SRIA document.