A new project with Maggioli Group as coordinator held its kick-off meeting in Athens on 13 & 14 October.
VOXReality is an ambitious project whose goal will be to facilitate and exploit the convergence of two important technologies, natural language processing and computer vision. Vision systems drive both AR and VR, while language understanding adds a natural way for humans to interact with the back-ends of XR systems or create multimodal XR experiences combining vision and sound.
Extended reality (XR) technologies are on the verge of dominating the human-computer interaction (HCI) scene by overtaking traditional approaches. Natural language processing (NLP) and computer vision (CV) are also two important technologies that are experiencing a huge performance increase due to the emergence of data-driven methods, specifically machine learning (ML) and artificial intelligence (AI).
But while the emergence of head-mounted-displays (HMDs) has lifted media consumption to the next level in terms of naturality, the same cannot be said for media interaction and content creation. Although remarkable progress has been observed regarding gesture-based interactions with the current generation of AR/VR HMDs, the solutions are usually limited to specific high-end devices and overlook the more accessible and easier-adopted mobile-based AR setting. Furthermore, gestures are only a piece in the arsenal of nature inspired HCI with natural speech interfaces being another, equally elaborate but less exploited one.
VOXReality aspires to fuse these two parallel fields to design and develop AI-models that will integrate language as a core interaction medium, together with other visual understanding, focusing on entangling the spatial and semantic knowledge of XR and natural language processing (NLP) systems. Such an endeavor could kick-start a new era of applications built around the holistic understanding of the users’ goals, away from devices and controllers. Integrating language- and vision-based AI models with either unidirectional or bidirectional exchanges between the two modalities will allow for project findings that will be useful to set pre-trained next-generation XR models.
The above technologies will be validated through three use cases:
- Personal Assistants that are an emerging type of digital technology that seeks to support humans in their daily tasks, with their core functionalities related to human-to-machine interaction.
- Virtual Conferences that are completely hosted and run online, typically using a virtual conferencing platform that sets up a shared virtual environment, allowing their attendees to view or participate from anywhere in the world.
- Theaters where VOX-reality will combine language translation, audio-visual user associations and AR VFX triggered by predetermined speech.
The project has received funding from the European Union’s Horizon Europe program under grant agreement No 101070521 and is led by Maggioli Group, in cooperation with the Centre for Research and Technology-Hellas (CERTH) in Thessaloniki (GR), the University of Maastricht (NL), the Foundation for Dutch Scientific Research Institutes (NOW-i) (NL), Synelixis (GR), VRDays (NL), F6S NETWORK (IE), AdaptIT (GR), HOLO-INDUSTRIE 4.0 (DE) and the Athens Festival (Gr).