Intel’s Masked Humanoid Controller: A Novel Approach to Physically Realistic and Directable Human Motion Generation
Researchers from Intel Labshave, in collaboration with academic and industry experts, introduced a breakthrough technique for generating realistic and directional human movements from sparse, multimodal inputs. Their work, highlighted on the European Conference on Computer Vision (ECCV 2024), focuses on overcoming the challenges of generating natural, physically based human behavior in high-dimensional humanoid characters. This research is part of Intel Labs’ broader initiative to advance computer vision and machine learning.
Intel Labs and its partners recently presented six groundbreaking papers at ECCV 2024a leading conference organized by the European Computer Vision Association (ECVA).
The paper Generating physically realistic and directional human movements from multimodal inputs showcased innovations including a new defense strategy for protecting text-to-image models against prompt-based red teaming attacks and the development of a large-scale dataset designed to improve spatial consistency in these models. Among these contributions, the article highlights Intel’s commitment to advancing generative modeling while prioritizing responsible AI practices.
Generating realistic human movements using multimodal input
Intel’s Masked Humanoid Controller (MHC) is a groundbreaking system designed to generate human-like movements in simulated physics environments. Unlike traditional methods that rely heavily on fully detailed motion capture data, the MHC is built to process sparse, incomplete, or partial input data from various sources. These sources can include VR controllers, which may only track hand or head movements; joystick inputs that provide only high-level navigation commands; video tracking, where certain body parts may be closed off; or even abstract instructions derived from text prompts.
The innovation of the technology lies in its ability to interpret and fill the gaps where data is missing or incomplete. It achieves this through what Intel the Catch, Combine and Complete (CCC) possibilities:
- To overtake: This feature allows the MHC to recover and resynchronize its motion when disruptions occur, such as when the system boots into a failed state, such as a humanoid character that has fallen. The system can quickly correct its movements and resume natural movement without retraining or manual adjustments.
- Combine: MHC can merge different movement sequences, such as merging upper body movements of one action (e.g. waving) with lower body movements of another (e.g. walking). This flexibility makes it possible to generate entirely new behavior based on existing motion data.
- Completely: Given sparse input, such as partial body movement data or vague high-level guidelines, the MHC can intelligently infer and generate the missing parts of the movement. For example, if only arm movements are specified, the MHC can autonomously generate corresponding leg movements to maintain physical balance and realism.
The result is a highly customizable motion generation system that can create smooth, realistic, and physically accurate movements even with incomplete or underspecified guidelines. This makes MHC ideal for applications in gaming, robotics, virtual reality and any scenario where high-quality human movements are needed but input data is limited.
The impact of MHC on generative motion models
The Masked Humanoid Controller (MHC) is part of a broader effort by Intel Labs and its collaborators to responsibly build generative models, including models that power text-to-image and 3D generation tasks. As discussed at ECCV 2024, this approach has significant implications for sectors such as robotics, virtual reality, gaming and simulation, where generating realistic human movements is crucial. By integrating multimodal inputs and allowing the controller to switch seamlessly between movements, the MHC can handle real-world conditions where sensor data may be noisy or incomplete.
This work from Intel Labs sits alongside other cutting-edge research presented at ECCV 2024, such as their new defense for text-to-image models and the development of techniques for improving spatial consistency in image generation. Together, these developments demonstrate Intel’s leadership in computer vision, with a focus on developing secure, scalable and responsible AI technologies.
Conclusion
Developed by Intel Labs and academic collaborators, the Masked Humanoid Controller (MHC) represents a critical step forward in human motion generation. By addressing the complex control problem of generating realistic movements from multimodal inputs, the MHC paves the way for new applications in VR, gaming, robotics and simulation. This research, on display at ECCV 2024, demonstrates Intel’s commitment to advancing responsible AI and generative modeling, contributing to safer and more adaptive technologies across domains.