Figure's Helix is a New Vision-Language-Action (VLA) Model That Helps Humanoid Robots Pick Up Nearly Any Object

Figure’s Helix is a new Vision-Language-Action (VLA) model that can help the company’s robots pick up nearly any small household object, including various items they have never encountered before, simply by following natural language prompts. This model basically unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics.

Helix is claimed to be the first VLA model to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It can also operate simultaneously on two robots, enabling them to solve a shared, long-horizon manipulation task with items they have never seen before. Unlike other models, this one uses a single set of neural network weights to learn all behaviors—picking and placing items, using drawers and refrigerators, and cross-robot interaction, all without requiring any task-specific fine-tuning.

No products found.

Figure Helix Vision Language Action Model Humanoid Robots

This decoupled architecture allows each system to operate at its optimal timescale. S2 can ‘think slow’ about high-level goals, while S1 can ‘think fast’ to execute and adjust actions in real-time. For example, during collaborative behavior (see Video 2), S1 quickly adapts to the changing motions of a partner robot while maintaining S2’s semantic objectives, said Figure.

Related Posts

NASA’s PUNCH Mission Set to Explore the Mysteries of the Sun’s Corona and Solar Wind

Stealthy Apocalypse 4×4 Urus Touted as World’s Only Off-Road Lamborghini Urus