
Figure’s Helix is a new Vision-Language-Action (VLA) model that can help the company’s robots pick up nearly any small household object, including various items they have never encountered before, simply by following natural language prompts. This model basically unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics.
Helix is claimed to be the first VLA model to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It can also operate simultaneously on two robots, enabling them to solve a shared, long-horizon manipulation task with items they have never seen before. Unlike other models, this one uses a single set of neural network weights to learn all behaviors—picking and placing items, using drawers and refrigerators, and cross-robot interaction, all without requiring any task-specific fine-tuning.
No products found.
This decoupled architecture allows each system to operate at its optimal timescale. S2 can ‘think slow’ about high-level goals, while S1 can ‘think fast’ to execute and adjust actions in real-time. For example, during collaborative behavior (see Video 2), S1 quickly adapts to the changing motions of a partner robot while maintaining S2’s semantic objectives, said Figure.





