Sleep & Wellness Guide

IMAGIN-4D: Image-Guided Controllable Interaction Generation

2026-06-22

Key Takeaway

A robotics research paper on IMAGIN-4D: Image-Guided Controllable Interaction Generation.

Practical Tips

Practical tips and how-to guidance will be added by our editorial team.

中文解读

中文解读待补充:本站将优先为睡眠改善、失眠治疗、助眠方法等高价值文章补充中文说明。

Article Summary

Generating human-object interactions (HOI) is central to character animation, robotics, AR/VR, and embodied AI. Recent HOI generation methods synthesize motion from text, object geometry, and sparse waypoints, controlling action semantics and object trajectories. However, these signals underspecify interaction: the same prompt and trajectory can produce different grasps, approach directions, body poses, object poses, contacts, and body-object layouts. We address this ambiguity with a reference image as a visual specification of the desired interaction snapshot. However, a single global image representation conflates distinct cues and conditions all frames on identical visual evidence. We therefore introduce IMAGIN-4D, a diffusion-based HOI generator that decomposes image conditioning spatio-temporally. For spatial conditioning, IMAGIN-4D extracts supervised interaction-state tokens for body pose, object pose, body-object contact, and spatial relationships at the depicted frame. For temporal conditioning, it computes frame-aware tokens by querying image patches per generated frame, allowing sequence segments to attend to different visual cues from the same image. To balance image, text, and waypoint cues, IMAGIN-4D uses role-aware conditioning: text, waypoints, and interaction-state tokens use separate AdaLN streams, while frame-aware visual tokens cross-attend with motion tokens. Since HOI motion datasets lack paired images, we build a synthetic motion-to-image rendering pipeline from FullBodyManipulation (FBM) and introduce an image-adherence metric to evaluate whether generated motions match the reference snapshot. Experiments on FBM and BEHAVE show that IMAGIN-4D improves fine-grained interaction control over single-token and uniformly image-conditioned baselines while preserving waypoint-following and motion quality. Code and models will be released at https://imagin4d.github.io.

5.0Practicality
7.0Scientific Evidence
4.0Effectiveness

Sources & References

Need to track a shipment?

Use our free logistics tracking tool to check real-time delivery status for USPS, FedEx, UPS, DHL, Amazon and 1000+ carriers worldwide.

Track a Package Now

Comments

No comments yet. Be the first to share your thoughts.
Login or register to leave a comment