Efficient Inference for Embodied Foundation Models

Duration: Nov 2025 – Present
Role: Research Assistant
Advisor: Dr. Jiachen Liu (University of Michigan)

Project Overview

My work focuses on improving the efficiency of embodied foundation models. I profiled COSMOS-Policy across denoising steps, observing that block-level residual skipping is infeasible while cross-attention outputs remain highly stable (cosine similarity > 0.999).

Key Contributions

Validated cross-attention KV caching on 24 RoboCasa tasks.
Achieved an identical task success rate of 68.06% compared to the baseline across 3,600 rollout trials, ensuring consistent denoising speedup across all evaluated tasks.
Currently designing task-aware token compression. This approach exploits the model’s latent-frame slot structure to selectively prune image tokens via self-attention reduction, targeting further denoising acceleration on top of the validated caching strategy.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Chenming Ge

Project Overview

Key Contributions

Share on