Efficient Inference for Embodied Foundation Models
- Duration: Nov 2025 – Present
- Role: Research Assistant
- Advisor: Dr. Jiachen Liu (University of Michigan)
Project Overview
My work focuses on improving the efficiency of embodied foundation models. I profiled COSMOS-Policy across denoising steps, observing that block-level residual skipping is infeasible while cross-attention outputs remain highly stable (cosine similarity > 0.999).
Key Contributions
- Validated cross-attention KV caching on 24 RoboCasa tasks.
- Achieved an identical task success rate of 68.06% compared to the baseline across 3,600 rollout trials, ensuring consistent denoising speedup across all evaluated tasks.
- Currently designing task-aware token compression. This approach exploits the model’s latent-frame slot structure to selectively prune image tokens via self-attention reduction, targeting further denoising acceleration on top of the validated caching strategy.
