Ghosts in the Scale
Prompt injection via image resampling & certified defenses for multimodal systems.
🎯The Problem
Most multimodal stacks downscale images before inference. Attackers can plant high-frequency patterns that become legible commands only after downscaling, triggering tool calls or data exfiltration.
💡The Solution
Downscaler fingerprinting with bounded probes. ScaleJail-mini benchmark with crafted images. Certifying defense: Preview-of-Record + FreqGuard that blocks the channel while preserving utility.
✨Key Highlights
- Black-box downscaler fingerprinting
- ScaleJail-mini benchmark
- Certifying defense: FreqGuard