Do Image-Text Metrics Respect Semantic Invariances?

Do Image-Text Metrics Respect Semantic Invariances?

Amit Agarwal, Hitesh Laxmichand Patel, Meizhu Liu, Jyotika Singh, Karan Dua, Hansa Meghwani, Matthew Rowe, Michael Avendi, Yassi Abbasi, Tao Sheng, Sujith Ravi, Dan Roth. ACL 2026 (2026).

Abstract. This paper studies whether reference-free image-to-text evaluators respect meaning-preserving changes in images and captions. It probes popular caption-alignment metrics across spatial, object-level, and socio-linguistic perturbations, showing that non-semantic changes can shift scores and alter system rankings. The work also proposes invariance-calibrated scoring to reduce these sensitivities while preserving alignment with learned caption evaluators.

Updated: