World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models
Abstract. This paper studies how vision-language models behave when multiple cultural cues appear together in the same visual scene. It introduces a benchmark for culture-mixing scenarios and analyzes failure modes such as background sensitivity and inconsistent cultural attribution.