GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations

Jyotika Singh, Fang Tu, Aziza Mirsaidova, Amit Agarwal, Hitesh Laxmichand Patel, Sandip Ghoshal, Miguel Ballesteros, Karan Dua, Yassine Benajiba, Weiyi Sun, Tao Sheng, Graham Horwood, Sujith Ravi, Dan Roth. ACL 2026 (2026).

Abstract. GSM-SEM introduces a reusable framework for generating semantically diverse variants of math reasoning benchmarks. Instead of relying on surface paraphrases alone, it changes entities, attributes, and relationships while preserving answer structure and approximate difficulty, reducing memorization pressure in static benchmark use. The paper evaluates state-of-the-art LLMs on GSM-style and broader reasoning tasks using these semantic augmentations.

Updated: