GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations
GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations
Abstract. GSM-SEM introduces a reusable framework for generating semantically diverse variants of math reasoning benchmarks. Instead of relying on surface paraphrases alone, it changes entities, attributes, and relationships while preserving answer structure and approximate difficulty, reducing memorization pressure in static benchmark use. The paper evaluates state-of-the-art LLMs on GSM-style and broader reasoning tasks using these semantic augmentations.