Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu. arXiv (2026).

Abstract. This work proposes an oracle-free evaluation method for research-level mathematics. Instead of relying on expert verification for every solution, it scores candidate solutions by measuring how useful they are as in-context exemplars for solving related, verifiable problems.

Updated: