Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
Abstract. This work proposes an oracle-free evaluation method for research-level mathematics. Instead of relying on expert verification for every solution, it scores candidate solutions by measuring how useful they are as in-context exemplars for solving related, verifiable problems.