LLMThinkBench is a comprehensive framework designed to rigorously evaluate the basic math reasoning capabilities of Language Models, while identifying instances of overthinking—where models apply unnecessarily complex logic to simple problems.
Rank | Model | Parameters | Accuracy | Efficiency Score | Instruction Following | Overthinking Ratio | Avg Tokens | Avg Words | Avg Chars | Actions |
---|