FinanceArena evaluates LLMs' capabilities against the rigorous demands of real-world financial analysis.
Model performance as of July 19, 2025
Model | Overall | Basic Tactical | Assumption-Based | Conceptual |
---|---|---|---|---|
Loading model performance data... |
Missing your model? Reach out to have it added
FinanceArena is a platform for evaluating LLMs on real-world financial analysis tasks. Our mission is rigorous assessment of the reasoning capabilities of AI models against the unforgiving standards of professional finance—where a single calculation error can impact million-dollar decisions.
To ensure practical relevance, we exclusively test models on tasks designed by practitioners who perform this work professionally. Unlike academic benchmarks that test simplified concepts, our evaluations mirror the exact calculations, assumptions, and reasoning required in actual investment analysis. We measure performance using exact-match accuracy—because in finance, partial credit doesn't exist.
Starting with FinanceQA, our flagship benchmark reveals that even state-of-the-art models fail approximately 50% of professional tasks. Our initial leaderboard shows performance across three task types from FinanceQA: basic tactical analysis, assumption-based reasoning, and conceptual understanding.
By identifying where models fail, we provide the roadmap for developing AI that can actually be trusted with financial analysis. We've open-sourced our evaluation framework and dataset to accelerate progress toward AI that meets the industry's exacting requirements.
Because when trillions in capital allocation depend on your analysis, "pretty good" isn't good enough.
FinanceQA represents a paradigm shift in how we evaluate AI for finance. Unlike existing benchmarks that test basic information retrieval, FinanceQA mirrors the actual work performed by analysts at top-tier financial institutions every day.
Each task type reflects different aspects of professional financial analysis, from tactical calculations to high-level reasoning.
Fundamental financial calculations that analysts perform daily—from hand-spreading metrics to calculating diluted shares outstanding.
While 'basic' in name, these require precise adherence to accounting standards and the ability to work with primary source documents like 10-Ks.
The true test of financial acumen. These tasks require models to work with incomplete information—just like real analysts do.
When data is missing, can the AI make logical, defensible assumptions? Current models achieve only 8.4% accuracy on these critical tasks.
These evaluate deeper financial reasoning—understanding relationships between metrics, applying valuation principles, and demonstrating mastery of accounting concepts.
While models perform better here, the gap between conceptual knowledge and practical application remains stark.
Four critical insights that define the current state and future direction of AI in professional finance.
In finance, anything below near-perfect accuracy provides minimal value. Unlike other domains where 80% might be useful, financial analysis demands precision—errors compound, and verification time often exceeds doing the work from scratch.
Professional analysts don't just pull numbers from Bloomberg. They reconstruct calculations from primary sources, verify management's figures, and apply industry-specific adjustments. Current AI models consistently fail at this fundamental requirement.
The most dramatic finding—models achieve less than 5% accuracy when required to handle incomplete information and generate reasonable assumptions. Yet this represents a significant portion of real analyst work.
Professional finance has rules—specific, well-defined accounting conventions for every calculation. Current AI models consistently violate these conventions, defaulting to simplified math. It's not about being "close enough"—it's about following established methodologies.