Created by Hedge Fund, Private Equity, and Investment Banking Professionals

FinanceArena: Evaluating AI in Industry-Grade Finance

FinanceArena evaluates LLMs' capabilities against the rigorous demands of real-world financial analysis.

FinanceQA Leaderboard

Model performance as of July 19, 2025

Model	Overall	Basic Tactical	Assumption-Based	Conceptual
Loading model performance data...

Missing your model? Reach out to have it added

What is FinanceArena?

FinanceArena is a platform for evaluating LLMs on real-world financial analysis tasks. Our mission is rigorous assessment of the reasoning capabilities of AI models against the unforgiving standards of professional finance—where a single calculation error can impact million-dollar decisions.

To ensure practical relevance, we exclusively test models on tasks designed by practitioners who perform this work professionally. Unlike academic benchmarks that test simplified concepts, our evaluations mirror the exact calculations, assumptions, and reasoning required in actual investment analysis. We measure performance using exact-match accuracy—because in finance, partial credit doesn't exist.

Starting with FinanceQA, our flagship benchmark reveals that even state-of-the-art models fail approximately 50% of professional tasks. Our initial leaderboard shows performance across three task types from FinanceQA: basic tactical analysis, assumption-based reasoning, and conceptual understanding.

By identifying where models fail, we provide the roadmap for developing AI that can actually be trusted with financial analysis. We've open-sourced our evaluation framework and dataset to accelerate progress toward AI that meets the industry's exacting requirements.

Because when trillions in capital allocation depend on your analysis, "pretty good" isn't good enough.

Introducing Our First Challenge: FinanceQA

FinanceQA represents a paradigm shift in how we evaluate AI for finance. Unlike existing benchmarks that test basic information retrieval, FinanceQA mirrors the actual work performed by analysts at top-tier financial institutions every day.

Access Dataset

Read Paper

Three Types of Real-World Tasks

Each task type reflects different aspects of professional financial analysis, from tactical calculations to high-level reasoning.

Basic Tactical Questions

~45% avg accuracy

Fundamental financial calculations that analysts perform daily—from hand-spreading metrics to calculating diluted shares outstanding.

While 'basic' in name, these require precise adherence to accounting standards and the ability to work with primary source documents like 10-Ks.

Assumption-Based Questions

<10% avg accuracy

The true test of financial acumen. These tasks require models to work with incomplete information—just like real analysts do.

When data is missing, can the AI make logical, defensible assumptions? Current models achieve only 8.4% accuracy on these critical tasks.

Conceptual Questions

~65% avg accuracy

These evaluate deeper financial reasoning—understanding relationships between metrics, applying valuation principles, and demonstrating mastery of accounting concepts.

While models perform better here, the gap between conceptual knowledge and practical application remains stark.

Key Findings

Four critical insights that define the current state and future direction of AI in professional finance.

The 95% Accuracy Threshold

Critical Gap

In finance, anything below near-perfect accuracy provides minimal value. Unlike other domains where 80% might be useful, financial analysis demands precision—errors compound, and verification time often exceeds doing the work from scratch.

The Hand-Spreading Gap

Fundamental Issue

Professional analysts don't just pull numbers from Bloomberg. They reconstruct calculations from primary sources, verify management's figures, and apply industry-specific adjustments. Current AI models consistently fail at this fundamental requirement.

Assumptions Are Everything

Major Weakness

The most dramatic finding—models achieve less than 5% accuracy when required to handle incomplete information and generate reasonable assumptions. Yet this represents a significant portion of real analyst work.

Accounting Standards Matter

Standards Violation

Professional finance has rules—specific, well-defined accounting conventions for every calculation. Current AI models consistently violate these conventions, defaulting to simplified math. It's not about being "close enough"—it's about following established methodologies.