Skip to content

Popular repositories Loading

  1. vader vader Public

    Java 10 3

  2. anvil anvil Public

    Python 8 10

  3. FinanceQA FinanceQA Public

    FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities in Large Language Models

    6 1

  4. IDE-Bench IDE-Bench Public

    Comprehensive framework for evaluating AI IDE agents on real-world, cross-stack SWE tasks

    Python 4 9

  5. swift-anvil swift-anvil Public

    Python 3

  6. harbor harbor Public

    Forked from harbor-framework/harbor

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    Python 1

Repositories

Showing 10 of 10 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…