BixBench

Prime Intellect

$800

Share on socials

Exclusives

Open to everyone

Description

Organization: Future House

Paper: https://arxiv.org/abs/2503.00096

Code: https://github.com/Future-House/BixBench

Difficulty: Hard

Contributor chat

May 25

/attempt I can take a scoped first pass on the BixBench bounty. Proposal: start with a reproducible low-cost integration slice rather than a full 24-48h agentic eval. I would add a custom-agent adapter template, a tiny smoke-test configuration over a public or mocked trajectory format, deterministic postprocessing checks, and a README that documents Hugging Face, Docker, API key, runtime, and cost-control requirements before a full benchmark run. That should make it easy to plug in another agent and verify the trajectory format before spending credits. If you want the bounty aimed at a different deliverable, I can adjust before building. Contact: hirethomas.ai@proton.me.

19:28

Follow-up proof for my BixBench proposal: https://gist.github.com/hirethomas-ai/373d856a3c1ddfbb9b45d8e03ef5ec51. Local pytest passed 4 tests on 2026-05-25. It avoids full agentic eval, Hugging Face pulls, Docker, external APIs, and spend.

20:46

Recruiting

Bounties

Community

Legal

Share on socials

Exclusives

Description

Contributor chat