PR
Writing-Zero
Prime Intellect
$1,200

Share on socials

Writing-Zero

Exclusives

Open to everyone

Description

Paper: https://arxiv.org/pdf/2506.00103

Difficulty: Very Hard

Notes: Full solution should include a pipeline to generate pairwise training samples from existing LLMs+rubrics or public sources, an environment to train GenRM, and an environment to train main model using GenRM. Requires some creative decision-making; discuss proposal with Will before getting too deep into it, will give sufficient compute for train experiments conditional on implementation progress.

Contributor chat