/claim #2529

This pull request introduces a new LLM-as-a-judge evaluation metric called Trajectory Accuracy, based on the ReAct agent framework.

✅ What’s included:

  • New metric logic in trajectory_accuracy.py
  • Complete test suite (test_trajectory_accuracy_judge.py) with basic, edge case, and complex trajectory tests
  • Integrated with the existing Opik LLM evaluation framework
  • Designed using the scoring principles from the ReAct paper and Langchain’s TrajectoryEvalChain
  • Returns a score (float 0.0 - 1.0) and explanation (string)

🎥 Demo video: [Drive]

This metric assesses:

  1. Reasoning Quality
  2. Action Appropriateness
  3. Observation Integration
  4. Goal Achievement
  5. Efficiency

All test cases pass successfully, and the implementation aligns with the style and structure of existing metrics (e.g. Hallucination).

Let me know if you’d like further adjustments — happy to iterate!

Claim

Total prize pool $50
Total paid $0
Status Pending
Submitted June 20, 2025
Last updated June 20, 2025

Contributors

KA

Kaan

@kaan-dogan

100%

Sponsors

CO

Comet

@comet-ml

$50