/claim #2529
This pull request introduces a new LLM-as-a-judge evaluation metric called Trajectory Accuracy, based on the ReAct agent framework.
✅ What’s included:
trajectory_accuracy.py
test_trajectory_accuracy_judge.py
) with basic, edge case, and complex trajectory tests🎥 Demo video: [Drive]
This metric assesses:
All test cases pass successfully, and the implementation aligns with the style and structure of existing metrics (e.g. Hallucination).
Let me know if you’d like further adjustments — happy to iterate!
Kaan
@kaan-dogan
Comet
@comet-ml