Add Trajectory Accuracy LLM-as-a-Judge Metric (#2529)

/claim #2529

This pull request introduces a new LLM-as-a-judge evaluation metric called Trajectory Accuracy, based on the ReAct agent framework.

✅ What’s included:

New metric logic in trajectory_accuracy.py
Complete test suite (test_trajectory_accuracy_judge.py) with basic, edge case, and complex trajectory tests
Integrated with the existing Opik LLM evaluation framework
Designed using the scoring principles from the ReAct paper and Langchain’s TrajectoryEvalChain
Returns a score (float 0.0 - 1.0) and explanation (string)

🎥 Demo video: [Drive]

This metric assesses:

All test cases pass successfully, and the implementation aligns with the style and structure of existing metrics (e.g. Hallucination).

Let me know if you’d like further adjustments — happy to iterate!