Implemented the “Structured Output Compliance” evaluation metric, which validates model outputs as JSON/JSON-LD and returns a boolean result plus a “reason.”
This extends the LLM-as-a-judge evaluation in both the frontend (Online Evaluation tab) and the Python SDK.
Closes #2558
Resolves #2528
/claim #2528
/docs/evaluation/metrics/structure_output_compliance.mdx
with new metric detailsvideo :: https://github.com/user-attachments/assets/47ffd3e9-6642-4678-9e72-87765c747bac
Vikas_pal8923
@Vikaspal8923
Comet
@comet-ml