Details

Implemented the “Structured Output Compliance” evaluation metric, which validates model outputs as JSON/JSON-LD and returns a boolean result plus a “reason.”
This extends the LLM-as-a-judge evaluation in both the frontend (Online Evaluation tab) and the Python SDK.

Change checklist

  • Code follows repository coding style
  • Added/updated unit and integration tests
  • Documentation updated where needed

Issues

Closes #2558
Resolves #2528
/claim #2528

Testing

  • Verified that the metric appears and works in the Online Evaluation tab (frontend)
  • Ran SDK unit + integration tests with sample structured output inputs

Documentation

  • Updated /docs/evaluation/metrics/structure_output_compliance.mdx with new metric details
  • Updated changelog with feature entry
  • Added usage example in SDK docs

Demo

video :: https://github.com/user-attachments/assets/47ffd3e9-6642-4678-9e72-87765c747bac

Claim

Total prize pool $50
Total paid $0
Status Pending
Submitted June 23, 2025
Last updated June 23, 2025

Contributors

VI

Vikas_pal8923

@Vikaspal8923

100%

Sponsors

CO

Comet

@comet-ml

$50