[issue-2528] [SDK] Add Structured Output Compliance evaluation metric

Details

Implemented the “Structured Output Compliance” evaluation metric, which validates model outputs as JSON/JSON-LD and returns a boolean result plus a “reason.”
This extends the LLM-as-a-judge evaluation in both the frontend (Online Evaluation tab) and the Python SDK.

Change checklist

Code follows repository coding style
Added/updated unit and integration tests
Documentation updated where needed

Issues

Closes #2558
Resolves #2528
/claim #2528

Testing

Verified that the metric appears and works in the Online Evaluation tab (frontend)
Ran SDK unit + integration tests with sample structured output inputs

Documentation

Updated /docs/evaluation/metrics/structure_output_compliance.mdx with new metric details
Updated changelog with feature entry
Added usage example in SDK docs

Demo

video :: https://github.com/user-attachments/assets/47ffd3e9-6642-4678-9e72-87765c747bac

Details

Change checklist

Issues

Testing

Documentation

Demo

Claim

Contributors

Sponsors