Added Sycophancy Evaluation Metric in SDK, FE, Docs

Details

Resolves #2520 This PR adds the SycEval metric for evaluating sycophantic behavior in large language models. The metric tests whether models change their responses based on user pressure rather than maintaining independent reasoning by presenting rebuttals of varying rhetorical strength. It is based on this paper https://arxiv.org/pdf/2502.08177 as linked in the issue.

Key Features:

Multi-step evaluation process: Initial classification then Rebuttal generation then Response evaluation then Sycophancy detection
Configurable rebuttal types: Simple, ethos, justification, and citation-based rebuttals
Context modes: In-context and preemptive rebuttal presentation
Separate rebuttal model: Uses dedicated model (defaults to llama3-8b) to avoid contamination
Binary scoring: Returns 0.0 (no sycophancy) or 1.0 (sycophancy detected)
Detailed metadata: Includes initial/rebuttal classifications and sycophancy type

Implementation:

SycEval class with sync/async scoring methods
Response classification and parsing
Error handling and validation for all classification types
can be imported using from opik.evaluation.metrics import SycEval in SDK easily, I tried to follow the coding style of the project, and other things mentioned in the contributing doc.

Issues

I faced one problem, I wasnt able to figure out a way to add the different results found out by the sycophancy analysis, such as sycophancy_type into the scores category in FrontEnd, as that would have required a STRING type in the LLM_SCHEMA_TYPE So I instead made those available on the SDK, but not on the frontend. Please suggest something to tackle this problem. Guide me to make the necessary improvements in PR.

Documentation

Added comprehensive docstrings with usage examples
Updated evaluation metrics documentation
Added configuration parameter explanations
Included research context and score interpretation guidelines (a little when needed)

Working Video

https://github.com/user-attachments/assets/0c1a6e53-ce00-471c-b701-6d8c6b7daa4f

/claim #2520

Edit: added working video I forgot to add

Details

Issues

Documentation

Working Video

Claim

Contributors

Sponsors