Details

Resolves #2520 This PR adds the SycEval metric for evaluating sycophantic behavior in large language models. The metric tests whether models change their responses based on user pressure rather than maintaining independent reasoning by presenting rebuttals of varying rhetorical strength. It is based on this paper https://arxiv.org/pdf/2502.08177 as linked in the issue.

Key Features:

  • Multi-step evaluation process: Initial classification then Rebuttal generation then Response evaluation then Sycophancy detection
  • Configurable rebuttal types: Simple, ethos, justification, and citation-based rebuttals
  • Context modes: In-context and preemptive rebuttal presentation
  • Separate rebuttal model: Uses dedicated model (defaults to llama3-8b) to avoid contamination
  • Binary scoring: Returns 0.0 (no sycophancy) or 1.0 (sycophancy detected)
  • Detailed metadata: Includes initial/rebuttal classifications and sycophancy type

Implementation:

  • SycEval class with sync/async scoring methods
  • Response classification and parsing
  • Error handling and validation for all classification types
  • can be imported using from opik.evaluation.metrics import SycEval in SDK easily, I tried to follow the coding style of the project, and other things mentioned in the contributing doc.

Issues

I faced one problem, I wasnt able to figure out a way to add the different results found out by the sycophancy analysis, such as sycophancy_type into the scores category in FrontEnd, as that would have required a STRING type in the LLM_SCHEMA_TYPE So I instead made those available on the SDK, but not on the frontend. Please suggest something to tackle this problem. Guide me to make the necessary improvements in PR.

Documentation

  • Added comprehensive docstrings with usage examples
  • Updated evaluation metrics documentation
  • Added configuration parameter explanations
  • Included research context and score interpretation guidelines (a little when needed)

Working Video

https://github.com/user-attachments/assets/0c1a6e53-ce00-471c-b701-6d8c6b7daa4f

/claim #2520

Edit: added working video I forgot to add

Claim

Total prize pool $25
Total paid $0
Status Pending
Submitted June 29, 2025
Last updated June 29, 2025

Contributors

YA

Yash Kumar

@yashkumar2603

100%

Sponsors

CO

Comet

@comet-ml

$25