AR
feat: add MiniMax provider support
archestra-ai/archestra#2527

[Provider] Add MiniMax Support

Closes #1855 /claim #1855

Summary

This PR adds complete support for MiniMax AI

Changes

Backend Integration

Core Implementation:

  • Adapter (backend/src/routes/proxy/adapterV2/minimax.ts)

    • OpenAI-compatible request/response adapter with MiniMax-specific extensions
    • Streaming support with SSE parsing
    • Token estimation using tiktoken (MiniMax streaming returns usage: null)
    • Reasoning content extraction from reasoning_details array
    • Tool call handling and formatting
    • TOON compression for tool results
    • Model optimization support
    • Custom error handling for MiniMax’s unique error format
  • Type Definitions (backend/src/types/llm-providers/minimax/ - 4 files)

    • Complete Zod schemas for API types (api.ts, messages.ts, tools.ts, index.ts)
    • Message, tool, and request/response schemas
    • Extended schema for reasoning_details thinking content
    • Stream chunk type allows role: "" (empty string) - MiniMax quirk
    • OpenAI-compatible format with MiniMax extensions
  • Routes (backend/src/routes/proxy/routesv2/minimax.ts)

    • HTTP proxy endpoints with agent support
    • Chat completions (streaming and non-streaming)
    • Error handling and validation
    • Workaround for missing /v1/models endpoint - uses hardcoded model list
  • Database Migration (backend/src/database/migrations/0131_add_minimax_token_prices.sql)

    • Token pricing for all MiniMax models
    • MiniMax-M2: $0.30 input / $1.20 output per million tokens
    • MiniMax-M2.1: $0.30 input / $1.20 output per million tokens
    • MiniMax-M2.1-lightning: $0.30 input / $2.40 output per million tokens
  • Models Dev Client Integration (backend/src/clients/models-dev-client.ts)

    • Added minimax provider mapping for automatic price sync

Frontend Integration

  • Interaction Handler (frontend/src/lib/llmProviders/minimax.ts)

    • Message parsing and rendering
    • Reasoning content display (thinking)
    • Tool call display
    • Token usage calculation
    • Stop reason handling
  • UI Components

    • Model selector integration
    • API key form with MiniMax branding
    • Provider display name
    • Chat interface support

Key MiniMax Differences

1. No Models Endpoint

MiniMax doesn’t provide a /v1/models endpoint. We use a hardcoded list:

  • MiniMax-M2
  • MiniMax-M2.1
  • MiniMax-M2.1-lightning (fastest)

2. Streaming Usage Data

Problem: MiniMax streaming API returns "usage": null in all chunks (no token counts)
Solution: Implemented token estimation using tiktoken that:

  • Counts input tokens from request messages
  • Counts output tokens from accumulated text, reasoning, and tool calls
  • Runs automatically at stream end to ensure interactions are logged

3. Reasoning Details

MiniMax supports extended thinking with reasoning_details array:

  • Enable with extra_body: { reasoning_split: true }
  • Returns thinking process separate from main content
  • Adapter extracts reasoning deltas and accumulates full text

4. Stream Chunk Role Field

MiniMax sends "role": "" (empty string) in some stream chunks instead of "role": "assistant". Type schema updated to allow both.

Streaming Support

  • Non-streaming responses: Fully supported with native usage data
  • Streaming responses: Fully supported with SSE + token estimation

Feature Completeness

LLM Proxy ✅

Feature Status Notes
Tool invocation ✅ Supported Full tool call format conversion
Tool persistence ✅ Supported Tool calls maintained across conversation
Token/cost limits ✅ Supported Enforced per-request limits
Model optimization ✅ Supported Can switch to cheaper models
Tool results compression ✅ Supported TOON compression implemented
Dual LLM verification ✅ Supported Works with optimization rules
Metrics and observability ✅ Supported Full Prometheus metrics integration
Token estimation ✅ Supported Automatic for streaming (usage is null)

Chat ✅

Feature Status Notes
Chat conversations ✅ Works Full message history support
Model listing ✅ Works Hardcoded model list (no API endpoint)
Model selection ✅ Works Dropdown with all available models
Streaming responses ✅ Works Real-time SSE streaming
Reasoning content ✅ Works Displays thinking process separately
Error handling ✅ Works MiniMax-specific error codes handled
API key management ✅ Works Personal/team/org-wide key hierarchy
Conversation titles ✅ Works Auto-generation using fast model
Token tracking ✅ Works Estimated tokens logged to interactions

Testing Infrastructure

  • E2E Test Coverage (5 test suites)
    • Tool invocation policies (tool-invocation.spec.ts)
    • Tool persistence (tool-persistence.spec.ts)
    • Tool result compression (tool-result-compression.spec.ts)
    • Model optimization (model-optimization.spec.ts)
    • Token cost limits (token-cost-limits.spec.ts)
  • WireMock Stubs (13 mapping files in helm/e2e-tests/mappings/)
    • Model listing
    • Chat completions with various scenarios
    • Tool invocation test cases
    • Compression and optimization scenarios
    • Cost limit testing
  • CI Configuration (.github/values-ci.yaml)

API Key Instructions

Obtaining an API Key

Visit MiniMax Platform and generate an API key (a minimum of $25 recharge is required to use the API)

Demo Video

https://github.com/user-attachments/assets/4df301c2-be8b-425c-991c-d883743c284e

Testing Done

  • ✅ Chat conversation with MiniMax-M2.1-lightning
  • ✅ Verified streaming response with curl to actual API
  • ✅ Verified non-streaming response with curl to actual API
  • ✅ Token estimation accuracy verified (input/output/reasoning tokens)
  • ✅ Conversation title generation works
  • ✅ Interactions logged with cost data
  • ✅ Model listing shows all MiniMax models
  • pnpm lint passes
  • pnpm type-check passes

Documentation

Updated platform-supported-llm-providers.md

Claim

Total prize pool $50
Total paid $0
Status Pending
Submitted January 29, 2026
Last updated January 29, 2026

Contributors

AB

Abhishek Anand

@Rutetid

100%

Sponsors

AR

Archestra

@archestra-ai

$50