Summary

  • Implement load balancing between multiple LLM deployments (OpenAI, Google Palm/Gemini, Cohere)
  • Routing strategies: least-tokens (default), simple-shuffle, latency-based
  • Timeout/retry with exponential backoff via axios interceptors
  • Streaming support for all providers
  • Token usage tracking with cost calculation
  • Sentry and Posthog logging callbacks
  • JSONNet configuration support
  • Mock servers for testing

Features Implemented

  1. Load Balancing: Picks deployment below rate-limit with least tokens used
  2. Reliability: Timeouts, retries, exponential backoff
  3. Streaming: Full streaming support
  4. Token Usage: Tracks prompt/completion/total tokens and cost
  5. Logging: Sentry + Posthog callbacks

Tests

  • 8 E2E tests passing covering all features
  • Mock servers for OpenAI, Gemini, Cohere

Usage Example

import { Router } from "@arakoodev/edgechains.js/ai";
const router = new Router({
modelList: [
{ modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-xxx", rpm: 3000, tpm: 90000 },
{ modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-yyy", rpm: 3000, tpm: 90000 },
],
routingStrategy: "least-tokens",
numRetries: 3,
timeout: 30000,
});
const response = await router.completion({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: "Hello!" }],
});

/claim #286 closes #286

Claim

Total prize pool $200
Total paid $0
Status Pending
Submitted March 16, 2026
Last updated March 16, 2026

Contributors

MA

Matías J. Magni

@info3

100%

Sponsors

AR

Arakoo.ai

@arakoodev

$200