feat(#286): Add LiteLLM-like router infrastructure

Summary

Implement load balancing between multiple LLM deployments (OpenAI, Google Palm/Gemini, Cohere)
Routing strategies: least-tokens (default), simple-shuffle, latency-based
Timeout/retry with exponential backoff via axios interceptors
Streaming support for all providers
Token usage tracking with cost calculation
Sentry and Posthog logging callbacks
JSONNet configuration support
Mock servers for testing

Features Implemented

Load Balancing: Picks deployment below rate-limit with least tokens used
Reliability: Timeouts, retries, exponential backoff
Streaming: Full streaming support
Token Usage: Tracks prompt/completion/total tokens and cost
Logging: Sentry + Posthog callbacks

Tests

8 E2E tests passing covering all features
Mock servers for OpenAI, Gemini, Cohere

Usage Example

import { Router } from "@arakoodev/edgechains.js/ai";

const router = new Router({
  modelList: [
    { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-xxx", rpm: 3000, tpm: 90000 },
    { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-yyy", rpm: 3000, tpm: 90000 },
  ],
  routingStrategy: "least-tokens",
  numRetries: 3,
  timeout: 30000,
});

const response = await router.completion({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "Hello!" }],
});

/claim #286 closes #286

Summary

Features Implemented

Tests

Usage Example

Claim

Contributors

Sponsors