Add Support for Multiple LLM Providers (OpenAI, Anthropic, HuggingFace/Local, etc.)

Closes: #4

Description

This PR enhances the flexibility of the semantic chunking service by introducing support for multiple LLM providers. The system is no longer limited to OpenAI and can now leverage open-source, self-hosted, and cloud-based models, including Anthropic, HuggingFace-hosted, and local models (such as Llama, Mistral, and others). This update makes it easy to select and configure the desired backend for semantic chunking.


Goals

  • Add support for open-source LLMs (e.g., llama.cpp-based models, Mistral, etc.)
  • Add support for Anthropic models (Claude)
  • Support Hugging Face-hosted models or local inference with transformers/vLLM
  • Provide a configuration switch to select the model backend (openai, anthropic, huggingface, etc.)

Changes

  • Modular Model Provider System:
    • Introduced an abstract base class and provider implementations for:
      • OpenAI
      • Anthropic (Claude)
      • HuggingFace (local/hosted, e.g., Llama, Mistral, etc.)
  • Configurable Backend:
    • Added modelProvider and modelConfig options to select and configure the desired backend.
  • Requirements Updated:
    • Added dependencies for anthropic, transformers, and torch.
  • Documentation & Metadata:
    • Updated README.md and setup.py to clarify the new provider options and correct any misleading information about OCR.
  • Backward Compatibility:
    • Default behavior remains unchanged for OpenAI users.

Usage Examples

# OpenAI (default)
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"credentials": {"apiKey": "your-openai-key"},
"strategy": "semantic"
})
# Anthropic
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"credentials": {"apiKey": "your-anthropic-key"},
"strategy": "semantic",
"modelProvider": "anthropic",
"modelConfig": {"model": "claude-3-opus-20240229"}
})
# HuggingFace (local or hosted)
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"strategy": "semantic",
"modelProvider": "huggingface",
"modelConfig": {"model_name": "mistralai/Mistral-7B-Instruct-v0.2"}
})

Testing

  • Verified semantic chunking with each supported provider (OpenAI, Anthropic, HuggingFace/local).
  • Confirmed that configuration switching works as expected.
  • Ensured backward compatibility for existing OpenAI users.
  • Checked that documentation and error messages are clear and accurate.

Dependencies

  • anthropic (for Claude models)
  • transformers and torch (for HuggingFace/local models)

Documentation

  • Updated README.md with new provider options and usage instructions.
  • Clarified that the project does not perform OCR, only text extraction and chunking.
  • Added docstrings and inline comments for maintainability.

/claim #4

Claim

Total prize pool $50
Total paid $0
Status Pending
Submitted May 12, 2025
Last updated May 12, 2025

Contributors

HA

Harsh

@harsh-791

100%

Sponsors

UN

Unsiloed AI

@unsiloed-ai

$50