Closes: #4
This PR enhances the flexibility of the semantic chunking service by introducing support for multiple LLM providers. The system is no longer limited to OpenAI and can now leverage open-source, self-hosted, and cloud-based models, including Anthropic, HuggingFace-hosted, and local models (such as Llama, Mistral, and others). This update makes it easy to select and configure the desired backend for semantic chunking.
openai, anthropic, huggingface, etc.)modelProvider and modelConfig options to select and configure the desired backend.anthropic, transformers, and torch.README.md and setup.py to clarify the new provider options and correct any misleading information about OCR.# OpenAI (default)
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"credentials": {"apiKey": "your-openai-key"},
"strategy": "semantic"
})
# Anthropic
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"credentials": {"apiKey": "your-anthropic-key"},
"strategy": "semantic",
"modelProvider": "anthropic",
"modelConfig": {"model": "claude-3-opus-20240229"}
})
# HuggingFace (local or hosted)
result = Unsiloed.process_sync({
"filePath": "./test.pdf",
"strategy": "semantic",
"modelProvider": "huggingface",
"modelConfig": {"model_name": "mistralai/Mistral-7B-Instruct-v0.2"}
})
anthropic (for Claude models)transformers and torch (for HuggingFace/local models)README.md with new provider options and usage instructions./claim #4
Harsh
@harsh-791
Unsiloed AI
@unsiloed-ai