Fix: Add Offline DeepSeek Model #82

/claim #82

Hi, this is Enity300, I am sending this pull request from other ID due some issues I’m currently facing with Algora-pbc. Please be assured any modifications to the approach or any sort of conversation will be promptly discussed through this ID and through @Enity300.

Offline DeepSeek Model Loader Implementation

Changes Made:

Added model/deepseek implementation directory with:
- Core wrapper (wrapper.py lines 1-98)
- Quantization helpers (helpers/quantize.py)
- Memory mapping system (helpers/memory_map.py)
Implemented optimized loading patterns from llama.cpp (reference: llama_cpp_wrapper.py)
Added integration tests (test_deepseek_wrapper.py)

Key Features:

HuggingFace Hub Integration
- Direct model downloads with config fallback (see wrapper.py)
- Supports both full and quantized GGUF models
Memory Optimizations
- Layer-wise loading (reference: model_loader.py)
- mmap-based tensor loading (reference: memory_map.py)
Quantization Support
- 4/8-bit via bitsandbytes (quantize.py lines)

Testing:

Verified with DeepSeek-R1-Distill-Qwen-7B models:
- Quantized: bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
- Full: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Added arithmetic and QA test cases (test_deepseek.py)

Other necessary details are mentioned in the read-me under model/deepseek

Claim

Contributors

Sponsors