/claim #82
Hi, this is Enity300, I am sending this pull request from other ID due some issues I’m currently facing with Algora-pbc. Please be assured any modifications to the approach or any sort of conversation will be promptly discussed through this ID and through @Enity300.
Offline DeepSeek Model Loader Implementation
Changes Made:
- Added model/deepseek implementation directory with:
- Core wrapper (wrapper.py lines 1-98)
- Quantization helpers (helpers/quantize.py)
- Memory mapping system (helpers/memory_map.py)
- Implemented optimized loading patterns from llama.cpp (reference: llama_cpp_wrapper.py)
- Added integration tests (test_deepseek_wrapper.py)
Key Features:
- HuggingFace Hub Integration
- Direct model downloads with config fallback (see wrapper.py)
- Supports both full and quantized GGUF models
- Memory Optimizations
- Layer-wise loading (reference: model_loader.py)
- mmap-based tensor loading (reference: memory_map.py)
- Quantization Support
- 4/8-bit via bitsandbytes (quantize.py lines)
Testing:
- Verified with DeepSeek-R1-Distill-Qwen-7B models:
- Quantized: bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
- Full: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Added arithmetic and QA test cases (test_deepseek.py)
Other necessary details are mentioned in the read-me under model/deepseek