Implement offline DeepSeek model loader with memory optimization

/claim #82 /fixes #82

This PR implements an offline DeepSeek model loader for inference as requested in the feature request.

Features

Created a modular architecture with separate components:
- DeepSeekLoader: Core loading functionality with memory optimization
- DeepSeekTokenizer: Text encoding/decoding
- DeepSeekWrapper: High-level interface following project patterns
Implemented memory optimization techniques:
- Chunk-based loading to reduce memory footprint
- Int8 quantization for reduced memory usage
- Efficient tensor management with device control
Added dynamic model discovery:
- Automatically detects model file structure
- Supports different weight file formats
- Handles various tokenizer configurations
Created comprehensive tests:
- Model initialization and loading tests
- Chat interface tests
- Code generation tests
- Quantization tests
- Memory efficiency tests
Added example usage for easy integration

This implementation avoids high-level libraries like transformers as requested.