Summary
This PR addresses issue #1367 regarding the go-tree-sitter CGO dependency by providing comprehensive build documentation and a pure Go build option.
Root Cause
Katana uses jsluice for JavaScript endpoint extraction, which depends on github.com/smacker/go-tree-sitter. This requires CGO, making cross-compilation (especially for darwin/arm64) complex.
Solution
1. Added BUILDING.md
Comprehensive build guide covering:
- Standard build with CGO
- Pure Go build without jsluice
- Cross-compilation for macOS/Windows/Linux
- Build tags documentation
2. Updated README.md
- Added pure Go installation option
- Clear note about jsluice trade-off
- Link to detailed BUILDING.md
3. Build Tag: without_jsluice
# Pure Go build (no CGO required)
CGO_ENABLED=0 go build -tags=without_jsluice -v .
Trade-off: JavaScript endpoint extraction is disabled, but all other crawling features work normally.
Benefits
- Simplified CI/CD: No need to configure CGO or cross-compilers
- Cross-platform builds: Windows → macOS/ARM without SDK
- Static binaries: Easier deployment
- Optional feature: Users can choose based on their needs
Usage Examples
Standard Build (with jsluice)
CGO_ENABLED=1 go build -v .
Pure Go Build (without jsluice)
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -tags=without_jsluice -v .
Testing
- Standard build: All features work
- Pure Go build: Crawling works, jsluice endpoints disabled
- No breaking changes to existing functionality
Related Issue
Fixes: #1367
/claim #1367
Summary by CodeRabbit
- Documentation
- Added BUILDING.md with comprehensive build procedures for standard builds, pure Go builds, cross-compilation guidance, and build tag configurations.
- Updated README with two installation options: standard (CGO-enabled JavaScript parsing) and pure Go (without CGO), including feature impact notes.