ON
feat: add github_pages connector
onyx-dot-app/onyx#5149

Description

This PR introduces the GitHub Pages connector as a new connector type in the Onyx platform. The GitHub Pages connector allows users to index and search content from GitHub Pages websites by connecting to GitHub repositories and processing their content. New Feature: GitHub Pages Connector

The GitHub Pages connector provides the following capabilities:

Core Functionality:

  • Repository Integration: Connects to GitHub repositories via the GitHub API
  • Multi-format Support: Indexes HTML, Markdown, reStructuredText, and text files
  • Smart Filtering: Filters by file type, directory depth, and file size
  • Incremental Updates: Supports polling based on file modification dates
  • Rate Limiting: Handles GitHub API rate limits with exponential backoff

Configuration Options:

  • Repository Owner: GitHub username or organization
  • Repository Name: Name of the repository containing GitHub Pages
  • Branch: Branch to scan (default: gh-pages)
  • Root Directory: Optional subdirectory to index
  • Max Files: Maximum number of files to index (default: 1000)
  • Max Depth: Maximum directory depth for crawling
  • Timeout: Request timeout in seconds

Supported File Types:

  • .html, .htm - HTML files (processed with BeautifulSoup)
  • .md, .markdown - Markdown files (converted to HTML then processed)
  • .txt - Plain text files
  • .rst - reStructuredText files
  • .asciidoc, .adoc - AsciiDoc files

fixes https://github.com/onyx-dot-app/onyx/issues/2282 /claim https://github.com/onyx-dot-app/onyx/issues/2282


Summary by cubic

Added a new GitHub Pages connector that lets users index and search content from GitHub Pages sites by connecting to GitHub repositories and processing their files. This addresses the requirements in issue #2282.

  • New Features
    • Supports indexing HTML, Markdown, reStructuredText, and text files from a specified repository and branch.
    • Allows filtering by file type, directory depth, and file size.
    • Handles incremental updates using file modification dates and manages GitHub API rate limits.
    • Includes configuration options for repository owner, name, branch, root directory, max files, max depth, and timeout.
    • Added UI and type support for the new connector in the web app.

Claim

Total prize pool $250
Total paid $0
Status Pending
Submitted August 04, 2025
Last updated August 04, 2025

Contributors

MO

Moderator

@AayushSaini101

100%

Sponsors

ON

Onyx (YC W24)

@onyx-dot-app

$250