GitHub Pages connector
Description
This PR introduces a new GitHub Pages connector and integrates it into both the backend and frontend of Onyx.
Test
- ✅ Prettier applied on web files
- ✅ Pre-commit hooks (black, reorder-python-imports, autoflake, ruff, prettier) all passed
- ✅ mypy type checks passed on modified backend files
Demo

Related Issue / Claim
Closes #2282
Creating a GitHub PAT for the GitHub Pages connector
- Generate a fine-grained personal access token.
- Configure:
- Token name:
Onyx GitHub Pages
- Expiration:
No expiration
(recommended for connectors)
- Resource owner: user/org that owns the repo
- Repository access:
All repositories
(or select specific repos)
- Permissions:
Contents → Read-only
Metadata → Read-only
- Copy and store the token securely.
Using the token in Onyx
- In the GitHub Pages connector config, paste the PAT into the GitHub access token field.
- Provide:
repo_owner
(e.g. melmathari
)
repo_name
(e.g. GitHub-pages
)
- Save and validate the connector.
/claim #2282
- This PR should be backported
- [Optional] Override Linear Check
Summary by cubic
Adds a GitHub Pages connector that indexes HTML/Markdown from a repo’s Pages site via the GitHub API and exposes it as a load-state connector in the app. Implements the flow requested in Linear #2282.
-
New Features
- Backend GitHub Pages connector with checkpointing, rate-limit handling, and credential validation
- Supports gh-pages, configured Pages branch, or default branch; converts repo paths to Pages URLs
- Parses HTML/Markdown using existing file processing utilities; includes title extraction and metadata
- New enum, factory mapping, and Slack icon for DocumentSource.GITHUB_PAGES
-
Frontend
- New connector config with fields: repo_owner, repo_name; advanced option: include_readme
- Uses existing GitHub access token credential template
- Added icon, source metadata, types, and inclusion in load-state and auto-sync sources