feat: add coda connector

Full Implementation of the Coda.io Connector

This PR adds a smooth production-ready connector for indexing Coda workspaces into Onyx.

architecture

Document Granularity:

Pages → One Onyx Document per page (includes title, link, and optional body content)
Tables → One Onyx Document per table, with rows as TextSections within that document
Rows → Converted to TextSections (not individual documents to avoid excessive granularity)
Docs → Not converted to documents; used as containers to traverse the workspace hierarchy

Rationale: Coda Docs are organizational containers (like folders). The actual searchable content lives in pages and tables, so we index at that level. No need to overwhelm the onyx system.

feats

✅ Full workspace indexing via load_from_state()
✅ Incremental updates via poll_source() with timestamp filtering
✅ Page content extraction (configurable via index_page_content flag)
✅ Rate limiting with rl_requests wrapper
✅ Retry logic with exponential backoff (@retry(tries=3, delay=1, backoff=2))
✅ Pagination support for all list endpoints
✅ Proper error handling (401, 403, 404, 429 status codes)
✅ Hidden page filtering
✅ Rich value formatting for table cells

Implementation Details

Custom API Client: Dedicated CodaApiClient class wraps HTTP logic with proper header construction and error handling

Conversion Flow:

List all docs in workspace
For each doc: list pages and tables
For each table: list all rows with values
Convert pages → Documents (with optional content)
Convert tables+rows → Documents (rows as sections)

Poll Strategy: Fetches all pages/tables, then filters by updated_at timestamp. Also checks individual row timestamps to catch table updates.

Testing

Set CODA_BEARER_TOKEN env variable. How 2 Run → backend/tests/daily/connectors/coda/README.md

test_coda_connector.py provides deep dive coverage:

API method mocking and response validation
Document conversion logic
Pagination handling
Error scenarios and edge cases
Credential validation

config

CodaConnector(
    batch_size=INDEX_BATCH_SIZE,        # Docs per batch
    index_page_content=True             # Whether to fetch page body content (many Coda users purely dump table data)
)

notes

Rows as TextSections avoids creating thousands of tiny documents for large tables
Coda Docs don’t become Onyx Documents—they’re just traversal nodes (workspace_id/name metadata is all we really care about)
Page content fetching is optional to support lightweight indexing (title+link only) vs full-text indexing

/claim https://github.com/onyx-dot-app/onyx/issues/2807