This PR adds a smooth production-ready connector for indexing Coda workspaces into Onyx.
Document Granularity:
Document per page (includes title, link, and optional body content)Document per table, with rows as TextSections within that documentTextSections (not individual documents to avoid excessive granularity)Rationale: Coda Docs are organizational containers (like folders). The actual searchable content lives in pages and tables, so we index at that level. No need to overwhelm the onyx system.
load_from_state()poll_source() with timestamp filteringindex_page_content flag)rl_requests wrapper@retry(tries=3, delay=1, backoff=2))Custom API Client: Dedicated CodaApiClient class wraps HTTP logic with proper header construction and error handling
Conversion Flow:
Poll Strategy: Fetches all pages/tables, then filters by updated_at timestamp. Also checks individual row timestamps to catch table updates.
How 2 Run β backend/tests/daily/connectors/coda/README.md
test_coda_connector.py provides deep dive coverage:
CodaConnector(
batch_size=INDEX_BATCH_SIZE, # Docs per batch
index_page_content=True # Whether to fetch page body content (many Coda users purely dump table data)
)
TextSections avoids creating thousands of tiny documents for large tablesethan
@ethanwater
Onyx (YC W24)
@onyx-dot-app