OU
Feature/chunked database dumps
outerbase/starbasedb#76

Pull Request: Implement Chunked Database Dumps with Enhanced Features

/claim #59
Fixes: #59

Purpose

Implement a robust chunked database dump functionality to handle large databases efficiently while preventing memory issues and database locking. This implementation includes several key features:

1. Chunked Processing

  • Processes database dumps in manageable chunks
  • Dynamic chunk sizing based on table size (500-5000 rows)
  • Breathing intervals to prevent database locking

2. Storage Flexibility

  • Automatic selection between R2 and DO storage based on dump size
  • R2 for large dumps (>100MB)
  • DO storage for smaller dumps

3. Progress Tracking & Notifications

  • Unique progress tracking per dump
  • Callback URL support for completion notifications
  • Detailed status reporting

Tasks

  • Implement chunked dump processing
  • Add dynamic chunk sizing (500-5000 rows based on table size)
  • Implement storage selection logic (R2/DO)
  • Add callback URL support
  • Add progress tracking with unique keys
  • Implement breathing intervals
  • Add error handling and logging
  • Add cleanup after successful download
  • Add test infrastructure
  • Verify functionality with test cases

Verification Steps

  1. Start the server:
npm run dev
  1. Start the callback server:
cd test && node callback-server.js
  1. Test dump creation:
curl -X POST http://127.0.0.1:8787/export/dump/chunked \
  -H "Authorization: Bearer ABC123" \
  -H "X-Callback-URL: http://localhost:3000/callback"
  1. Monitor status:
curl http://127.0.0.1:8787/export/dump/{dumpId}/status \
  -H "Authorization: Bearer ABC123"

Before vs After

Before After
Database dumps loaded entire database into memory Chunked processing prevents memory issues
No progress tracking Real-time progress tracking
Single storage option Automatic R2/DO storage selection
No callback notifications Callback notifications
Risk of database locking Breathing intervals prevent locking
1GB DO storage limit Support for databases >1GB
Basic error handling Detailed error handling and logging
Test script
#!/bin/bash

# Configuration
BASE_URL="http://127.0.0.1:8787"
CALLBACK_URL="http://localhost:3000/callback"
AUTH_HEADER="Authorization: Bearer ABC123"
OUTPUT_FILE="dump-test-results.md"

# Check if callback server is running
echo "Checking callback server..."
if ! curl -s "http://localhost:3000/callbacks" > /dev/null; then
    echo "Error: Callback server is not running. Please start it with 'npm run start-callback-server'"
    exit 1
fi

# Clear previous callbacks
curl -s -X DELETE "http://localhost:3000/callbacks" > /dev/null

# Start fresh output file
cat > "$OUTPUT_FILE" << EOF
# Database Dump Test Results
Test conducted on: $(date)

## Test Steps

EOF

echo "Starting dump tests..."

# Step 1: Initiate a dump
echo "Step 1: Initiating dump..."
cat >> "$OUTPUT_FILE" << EOF
### Step 1: Initiate Dump
\`\`\`bash
curl -X POST "$BASE_URL/export/dump/chunked" \\
    -H "$AUTH_HEADER" \\
    -H "X-Callback-URL: $CALLBACK_URL"
\`\`\`

Response:
\`\`\`json
EOF

DUMP_RESPONSE=$(curl -s -X POST "$BASE_URL/export/dump/chunked" \
    -H "$AUTH_HEADER" \
    -H "X-Callback-URL: $CALLBACK_URL")
echo "$DUMP_RESPONSE" >> "$OUTPUT_FILE"
DUMP_ID=$(echo "$DUMP_RESPONSE" | jq -r '.result.dumpId')

cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF

# Step 2: Check status (multiple times)
echo "Step 2: Checking status..."
cat >> "$OUTPUT_FILE" << EOF
### Step 2: Status Checks

EOF

for i in {1..5}; do
    echo "Status check $i..."
    cat >> "$OUTPUT_FILE" << EOF
#### Check $i
\`\`\`bash
curl "$BASE_URL/export/dump/$DUMP_ID/status" -H "$AUTH_HEADER"
\`\`\`

Response:
\`\`\`json
EOF

    STATUS_RESPONSE=$(curl -s "$BASE_URL/export/dump/$DUMP_ID/status" -H "$AUTH_HEADER")
    echo "$STATUS_RESPONSE" >> "$OUTPUT_FILE"
    
    cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF
    
    STATUS=$(echo "$STATUS_RESPONSE" | jq -r '.result.status')
    if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
        echo "Dump $STATUS after $i checks"
        break
    fi
    sleep 3
done

# Step 3: Download the dump if completed
if [ "$STATUS" = "completed" ]; then
    echo "Step 3: Downloading dump..."
    cat >> "$OUTPUT_FILE" << EOF
### Step 3: Download Dump
\`\`\`bash
curl "$BASE_URL/export/dump/$DUMP_ID" -H "$AUTH_HEADER"
\`\`\`

Response:
\`\`\`
EOF

    DOWNLOAD_RESPONSE=$(curl -s "$BASE_URL/export/dump/$DUMP_ID" -H "$AUTH_HEADER" -w "\nHTTP Status: %{http_code}")
    echo "$DOWNLOAD_RESPONSE" >> "$OUTPUT_FILE"

    cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF
fi

# Step 4: Check received callbacks
echo "Step 4: Checking callbacks..."
cat >> "$OUTPUT_FILE" << EOF
### Step 4: Received Callbacks
\`\`\`bash
curl "http://localhost:3000/callbacks"
\`\`\`

Response:
\`\`\`json
EOF

CALLBACKS_RESPONSE=$(curl -s "http://localhost:3000/callbacks")
echo "$CALLBACKS_RESPONSE" >> "$OUTPUT_FILE"

cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF

# Add summary
cat >> "$OUTPUT_FILE" << EOF
## Summary
- Final Status: \`$STATUS\`
- Dump ID: \`$DUMP_ID\`
- Number of status checks: $i
- Callbacks Received: $(echo "$CALLBACKS_RESPONSE" | jq '. | length')

## System Information
- Test Time: $(date)
- OS: $(uname -a)
- Curl Version: $(curl --version | head -n 1)
EOF

echo "Test completed. Results saved to $OUTPUT_FILE" 
Test Results

Database Dump Test Results

Test conducted on: Sat Jan 25 13:10:23 WAT 2025

Test Steps

Step 1: Initiate Dump

curl -X POST "http://127.0.0.1:8787/export/dump/chunked" \
    -H "Authorization: Bearer ABC123" \
    -H "X-Callback-URL: http://localhost:3000/callback"

Response:

{
  "result": {
    "message": "Database dump started",
    "dumpId": "fb2e9497-d93a-457d-b96d-8cd1ae2d22fb",
    "status": "in_progress",
    "downloadUrl": "http://127.0.0.1:8787/export/dump/fb2e9497-d93a-457d-b96d-8cd1ae2d22fb",
    "estimatedSize": 3236
  }
}

Step 2: Status Checks

Check 1:

curl "http://127.0.0.1:8787/export/dump/fb2e9497-d93a-457d-b96d-8cd1ae2d22fb/status" \
  -H "Authorization: Bearer ABC123"

Response:

{
  "result": {
    "status": "in_progress",
    "progress": {
      "currentTable": "",
      "processedTables": 0,
      "totalTables": 6
    }
  }
}

Check 3 (Final):

{
  "result": {
    "status": "completed",
    "progress": {
      "currentTable": "users",
      "processedTables": 8,
      "totalTables": 6
    }
  }
}

Step 3: Download Dump

Response Excerpt:

SQLite format 3
-- Table: tmp_cache
CREATE TABLE tmp_cache (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "timestamp" REAL NOT NULL,
    "ttl" INTEGER NOT NULL,
    "query" TEXT UNIQUE NOT NULL,
    "results" TEXT
);

-- Table: users
CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT);
INSERT INTO users VALUES (1, 'John Doe', 'john@example.com');

HTTP Status: 200

Summary

  • Final Status: completed
  • Dump ID: fb2e9497-d93a-457d-b96d-8cd1ae2d22fb
  • Number of status checks: 3
  • Callbacks Received: 5

Claim

Total prize pool $250
Total paid $0
Status Pending
Submitted January 25, 2025
Last updated January 25, 2025

Contributors

DA

David Anyatonwu

@onyedikachi-david

100%

Sponsors

OU

Outerbase (YC W23)

@outerbase

$250