Fixes: #732 /claim #732
The PR includes three example scripts demonstrating different capabilities:
omniparser_captcha.py
)This example demonstrates how the OmniParser integration can detect CAPTCHA elements on webpages:
INFO [browser_use] BrowserUse logging setup complete with level info
INFO [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
INFO [__main__] Navigating to page with CAPTCHA...
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Merging 18 OmniParser elements with DOM results
INFO [__main__] Detected elements:
INFO [__main__] Screenshot saved to captcha_detected.png
Screenshot: captcha_detected.png
omniparser_complex_ui.py
)This example shows how OmniParser helps navigate complex UI elements that traditional DOM-based methods struggle with:
INFO [browser_use] BrowserUse logging setup complete with level info
INFO [root] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information.
INFO [__main__] PART 1: Handling complex form interaction...
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Merging 87 OmniParser elements with DOM results
INFO [__main__] Initial state has 224 elements
INFO [__main__] Saved initial screenshot to airbnb_initial.png
INFO [__main__] Looking for the search/location input field...
INFO [__main__] Could not interact with search field using direct selectors: Page.click: Timeout 30000ms exceeded.
Call log:
- waiting for locator("text=Where")
- - locator resolved to 2 elements. Proceeding with the first one: <div class="f16sug5q atm_c8_1cw7z3g atm_g3_qslrf5 atm_cs_10d11i2 atm_l8_1mni9fk atm_sq_1l2sidv atm_vv_1q9ccgz atm_ks_15vqwwr atm_am_ggq5uc atm_jb_1xtcb10 dir dir-ltr">Anywhere</div>
- - attempting click action
- 2 × waiting for element to be visible, enabled and stable
- - element is not visible
- - retrying click action
- - waiting 20ms
- 2 × waiting for element to be visible, enabled and stable
- - element is not visible
- - retrying click action
- - waiting 100ms
- 56 × waiting for element to be visible, enabled and stable
- - element is not visible
- - retrying click action
- - waiting 500ms
INFO [__main__] Trying alternative search approach...
INFO [__main__] Found 1 potential search elements
INFO [__main__] Clicked on search element with xpath: html/body/div[5]/div/div/div/div/div[3]/div[2]/div/div/div/header/div/div[2]/div[2]/div/div/div/form/div[2]/div/div/div/label/div/input
INFO [__main__] Entered 'London' as search text
INFO [__main__] Saved fallback search screenshot to airbnb_search_fallback.png
INFO [__main__] Complex form interaction complete
Screenshots:
INFO [__main__] PART 2: Handling dynamic UI components...
INFO [__main__] Saved initial carousel screenshot to carousel_initial.png
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Merging 39 OmniParser elements with DOM results
INFO [__main__] Found 5 potential carousel controls
INFO [__main__] Clicking carousel 'Next' button...
INFO [__main__] Saved carousel 'next' screenshot to carousel_next.png
INFO [__main__] Saved final carousel screenshot to carousel_final.png
INFO [__main__] Dynamic component interaction complete
Screenshots:
INFO [__main__] PART 3: Extracting visual information...
INFO [__main__] Saved GitHub trending page screenshot to github_trending.png
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Using hosted OmniParser API (local installation not available)
INFO [omniparser] Merging 168 OmniParser elements with DOM results
INFO [__main__] Extracted information about 10 trending repositories:
INFO [__main__] 1. sponsors/vllm-project
INFO [__main__] Description: Cost-efficient and pluggable Infrastructure components for GenAI inference
INFO [__main__] 2. al1abb/invoify
INFO [__main__] Description: An invoice generator app built using Next.js, Typescript, and Shadcn
INFO [__main__] 3. sponsors/NirDiamant
INFO [__main__] Description: This repository provides tutorials and implementations for various Generative AI Agent techniques...
INFO [__main__] 4. deepseek-ai/awesome-deepseek-integration
INFO [__main__] 5. mishushakov/llm-scraper
INFO [__main__] Description: Turn any webpage into structured data using LLMs
INFO [__main__] Saved extracted repository data to trending_repos.json
INFO [__main__] Visual information extraction complete
Screenshot: github_trending.png
The integration simplifies working with complex web UIs by automatically:
The three example scripts demonstrate functionality across different use cases:
David Anyatonwu
@onyedikachi-david
Browser Use (YC W25)
@browser-use