Agentfy
Get started
Use cases / CAPTCHA automation

AI-solved CAPTCHAs, built into the macro DSL

Four CAPTCHA kinds out of the box. Your own LLM key drives the vision model. Embeds as a single step in any macro.

ai_solve_captcha clearing a slider challenge
ai_solve_captcha clearing a slider challenge
Why not just buy a service

Built in, on your key, in your audit log

Per-solve pricing scales poorly; an external service can't be audited. ai_solve_captcha runs inside your tenant on inference you already pay for.

Third-party CAPTCHA service

Pay per solve, slow, brittle, audit-opaque

~$2 / 1000 solves
  • Send the screenshot to an external server, wait 10–30 seconds for a result
  • Per-solve pricing — runaway costs once you scale to fleets
  • Adds an external dependency you can't see inside or audit
  • Specific to one or two CAPTCHA shapes; new variants break silently
ai_solve_captcha (built in)

Vision model on your own key, in your own audit log

Pay only your LLM inference
  • One macro step, runs entirely on your Anthropic / OpenAI / OpenRouter key
  • Covers 4 CAPTCHA types: image_selection, slider, click_in_order, checkbox_terms
  • Each solve lands in agent_sub_runs with model name and token usage
  • Retry budget + exclusion set — failed picks aren't tried again
Coverage

Four CAPTCHA kinds, one solver step

The auto-classifier picks the right strategy from the frame; you can also pin a kind explicitly for faster, cheaper runs.

Image selection

Pick the cells that match
ai_solve_captcha kind=image_selection describe_screen tap_pixel

Slider

Drag the handle to the target
ai_solve_captcha kind=slider swipe find_element_on_screen

Click in order

Tap A, B, C in sequence
ai_solve_captcha kind=click_in_order find_text_on_screen tap_by_text

Terms checkbox

Auto-find and check the agree box
ai_solve_captcha kind=checkbox_terms tap_by_text
A real solve run

What ai_solve_captcha looks like at runtime

The agent narrates, calls the solver tool with a 100-second budget, and reports the result with classifier output for audit.

chat / claude-sonnet-4 running
U
When the security check appears, solve it and continue submission.
AI
Watching for the verification screen. When it appears I'll hand the frame to ai_solve_captcha.
find_text_on_screen
text: "Security Verification"
→ match at (140, 220)
screenshot
→ frame captured
ai_solve_captcha
auto, timeout=100s
→ { solved: true, kind: "slider", attempts: 2 }
AI
Solved on second attempt (slider). Tapping Log in to submit…
tap_by_text
text: "Log in" exact
Drop-in usage

Embed in any macro as a single step

Wrap it in try/catch for graceful fallback. Failed solves leave a clean audit trail and halt the macro deterministically — no orphan sessions.

login-with-captcha.agfm agfm
# Login flow with a CAPTCHA escape hatch
tap "Log in" exact

if visible "Security Verification" timeout=5s {
  try {
    ai_solve_captcha -> captcha timeout=100s
    tap "Log in" exact
  } catch {
    # Bail out of the macro cleanly so the runner can retry
    log "captcha unsolved — terminating"
    terminate_app "com.example.app"
    halt
  }
}

Combine with…

Stop paying per-solve for CAPTCHAs

BYOK your LLM key — solve as many as your inference budget allows. Drop into existing macros without a rewrite.

Start free trial