AI agent

Drop Claude or GPT into the driver's seat

The agent sees screenshots, reads text, decides what to tap. 40+ tools. BYOK. Per-tenant scope. Audited end to end.

Start free trial Pricing

Inside the agent loop

One sentence in, a dozen tool calls out

The agent thinks aloud, then dispatches OCR, taps, and screenshots until the goal is met — every call is audited.

chat / claude-sonnet-4 running

Open Xiaohongshu and post a photo with the hashtag #cat — make sure I'm logged in first.

Got it. I'll check the foreground app, launch Xiaohongshu, verify the session, then open the post composer.

get_foreground_app

→ com.apple.springboard

launch_app

bundle_id: "com.xingin.discover"

→ ok

find_text_on_screen

text: "For You"

→ found at (187, 142)

Session is live. Tapping the + button to open the composer…

tap_by_text

text: "+"

The same tools you'd write — but the agent picks which to call

Every tool below is also callable from your own scripts or via REST. The agent just happens to be the most general consumer.

Natural-language control

Tell Claude or GPT what you want — 'open Xiaohongshu and post a photo with the hashtag #cat' — Agentfy converts the intent into screenshot / find-text / tap / verify tool calls automatically.

40+ built-in tools

screenshot, describe_screen, find_text_on_screen, find_element_on_screen, tap_pixel, tap_by_text, swipe, launch_app, get_foreground_app, read_clipboard, list_macros, run_macro, sequence, wait_for_condition, and 30 more.

Bring your own LLM key

Plug an Anthropic, OpenAI, or OpenRouter key into the settings page. Per-tenant — your costs, your rate limits. No platform markup on inference.

Auto CAPTCHA solver

ai_solve_captcha handles image-selection, slider, checkbox-terms, and click-in-order CAPTCHAs out of the box. Vision-driven with retry budget.

Sub-agents (ai_takeover)

Macros can hand off to a bounded sub-agent for unstructured steps — agent runs in its own budget, returns control with structured result, macro keeps deterministic flow.

Tenant + device scoping

Agent can only act on devices you've granted. Per-tool safety tags. Every action lands in the audit log with model name and token usage.

Plug in your own Anthropic / OpenAI key

Every plan is BYOK — you pay providers directly, with zero platform markup. We may offer bundled inference budgets on higher tiers in the future.

Start free trial