Drop Claude or GPT into the driver's seat
The agent sees screenshots, reads text, decides what to tap. 40+ tools. BYOK. Per-tenant scope. Audited end to end.
One sentence in, a dozen tool calls out
The agent thinks aloud, then dispatches OCR, taps, and screenshots until the goal is met — every call is audited.
bundle_id: "com.xingin.discover"
text: "For You"
text: "+"
The same tools you'd write — but the agent picks which to call
Every tool below is also callable from your own scripts or via REST. The agent just happens to be the most general consumer.
Natural-language control
Tell Claude or GPT what you want — 'open Xiaohongshu and post a photo with the hashtag #cat' — Agentfy converts the intent into screenshot / find-text / tap / verify tool calls automatically.
40+ built-in tools
screenshot, describe_screen, find_text_on_screen, find_element_on_screen, tap_pixel, tap_by_text, swipe, launch_app, get_foreground_app, read_clipboard, list_macros, run_macro, sequence, wait_for_condition, and 30 more.
Bring your own LLM key
Plug an Anthropic, OpenAI, or OpenRouter key into the settings page. Per-tenant — your costs, your rate limits. No platform markup on inference.
Auto CAPTCHA solver
ai_solve_captcha handles image-selection, slider, checkbox-terms, and click-in-order CAPTCHAs out of the box. Vision-driven with retry budget.
Sub-agents (ai_takeover)
Macros can hand off to a bounded sub-agent for unstructured steps — agent runs in its own budget, returns control with structured result, macro keeps deterministic flow.
Tenant + device scoping
Agent can only act on devices you've granted. Per-tool safety tags. Every action lands in the audit log with model name and token usage.
Adjacent features…
Plug in your own Anthropic / OpenAI key
Every plan is BYOK — you pay providers directly, with zero platform markup. We may offer bundled inference budgets on higher tiers in the future.
Start free trial