Agentfy
Get started
Use cases / Mobile AI agent

The shortest path from a sentence to a real iPhone tap

Skip XCUITest, simulators, and JSON-RPC plumbing. Plug a Claude or GPT key, hand the agent a goal, and watch it work the real device.

LLM driving an iPhone
LLM driving an iPhone
Inside the loop

One sentence in, a dozen tool calls out

The agent narrates its plan, dispatches OCR, taps, and screenshots until the goal is met — every call audited end to end.

chat / claude-sonnet-4 running
U
Find the latest unread DM in WeChat from the marketing channel and summarize it.
AI
Launching WeChat, then I'll scan the chat list for the marketing channel and look for the unread badge.
launch_app
bundle_id: "com.tencent.xin"
→ ok
find_text_on_screen
text: "marketing"
→ match at (210, 384)
tap_by_text
text: "marketing"
describe_screen
→ { "unread": "Q3 plan review at 3pm…" }
AI
The latest message is a meeting reminder for the Q3 plan review at 3pm today — no action required from you.
Architecture

From your editor's chat box to a real iPhone

MCP client speaks stdio, the bridge translates to authenticated HTTPS, the dashboard fans out over a reverse tunnel — all four hops in under 200ms.

architecture
   MCP client      Claude  /  GPT  /  any LLM
        │
        │   tool-call
        ▼
   agentfy-mcp-server     ← 40+ device tools
        │
        │   HTTPS  +  X-API-Key
        ▼
   app.agentfy.io         ← tenant scope, audit log
        │
        │   reverse tunnel
        ▼
   real iPhones           ← 1 device, or 100

        
Tools
40+
Latency
< 200ms
Per-tenant
isolated
Setup
60s
Tool surface

40+ tools, exposed as first-class MCP calls

The same tools your scripts can call — the agent just happens to be the most general consumer.

Device input

What the agent can do on the screen
tap tap_by_text swipe long_press text press_home press_lock

Perception

What the agent can see
screenshot describe_screen find_text_on_screen find_element_on_screen ocr

App control

Lifecycle and deep links
launch_app terminate_app get_foreground_app open_url list_apps

Sub-agents + AI

Hand off the messy bits
ai_takeover ai_solve_captcha ai_extract ai_classify

Network + state

Talk to the outside world
http extract jsonpath set log

Vault + clipboard

Secrets and host-device IO
${vault.X} read_clipboard write_clipboard paste_to_phone

Build with…

Bring your LLM key, bring an iPhone, start agenting

BYOK on every plan. 40+ MCP tools. 60 seconds to first call.

Start free trial