Launch a headless browser bound to a local port, then drive it over a plain HTTP API — with an interactive Swagger UI console at the root.
webrudder <url> starts a local daemon: it launches a headless Chromium at that URL
and serves it on a localhost port. Agents and scripts drive it through an HTTP API;
visiting the root serves Swagger UI — an interactive list of every endpoint with a
try-it-out console. Navigation happens by interacting (clicking links and buttons); the daemon
is a state machine that tracks the current URL, DOM, and element map. Close the terminal and the
browser dies with it.
No bundled browser bloat. No per-step screenshots. No MCP layer. One static binary talking to Chromium over CDP.
Driving a browser through an LLM is slow when every step round-trips a screenshot: the model waits on inference, parses ~1.5k image tokens, then acts — and repeats. The engine speed was never the bottleneck; the protocol is. webrudder cuts the loop two ways.
GET /scan returns a compact list of actionable elements
(e1 button "Login"). The model acts by ref-id with no vision and ~50 tokens
instead of ~1500.
Many actions in one request (POST /batch) collapse N round-trips
into one — a whole form fills and submits in a single call.
Terminal: ./webrudder https://example.com
↓
Daemon launches headless Chromium (over CDP)
and serves http://localhost:10000
↓
Humans → open localhost:10000 → Swagger UI (endpoint list + try-it-out)
Agents → GET /scan · POST /click · GET /read ...
↓
Daemon = state machine: current URL + DOM + element map.
Clicking navigates; re-scan for the new page's elements.
↓
Close terminal → daemon + Chromium die cleanly
The URL passed at launch is just the entry point — after that you move around by interacting; you never re-feed URLs to navigate.
| Surface | URL | For | What it does |
|---|---|---|---|
| Swagger UI | localhost:10000/ |
humans | interactive endpoint list + try-it-out console |
| HTTP API | localhost:10000/scan, /click, … |
agents / scripts | programmatic control, JSON in and out |
Both hit the same state machine — a try-it-out call and an agent's /click act on one live browser. Swagger UI is generated from the API's OpenAPI spec, the single source of truth for every endpoint.
Start it in one terminal:
$ ./webrudder https://example.com webrudder · http://localhost:10000 · chromium pid 4821 · ctrl-c to quit
Drive it via the API — curl, an agent, or a script:
$ curl localhost:10000/scan
{"elements":[{"ref":"e1","role":"link","name":"More information","href":"..."}]}
$ curl -X POST localhost:10000/click -d '{"ref":"e1"}'
{"ok":true,"navigated":true,"url":"https://www.iana.org/help/example-domains"}
Or open http://localhost:10000/ for Swagger UI — browse every endpoint and fire requests live.
Base URL: http://localhost:<port>
| Method & Path | Body | Returns |
|---|---|---|
GET /scan | — | actionable elements [{ref, role, name, kind?}] |
GET /read | — | {url, title, text} |
GET /snap | — | full-page PNG; ?full=false for viewport only |
GET /status | — | {url, title, port} |
POST /click | {ref} | {ok, navigated?, downloaded?, needs_file?} |
POST /fill | {ref, text} | {ok} |
POST /goto | {url} | {ok, url} |
POST /upload | {ref, file} | clicks, intercepts the file chooser, injects the file |
POST /download | {ref, dir?} | clicks, waits, returns the saved path |
POST /batch | {actions:[…]} | many actions, one request |
POST /shutdown | — | stops the daemon and browser |
It navigates, interacts, and extracts. It does not assert — the caller reads back text / url / elements and decides pass/fail.
scan + read return cheap text; screenshots only on demand. No vision tokens per step.
No bundled browser, no driver layer — a thin CDP client and a single static binary. Chromium is fetched once, not embedded.
A long-running daemon holds the live browser, so requests operate on evolving page state. Close the terminal and everything dies cleanly.