Documentation

Everything you need to run Paperclip.

Guides, references, and walkthroughs for the people running AI agents at work. Start at the quickstart, or jump anywhere below.

Agent Browser

Drive a real browser to inspect or interact with a web page or app — navigate, take screenshots, read console and network, fill simple forms — for verification tasks, not unattended automation.

Drive a real, supervised browser to verify behavior, capture evidence, or read JavaScript-rendered content that a static fetch can't reach. This is about supervised verification of a page or app, not unattended scraping.

This is an optional catalog skill — opt-in (install when you need it). For how to install, audit, update, assign, and reset catalog skills, see the Skills reference.

When to use

  • You need a screenshot of a deployed page or local dev server to confirm a UI change.
  • You need to read JavaScript-rendered content that curl/wget won't see.
  • A user reports a UI bug and you need to reproduce it to capture console errors, network requests, or layout state.
  • You need to walk a short flow (load, click, observe) to verify acceptance criteria.

When not to use

  • The page is reachable as static HTML — use an HTTP fetch instead.
  • The task is unattended large-scale scraping — that belongs to a dedicated scraper.
  • The site is behind authentication you don't own, or its terms prohibit automation.

Catalog metadata

Field Value
Catalog id paperclipai:optional:browser:agent-browser
Canonical key paperclipai/optional/browser/agent-browser
Catalog path catalog/optional/browser/agent-browser
Kind optional
Category browser
Slug agent-browser
Entrypoint SKILL.md
Trust level markdown_only
Compatibility compatible
Default install false
Recommended roles qa, engineer, researcher
Requires
Tags browser, puppeteer, playwright, verification
Files 1
Content hash sha256:eabb2c9f7b5e1a27ebb1e05a711d61433a266478154cd671a685e99e67aadea2
Package @paperclipai/[email protected]

File inventory

Path Kind Bytes
SKILL.md skill 5,133

Full skill definition

Skill frontmatter

  • name: agent-browser
  • description: Drive a real browser to inspect or interact with a web page or app — navigate, take screenshots, read console and network, fill simple forms — for verification tasks, not unattended automation.
  • key: paperclipai/optional/browser/agent-browser
  • recommendedForRoles:
    • qa
    • engineer
    • researcher
  • tags:
    • browser
    • puppeteer
    • playwright
    • verification

Skill instructions

Agent Browser

Use a controlled browser to verify behavior, capture evidence, or extract information from web pages that a static fetch cannot reach (SPAs, login-gated pages, dynamic content). This skill is about supervised verification, not unattended scraping.

When to use

  • You need a screenshot of a deployed page or a local dev server to confirm a UI change.
  • You need to read JavaScript-rendered content that curl/wget will not see.
  • A user reports a UI bug and you need to reproduce it interactively to capture console errors, network requests, or layout state.
  • You need to walk through a short flow (load page, click, observe) to verify acceptance criteria.

When not to use

  • The page is reachable as static HTML. Use curl/HTTP fetch — it is cheaper, faster, and more reliable.
  • The task is unattended large-scale scraping. That belongs to a dedicated scraper with rate limits, robots.txt handling, and a real user agent policy — not this skill.
  • The site is behind authentication you do not own credentials for, or whose terms of service prohibit automation.
  • The site involves sensitive accounts (banking, healthcare, government) where automation risks lockout or compliance issues.

Before launching the browser

  • Confirm the URL and what state should be true after navigation.
  • Decide what evidence is needed: full-page screenshot, viewport screenshot, console log, network trace, HTML snapshot, extracted text.
  • Decide the viewport size that matters for the task (mobile vs desktop). Default to a desktop size unless the task is mobile-specific.
  • For local dev servers, confirm the server is running and the port is what you expect.

Driving the browser

A typical verification session:

  1. Launch with a real-looking user agent when the target is the public internet; an unrealistic UA flags automation traffic.
  2. Set a sane viewport (e.g., 1366×768 desktop, 390×844 iPhone-ish).
  3. Navigate and wait for the right signal. Prefer waiting for a specific selector or network-idle over arbitrary sleeps.
  4. Capture evidence immediately after the wait condition succeeds, before any interaction perturbs the state.
  5. Interact deliberately. One click at a time, with a wait between actions; re-screenshot after each meaningful state change.
  6. Read the console and network panels for unexpected errors, 4xx/5xx responses, or slow requests.
  7. Close the browser cleanly when done. Long-running browser sessions leak memory and hold ports.

What evidence to record

For a verification task, deliver:

  • A full-page or viewport screenshot of each meaningful state.
  • The console log, filtered to warnings/errors.
  • Any non-2xx network response with the URL, status, and a short response body excerpt.
  • A short narration: "Navigated to X, observed Y, clicked Z, observed W."

For a UI bug repro, also record:

  • The exact reproduction steps the user can follow.
  • Viewport size and (where relevant) device pixel ratio.
  • Whether the bug reproduces on first load vs after interaction.

Login-gated pages

  • Prefer programmatic auth (API token, magic link) over UI login.
  • If UI login is the only path, the user must provide credentials explicitly for this run. Never reuse credentials outside the session.
  • Do not store credentials in the session log, screenshot, or returned output.

Performance and politeness

  • Throttle to one navigation per few seconds when touching shared infra.
  • Respect robots.txt for public sites you are inspecting at any volume.
  • Cancel navigations if a page exceeds a reasonable timeout (e.g., 30s); the page is broken or rate-limiting you.
  • Do not retry forever on failure. Retry once with a longer timeout, then escalate.

Common failure modes

  • Selector not found. Page changed, or you are waiting before render. Take a screenshot to see actual state; adjust the selector.
  • Click does nothing. The element is offscreen, covered by a modal, or in a shadow DOM. Scroll into view or pierce the shadow root.
  • Headless detection. Some sites detect headless Chrome and serve a different page. Use a non-headless mode or a fingerprint-realistic configuration only when authorized.
  • Cross-origin iframe blocking. Iframes you do not own cannot be inspected; the page must offer the data outside the iframe or the task is infeasible.

Anti-patterns

  • Long unsupervised browser sessions that drift from the original task.
  • Scraping behind authentication you do not own.
  • Captioning a screenshot with "looks good" without saying what state was loaded and what selectors confirmed it.
  • Treating a passing screenshot as proof of correctness across viewports you did not actually test.

See also

  • Optional skills — all optional catalog skills grouped by category.
  • Skills reference — file shape, install pipeline, catalog browse/install/audit/update/reset, assignment, and troubleshooting.
  • App-shipped catalog — how bundled and optional catalog skills are versioned and kept current.
  • Skills guide — the UI walkthrough and built-in catalog overview.