Researcher (Iterative Web Research)

Outcome

Turn a question into an evidence-backed answer by running a repeatable, multi-round web research loop.
Prefer up-to-date sources; surface trade-offs, uncertainties, and next research directions.

0) Online Tools

Confirm what online-capable tools exist in this session and explicitly state the plan to use them.

Non-negotiable

Do not rely on search result snippets/abstracts alone; fetch/open the original page/paper and extract the relevant parts. THIS IS CRITICAL.

If no browsing is available

Ask the user to provide links/files or explicitly enable browsing; then proceed with offline synthesis + reasoning.

Organize tools by phase

Discover: search_query
Drill-down: open / click
Fetch & Extract: open / find / screenshot
Synthesize: summarize + compare
Iterate: refine queries based on gaps + user feedback

For query patterns and source-triage heuristics, see references/query-playbook.md.

Local downloads: use a `/tmp` work directory

If you need to download files for local analysis (PDFs, datasets, repos, etc.), create a dedicated work directory under /tmp first and download there.

Preferred: workdir="$(scripts/mk_workdir.sh)"
Then download into: $workdir
Keep the repo clean; treat the /tmp directory as disposable.

PDFs: Can use `pdftotext` if you can not directly read them

If a key source is a PDF, prefer converting it to text locally so you can search/quote accurately.

If you can download the PDF: use scripts/pdf_to_text.sh (wrapper around pdftotext/pdftotxt).
If you cannot download: fall back to web.run.screenshot + manual extraction, but note the limitations.

Arxiv papers: Prefer fetching HTML over PDF when possible

Many Arxiv papers have HTML versions that are easier to read/search than PDFs. e.g. https://arxiv.org/abs/XXXX.XXXXX often has a link to HTML which is available at https://arxiv.org/html/XXXX.XXXXX.
If you can open the HTML version, prefer that over downloading the PDF.

If encountering access restrictions (like CAPTCHAs or paywalls)

Inform the user to manually access the source and provide the content or a screenshot.

1) Workflow (n-round research loop)

Step 1: Detect vagueness → request clarification

If the prompt is too vague to search effectively, ask 2–5 clarification questions before browsing, covering:

Goal: what decision/action will this inform?
Scope: which sub-area(s) matter and which don’t?
Time window: “latest” as of when? (date range)
Region/context constraints: geography, industry, stack, budget, risk tolerance
Output preference: quick overview vs deep dive; recommendations vs neutral map

If the user can’t answer, state explicit assumptions and proceed.

Step 2: Round 1 broad scan

Generate 6–12 query variants, mixing:

Chinese + English keywords (and common acronyms)
Synonyms and alternative names
“comparison / vs / benchmark / survey / tutorial / docs / RFC / issue / postmortem”
Community filters (as needed): site:reddit.com, site:news.ycombinator.com, site:stackoverflow.com, site:github.com

Run web.run.search_query and quickly open the top results to extract:

Canonical definitions / terminology
Mainstream approaches and current “best practices”
Key trade-offs / controversies
High-signal sources to read next (official docs, top repos, surveys, FAQs)

Keep lightweight notes as: claim → source → date.

Step 3: Report after Round 1 (directions + plan)

Return a short landscape map:

3–7 plausible directions (each: what it is + why it matters)
What seems stable consensus vs what’s disputed
A proposed deep-dive plan (2–4 subtopics, sources to prioritize, questions to resolve)
2–3 targeted questions for the user to choose direction and constraints

Then explicitly ask the user to comment/choose: “Which direction should we deep dive first?”

Step 4: Round 2..N deep dive loop

After the user’s feedback, pick 1–3 focused subtopics and search deeply:

Prioritize primary sources when possible: official docs/specs, standards, research papers, repos/design docs.
Include community discussion for pitfalls and edge cases: issues/PRs, postmortems, forums.
Search “enough” before concluding: multiple independent sources, and at least one primary source when available.

For each round, deliver:

Key findings with supporting links (and dates for time-sensitive claims)
Comparison table / pros-cons / decision criteria
Open questions + next search angles (what to look up next and why)

Then ask for feedback and repeat Step 4 as needed.

Quality Bar

Treat “latest” as time-sensitive: always include dates and call out what may have changed recently.
Separate facts, informed interpretation, and speculation.
If sources disagree, present both sides and explain plausible reasons (methodology, context, recency).

researcherSafety 95Repository

Package Files

Researcher (Iterative Web Research)

Outcome

0) Online Tools

Non-negotiable

If no browsing is available

Organize tools by phase

Local downloads: use a `/tmp` work directory

PDFs: Can use `pdftotext` if you can not directly read them

Arxiv papers: Prefer fetching HTML over PDF when possible

If encountering access restrictions (like CAPTCHAs or paywalls)

1) Workflow (n-round research loop)

Step 1: Detect vagueness → request clarification

Step 2: Round 1 broad scan

Step 3: Report after Round 1 (directions + plan)

Step 4: Round 2..N deep dive loop

Quality Bar

Install

AI Quality Score

Metadata

Tags

researcherSafety 95Repository ShareFavorite skill

Package Files

Researcher (Iterative Web Research)

Outcome

0) Online Tools

Non-negotiable

If no browsing is available

Organize tools by phase

Local downloads: use a /tmp work directory

PDFs: Can use pdftotext if you can not directly read them

Arxiv papers: Prefer fetching HTML over PDF when possible

If encountering access restrictions (like CAPTCHAs or paywalls)

1) Workflow (n-round research loop)

Step 1: Detect vagueness → request clarification

Step 2: Round 1 broad scan

Step 3: Report after Round 1 (directions + plan)

Step 4: Round 2..N deep dive loop

Quality Bar

Install

AI Quality Score

Metadata

Tags

researcherSafety 95Repository

Local downloads: use a `/tmp` work directory

PDFs: Can use `pdftotext` if you can not directly read them