askill
scrape-url

scrape-urlSafety 88Repository

Web crawling with Tantivy full-text search indexing. Supports crawl, search, and auto-crawl. WHEN: User wants to "scrape a website", "crawl documentation", "search crawled content", "index a site". WHEN NOT: Single page fetch (use browser_navigate), web search (use web_search).

1 stars
1.2k downloads
Updated 2/15/2026

Package Files

Loading files...
SKILL.md

scrape_url - Web Crawling with Search

Core Concept

mcp__plugin_kg_kodegen__scrape_url crawls websites, saves content as Markdown, and builds a Tantivy full-text search index. Uses an action-based interface with connection isolation and background execution support.

Actions

ActionDescriptionRequired Parameters
SEARCHSearch with auto-crawl (RECOMMENDED)url, query
CRAWLExplicit crawlurl
READCheck crawl progressNone
LISTShow all active crawlsNone
KILLCancel crawlNone

Key Parameters

ParameterTypeDefaultDescription
actionstring"CRAWL"Action to perform
urlstringnullTarget URL (required for CRAWL/SEARCH)
crawl_idnumber0Crawl instance (0, 1, 2...)
querystringnullSearch query (SEARCH action)
max_depthnumber3Maximum crawl depth
limitnumbernullMax pages to crawl
await_completion_msnumber600000Timeout (10 min default)
crawl_rate_rpsnumber2Requests per second
search_limitnumber10Max search results
search_offsetnumber0Search pagination offset
search_highlightbooleantrueHighlight matches

Usage Examples

One-Step Search (Recommended)

Auto-crawls if index doesn't exist:

{
  "action": "SEARCH",
  "url": "https://ratatui.rs",
  "crawl_id": 0,
  "query": "layout widgets"
}

Explicit Crawl

{
  "action": "CRAWL",
  "crawl_id": 0,
  "url": "https://docs.rs/tokio"
}

Crawl with Limits

{
  "action": "CRAWL",
  "url": "https://example.com/docs",
  "max_depth": 2,
  "limit": 50,
  "crawl_rate_rps": 1
}

Check Progress

{
  "action": "READ",
  "crawl_id": 0
}

List Active Crawls

{ "action": "LIST" }

Cancel Crawl

{
  "action": "KILL",
  "crawl_id": 0
}

Search Query Syntax

Tantivy supports advanced queries:

Query TypeExampleDescription
Textlayout componentsSearch all fields
Phrase"exact phrase"Exact match
Booleanlayout AND widgetsLogical operators
Fieldtitle:layoutSearch specific field
Fuzzylayot~2Allow 2 character differences

Output Directory Structure

Content saved to .kodegen/citescrape/{domain}/:

.kodegen/citescrape/ratatui.rs/
├── manifest.json          # Crawl metadata
├── .search_index/         # Tantivy search index
├── index.md               # Homepage
├── tutorials/
│   └── hello-world.md
└── api/
    └── widgets.md

Workflows

Research Documentation

  1. SEARCH with url and query (auto-crawls if needed)
  2. Review results
  3. Follow up with more specific queries

Full Site Crawl

  1. CRAWL with url, max_depth, limit
  2. Monitor with READ
  3. Search with SEARCH action

Remember

  • SEARCH with url auto-crawls if index missing - simplest approach
  • Crawls are isolated by crawl_id - use different numbers for parallel crawls
  • Rate limiting default is 2 req/sec - be respectful of servers
  • Content saved as Markdown for easy reading
  • Search index enables fast full-text queries
  • Use READ to check on background crawls
  • Timeout returns partial results - crawl continues in background

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/19/2026

High-quality technical skill document for web crawling with Tantivy search. Excellent completeness with actions, parameters, examples, search syntax, and workflows. Strong actionability through clear WHEN/WHEN NOT guidance and practical JSON examples. Well-organized with proper tables and sections. Includes safety considerations like rate limiting. Not internal-only - appears to be a general-purpose tool in a public registry.

88
90
85
92
92

Metadata

Licenseunknown
Version0.1.0
Updated2/15/2026
Publishermajiayu000

Tags

api