Web Research Skill

Use markdown.new to convert any web URL into clean, AI-ready markdown. This saves ~80% on tokens compared to raw HTML.

When to Use

Researching external documentation or sources
Gathering information from web pages
Archiving web content for analysis
Building knowledge bases from web sources
Any task requiring web content in markdown format

Quick Usage

Method 1: Direct URL (Browser)

Prepend markdown.new/ to any URL:

https://markdown.new/https://example.com/article

Method 2: API (Programmatic)

from fred_macro.utils.web_to_markdown import fetch_markdown

markdown_content, metadata = fetch_markdown("https://example.com/article")
print(f"Token count: {metadata.get('tokens', 'unknown')}")

Method 3: CLI

python -m fred_macro.utils.web_to_markdown https://example.com/article

How It Works (Three-Tier Pipeline)

markdown.new tries the fastest method first, falling back automatically:

Primary: Cloudflare text/markdown content negotiation
- Returns clean markdown directly from edge-enabled sites
- Zero parsing needed
Fallback 1: Workers AI toMarkdown()
- If HTML is returned, converts it via AI
- Fast conversion without re-fetch
Fallback 2: Browser Rendering API
- For JavaScript-heavy pages
- Full headless browser rendering

Token Savings

Format	Tokens (example blog post)
Raw HTML	~16,180
markdown.new	~3,150
Savings	~80%

Configuration Options

Method Override

Force a specific conversion method:

fetch_markdown(url, method="browser")  # JS-heavy sites
fetch_markdown(url, method="ai")       # Workers AI

Image Retention

By default, images are removed. To keep them:

fetch_markdown(url, retain_images=True)

Or via URL:

https://markdown.new/https://example.com?method=browser&retain_images=true

Error Handling

The utility handles common errors gracefully:

Network errors: Retries with exponential backoff
Invalid URLs: Raises ValueError with clear message
Non-200 responses: Returns structured error with status code
Timeout: Configurable timeout (default 30s)

Response Format

{
    "content": "# Clean Markdown\n\nArticle content...",
    "metadata": {
        "url": "https://example.com/article",
        "method_used": "text/markdown",  # or "ai", "browser"
        "tokens": 3150,
        "content_type": "text/markdown; charset=utf-8",
        "retain_images": False
    }
}

Rate Limits and Best Practices

Be respectful: Check robots.txt before bulk fetching
Cache results: Store fetched markdown to avoid re-fetching
Rate limit: Add delays between requests (1-2s minimum)
Respect ToS: Verify external site terms of service
Error gracefully: Always handle network failures

Common Use Cases

Research Agent Workflow

from fred_macro.utils.web_to_markdown import fetch_markdown

urls_to_research = [
    "https://docs.python.org/3/library/asyncio.html",
    "https://fastapi.tiangolo.com/tutorial/",
]

for url in urls_to_research:
    try:
        content, meta = fetch_markdown(url)
        # Save to knowledge base or analyze
        print(f"Fetched {meta['tokens']} tokens from {url}")
    except Exception as e:
        print(f"Failed to fetch {url}: {e}")

DataOps - Documentation Ingestion

# Fetch API documentation for local indexing
docs = fetch_markdown(
    "https://api.example.com/docs",
    retain_images=True  # Keep diagrams
)

Quick Manual Research

Just prepend markdown.new/ to any URL in your browser:

https://markdown.new/https://en.wikipedia.org/wiki/Python_(programming_language)

Validation

Before using, verify:

URL is valid and accessible
robots.txt permits crawling (for bulk operations)
Content license allows storage/analysis
Network connectivity available

Common Mistakes to Avoid

Not checking robots.txt - Respect site crawling policies
No error handling - Network requests can fail
Fetching without caching - Re-fetching wastes tokens and bandwidth
Ignoring rate limits - Can get IP blocked
Forgetting image retention - Diagrams/code snippets may be lost

web-researchSafety 88Repository

Package Files

Web Research Skill

When to Use

Quick Usage

Method 1: Direct URL (Browser)

Method 2: API (Programmatic)

Method 3: CLI

How It Works (Three-Tier Pipeline)

Token Savings

Configuration Options

Method Override

Image Retention

Error Handling

Response Format

Rate Limits and Best Practices

Common Use Cases

Research Agent Workflow

DataOps - Documentation Ingestion

Quick Manual Research

Validation

Common Mistakes to Avoid

Links

Install

AI Quality Score

Metadata

Tags

web-researchSafety 88Repository ShareFavorite skill

Package Files

Web Research Skill

When to Use

Quick Usage

Method 1: Direct URL (Browser)

Method 2: API (Programmatic)

Method 3: CLI

How It Works (Three-Tier Pipeline)

Token Savings

Configuration Options

Method Override

Image Retention

Error Handling

Response Format

Rate Limits and Best Practices

Common Use Cases

Research Agent Workflow

DataOps - Documentation Ingestion

Quick Manual Research

Validation

Common Mistakes to Avoid

Links

Install

AI Quality Score

Metadata

Tags

web-researchSafety 88Repository