Feapder Dev

Quick Intake

Ask the user for the minimum information needed to produce a runnable crawler:

Target site/app constraints (login, JS rendering, anti-bot, rate limits).
Data schema (fields), uniqueness key, and storage target (file/DB/queue).
Entry points (seed URLs, keywords, categories) and pagination strategy.
Scale constraints (single machine vs distributed; incremental vs full; schedule).
Environment constraints (OS, Python version, network/proxy availability).

Workflow Decision Tree

Decide the crawler architecture before writing code:

Choose crawler type:
- Choose AirSpider for small jobs, no distributed scheduling, simple persistence.
- Choose Spider for high-volume, distributed crawling, and persistent task queues.
- Choose TaskSpider for explicit task tables/queues with retry/continuation semantics.
- Choose BatchSpider for periodic batches with batch metadata and separation.
Choose rendering:
- Use HTTP-only requests when pages are static or the API is reachable.
- Use browser rendering only when content is JS-rendered or requires complex interaction.
Choose persistence and flow:
- Use Item + pipeline for normalized storage.
- Use explicit DB access helpers only when pipeline is insufficient.
Choose anti-bot strategy:
- Add rate limits/backoff/retries first.
- Add proxy rotation/user pool only when needed and with observability.

Build Steps

Implement the smallest runnable crawler first, then iterate:

1) Set up environment (prefer `uv`)

Create a project venv with uv and pin a compatible Python version.
Install feapder and any runtime dependencies (DB drivers, playwright/selenium if used).
Verify import and basic CLI availability before writing spider code.

2) Scaffold a minimal project

Create a clean module layout: spiders/, parsers/, items/, pipelines/, settings.py.
Start with one spider, one parser, and one item type.

3) Implement a minimal spider + parser

Keep the first version minimal and end-to-end runnable:

from feapder import AirSpider
from feapder.network.request import Request

class DemoSpider(AirSpider):
    def start_requests(self):
        yield Request("https://example.com")

    def parse(self, request, response):
        # extract fields -> yield Item or dict
        yield {"url": response.url}

4) Add Item + pipeline (only after extraction works)

Define a stable unique key for de-duplication.
Add a pipeline that validates fields, normalizes types, and writes to the chosen sink.

5) Configure settings deliberately

Keep settings minimal at first.
Turn on logging suitable for debugging.
Add retries/timeouts/backoff before adding heavier defenses.

6) Run and debug iteratively

Run one seed URL first.
Add pagination/expansion only after single-page extraction is correct.
Use logs and counters (success/fail/empty) to validate progress.

7) Scale up (when required)

Switch to Spider/TaskSpider/BatchSpider only when the workload or scheduling requires it.
Add Redis/DB infrastructure as a deliberate dependency and document required services.

Debugging Checklist

Use a checklist-driven approach before changing architecture:

Confirm the request layer works: DNS/SSL/proxy, timeouts, status codes, encoding.
Confirm selectors/JSON parsing against real responses (save a sample response).
Confirm that parse() yields items/requests as expected (no silent drops).
Confirm pipeline and settings are loaded (wrong module path is a common cause).
Add observability: log key decisions, count produced items, count retries and errors.

References

Read these only when needed to keep context small:

references/checklist.md: Common tasks and quick checks for Feapder projects.
references/patterns.md: Lightweight patterns for spider types, parsing, and pipelines.
references/official-docs/: Local copy of Feapder official docs (Markdown).
references/official-docs.md: Short note on using the local docs copy.

feapder-devSafety 90Repository

Package Files

Feapder Dev

Quick Intake

Workflow Decision Tree

Build Steps

1) Set up environment (prefer `uv`)

2) Scaffold a minimal project

3) Implement a minimal spider + parser

4) Add Item + pipeline (only after extraction works)

5) Configure settings deliberately

6) Run and debug iteratively

7) Scale up (when required)

Debugging Checklist

References

Install

AI Quality Score

Metadata

Tags

feapder-devSafety 90Repository ShareFavorite skill

Package Files

Feapder Dev

Quick Intake

Workflow Decision Tree

Build Steps

1) Set up environment (prefer uv)

2) Scaffold a minimal project

3) Implement a minimal spider + parser

4) Add Item + pipeline (only after extraction works)

5) Configure settings deliberately

6) Run and debug iteratively

7) Scale up (when required)

Debugging Checklist

References

Install

AI Quality Score

Metadata

Tags

feapder-devSafety 90Repository

1) Set up environment (prefer `uv`)