Jan 30, 2023
datasette-scraperis a Datasette plugin to manage small-ish (~100K pages) crawl and extract jobs.
- Opinionated yet extensible
- Some useful tasks are possible out-of-the-box, or write your own pluggy hooks to go further
- Leans heavily into SQLite
- Introspect your crawls via ops tables exposed in Datasette
- Built on robust libraries
- Datasette as a host
- selectolax for HTML parsing
- httpx for HTTP requests
- pluggy for extensibility
- zstandard for efficiently compressing HTTP responses
Not for adversarial crawling. Want to crawl a site that blocks bots? You're on your own.