5 releases
Uses new Rust 2024
| new 0.14.0 | Jun 8, 2026 |
|---|---|
| 0.13.4 | Jun 7, 2026 |
| 0.13.3 | Jun 7, 2026 |
| 0.13.2 | Jun 6, 2026 |
| 0.13.1 | Jun 6, 2026 |
#24 in #web-crawler
525KB
11K
SLoC
Optional self-host monitor mode for CRW (Cargo feature monitor on
crw-server, default OFF).
This crate gives self-hosters reduced-parity scheduled monitoring without forcing a database dependency on the default stateless engine. It is the self-host analogue of the SaaS control plane (§2 of the monitor plan):
- Store (
store) — a WAL SQLite store of monitors, targets, snapshots, checks and per-page results. - Schedule (
schedule) — a small UTC cron / fixed-interval parser. - Runner (
runner) — runs one check: scrapes/crawls, diffs each page against the stored snapshot via the purecrw_diffengine, computes set-levelnew/removedacross the discovered URL set (the key self-host capability, possible becauseCrawlState.datacarries the full page set), applies a site-down gate, and optionally runs the LLM judge. - Scheduler (
scheduler) — a tokio tick loop that finds due monitors and runs their checks. - Webhook (
webhook) — HMAC-SHA256 signed local webhook delivery.
Everything here is local to this crate. The SQLite/HMAC stack is behind the
crate's own optional features and crw-server only links this crate behind
its monitor feature, so the open-core boundary (cargo tree -p crw-server
shows no rusqlite/hmac) holds.
Deferred (documented TODO)
- SMTP email delivery is a stub (
webhook::EmailStub); only HMAC webhooks are wired. SMTP balloons scope (TLS, auth, MIME, bounce handling) and is deferred to a follow-up per the M6 scope-discipline note. - The
crw monitor ...CLI surface and the MCPmonitortool are deferred (§9 of the plan). The library API (Store,Scheduler,run_check) is the integration point a CLI/endpoint would call.
Dependencies
~31–47MB
~785K SLoC