How this web scraping cost estimate works
This calculator estimates one pass of a scraping run using three variable cost buckets: proxy/network fees, storage for captured data, and processing/compute for parsing/ETL. It uses the inputs below and converts them into consistent units (per 1,000 requests/pages and per GB).
Formulas
Inputs
- P = pages to scrape
- Cp = proxy cost per 1,000 requests (USD)
- S = average data per page (KB)
- Cs = storage cost per GB (USD)
- Cpr = processing cost per 1,000 pages (USD)
Component costs
- Proxy cost = (P / 1000) ร Cp
- Processing cost = (P / 1000) ร Cpr
- Storage cost = (P ร S / 1,000,000) ร Cs (KB โ GB using 1 GB โ 1,000,000 KB)
Total cost = Proxy cost + Processing cost + Storage cost
Interpreting the results
- If proxy cost dominates, you usually reduce spend by lowering retries, using cheaper geos, reducing concurrency, or improving cache/deduping.
- If storage cost dominates, your biggest lever is retention (how long you keep raw HTML), compression, and whether you store only extracted fields instead of full pages.
- If processing cost dominates, optimize parsing, batching, and avoid heavy headless-browser rendering unless required.
Worked example
Suppose you plan to scrape 250,000 pages. Your proxy provider charges $2.50 per 1,000 requests. You expect 180 KB of stored data per page, storage is $20 per GB, and processing is $0.80 per 1,000 pages.
- Proxy cost = (250,000 / 1000) ร 2.50 = 250 ร 2.50 = $625.00
- Processing cost = (250,000 / 1000) ร 0.80 = 250 ร 0.80 = $200.00
- Storage GB = 250,000 ร 180 KB = 45,000,000 KB โ 45 GB
- Storage cost = 45 ร 20 = $900.00
- Total = 625 + 200 + 900 = $1,725.00
Typical rate ranges (sanity-check table)
| Cost component |
Common billing unit |
Typical range |
Notes |
| Proxy / requests |
per 1,000 requests |
$0.50โ$15+ |
Varies by geo, residential vs. datacenter, and anti-bot difficulty. |
| Storage |
per GB (often per month) |
$0.02โ$30+ |
Cloud object storage is usually low; specialized managed stores can be higher. |
| Processing / ETL |
per 1,000 pages |
$0.10โ$10+ |
Depends on parsing complexity, JS rendering, NLP, and enrichment steps. |
Assumptions & limitations (what this estimate excludes)
- No retries: assumes 1 request per page. If you expect retries, multiply pages by (1 + retry rate). Example: 25% retry rate โ effective requests = P ร 1.25.
- No asset loading: counts the page payload you store, not additional images, scripts, API calls, or bandwidth from headless browsers.
- Storage is treated as a simple per-GB charge: many providers bill monthly and may charge for read/write operations and egress.
- Linear processing model: assumes processing cost scales roughly with page count; heavy rendering, OCR, NLP, or deduplication can be non-linear.
- Compliance & overhead excluded: engineering time, monitoring, QA, legal review, and vendor minimums are not included.