Simple Scraper - Extract data from any website in seconds.
ScrapingBee - Web Scraping API.
Crawlab - Distributed web crawler admin platform for spiders management regardless of languages and frameworks.
hakrawler - Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application.
JobFunnel - Tool for scraping job websites, and filtering and reviewing the job listings.
You-Get - Tiny command-line utility to download media contents (videos, audios, images) from the Web.
Universal Reddit Scraper - Scrape Subreddits, Redditors, and comments on posts. A command-line tool written in Python.
Gerapy - Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js.
scrapio - Simple and easy-to-use scraper and crawler in Go.
Colly - Elegant Scraper and Crawler Framework for Golang.
extract-news-api - Flask code to deploy an API that pulls structured data from online news articles.
Web Scraper - Scrape websites for text by CSS selector.
micawber - Small library for extracting rich content from urls.
rget - Download URLs and verify the contents against a publicly recorded cryptographic log.
yarl - Yet another URL library.
Apify - Web Scraping, Data Extraction and Automation.
Gumbo - Pure-C HTML5 parser.
Dataflow Kit - Web scraping. Data extraction tools
Cognito Common Crawl - Search the common crawl using lambda functions.
ScrapingAnt - All in One Scraping API. Rotating Proxies. Headless Chrome.
Django Dynamic Scraper - Creating Scrapy scrapers via the Django admin interface.
AutoScraper - Smart, Automatic, Fast and Lightweight Web Scraper for Python.
Spidey - Dead-simple crawler which focuses on ease of use and speed. Return a list of all URls of a web page.
ScrapeOwl - Simple and affordable web scraping API.
Pholcidae - Tiny python web crawler.
Booking site web scraper - Downloads all of the accommodations for the chosen country and saves them in a file.
Reddit Media Downloader - Scrapes Reddit to download media of your choice.
extruct - Library for extracting embedded metadata from HTML markup.
Floki - Simple HTML parser that enables search for nodes using CSS selectors.
NYT Vote Scraper - Scrapes the NYT Votes Remaining Page JSON and commits it back to this repo. Nice use of GitHub actions for git scraping.
Instagram Scraper - Scrapes an instagram user's photos and videos.
Inventory Hunter - Get notified as soon as your next CPU, GPU, or game console is in stock.
news-please - Open source, easy-to-use news crawler that extracts structured information from almost any news website.
trafilatura - Manage URLs and scrape main text and metadata.
htmldate - Find the publication date of web pages.
jusText - Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages.
sumy - Module for automatic summarization of text documents and HTML pages.
Voyager - Write your own web crawler/scraper as a state machine in rust.
Trandoshan - Fast, highly configurable, cloud native dark web crawler.
ralger - Makes it easy to scrape a website with R.
snscrape - Social networking service scraper in Python.
qwarc - Framework for rapidly archiving a large number of URLs with little overhead.
select.rs - Rust library to extract useful data from HTML documents, suitable for web scraping.
Scrapera - Provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.
Headless Chrome Crawler - Distributed crawler powered by Headless Chrome.
Automatio - No-code Web Automation Tool. Automation Tool to Extract Data From Any Website.
crawler-user-agents - List of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
ant - Web crawler for Go.
SearchScraperAPI - Implementation of an API, which allows you to scrape Google, Bing, Yandex, and Qwant.
Scala Scraper - Scala library for scraping content from HTML pages.