#

crawling

Here are 591 public repositories matching this topic...

Scrapling

D4Vinci / Scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

Updated Jun 29, 2026
Python

scrapy

scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python

Updated Jul 1, 2026
Python

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

python crawler scraper news crawling news-aggregator

Updated May 13, 2026
Python

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling selenium pip web-scraping beautifulsoup web-crawling headless-chrome apify parsel playwright

Updated Jul 1, 2026
Python

ai.robots.txt

ai-robots-txt / ai.robots.txt

A list of AI agents and robots to block.

privacy ai crawling crawlers

Updated Jun 26, 2026
Python

lorien / grab

Web Scraping Framework

python crawler framework spider asynchronous network python-library scraping crawling http-client python3 web-scraping pycurl urllib3

Updated Sep 19, 2025
Python

holiday-cn

NateScarlet / holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

data natural-language-processing crawling china holiday

Updated May 11, 2026
Python

mlscraper

lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

html crawler machine-learning scraper scraping crawling crawler-python extraction-engine

Updated Mar 17, 2024
Python

needleworm / bhban_rpa

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.

python education design automation crawling rpa

Updated Jan 15, 2026
Python

scrapfly / scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

Updated Jun 30, 2026
Python

bluet / proxybroker2

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭

crawler privacy proxy crawling proxy-server http-proxy https-proxy socks proxies anonymity anonymous proxypool proxy-list hacktoberfest proxychains proxy-checker

Updated Jun 17, 2026
Python

clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

crawling selenium scrapy

Updated Apr 13, 2026
Python

scrapinghub / scrapyrt

HTTP API for Scrapy spiders

python crawler scraper crawling twisted scrapy webcrawler hacktoberfest webcrawling hacktoberfest2021

Updated Jun 29, 2026
Python

iawia002 / Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

python crawler scraper downloader video scraping crawling python3

Updated Mar 29, 2021
Python

cxcscmu / Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

crawler web-crawler crawling web-crawling pre-training pretraining large-language-models llm

Updated Feb 24, 2025
Python

essandess / isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation

data privacy obfuscation web crawling data-analytics privacy-enhancing-technologies

Updated Mar 20, 2023
Python

l4rm4nd / LinkedInDumper

Python 3 script to dump/scrape/extract company employees from LinkedIn API

osint spider linkedin scraping crawling python3 employees extracting

Updated Apr 18, 2026
Python

scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.

testing monitoring scraping crawling spiders hacktoberfest monitoring-tool scrapinghub

Updated May 28, 2026
Python

Florents-Tselai / WarcDB

WarcDB: Web crawl data as SQLite databases.

cli database sqlite crawling warc web-archiving web-data

Updated Jul 13, 2024
Python

telegram-crawler

MarshalX / telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

parser crawler telegram crawling crawling-python telegram-org telegram-updates

Updated Jul 1, 2026
Python

Improve this page

Add a description, image, and links to the crawling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the crawling topic, visit your repo's landing page and select "manage topics."