news-please is a Python tool built on Scrapy for crawling and extracting structured news data, supporting Common Crawl archives and multiple storage backends.
Scrapy is a Python framework designed for efficient and extensible web scraping, featuring a powerful selector system and item pipelines for data extraction and processing.