我有2个独立的问题:
问题1
I'm trying to scrape some tables from this website. See the attached image below.
因此,我编写了这段代码,直到这里:
from bs4 import BeautifulSoup
import requests
url = 'https://transparencia.registrocivil.org.br/registros'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
source = requests.get(url, headers=headers).text
soup = BeautifulSoup(source, 'html.parser')
table = soup.find('table')
print(table.prettify())
This code isn't working and the table
returned is NoneType
. It seems that BeautifulSoup can't find it. What am I doing wrong to scrape the table?
完成此操作后,我将解释问题的第二部分:
问题2
My main idea is to scrape data using the selectors from the image, referring each year, month, region, state to scrape city-data.
这些表中的一些很大,分布在页面中,如您在网站中某些表的末尾所见。如何运行所有这些页面以获取每年,每月,地区和州的所有数据?
我相信数据是动态加载的,所以我建议使用Selenium抓取数据。我不确定BeautifulSoup是否可以处理网站上的动态数据。