我是BS4网站抓手的新手,需要您的帮助。
我的目标是从以下网址中刮掉Yahoo Finance的“总债务”行:
https://finance.yahoo.com/quote/AAPL/key-statistics/
我尝试使用以下代码未成功:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
from bs4 import BeautifulSoup as soup
res = requests.get(https://finance.yahoo.com/quote/AAPL)
html = res.text
soup = soup(html, 'html.parser')
total_debt = soup.find( "span", {"data-reactid" : "591"} )
print("total debt is " , total_debt)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
我注意到的另一件事是,如果我尝试使用以下URL抓取另一个置顶栏(Facebook),FB
https://finance.yahoo.com/quote/FB/key-statistics/
数据反应堆不再存在于591中。
谁能提供一些见解?谢谢大家的帮助!
That reactid will always change. Either find a class you can utilize to identify that row or similar rows or you can use a package called yahooquery. That data can be retrieved pretty easily: