Beautifulsoup链接(URL)具有特殊字符

for link in new_links:   
    print(link)
    html = urlopen(link)
    bsObj_href = BeautifulSoup(html, "lxml")
    #bsObj_href = BeautifulSoup (html.decode('utf-8', 'ignore'))
    div_href = bsObj_href.find("div",{"id":"accordion"})
    href_rows = div_href.findAll("tr")

I have a link that has a special character ® like the link below. https://www.accessdata.fda.gov/scripts/drugshortages/dsp_ActiveIngredientDetails.cfm?AI=AVYCAZ®%20(ceftazidime%20and%20avibactam)%20for%20Injection,%202%20grams/0.5%20grams&st=c&tab=tabs-1

我收到一条错误消息,即UnicodeEncodeError:'ascii'编解码器无法在位置68编码字符'\ xae':序数不在range(128)中。 我查找其他海报,但仅说明如何忽略特殊字符或如何处理HTML正文中的一个。我无法删除特殊字符,因为我需要该确切的URL来提取数据。如何以正确的方式打开该URL以提取数据?

评论
  • snihil
    snihil 回复

    尝试用%C2%AE替换®字符,它应该可以工作。