我正在使用bs4进行网页抓取,并且网址不会显示

 收藏

i am new to webscaping and wanted to scape all the charackterportraits from the lol site and when i examined one of the pictures in the browser it was in a "img scr="url" tag and i want to get the url to download the picture but when i do soup.select('img[src]') or soup.select('img') it returns an empty list and i dont know why

这是代码:

data=requests.get(website)
data.raise_for_status()


soup = bs4.BeautifulSoup(data.text,"lxml")
print(soup)
#soup returns html    


elems = soup.select('img[src]')
print(elems)
#elems returns an empty list
回复
  • 笑看红尘 回复

    可能可以使用request,但是似乎您的get请求没有获得完整的pageSource。

    您可以使用硒来克服此问题,只需获取内容即可。

    from selenium import webdriver
    import bs4
    
    driver = webdriver.Chrome()
    driver.get('https://na.leagueoflegends.com/en/game-info/champions/')
    page_source = driver.page_source
    driver.close()
    soup = bs4.BeautifulSoup(page_source, "lxml")
    print(soup)
    
    elems = soup.find_all('img')
    for elem in elems:
        print(elem.attrs['src'])
    

    输出:

    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Aatrox.png
    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Ahri.png
    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Akali.png
    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Alistar.png
    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Amumu.png
    https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Anivia.png
    ...
    

  • id_sed 回复

    这是你的答案

    import requests
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
    soup.find_all('link')    #these are your tags eg: a , script link 
    
    
    OUTPUT:
    Out[21]: 
    [<a href="/en/game-info/get-started/">Get Started</a>,
     <a href="/en/game-info/get-started/what-is-lol/">What is League of Legends?</a>,
     <a href="https://na.leagueoflegends.com/en/site/guide/index.html">New Player Guide</a>,
     <a href="/en/game-info/get-started/chat-commands/">Chat Commands</a>,
     <a href="/en/game-info/get-started/community-interaction/">Community Interaction</a>,
     <a href="/en/featured/summoners-code">The Summoner's Code</a>,
     <a href="/en/game-info/champions/">Champions</a>,
     <a href="/en/game-info/items/">Items</a>,
     <a href="/en/game-info/summoners/">Summoners</a>,
     <a href="/en/game-info/summoners/spells/">Summoner Spells</a>,
     <a href="/en/game-info/game-modes/">Game Modes</a>,
     <a href="/en/game-info/game-modes/summoners-rift/">Summoner's Rift</a>,
     <a href="/en/game-info/game-modes/the-twisted-treeline/">The Twisted Treeline</a>,
     <a href="/en/game-info/game-modes/howling-abyss/">Howling Abyss</a>,
     <a href="//na.leagueoflegends.com/en/">Home</a>,
     <a href="/en/game-info/">Game Info</a>]
    
    soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
    soup.find_all('script')
    Out[22]: 
    soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
    soup.find_all('a')
    [<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-N98J');</script>,
     <script>window.ga = window.ga || function(){(ga.q=ga.q||[]).push(arguments)};ga.l = +new Date;</script>,
     <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/modernizr.js" type="text/javascript"></script>,
     <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>,
     <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-all.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-kit-all.js" type="text/javascript"></script>,
     <script type="text/javascript">rg_force_language = 'en_US';rg_force_manifest = 'https://ddragon.leagueoflegends.com/realms/na.js';rg_assets = 'https://lolstatic-a.akamaihd.net/game-info/1.1.9';</script>,
     <script type="text/javascript">window.riotBarConfig = {touchpoints: {activeTouchpoint: 'game'},locale: {landingUrlPattern : 'https://na.leagueoflegends.com//game-info/'},footer: {enabled: true,container: {renderFooterInto: '#footer'}}};</script>,
     <script async="" src="https://lolstatic-a.akamaihd.net/riotbar/prod/latest/en_US.js"></script>,
     <script src="https://ddragon.leagueoflegends.com/cdn/dragonhead.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-utils.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-i18n.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/external/jquery.lazy-load.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDFilterApp.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupItem.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupContainer.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridItem.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridView.js" type="text/javascript"></script>,
     <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListApp.js" type="text/javascript"></script>]
    
    soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
    soup.find_all('link')
    Out[23]: 
    [<link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/lol-kit.css" rel="stylesheet"/>,
     <link href="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/css/base-styles.css" rel="stylesheet"/>,
     <link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/resources/images/favicon.ico" rel="SHORTCUT ICON"/>]
    

  • 别掏心 回复

    使用与页面相同的端点。在网络标签中找到它

    import requests 
    
    base = 'https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/'
    r = requests.get('https://ddragon.leagueoflegends.com/cdn/9.11.1/data/en_US/champion.json').json()
    images = [base + r['data'][item]['image']['full'] for item in r['data']]
    print(images)