从具有URL的网站下载图像并按描述排序

我正在尝试从网站下载图像,然后能够根据它们各自的描述将这些图像分类到文件夹中。在我的脚本中,我到达了解析HTML标签并获得所需的必要信息(每个图像的URL和该图像的描述)的那一部分。我还在此脚本中添加了另外两列,即每个文件的名称以及包含文件下载名称和文件夹的完整路径。我现在停留在我想做的下一部分上。我希望能够检查文件夹是否已经存在,并在相同的if语句中检查文件名是否已经存在。如果这两个都是正确的,则脚本将移至下一个链接。如果文件不存在,则它将创建文件夹并在那时下载文件。我要执行的下一部分是一个Elif,该文件夹不存在,然后它将创建该文件夹并下载文件。我在下面概述了我希望本节执行的操作。我遇到的问题是我不知道如何下载文件或如何检查它们。如果我要从多个列表中提取信息,我也不知道它将如何工作。对于每个链接,如果下载了文件,则必须从csv的另一列(即另一个列表)中提取完整路径和名称,而我不知道如何设置它才能做到这一点。有人可以帮忙吗... !!!

我的代码(直到我坚持的那一部分)在本节下面,概述了我想对脚本的下一部分进行的操作。

for elem in full_links
        if full_path  exists
                run test for if file name exists
                if file name exists = true
                        move onto the next file
                        if last file in list
                                break
                elif  file name exists = false
                        download image to location with with name in list

        elif full_path does not exist
                download image with file path and name

到目前为止我完成的代码:

from bs4 import BeautifulSoup
from bs4 import SoupStrainer
from pip._vendor import requests
import csv
import time
import urllib.request
import pandas as pd 
import wget



URL = 'https://www.baps.org/Vicharan'
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')

#create a csv
f=csv.writer(open('crawl3.csv' , 'w'))
f.writerow(['description' , 'full_link', 'name','full_path' , 'full_path_with_jpg_name'])



# Use the 'fullview' class 
panelrow = soup.find('div' , {'id' : 'fullview'})

main_class =  panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})

# Look for 'highslide-- img-flag' links
individual_classes = panelrow.find_all('a' , {'class' : 'highslide-- img-flag'})

# Get the img tags, each <a> tag contains one
images = [i.img for i in individual_classes]

for image in images:
    src=image.get('src')
    full_link = 'https://www.baps.org' + src
    description = image.get('alt')
    name = full_link.split('/')[-1]
    full_path = '/home/pi/image_downloader_test/' + description + '/'
    full_path_with_jpg_name = full_path + name 
    f.writerow([description , full_link , name, full_path , full_path_with_jpg_name])

print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')
print('finished with search  and csv created. Now moving onto download portion')
print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')



f = open('crawl3.csv')
csv_f = csv.reader(f)

descriptions = []
full_links = []
names = []
full_path = []
full_path_with_jpg_name = []

for row in csv_f:
    descriptions.append(row[0])
    full_links.append(row[1])
    names.append(row[2])
    full_path.append(row[3])
    full_path_with_jpg_name.append(row[4])
评论