我正在尝试从网站下载图像,然后能够根据它们各自的描述将这些图像分类到文件夹中。在我的脚本中,我到达了解析HTML标签并获得所需的必要信息(每个图像的URL和该图像的描述)的那一部分。我还在此脚本中添加了另外两列,即每个文件的名称以及包含文件下载名称和文件夹的完整路径。我现在停留在我想做的下一部分上。我希望能够检查文件夹是否已经存在,并在相同的if语句中检查文件名是否已经存在。如果这两个都是正确的,则脚本将移至下一个链接。如果文件不存在,则它将创建文件夹并在那时下载文件。我要执行的下一部分是一个Elif,该文件夹不存在,然后它将创建该文件夹并下载文件。我在下面概述了我希望本节执行的操作。我遇到的问题是我不知道如何下载文件或如何检查它们。如果我要从多个列表中提取信息,我也不知道它将如何工作。对于每个链接,如果下载了文件,则必须从csv的另一列(即另一个列表)中提取完整路径和名称,而我不知道如何设置它才能做到这一点。有人可以帮忙吗... !!!
我的代码(直到我坚持的那一部分)在本节下面,概述了我想对脚本的下一部分进行的操作。
for elem in full_links
if full_path exists
run test for if file name exists
if file name exists = true
move onto the next file
if last file in list
break
elif file name exists = false
download image to location with with name in list
elif full_path does not exist
download image with file path and name
到目前为止我完成的代码:
from bs4 import BeautifulSoup
from bs4 import SoupStrainer
from pip._vendor import requests
import csv
import time
import urllib.request
import pandas as pd
import wget
URL = 'https://www.baps.org/Vicharan'
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')
#create a csv
f=csv.writer(open('crawl3.csv' , 'w'))
f.writerow(['description' , 'full_link', 'name','full_path' , 'full_path_with_jpg_name'])
# Use the 'fullview' class
panelrow = soup.find('div' , {'id' : 'fullview'})
main_class = panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})
# Look for 'highslide-- img-flag' links
individual_classes = panelrow.find_all('a' , {'class' : 'highslide-- img-flag'})
# Get the img tags, each <a> tag contains one
images = [i.img for i in individual_classes]
for image in images:
src=image.get('src')
full_link = 'https://www.baps.org' + src
description = image.get('alt')
name = full_link.split('/')[-1]
full_path = '/home/pi/image_downloader_test/' + description + '/'
full_path_with_jpg_name = full_path + name
f.writerow([description , full_link , name, full_path , full_path_with_jpg_name])
print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')
print('finished with search and csv created. Now moving onto download portion')
print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')
f = open('crawl3.csv')
csv_f = csv.reader(f)
descriptions = []
full_links = []
names = []
full_path = []
full_path_with_jpg_name = []
for row in csv_f:
descriptions.append(row[0])
full_links.append(row[1])
names.append(row[2])
full_path.append(row[3])
full_path_with_jpg_name.append(row[4])