如何使用urllib2在python中下载zip文件?
收藏

两部分的问题。我正在尝试从互联网档案下载多个存档的Cory Doctorow播客。我的iTunes提要中未包含的旧内容。我已经编写了脚本,但是下载的文件格式不正确。

问题1-如何更改以下载zip mp3文件? 问题2-将变量传递到URL的更好方法是什么?

 # and the base url.

def dlfile(file_name,file_mode,base_url):
    from urllib2 import Request, urlopen, URLError, HTTPError

    #create the url and the request
    url = base_url + file_name + mid_url + file_name + end_url 
    req = Request(url)

    # Open the url
    try:
        f = urlopen(req)
        print "downloading " + url

        # Open our local file for writing
        local_file = open(file_name, "wb" + file_mode)
        #Write to our local file
        local_file.write(f.read())
        local_file.close()

    #handle errors
    except HTTPError, e:
        print "HTTP Error:",e.code , url
    except URLError, e:
        print "URL Error:",e.reason , url

# Set the range 
var_range = range(150,153)

# Iterate over image ranges
for index in var_range:

    base_url = 'http://www.archive.org/download/Cory_Doctorow_Podcast_'
    mid_url = '/Cory_Doctorow_Podcast_'
    end_url = '_64kb_mp3.zip'
    #create file name based on known pattern
    file_name =  str(index) 
    dlfile(file_name,"wb",base_url

该脚本是从这里改编的

最佳答案

Here's how I'd deal with the url building and downloading. I'm making sure to name the file as the basename of the url (the last bit after the trailing slash) and I'm also using the with clause for opening the file to write to. This uses a ContextManager which is nice because it will close that file when the block exits. In addition, I use a template to build the string for the url. urlopen doesn't need a request object, just a string.

import os
from urllib2 import urlopen, URLError, HTTPError


def dlfile(url):
    # Open the url
    try:
        f = urlopen(url)
        print "downloading " + url

        # Open our local file for writing
        with open(os.path.basename(url), "wb") as local_file:
            local_file.write(f.read())

    #handle errors
    except HTTPError, e:
        print "HTTP Error:", e.code, url
    except URLError, e:
        print "URL Error:", e.reason, url


def main():
    # Iterate over image ranges
    for index in range(150, 151):
        url = ("http://www.archive.org/download/"
               "Cory_Doctorow_Podcast_%d/"
               "Cory_Doctorow_Podcast_%d_64kb_mp3.zip" %
               (index, index))
        dlfile(url)

if __name__ == '__main__':
    main()

    公众号
    关注公众号订阅更多技术干货!