Python: ftplib hangs at end of transfer

hammus picture hammus · Oct 30, 2013 · Viewed 7.7k times · Source

I been searching on this for a couple of days and havent found an answer yet.

I have trying to download video files from an FTP, my script checks the server, compares the nlist() to a list of already downloaded files parsed from a text file and then creates a new list of files to get and iterates over it downloading each file, disconnecting from the server and reconnecting for the next file (I thought server timeout might be an issue so I quit() the connection after each file download).

This works for the first few files but as soon as I hit a file that takes longer than 5 mins, fitlib just hangs at the end of the transfer (I can see in explorer that the file is the correct size so the download has completed but it doesnt seem to be getting the message and moving on to the next file)

any help will be greatly appreciated, my code is below:

newPath = "Z:\\pathto\\downloads\\"

for f in getFiles:
    print("Getting " + f)

for f in getFiles:

    fil = f.rstrip()
    ext = os.path.splitext(fil)[1]
    if ext in validExtensions:
        print("Downloading new file: " + fil)
        downloadFile(fil, newPath)

here is download.py

from ftplib import FTP
def downloadFile(filename, folder):
    myhost = 'host'
    myuser = 'user'
    passw = 'pass'
    #login
    ftp = FTP(myhost,myuser,passw)
    localfile = open(folder + filename, 'wb')
    ftp.retrbinary("RETR " + filename, localfile.write, 1024)
    print("Downloaded " + filename)
    localfile.close()
    ftp.quit()

Answer

abarnert picture abarnert · Oct 30, 2013

Without more information, I can't actually debug your problem, so I can only suggest the most general answer. This will probably not be necessary for you, but probably will be sufficient for anyone.

retrbinary will block until the entire file is done. If that's longer than 5 minutes, nothing will get sent over the control channel for the entire 5 minutes. Either your client is timing out the control channel, or the server is. So, when you try to hang up with ftp.quit(), it will either hang forever or raise an exception.

You can control your side's timeout with a timeout argument on the FTP constructor. Some servers support an IDLE command to allow you to set the server-side timeout. But, even if the appropriate one turns out to be doable, how do you pick an appropriate timeout in the first place?

What you really want to do is prevent the control socket from timing out while a transfer is happening on the data socket. But how? If you, e.g., ftp.voidcmd('NOOP') every so often in your callback function, that'll be enough to keep the connection alive… but it'll also force you to block until the server responds to the NOOP, which many servers will not do until the data transfer is complete, which means you'll just end up blocking forever (or until a different timeout) and not getting your data.

The standard techniques for handling two sockets without one blocking on the other are a multiplexer like select.select or threads. And you can do that here, but you will have to give up using the simple retrbinary interface and instead using transfercmd to get the data socket explicitly.

For example:

def downloadFile(…):
    ftp = FTP(…)
    sock = ftp.transfercmd('RETR ' + filename)
    def background():
        f = open(…)
        while True:
            block = sock.recv(1024*1024)
            if not block:
                break
            f.write(block)
        sock.close()
    t = threading.Thread(target=background)
    t.start()
    while t.is_alive():
        t.join(60)
        ftp.voidcmd('NOOP')

An alternative solution would be to read, say, 20MB at a time, then call ftp.abort(), and use the rest argument to resume the transfer with each new retrbinary until you reach the end of the file. However, ABOR could hang forever, just like that NOOP, so that doesn't guarantee anything—not to mention that servers don't have to respond to it.

What you could do is just close the whole connection down (not quit, but close). This is not very nice to the server, and may result in some wasted data being re-sent, and may also prevent TCP from doing its usual ramp up to full speed if you kill the sockets too quickly. But it should work.

See this answer—and notice that it requires a bit of testing against your particular broken server to figure out which, if any, variation works correctly and efficiently.