How does cgi.FieldStorage store files?

Justinas picture Justinas · Jul 27, 2011 · Viewed 20.8k times · Source

So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.

At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.

I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!

Here's the code I've used:

import cgi
from wsgiref.simple_server import make_server

def app(environ,start_response):
    start_response('200 OK',[('Content-Type','text/html')])
    output = """
    <form action="" method="post" enctype="multipart/form-data">
    <input type="file" name="failas" />
    <input type="submit" value="Varom" />
    </form>
    """
    fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
    f = fs.getfirst('failas')
    print type(f)
    return output


if __name__ == '__main__' :
    httpd = make_server('',8000,app)
    print 'Serving'
    httpd.serve_forever()

Thanks in advance! :)

Answer

gimel picture gimel · Jul 27, 2011

Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.

If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:

fileitem = form["userfile"]
if fileitem.file:
    # It's an uploaded file; count lines
    linecount = 0
    while 1:
        line = fileitem.file.readline()
        if not line: break
        linecount = linecount + 1

Regarding your example, getfirst() is just a version of getvalue(). try replacing

f = fs.getfirst('failas')

with

f = fs['failas'].file

This will return a file-like object that is readable "at leisure".