Parse a String containing multipart/form-data request body in Java

Attilio picture Attilio · Jan 23, 2018 · Viewed 9.7k times · Source

Problem statement

I think the title says it all: I'm looking for the way to parse a String containing the body part of a multipart/form-data HTTP request. I.e. the contents of the string would look something like this:

--xyzseparator-blah
Content-Disposition: form-data; name="param1"

hello, world
--xyzseparator-blah
Content-Disposition: form-data; name="param2"

42
--xyzseparator-blah
Content-Disposition: form-data; name="param3"

blah, blah, blah
--xyzseparator-blah--

What I'm hoping to obtain, is a parameters map, or something similar, like this.

parameters.get("param1");    // returns "hello, world"
parameters.get("param2");    // returns "42"
parameters.get("param3");    // returns "blah, blah, blah"
parameters.keys();           // returns ["param1", "param2", "param3"]

Further criteria

  • It would be best if I don't have to supply the separator (i.e. xyzseparator-blah in this case), but I can live with it if I do have to.
  • I'm looking for a library based solution, possibly from a main stream library (like "Apache Commons" or something similar).
  • I want to avoid rolling my own solution, but at the current stage, I'm afraid I will have to. Reason: while the example above seems trivial to split/parse with some string manipulation, real multipart request bodies can have many more headers. Besides that, I do not want to re-invent (and much less re-test!) the wheel :)

Alternative solution

If there were a solution, which satisfies the above criteria, but whose input is an Apache HttpRequest, instead of a String, that would be acceptable too. (Basically I do receive an HttpRequest, but the in-house library I'm using is built such, that it extracts the body of this request as a String, and passes that to the class responsible for doing the parsing. However, if need be, I could also work directly on the HttpRequest.)

Related questions

No matter how I try to find an answer through Google, here on SO, and on other forums too, the solution seems to be always to use commons fileupload to go through the parts. E.g.: here, here, here, here, here... However, parseRequest method, used in that solution, expects a RequestContext, which I do not have (only HttpRequest).

The other way, also mentioned in some of the above answers, is getting the parameters from the HttpServletRequest (but again, I only have HttpRequest).

EDIT: In other words: I could include Commons Fileupload (I have access to it), but that would not help me, because I have an HttpRequest, and the Commons Fileupload needs RequestContext. (Unless there is an easy way to convert from HttpRequest to RequestContext, which I have overlooked.)

Answer

roninjoe picture roninjoe · Mar 15, 2018

You can parse your String using Commons FileUpload by wrapping it in a class implementing 'org.apache.commons.fileupload.UploadContext', like below.

I recommend wrapping the HttpRequest in your proposed alternate solution instead though, for a couple of reasons. First, using a String means that the whole multipart POST body, including the file contents,needs to fit into memory. Wrapping the HttpRequest would allow you to stream it, with only a small buffer in memory at one time. Second, without the HttpRequest, you'll need to sniff out the multipart boundary, which would normally be in the 'Content-type' header (see RFC1867).

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUpload;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;

public class MultiPartStringParser implements org.apache.commons.fileupload.UploadContext {

    public static void main(String[] args) throws Exception {
        String s = new String(Files.readAllBytes(Paths.get(args[0])));
        MultiPartStringParser p = new MultiPartStringParser(s);
        for (String key : p.parameters.keySet()) {
            System.out.println(key + "=" + p.parameters.get(key));
        }
    }

    private String postBody;
    private String boundary;
    private Map<String, String> parameters = new HashMap<String, String>();

    public MultiPartStringParser(String postBody) throws Exception {
        this.postBody = postBody;
        // Sniff out the multpart boundary.
        this.boundary = postBody.substring(2, postBody.indexOf('\n')).trim();
        // Parse out the parameters.
        final FileItemFactory factory = new DiskFileItemFactory();
        FileUpload upload = new FileUpload(factory);
        List<FileItem> fileItems = upload.parseRequest(this);
        for (FileItem fileItem: fileItems) {
            if (fileItem.isFormField()){
                parameters.put(fileItem.getFieldName(), fileItem.getString());
            } // else it is an uploaded file
        }
    }

    public Map<String,String> getParameters() {
        return parameters;
    }

    // The methods below here are to implement the UploadContext interface.
    @Override
    public String getCharacterEncoding() {
        return "UTF-8"; // You should know the actual encoding.
    }

    // This is the deprecated method from RequestContext that unnecessarily
    // limits the length of the content to ~2GB by returning an int. 
    @Override
    public int getContentLength() {
        return -1; // Don't use this
    }

    @Override
    public String getContentType() {
        // Use the boundary that was sniffed out above.
        return "multipart/form-data, boundary=" + this.boundary;
    }

    @Override
    public InputStream getInputStream() throws IOException {
        return new ByteArrayInputStream(postBody.getBytes());
    }

    @Override
    public long contentLength() {
        return postBody.length();
    }
}