What is HTML5 File.slice method actually doing?

Ponml picture Ponml · Jul 18, 2014 · Viewed 14.3k times · Source

I'm working with a custom API to allow a user to upload a file (of, hopefully, arbitrary size). If the file is to large, it will be chunkfied, and handled in multiple requests to the server.

I'm writing code that uses File and FileReader (HTML5) as per many examples from online. In general (from what I read online) for a chunkfied file transfer, people will first get a blob of data from their file object

var file = $('input[type=file]')[0].files[0];
var blob = file.slice(start,end)

Then use a FileReader to read the blob readAsArrayBuffer(blob) or readAsBinaryString(blob)

And finally in FileReader.onload(e) method, send the data to the server. Repeat this process for all the chunks in the file.

My questions are

Why do I need to use a FileReader? If I don't use it, and simply send blobs with File.slice, is there any guarantee that the slicing operation will be done before I try to send the data in each request. Does the File object load the entire file when it's created (surely not?). Does File.slice seek to the position stipulated by the parameters, and then read the information in? The documentation doesn't give me an clues on how it's implemented.

Answer

Clayton Gulick picture Clayton Gulick · Jul 18, 2014

The important thing to keep in mind is that File inherits from Blob, File doesn't actually have a slice method, it gets this method from Blob. File just adds a couple metadata attributes.

The best way to think of a Blob (or File) is as a pointer to data, but not the actual data itself. Sort of like a file handle in other languages.

You can't actually get to the data in a Blob without using a reader, which reads asynchronously to avoid blocking the UI thread.

The Blob slice() method just returns another Blob, but again, this isn't data, it's just a pointer to a range of data within the original Blob, sort of like a bounded pointer to a view. To actually get the bytes out of the sliced Blob, you still need to use a reader. In the case of a sliced blob, your reader is bounded.

This is really just intended as a convenience so that you don't have to carry a bunch of relative and absolute offsets around in your code, you can just get a bounded view of the data and use the reader as if you were reading from byte 0.

In the case of XMLHttpRequest (assuming the browser supports the newer interface) the data will be streamed on send, and constrained by the bounds of the blob. Basically, it will work the same way you'd imagine it to work if you sent a file pointer to a stream method (which is basically what's going on under the covers). https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data#Sending_binary_data

Essentially, it's a lazy reader. If the blob is already loaded/read from the file system, or was created in memory, it's just going to use that. When you're using a File though, it'll be lazily loaded and streamed asynchronously out of the main thread.

The basic logic here is that the browser devs never want a read to happen synchronously because it could block the main thread, so all of the API's are designed around that core philosophy. Notice how Blob.slice() is synchronous - that's how you know it's not actually doing any IO, it's just setting up bounds and (possibly) file pointers.