I'm working with a custom API to allow a user to upload a file (of, hopefully, arbitrary size). If the file is to large, it will be chunkfied, and handled in multiple requests to the server.
I'm writing code that uses File
and FileReader
(HTML5) as per many examples from online. In general (from what I read online) for a chunkfied file transfer, people will first get a blob of data from their file object
var file = $('input[type=file]')[0].files[0];
var blob = file.slice(start,end)
Then use a FileReader
to read the blob readAsArrayBuffer(blob)
or readAsBinaryString(blob)
And finally in FileReader.onload(e)
method, send the data to the server. Repeat this process for all the chunks in the file.
My questions are
Why do I need to use a FileReader
? If I don't use it, and simply send blobs with File.slice
, is there any guarantee that the slicing operation will be done before I try to send the data in each request. Does the File
object load the entire file when it's created (surely not?). Does File.slice
seek to the position stipulated by the parameters, and then read the information in? The documentation doesn't give me an clues on how it's implemented.
The important thing to keep in mind is that File inherits from Blob, File doesn't actually have a slice method, it gets this method from Blob. File just adds a couple metadata attributes.
The best way to think of a Blob (or File) is as a pointer to data, but not the actual data itself. Sort of like a file handle in other languages.
You can't actually get to the data in a Blob without using a reader, which reads asynchronously to avoid blocking the UI thread.
The Blob slice() method just returns another Blob, but again, this isn't data, it's just a pointer to a range of data within the original Blob, sort of like a bounded pointer to a view. To actually get the bytes out of the sliced Blob, you still need to use a reader. In the case of a sliced blob, your reader is bounded.
This is really just intended as a convenience so that you don't have to carry a bunch of relative and absolute offsets around in your code, you can just get a bounded view of the data and use the reader as if you were reading from byte 0.
In the case of XMLHttpRequest (assuming the browser supports the newer interface) the data will be streamed on send, and constrained by the bounds of the blob. Basically, it will work the same way you'd imagine it to work if you sent a file pointer to a stream method (which is basically what's going on under the covers). https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data#Sending_binary_data
Essentially, it's a lazy reader. If the blob is already loaded/read from the file system, or was created in memory, it's just going to use that. When you're using a File though, it'll be lazily loaded and streamed asynchronously out of the main thread.
The basic logic here is that the browser devs never want a read to happen synchronously because it could block the main thread, so all of the API's are designed around that core philosophy. Notice how Blob.slice() is synchronous - that's how you know it's not actually doing any IO, it's just setting up bounds and (possibly) file pointers.