using papa parse for big csv files

Keith picture Keith · Jun 29, 2016 · Viewed 7.6k times · Source

I am trying to load a file that has about 100k in lines and so far the browser has been crashing ( locally ). I looked on the internet and saw Papa Parse seems to handle large files. Now it is reduced down to about 3-4 minutes to load into the textarea. Once the file is loaded, I then want to do some more jQuery to do counts and things so the process is taking awhile. Is there a way to make the csv load faster? Am I using the program correctly?

<div id="tabs">
<ul>
  <li><a href="#tabs-4">Generate a Report</a></li>
</ul>
<div id="tabs-4">
  <h2>Generating a CSV report</h2>
  <h4>Input Data:</h4>      
  <input id="myFile" type="file" name="files" value="Load File" />
  <button onclick="loadFileAsText()">Load Selected File</button>
  <form action="./" method="post">
  <textarea id="input3" style="height:150px;"></textarea>

  <input id="run3" type="button" value="Run" />
  <input id="runSplit" type="button" value="Run Split" />
  <input id="downloadLink" type="button" value="Download" />
  </form>
</div>
</div>

$(function () {
    $("#tabs").tabs();
});

var data = $('#input3').val();

function handleFileSelect(evt) {
    var file = evt.target.files[0];

Papa.parse(file, {
    header: true,
    dynamicTyping: true,
    complete: function (results) {
        data = results;
    }
});
}

$(document).ready(function () {

    $('#myFile').change(function(handleFileSelect){

    });
});


function loadFileAsText() {
    var fileToLoad = document.getElementById("myFile").files[0];

    var fileReader = new FileReader();
    fileReader.onload = function (fileLoadedEvent) {
        var textFromFileLoaded = fileLoadedEvent.target.result;
        document.getElementById("input3").value = textFromFileLoaded;
    };
    fileReader.readAsText(fileToLoad, "UTF-8");
}

Answer

romellem picture romellem · Jun 29, 2016

You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!

This is probably a good use case scenario for Web Workers.

NOTE: Per @tomBryer's answer below, Papa Parse now has support for Web Workers out of the box. This may be a better approach than rolling your own worker.

If you've never used them before, this site gives a decent rundown, but the key part is:

Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.

Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.

Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.

I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.

EDIT:

(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript function (which is globally defined within the worker thread). See the MDN page for more info on that.)

Here is my working example:

csv.html

<!doctype html>
<html>
<head>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>

<body>
  <input id="myFile" type="file" name="files" value="Load File" />
  <br>
  <button class="load-file">Load and Parse Selected CSV File</button>
  <div id="report"></div>

<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;

// document.ready
$(function() {

  $('.load-file').on('click', function(e) {
    start_time = performance.now();
    $('#report').text('Processing...');

    console.log('initialize worker');

    var worker = new Worker('worker.js');
    worker.addEventListener('message', function(ev) {
      console.log('received raw CSV, now parsing...');

      // Parse our CSV raw text
      Papa.parse(ev.data, {
        header: true,
        dynamicTyping: true,
        complete: function (results) {
            // Save result in a globally accessible var
          parsed_csv = results;
          console.log('parsed CSV!');
          console.log(parsed_csv);

          $('#report').text(parsed_csv.data.length + ' rows processed');
          end_time = performance.now();
          console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
        }
      });

      // Terminate our worker
      worker.terminate();
    }, false);

    // Submit our file to load
    var file_to_load = document.getElementById("myFile").files[0];

    console.log('call our worker');
    worker.postMessage({file: file_to_load});
  });

});
</script>
</body>

</html>

worker.js

self.addEventListener('message', function(e) {
    console.log('worker is running');

    var file = e.data.file;
    var reader = new FileReader();

    reader.onload = function (fileLoadedEvent) {
        console.log('file loaded, posting back from worker');

        var textFromFileLoaded = fileLoadedEvent.target.result;

        // Post our text file back from the worker
        self.postMessage(textFromFileLoaded);
    };

    // Actually load the text file
    reader.readAsText(file, "UTF-8");
}, false);

GIF of it processing, takes less than a second (all running locally)

GIF of working example