PHP - Best approach to detect CSV delimiter

simon_www picture simon_www · Nov 3, 2014 · Viewed 9.9k times · Source

I have seen multiple threads about what the best solution to auto detect the delimiter for an incoming CSV. Most of them are functions of length between 20 - 30 lines, multiple loops pre-determined list of delimiters, reading the first 5 lines and matching counts e.t.c e.t.c

Here is 1 example

I have just implemented this procedure, with a few modifications. Works brilliantly.

THEN I found the following code:

private function DetectDelimiter($fh)
{
    $data_1 = null;
    $data_2 = null;
    $delimiter = self::$delim_list['comma'];
    foreach(self::$delim_list as $key=>$value)
    {
        $data_1 = fgetcsv($fh, 4096, $value);
        $delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
        $data_2 = $data_1;
    }

    $this->SetDelimiter($delimiter);
    return $delimiter;
}

This to me looks like it's achieving the SAME results, where $delim_list is an array of delimiters as follows:

static protected $delim_list = array('tab'=>"\t", 
                                     'semicolon'=>";", 
                                     'pipe'=>"|", 
                                     'comma'=>",");

Can anyone shed any light as to why I shouldn't do it this simpler way, and why everywhere I look the more convoluted solution seems to be the accepted answer?

Thanks!

Answer

Ahmed Al Bermawy picture Ahmed Al Bermawy · Jan 3, 2020

This function is elegant :)

/**
* @param string $csvFile Path to the CSV file
* @return string Delimiter
*/
public function detectDelimiter($csvFile)
{
    $delimiters = [";" => 0, "," => 0, "\t" => 0, "|" => 0];

    $handle = fopen($csvFile, "r");
    $firstLine = fgets($handle);
    fclose($handle); 
    foreach ($delimiters as $delimiter => &$count) {
        $count = count(str_getcsv($firstLine, $delimiter));
    }

    return array_search(max($delimiters), $delimiters);
}