I have seen multiple threads about what the best solution to auto detect the delimiter for an incoming CSV. Most of them are functions of length between 20 - 30 lines, multiple loops pre-determined list of delimiters, reading the first 5 lines and matching counts e.t.c e.t.c
I have just implemented this procedure, with a few modifications. Works brilliantly.
THEN I found the following code:
private function DetectDelimiter($fh)
{
$data_1 = null;
$data_2 = null;
$delimiter = self::$delim_list['comma'];
foreach(self::$delim_list as $key=>$value)
{
$data_1 = fgetcsv($fh, 4096, $value);
$delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
$data_2 = $data_1;
}
$this->SetDelimiter($delimiter);
return $delimiter;
}
This to me looks like it's achieving the SAME results, where $delim_list is an array of delimiters as follows:
static protected $delim_list = array('tab'=>"\t",
'semicolon'=>";",
'pipe'=>"|",
'comma'=>",");
Can anyone shed any light as to why I shouldn't do it this simpler way, and why everywhere I look the more convoluted solution seems to be the accepted answer?
Thanks!
This function is elegant :)
/**
* @param string $csvFile Path to the CSV file
* @return string Delimiter
*/
public function detectDelimiter($csvFile)
{
$delimiters = [";" => 0, "," => 0, "\t" => 0, "|" => 0];
$handle = fopen($csvFile, "r");
$firstLine = fgets($handle);
fclose($handle);
foreach ($delimiters as $delimiter => &$count) {
$count = count(str_getcsv($firstLine, $delimiter));
}
return array_search(max($delimiters), $delimiters);
}