PHPExcel how to solve the encoding issue when reading a file

Joel Joseph picture Joel Joseph · Mar 23, 2018 · Viewed 8.2k times · Source

I was working on an Yii2 API where i need to upload a .csv or .xlsx file and read from it using PHPExcel(DEPRECATED now , but i am stuck with it as new one PhpSpreadsheet requires PHP version 5.6 or newer) and return the array of data .

This was the code used in the API function

public function actionUpload()
{
    $params = $_FILES['uploadFile'];
    if($params)
    {
        $data = array();
        $model = new UploadForm();
        $model->uploadFile = $_FILES['uploadFile'];
        $file =  UploadedFile::getInstanceByname('uploadFile');
        $inputFileName = $model->getpath($file,$data);
        //  Read your Excel workbook
        try
        {
            $inputFileType = \PHPExcel_IOFactory::identify($inputFileName['link']);
            $objReader = \PHPExcel_IOFactory::createReader($inputFileType);
            if($inputFileType == 'CSV')
            {   


                if (mb_check_encoding(file_get_contents($inputFileName['link']), 'UTF-8'))
                {
                    $objReader->setInputEncoding('UTF-8');
                }
                else
                {
                     $objReader->setInputEncoding('Windows-1255');
                     //$objReader->setInputEncoding('ISO-8859-8');
                }


            }
            $objPHPExcel = $objReader->load($inputFileName['link']);
        }
        catch(Exception $e)
        {
            die('Error loading file "'.pathinfo($inputFileName['link'],PATHINFO_BASENAME).'": '.$e->getMessage());
        }

        //  Get worksheet dimensions
        $sheet = $objPHPExcel->getSheet(0); 
        $highestRow = $sheet->getHighestRow(); 
        $highestColumn = $sheet->getHighestColumn();
        $fileData = array();
        //  Loop through each row of the worksheet in turn
        for ($row = 1; $row <= $highestRow; $row++)
        { 
            //  Read a row of data into an array
            $rowData = $sheet->rangeToArray('A' . $row . ':' . $highestColumn . $row,
                                            NULL,
                                            TRUE,
                                            FALSE);
            array_push($fileData,$rowData[0]);
            //  Insert row data array into your database of choice here
        }
        return $fileData;
    }

}

But there are encoding issues when we upload a excel file containing hebrew data in it . As you can see the code below from the above code was used to address this issue

if (mb_check_encoding(file_get_contents($inputFileName['link']), 'UTF-8'))
{
    $objReader->setInputEncoding('UTF-8');
}
else
{
        $objReader->setInputEncoding('Windows-1255');

}

Later i found that UTF-8 and Windows-1255 are not the only possible encoding for the flies that may be uploaded but other encoding like UTF-16 or other ones depending upon the Operating System of user. Is there any better way to find the encoding other than using mb_check_encoding

The common error that occur during the process of reading the data in file is :

iconv(): Detected an illegal character in input string

As you can see the above error occurs due to the inability to detect the appropriate encoding of the file. Is there any workaround ?

Answer

ofurkusi picture ofurkusi · Jun 5, 2018

You can attempt to use mb_detect_encoding to detect the file encoding but I find that results vary. You might have to manually specify a custom match order of encodings to get proper results. Here is an example substitute for the if statement in question:

if(inputFileType == 'CSV')
{
    // Try to detect file encoding
    $encoding = mb_detect_encoding(file_get_contents($inputFileName['link']),
                     // example of a manual detection order
                    'ASCII,UTF-8,ISO-8859-15');

    $objReader->setInputEncoding($encoding);
}