Which encoding opens CSV files correctly with Excel on both Mac and Windows?

Timm picture Timm · Jul 5, 2011 · Viewed 145.5k times · Source

We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac. Here's all the encodings I tried:

Encoding  BOM      Win                            Mac
--------  ---      ----------------------------   ------------
utf-8     --       scrambled                      scrambled
utf-8     BOM      WORKS                          scrambled
utf-16    --       file not recognized            file not recognized
utf-16    BOM      file not recognized            Chinese gibberish
utf-16LE  --       file not recognized            file not recognized
utf-16LE  BOM      characters OK,                 same as Win
                   row data all in first field

The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.

Is there any encoding that works in both worlds?

Answer

mikezter picture mikezter · Nov 25, 2013

Excel Encodings

I found the WINDOWS-1252 encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.

Depending on your system and the tools you use, this encoding could also be named CP1252, ANSI, Windows (ANSI), MS-ANSI or just Windows, among other variations.

This encoding is a superset of ISO-8859-1 (aka LATIN1 and others), so you can fallback to ISO-8859-1 if you cannot use WINDOWS-1252 for some reason. Be advised that ISO-8859-1 is missing some characters from WINDOWS-1252 as shown here:

| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name                               | Unicode Range            |
| €    | 128  | 8364    | 0x80     | U+20AC      | €      | euro sign                                  | Currency Symbols         |
| ‚    | 130  | 8218    | 0x82     | U+201A      | ‚     | single low-9 quotation mark                | General Punctuation      |
| ƒ    | 131  | 402     | 0x83     | U+0192      | ƒ      | Latin small letter f with hook             | Latin Extended-B         |
| „    | 132  | 8222    | 0x84     | U+201E      | „     | double low-9 quotation mark                | General Punctuation      |
| …    | 133  | 8230    | 0x85     | U+2026      | …    | horizontal ellipsis                        | General Punctuation      |
| †    | 134  | 8224    | 0x86     | U+2020      | †    | dagger                                     | General Punctuation      |
| ‡    | 135  | 8225    | 0x87     | U+2021      | ‡    | double dagger                              | General Punctuation      |
| ˆ    | 136  | 710     | 0x88     | U+02C6      | ˆ      | modifier letter circumflex accent          | Spacing Modifier Letters |
| ‰    | 137  | 8240    | 0x89     | U+2030      | ‰    | per mille sign                             | General Punctuation      |
| Š    | 138  | 352     | 0x8A     | U+0160      | Š    | Latin capital letter S with caron          | Latin Extended-A         |
| ‹    | 139  | 8249    | 0x8B     | U+2039      | ‹    | single left-pointing angle quotation mark  | General Punctuation      |
| Œ    | 140  | 338     | 0x8C     | U+0152      | Œ     | Latin capital ligature OE                  | Latin Extended-A         |
| Ž    | 142  | 381     | 0x8E     | U+017D      |             | Latin capital letter Z with caron          | Latin Extended-A         |
| ‘    | 145  | 8216    | 0x91     | U+2018      | ‘     | left single quotation mark                 | General Punctuation      |
| ’    | 146  | 8217    | 0x92     | U+2019      | ’     | right single quotation mark                | General Punctuation      |
| “    | 147  | 8220    | 0x93     | U+201C      | “     | left double quotation mark                 | General Punctuation      |
| ”    | 148  | 8221    | 0x94     | U+201D      | ”     | right double quotation mark                | General Punctuation      |
| •    | 149  | 8226    | 0x95     | U+2022      | •      | bullet                                     | General Punctuation      |
| –    | 150  | 8211    | 0x96     | U+2013      | –     | en dash                                    | General Punctuation      |
| —    | 151  | 8212    | 0x97     | U+2014      | —     | em dash                                    | General Punctuation      |
| ˜    | 152  | 732     | 0x98     | U+02DC      | ˜     | small tilde                                | Spacing Modifier Letters |
| ™    | 153  | 8482    | 0x99     | U+2122      | ™     | trade mark sign                            | Letterlike Symbols       |
| š    | 154  | 353     | 0x9A     | U+0161      | š    | Latin small letter s with caron            | Latin Extended-A         |
| ›    | 155  | 8250    | 0x9B     | U+203A      | ›    | single right-pointing angle quotation mark | General Punctuation      |
| œ    | 156  | 339     | 0x9C     | U+0153      | œ     | Latin small ligature oe                    | Latin Extended-A         |
| ž    | 158  | 382     | 0x9E     | U+017E      |             | Latin small letter z with caron            | Latin Extended-A         |
| Ÿ    | 159  | 376     | 0x9F     | U+0178      | Ÿ      | Latin capital letter Y with diaeresis      | Latin Extended-A         |

Note that the euro sign is missing. This table can be found at Alan Wood.

Conversion

Conversion is done differently in every tool and language. However, suppose you have a file query_result.csv which you know is UTF-8 encoded. Convert it to WINDOWS-1252 using iconv:

iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv