So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII encoded text, and accordingly tries to read two-byte chars like ø and ü as two separate characters and you end up with fail.
So I've done a bit of digging (the Intervals folks have a interesting post about it here), and there are some limited if ridiculously annoying options out there. Among them:
It looks like no matter what, I'm probably going to want to continue offering a plain-old CSV file for the folks who aren't using it for Excel anyway, and a separate download option for Excel.
What's the simplest way of generating that Just-For-Excel file that will correctly support UTF8, my dear Stack Overflowers? If that simplest option only supports the latest version of Excel, that's still of interest.
I'm doing this on a Rails stack, but curious how the .Net-ers and folks on any frameworks handle this. I work in a few different environments myself and this is definitely an issue that will becoming up again.
Update 2010-10-22: We had been using the Ruport gem in our time-tracking system Tempo to provide the CSV exports when I first posted this question. One of my coworkers, Erik Hollensbee, threw together a quick filter for Ruport to provide us with actual Excel XSL output, and I figured I'd share that here for any other ruby-ists:
require 'rubygems'
require 'ruport'
require 'spreadsheet'
require 'stringio'
Spreadsheet.client_encoding = "UTF-8"
include Ruport::Data
class Ruport::Formatter::Excel < Ruport::Formatter
renders :excel, :for => Ruport::Controller::Table
def output
retval = StringIO.new
if options.workbook
book = options.workbook
else
book = Spreadsheet::Workbook.new
end
if options.worksheet_name
book_args = { :name => options.worksheet_name }
else
book_args = { }
end
sheet = book.create_worksheet(book_args)
offset = 0
if options.show_table_headers
sheet.row(0).default_format = Spreadsheet::Format.new(
options.format_options ||
{
:color => :blue,
:weight => :bold,
:size => 18
}
)
sheet.row(0).replace data.column_names
offset = 1
end
data.data.each_with_index do |row, i|
sheet.row(i+offset).replace row.attributes.map { |x| row.data[x] }
end
book.write retval
retval.seek(0)
return retval.read
end
end
I found that if you set the charset encoding of the web page to utf-8, and then Response.BinaryWrite the UTF-8 Byte Order Mark (0xEF 0xBB 0xBF) at the top of the csv file, then Excel 2007 (not sure about other versions) will recognize it as utf-8 and open it correctly.