How to Convert Ansi to UTF 8 with TXMLDocument in Delphi

EProgrammerNotFound picture EProgrammerNotFound · Jul 18, 2013 · Viewed 11.9k times · Source

It's possible to convert the XML to UTF-8 encoding in Delphi 6?
Currently that's what I am doing:

  • Fill TXMLDocument with AnsiString
  • At the end convert the Data to UTF-8 by using WideStringVariable = AnsiToUtf8(Doc.XML.Text);
  • Save the value of WideStringVariable to file using TFileStream and Adding BOM for UTF8 at the file beggining.

CODE:

Procedure SaveAsUTF8( const Name:String; Data: TStrings );

const
  cUTF8 = $BFBBEF;
var
  W_TXT: WideString;
  fs: TFileStream;
  wBOM: Integer;
begin
  if TRIM(Data.Text) <> '' then begin    
    W_TXT:= AnsiToUTF8(Data.Text);
    fs:= Tfilestream.create( Name, fmCreate );
    try
      wBOM := cUTF8;
      fs.WriteBUffer( wBOM, sizeof(wBOM)-1);
      fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
    finally
      fs.free
    end;
  end;
end;

If I open the file in Notepad++ or another editor that detects encoding, it shows me UTF-8 with BOM. However, it seems like the text it's not properly encoded.

What is wrong and how can I fix it?

UPDATE: XML Properties:

XMLDoc.Version := '1.0';
XMLDoc.Encoding := 'UTF-8';
XMLDoc.StandAlone := 'yes';

Answer

Arioch &#39;The picture Arioch 'The · Jul 18, 2013

You can save the file using standard SaveToFile method over the TXMLDocument variable: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/XMLDoc_TXMLDocument_SaveToFile.html

Whether the file would be or not UTF8 you have to check using local tools like aforementioned Notepad++ or Hex Editor or anything else.


If you insist of using intermediate string and file stream, you should use the proper variable. AnsiToUTF8 returns UTF8String type and that is what to be used. Compiling `WideStringVar := AnsiStringSource' would issue compiler warning and

It is a proper warning. Googling for "Delphi WideString" - or reading Delphi manuals on topic - shows that WideString aka Microsoft OLE BSTR keeps data in UTF-16 format. http://delphi.about.com/od/beginners/l/aa071800a.htm Thus assignment UTF16 string <= 8-bit source would necessarily convert data and thus dumping WideString data can not be dumping UTF-8 text by the definition of WideString

Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
  cUTF8: array [1..3] of byte = ($EF,$BB,$BF)
var
  W_TXT: UTF8String;
  fs: TFileStream;
  Trimmed: AnsiString;
begin
  Trimmed := TRIM(Data.Text);
  if Trimmed <> '' then begin    
    W_TXT:= AnsiToUTF8(Trimmed);
    fs:= TFileStream.Create( Name, fmCreate );
    try
      fs.WriteBuffer( cUTF8[1], sizeof(cUTF8) );
      fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
    finally
      fs.free
    end;
  end;
end;

BTW, this code of yours would not create even empty file if the source data was empty. It looks rather suspicious, though it is you to decide whether that is an error or not wrt the rest of your program.


The proper "uploading" of received file or stream to web is yet another issue (to be put as a separate question on Q&A site like SO), related to testing conformance with HTTP. As a foreword, you can readsome hints at WWW server reports error after POST Request by Internet Direct components in Delphi