VB6/VBScript change file encoding to ansi

HerbalMart picture HerbalMart · Mar 3, 2011 · Viewed 55.2k times · Source

I am looking for a way to convert a textfile with UTF8 encoding to ANSI encoding.

How can i go around and achieve this in Visual Basic (VB6) and or vbscript?

Answer

Bob77 picture Bob77 · Mar 3, 2011

If your files aren't truly enormous (e.g. even merely 40MB can be painfully slow) you can do this using the following code in VB6, VBA, or VBScript:

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
    Dim strText

    With CreateObject("ADODB.Stream")
        .Open
        .Type = adTypeBinary
        .LoadFromFile UTF8FName
        .Type = adTypeText
        .Charset = "utf-8"
        strText = .ReadText(adReadAll)
        .Position = 0
        .SetEOS
        .Charset = "_autodetect" 'Use current ANSI codepage.
        .WriteText strText, adWriteChar
        .SaveToFile ANSIFName, adSaveCreateOverWrite
        .Close
    End With
End Sub

UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName

Note that it will handle UTF-8 input files either with or without a BOM.

Using strong typing and early binding will improve performance a hair in VB6, and you won't need to declare those Const values. This isn't an option in script though.

For VB6 programs that need to process very large files you might be better off using VB6 native I/O against Byte arrays and use an API call to convert the data in chunks. This adds the extra messiness of finding the character boundaries though (UTF-8 uses a variable number of bytes per character). You'd need to scan each data block you read to find a safe ending point for an API translation.

I'd look at MultiByteToWideChar() and WideCharToMultiByte() to get started.

Note that UTF-8 often "arrives" with LF line delimiters instead of CRLF.