UTF-8 output from PowerShell

Paul Stovell picture Paul Stovell · Mar 12, 2014 · Viewed 117.6k times · Source

I'm trying to use Process.Start with redirected I/O to call PowerShell.exe with a string, and to get the output back, all in UTF-8. But I don't seem to be able to make this work.

What I've tried:

  • Passing the command to run via the -Command parameter
  • Writing the PowerShell script as a file to disk with UTF-8 encoding
  • Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding
  • Writing the PowerShell script as a file to disk with UTF-16
  • Setting Console.OutputEncoding in both my console application and in the PowerShell script
  • Setting $OutputEncoding in PowerShell
  • Setting Process.StartInfo.StandardOutputEncoding
  • Doing it all with Encoding.Unicode instead of Encoding.UTF8

In every case, when I inspect the bytes I'm given, I get different values to my original string. I'd really love an explanation as to why this doesn't work.

Here is my code:

static void Main(string[] args)
{
    DumpBytes("Héllo");

    ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"",
        Environment.CurrentDirectory, DumpBytes, DumpBytes);

    Console.ReadLine();
}

static void DumpBytes(string text)
{
    Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X"))));
    Console.WriteLine();
}

static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error)
{
    try
    {
        using (var process = new Process())
        {
            process.StartInfo.FileName = executable;
            process.StartInfo.Arguments = arguments;
            process.StartInfo.WorkingDirectory = workingDirectory;
            process.StartInfo.UseShellExecute = false;
            process.StartInfo.CreateNoWindow = true;
            process.StartInfo.RedirectStandardOutput = true;
            process.StartInfo.RedirectStandardError = true;
            process.StartInfo.StandardOutputEncoding = Encoding.UTF8;
            process.StartInfo.StandardErrorEncoding = Encoding.UTF8;

            using (var outputWaitHandle = new AutoResetEvent(false))
            using (var errorWaitHandle = new AutoResetEvent(false))
            {
                process.OutputDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        outputWaitHandle.Set();
                    }
                    else
                    {
                        output(e.Data);
                    }
                };

                process.ErrorDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        errorWaitHandle.Set();
                    }
                    else
                    {
                        error(e.Data);
                    }
                };

                process.Start();

                process.BeginOutputReadLine();
                process.BeginErrorReadLine();

                process.WaitForExit();
                outputWaitHandle.WaitOne();
                errorWaitHandle.WaitOne();

                return process.ExitCode;
            }
        }
    }
    catch (Exception ex)
    {
        throw new Exception(string.Format("Error when attempting to execute {0}: {1}", executable, ex.Message),
            ex);
    }
}

Update 1

I found that if I make this script:

[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
Write-Host "Héllo!"
[Console]::WriteLine("Héllo")

Then invoke it via:

ExecuteCommand("PowerShell.exe", "-File C:\\Users\\Paul\\Desktop\\Foo.ps1",
  Environment.CurrentDirectory, DumpBytes, DumpBytes);

The first line is corrupted, but the second isn't:

H?llo! 48,EF,BF,BD,6C,6C,6F,21
Héllo 48,C3,A9,6C,6C,6F

This suggests to me that my redirection code is all working fine; when I use Console.WriteLine in PowerShell I get UTF-8 as I expect.

This means that PowerShell's Write-Output and Write-Host commands must be doing something different with the output, and not simply calling Console.WriteLine.

Update 2

I've even tried the following to force the PowerShell console code page to UTF-8, but Write-Host and Write-Output continue to produce broken results while [Console]::WriteLine works.

$sig = @'
[DllImport("kernel32.dll")]
public static extern bool SetConsoleCP(uint wCodePageID);

[DllImport("kernel32.dll")]
public static extern bool SetConsoleOutputCP(uint wCodePageID);
'@

$type = Add-Type -MemberDefinition $sig -Name Win32Utils -Namespace Foo -PassThru

$type::SetConsoleCP(65001)
$type::SetConsoleOutputCP(65001)

Write-Host "Héllo!"

& chcp    # Tells us 65001 (UTF-8) is being used

Answer

andyb picture andyb · Mar 12, 2014

Not an expert on encoding, but after reading these...

... it seems fairly clear that the $OutputEncoding variable only affects data piped to native applications.

If sending to a file from withing PowerShell, the encoding can be controlled by the -encoding parameter on the out-file cmdlet e.g.

write-output "hello" | out-file "enctest.txt" -encoding utf8

Nothing else you can do on the PowerShell front then, but the following post may well help you:.