Conversion in .net: Native Utf-8 <-> Managed String

Totonga picture Totonga · May 27, 2012 · Viewed 9.1k times · Source

I created those two methods to convert Native utf-8 strings (char*) into managed string and vice versa. The following code does the job:

public IntPtr NativeUtf8FromString(string managedString)
{
    byte[] buffer = Encoding.UTF8.GetBytes(managedString); // not null terminated
    Array.Resize(ref buffer, buffer.Length + 1);
    buffer[buffer.Length - 1] = 0; // terminating 0
    IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
    Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
    return nativeUtf8;
}

string StringFromNativeUtf8(IntPtr nativeUtf8)
{
    int size = 0;
    byte[] buffer = {};
    do
    {
        ++size;
        Array.Resize(ref buffer, size);
        Marshal.Copy(nativeUtf8, buffer, 0, size);
    } while (buffer[size - 1] != 0); // till 0 termination found

    if (1 == size)
    {
        return ""; // empty string
    }

    Array.Resize(ref buffer, size - 1); // remove terminating 0
    return Encoding.UTF8.GetString(buffer);
}

While NativeUtf8FromString is ok, StringFromNativeUtf8 is a mess but the only safe code I could get to run. Using unsafe code I could use an byte* but I do not want unsafe code. Is there another way someone can think of where I do not have to copy the string for every contained byte to find the 0 termination.


I just add the unsave code here:

public unsafe string StringFromNativeUtf8(IntPtr nativeUtf8)
{
    byte* bytes = (byte*)nativeUtf8.ToPointer();
    int size = 0;
    while (bytes[size] != 0)
    {
        ++size;
    }
    byte[] buffer = new byte[size];
    Marshal.Copy((IntPtr)nativeUtf8, buffer, 0, size);
    return Encoding.UTF8.GetString(buffer);
}

As you see its not ugly just needs unsafe.

Answer

Hans Passant picture Hans Passant · May 27, 2012

Just perform the exact same operation strlen() performs. Do consider keeping the buffer around, the code does generate garbage in a hurry.

    public static IntPtr NativeUtf8FromString(string managedString) {
        int len = Encoding.UTF8.GetByteCount(managedString);
        byte[] buffer = new byte[len + 1];
        Encoding.UTF8.GetBytes(managedString, 0, managedString.Length, buffer, 0);
        IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
        Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
        return nativeUtf8;
    }

    public static string StringFromNativeUtf8(IntPtr nativeUtf8) {
        int len = 0;
        while (Marshal.ReadByte(nativeUtf8, len) != 0) ++len;
        byte[] buffer = new byte[len];
        Marshal.Copy(nativeUtf8, buffer, 0, buffer.Length);
        return Encoding.UTF8.GetString(buffer);
    }