How to escape unicode characters in bash prompt correctly

Andy Ray picture Andy Ray · Aug 18, 2011 · Viewed 7.4k times · Source

I have a specific method for my bash prompt, let's say it looks like this:

CHAR="༇ "
my_function="
    prompt=\" \[\$CHAR\]\"
    echo -e \$prompt"

PS1="\$(${my_function}) \$ "

To explain the above, I'm builidng my bash prompt by executing a function stored in a string, which was a decision made as the result of this question. Let's pretend like it works fine, because it does, except when unicode characters get involved

I am trying to find the proper way to escape a unicode character, because right now it messes with the bash line length. An easy way to test if it's broken is to type a long command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to jump to the beginning / end of the line. If the text gets garbled then it's not working.

I have tried several things to properly escape the unicode character in the function string, but nothing seems to be working.

Special characters like this work:

COLOR_BLUE=$(tput sgr0 && tput setaf 6)

my_function="
    prompt="\\[\$COLOR_BLUE\\] \"
    echo -e \$prompt"

Which is the main reason I made the prompt a function string. That escape sequence does NOT mess with the line length, it's just the unicode character.

Answer

tripleee picture tripleee · Aug 19, 2011

The \[...\] sequence says to ignore this part of the string completely, which is useful when your prompt contains a zero-length sequence, such as a control sequence which changes the text color or the title bar, say. But in this case, you are printing a character, so the length of it is not zero. Perhaps you could work around this by, say, using a no-op escape sequence to fool Bash into calculating the correct line length, but it sounds like that way lies madness.

The correct solution would be for the line length calculations in Bash to correctly grok UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without the \[...\] sequence?

Edit: The following implements the solution I propose in the comments below. The cursor position is saved, then two spaces are printed, outside of \[...\], then the cursor position is restored, and the Unicode character is printed on top of the two spaces. This assumes a fixed font width, with double width for the Unicode character.

PS1='\['"`tput sc`"'\]  \['"`tput rc`"'༇ \] \$ '

At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic] testing.

In the interest of transparency and legibility, I have ignored the requirement to have the prompt's functionality inside a function, and the color coding; this just changes the prompt to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex needs.