Someone sent this to me and claimed it is a hello world in Brainfuck (and I hope so...)
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
I know the basics that it works by moving a pointer and increment and decrementing stuff...
Yet I still want to know, how does it actually work? How does it print anything on the screen in the first place? How does it encode the text? I do not understand at all...
To understand Brainfuck you must imagine infinite array of cells initialized by 0
each.
...[0][0][0][0][0]...
When brainfuck program starts, it points to any cell.
...[0][0][*0*][0][0]...
If you move pointer right >
you are moving pointer from cell X to cell X+1
...[0][0][0][*0*][0]...
If you increase cell value +
you get:
...[0][0][0][*1*][0]...
If you increase cell value again +
you get:
...[0][0][0][*2*][0]...
If you decrease cell value -
you get:
...[0][0][0][*1*][0]...
If you move pointer left <
you are moving pointer from cell X to cell X-1
...[0][0][*0*][1][0]...
To read character you use comma ,
. What it does is: Read character from standard input and write its decimal ASCII code to the actual cell.
Take a look at ASCII table. For example, decimal code of !
is 33
, while a
is 97
.
Well, lets imagine your BF program memory looks like:
...[0][0][*0*][0][0]...
Assuming standard input stands for a
, if you use comma ,
operator, what BF does is read a
decimal ASCII code 97
to memory:
...[0][0][*97*][0][0]...
You generally want to think that way, however the truth is a bit more complex. The truth is BF does not read a character but a byte (whatever that byte is). Let me show you example:
In linux
$ printf ł
prints:
ł
which is specific polish character. This character is not encoded by ASCII encoding. In this case it's UTF-8 encoding, so it used to take more than one byte in computer memory. We can prove it by making a hexadecimal dump:
$ printf ł | hd
which shows:
00000000 c5 82 |..|
Zeroes are offset. 82
is first and c5
is second byte representing ł
(in order we will read them). |..|
is graphical representation which is not possible in this case.
Well, if you pass ł
as input to your BF program that reads single byte, program memory will look like:
...[0][0][*197*][0][0]...
Why 197
? Well 197
decimal is c5
hexadecimal. Seems familiar ? Of course. It's first byte of ł
!
To print character you use dot .
What it does is: Assuming we treat actual cell value like decimal ASCII code, print corresponding character to standard output.
Well, lets imagine your BF program memory looks like:
...[0][0][*97*][0][0]...
If you use dot (.) operator now, what BF does is print:
a
Because a
decimal code in ASCII is 97
.
So for example BF program like this (97 pluses 2 dots):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..
Will increase value of the cell it points to up to 97 and print it out 2 times.
aa
In BF loop consists of loop begin [
and loop end ]
. You can think it's like while in C/C++ where the condition is actual cell value.
Take a look BF program below:
++[]
++
increments actual cell value twice:
...[0][0][*2*][0][0]...
And []
is like while(2) {}
, so it's infinite loop.
Let's say we don't want this loop to be infinite. We can do for example:
++[-]
So each time a loop loops it decrements actual cell value. Once actual cell value is 0
loop ends:
...[0][0][*2*][0][0]... loop starts
...[0][0][*1*][0][0]... after first iteration
...[0][0][*0*][0][0]... after second iteration (loop ends)
Let's consider yet another example of finite loop:
++[>]
This example shows, we haven't to finish loop at cell that loop started on:
...[0][0][*2*][0][0]... loop starts
...[0][0][2][*0*][0]... after first iteration (loop ends)
However it is good practice to end where we started. Why ? Because if loop ends another cell it started, we can't assume where the cell pointer will be. To be honest, this practice makes brainfuck less brainfuck.