Creating the shortest Turing-complete interpreter

ilya n. picture ilya n. · Jun 28, 2009 · Viewed 7k times · Source

I've just tried to create the smallest possible language interpreter. Would you like to join and try?

Rules of the game:

  • You should specify a programming language you're interpreting. If it's a language you invented, it should come with a list of commands in the comments.
  • Your code should start with example program and data assigned to your code and data variables.
  • Your code should end with output of your result. It's preferable that there are debug statements at every intermediate step.
  • Your code should be runnable as written.
  • You can assume that data are 0 and 1s (int, string or boolean, your choice) and output is a single bit.
  • The language should be Turing-complete in the sense that for any algorithm written on a standard model, such as Turing machine, Markov chains, or similar of your choice, it's reasonably obvious (or explained) how to write a program that after being executred by your interpreter performs the algorithm.
  • The length of the code is defined as the length of the code after removal of input part, output part, debug statements and non-necessary whitespaces. Please add the resulting code and its length to the post.
  • You can't use functions that make compiler execute code for you, such as eval(), exec() or similar.

This is a Community Wiki, meaning neither the question nor answers get the reputation points from votes. But vote anyway!

Answer

balpha picture balpha · Jun 28, 2009

Python, 250 bytes.

This one is longer than ilya n.'s Python solution, but the language is a little easier to grasp. Each command in this language (I call it Kaputt, the German word for "broken") is one byte. The following 6 commands are defined:

  • 0: Push a zero onto the stack
  • 1: Push a one onto the stack
  • I: (if) Pop a value off the stack (which must be zero or one). Run the following block of code (until "i") if it's a one; skip the block if it's a zero.
  • i: (endif) Ends the if block and pushes a one if the block was not run (because the "I" popped a zero). See below for an explanation of the latter.
  • D: (def) Pops the name of the to-be-defined function off the stack (see below) and binds the following block (until "d") to that name.
  • d: (enddef) Ends the function definition.
  • Any other byte: Check if there is a function by this (one-byte; but see the edit below) name. If so, run this function; if not, push this byte onto the stack for immediate use by D.

Nested ifs are legal; nested function definitions are not. The fact that i (endif) pushes a one if and only if the corresponding if block was not run allows for the following idiom resembling an if/else/endif structure:

# [code that left a one or zero on the stack]
I    # abbreviated "(" below
# [code to be run if it was a one]
0iI  # abbreviated "/" below
# [code to be run if it was a zero]
1iIi # abbreviated ")" below
# [continuing...]

Note that comments, linebreaks, spaces etc. are not actually allowed.

Here are some examples of common functions. These make use of the abbreviations ( / ) mentioned above.

<D(/)d

defines the function < (pop) that pops a value off the stack without using it for anything.

&D((1/0)/<0)d

defines the function & (and) that pops two values of the stack and pushes a one if both values are ones, pushes a zero otherwise.

~D((11/10)/(01/00))d

defines the function ~ (swap) that swaps the top two values on the stack.

RD(R/<)d

defines the function R (remove) that recursively removes all ones from the top of the stack, and then removes two more values (the top one of which should be zero).

The following interpreter function expects the program p, which is a string (or any other iterable of bytes), and the input stack S, which is a (possibly empty) list of bytes. After the interpreter has run, this list contains the output stack.

def i(p,S,F=0):
 A=S.append
 F=F or{}
 C=0
 k=[]
 for c in p:
  h=0in k
  if c=="d":C=0
  elif C!=0:C+=[c]
  elif c=="I":k+=[int(h or S.pop())]
  elif c=="i":k.pop()or A("1")
  elif 1-h:
   if c=="D":C=F[S.pop()]=[]
   else:i(F.get(c)or A(c)or"",S,F)

Obviously, there was no room for error checking, so i() expects legal Kaputt code. Test cases:

script = "<D(/)d"                 # < = pop
script += "&D((1/0)/<0)d"         # & = and
script += "~D((11/10)/(01/00))d"  # ~ = swap
script += "RD(R/<)d"              # R = remove
script += "|D(<1/(1/0))d"         # | = or
script=script.replace("(","I")   
script=script.replace("/","0iI")
script=script.replace(")","1iIi") # ( and / and ) are no real commands of the language.

S=[]
i(script+"1111010111111111R",S)
assert S == ["1","1","1","1","0"]

S=[]
i(script+"00&",S)
assert S == ["0"]

S=[]
i(script+"01~",S)
assert S == ["1","0"]

S=[]
i(script+"00|",S)
assert S == ["0"]

S=[]
i(script+"01|",S)
assert S == ["1"]

Happy coding :-)

Edit: There is nothing inherent in the interpreter that forces tokens to be one byte only. Requiring this was more for consistency with the built-in commands (which are one-byte because that makes the interpreter shorter). If you pass a list of strings instead of a string, you can write more readable Kaputt code like this:

script = """
inc D
    (
        (
            0 0
        /
            1 0
        )
    /
        1
    )
d
""".replace("(","I").replace("/","0 i I").replace(")","1 i I i").split()

This defines a function inc that increments the two-bit number on top of the stack (LSB on top).

Test:

for n in xrange(4):
    S=[]
    i(script + [str((n & 2)/2), str(n & 1), "inc"], S)
    assert S == [str(((n+1) & 2)/2), str((n+1) & 1)]

Let's call multi-byte function names a language extension :-)