Clean, self-contained VM implemented in C and under 100-200K compiled code size?

soze picture soze · Mar 12, 2011 · Viewed 8.4k times · Source

I'm looking for a VM with the following features:

  • Small compiled code footprint (under 200K).
  • No external dependencies.
  • Unicode (or raw) string support.
  • Clean code/well organized.
  • C(99) code, NOT C++.
  • C/Java-like syntax.
  • Operators/bitwise: AND/OR, etc.
  • Threading support.
  • Generic/portable bytecode. Bytecode should work on different machines even if it was compiled on a different architecture with different endianness etc.
  • Barebones, nothing fancy necessary. Only the basic language support.
  • Lexer/parser and compiler separate from VM. I will be embedding the VM in a program and then compile the bytecode independently.

So far I have reviewed Lua, Squirrel, Neko, Pawn, Io, AngelScript... and the only one which comes somewhat close to the spec is Lua, but the syntax is horrible, it does not have bitwise support, and the code style generally sucks. Squirrel and IO are huge, mostly. Pawn is problematic, it is small, but bytecode is not cross platform and the implementation has some serious issues (ex bytecode is not validated at all, not even the headers AFAIK).

I would love to find a suitable option out there.

Thanks!

Update: Javascript interpreters are... interpreters. This is a VM question for a bytecode-based VM, hence the compiler/bytecode vm separation requirement. JS is interpreted, and very seldom compiled by JIT. I don't want JIT necessarily. Also, all current ECMAScript parsers are all but small.

Answer

ephemient picture ephemient · Mar 12, 2011

You say you've reviewed NekoVM, but don't mention why it's not suitable for you.

It's written in C, not C++, the VM is under 10kLOC with a compiled size of roughly 100kB, and the compiler is a separate executable producing portable bytecode. The language itself has C-like syntax, bitwise operators, and it's not thread-hostile.