Is there an alternative for flex/bison that is usable on 8-bit embedded systems?

Johan picture Johan · Feb 11, 2010 · Viewed 65.7k times · Source

I'm writing a small interpreter for a simple BASIC like language as an exercise on an AVR microcontroller in C using the avr-gcc toolchain. However, I'm wondering if there are any open source tools out there that could help me writing the lexer and parser.

If I would write this to run on my Linux box, I could use flex/bison. Now that I restricted myself to an 8-bit platform I have to do it all by hand, or not?

Answer

Ira Baxter picture Ira Baxter · Feb 25, 2010

If you want an easy way to code parsers, or you are tight on space, you should hand-code a recursive descent parser; these are essentially LL(1) parsers. This is especially effective for languages which are as "simple" as Basic. (I did several of these back in the 70s!). The good news is these don't contain any library code; just what you write.

They are pretty easy to code, if you already have a grammar. First, you have to get rid of left recursive rules (e.g., X = X Y ). This is generally pretty easy to do, so I leave it as an exercise. (You don't have to do this for list-forming rules; see discussion below).

Then if you have BNF rule of the form:

 X = A B C ;

create a subroutine for each item in the rule (X, A, B, C) that returns a boolean saying "I saw the corresponding syntax construct". For X, code:

subroutine X()
     if ~(A()) return false;
     if ~(B()) { error(); return false; }
     if ~(C()) { error(); return false; }
     // insert semantic action here: generate code, do the work, ....
     return true;
end X;

Similarly for A, B, C.

If a token is a terminal, write code that checks the input stream for the string of characters that makes up the terminal. E.g, for a Number, check that input stream contains digits and advance the input stream cursor past the digits. This is especially easy if you are parsing out of a buffer (for BASIC, you tend to get one line at time) by simply advancing or not advancing a buffer scan pointer. This code is essentially the lexer part of the parser.

If your BNF rule is recursive... don't worry. Just code the recursive call. This handles grammar rules like:

T  =  '('  T  ')' ;

This can be coded as:

subroutine T()
     if ~(left_paren()) return false;
     if ~(T()) { error(); return false; }
     if ~(right_paren()) { error(); return false; }
     // insert semantic action here: generate code, do the work, ....
     return true;
end T;

If you have a BNF rule with an alternative:

 P = Q | R ;

then code P with alternative choices:

subroutine P()
    if ~(Q())
        {if ~(R()) return false;
         return true;
        }
    return true;
end P;

Sometimes you'll encounter list forming rules. These tend to be left recursive, and this case is easily handled. The basic idea is to use iteration rather than recursion, and that avoids the infinite recursion you would get doing this the "obvious" way. Example:

L  =  A |  L A ;

You can code this using iteration as:

subroutine L()
    if ~(A()) then return false;
    while (A()) do { /* loop */ }
    return true;
end L;

You can code several hundred grammar rules in a day or two this way. There's more details to fill in, but the basics here should be more than enough.

If you are really tight on space, you can build a virtual machine that implements these ideas. That's what I did back in 70s, when 8K 16 bit words was what you could get.


If you don't want to code this by hand, you can automate it with a metacompiler (Meta II) that produces essentially the same thing. These are mind-blowing technical fun and really takes all the work out of doing this, even for big grammars.

August 2014:

I get a lot of requests for "how to build an AST with a parser". For details on this, which essentially elaborates this answer, see my other SO answer https://stackoverflow.com/a/25106688/120163

July 2015:

There are lots of folks what want to write a simple expression evaluator. You can do this by doing the same kinds of things that the "AST builder" link above suggests; just do arithmetic instead of building tree nodes. Here's an expression evaluator done this way.