How to use yylval with strings in yacc

neuromancer picture neuromancer · Dec 5, 2009 · Viewed 26.8k times · Source

I want to pass the actual string of a token. If I have a token called ID, then I want my yacc file to actually know what ID is called. I thing I have to pass a string using yylval to the yacc file from the flex file. How do I do that?

Answer

JonN picture JonN · Sep 23, 2012

The key to returning a string or any complex type via yylval is the YYSTYPE union created by yacc in the y.tab.h file. The YYSTYPE is a union with a member for each type of token defined within the yacc source file. For example to return the string associated with a SYMBOL token in the yacc source file you declare this YYSTYPE union using %union in the yacc source file:

/*** Yacc's YYSTYPE Union ***/

/* The yacc parser maintains a stack (array) of token values while
   it is parsing.  This union defines all the possible values tokens
   may have.  Yacc creates a typedef of YYSTYPE for this union. All
   token types (see %type declarations below) are taken from
   the field names of this union.  The global variable yylval which lex
   uses to return token values is declared as a YYSTYPE union.
 */

    %union {
        long int4;              /* Constant integer value */
        float fp;               /* Constant floating point value */
        char *str;              /* Ptr to constant string (strings are malloc'd) */
        exprT expr;             /* Expression -  constant or address */
        operatorT *operatorP;   /* Pointer to run-time expression operator */
    };

%type <str> SYMBOL

Then in the LEX source file there is a pattern that matches the SYMBOL token. It is the responsibility of code associated with that rule to return the actual string that represents the SYMBOL. You can't just pass a pointer to the yytext buffer because it is a static buffer that is reused for each token that is matched. To return the matched text the static yytext buffer must be replicated on the heap with _strdup() and a pointer to this string passed via yyval.str. It is then the yacc rule that matches the SYMBOL token's responsibility to free the heap allocated string when it is done with it.

[A-Za-z_][A-Za-z0-9_]*  {{
    int i;

    /*
    * condition letter followed by zero or more letters
    * digits or underscores
    *      Convert matched text to uppercase
    *      Search keyword table
    *      if found
    *          return <keyword>
    *      endif
    * 
    *      set lexical value string to matched text
    *      return <SYMBOL>
    */

    /*** KEYWORDS and SYMBOLS ***/
    /* Here we match a keywords or SYMBOL as a letter
    * followed by zero or more letters, digits or 
    * underscores.
    */

    /* Convert the matched input text to uppercase */
    _strupr(yytext);         /* Convert to uppercase */

    /* First we search the keyword table */
    for (i = 0; i<NITEMS(keytable); i++) {
        if (strcmp(keytable[i].name, yytext)==0)
            return (keytable[i].token);
    }

    /* Return a SYMBOL since we did not match a keyword */
    yylval.str=_strdup(yytext);
    return (SYMBOL);
}}