So we had this assignment for our Compilers Construction course to write a program to generated beautified HTML file for a given C program. The stupid approach is to write a lexical analyzer from scratch with might take ages. The intelligent (and accepted) approach is to use a lexical analyzer generator and this is where Flex comes into picture.

Flex is a FOSS alternative to the POSIX lexical analyzer generator Lex and generates a lexical analyzer program from the vocabulary you specify for the language you’re writing a compiler for.

There are two parts to Flex, the definitions or labels and the translations or rules. The rules section uses labels to simplify the rules. Both use regex heavily (duh!).

A simple label looks like this. This puts the label oreo on all occurrences of the regex (cat) in the input.

oreo        (cat)

A simple rules for the same looks like this. This replaces all occurrences of the label oreo in the input to Oreo, the cat in the output. The rules section is enclosed in two %%.

{oreo}      { printf ("Oreo, the cat"); }

It should also have a main function which calls the yylex() function. A very simple flex file might look like this:

/* Filename: simple.lex */
%option noyywrap

%{
/* I guess we do have to put labels on everything */
oreo        (cat)
adjective   (good|bad)
%}

%%
%{
/* You might not like them, but you need to have rules */
{oreo}      { printf ("Oreo, the cat"); }
{adjective} { printf ("lazy"); }
%}
%%

int main (int argc, char *argv[]) {
    yylex();
    return 0;
}

Compile it by:

flex simple.lex     # this will create lex.yy.c
gcc lex.yy.c        # this will create a.out
./a.out

On executing the a.out file and entering the input, you’ll find the following output.

input: this is cat
output: this is Oreo, the cat

input: cat is good.
output: Oreo, the cat is lazy.

input: nothing to do
output: nothing to do

You can find the code for the assignment here.