(These notes were put together following a number of discussions during C tutorials and a tutorial for the second year compilers course).


What language / platform should I use?
I would suggest writing in C because the C tools are much older, better support and more mature than Java. I would suggest developing on a UNIX style platform as there are large numbers of very high quality pieces of C development software that are distributed with most UNIX / Linux distributions.

Programming in C

What's this debugger you keep on ranting about?
ddd - get it from comes as standard with many Linux distributions. It is worth spending some time learning how to use a debugger - in the long run it will save you time and much frustration.
What other debugging tools are avaliable?
The BUCS servers have both gdb and xxgdb - it is well worth learning how to use them; it will save you a lot of time (hint: you need to use the -g flag when you compile if you want to debug the code).
What other tools would you recommend?
Obviously - gcc! If you have time nm and ldd are both interesting. Not installed on the BUCS systems but avaliable on many Linux distros cdecl is a very useful tool for translating C definitions into English. Over the past few years valgrind has also saved me a lot of time and effort.
Are there any good C reference online?
To start with the UNIX manual contains pages for most of the basic C functions. I've also found this which looks very good. It covers the concepts and key parts of the langauge as well as the functions and has a full set of examples.
What's with the warnings and things on gcc?
Use the flags -Wall to turn on all warnings on gcc and -g to include the hooks for debugging tools.


What are they?
Programs that generate source code (in C) for a lexer (also called a tokeniser)
What is the difference between LEX and FLEX?
FLEX is a open source, GPL version of LEX
Where can I get them?
LEX and FLEX are both installed on the BUCS servers. FLEX can be downloaded from
Where can I get documentation for this?
Either see the man page (type 'man flex' or 'man lex' at your command prompt) or see the online manual
Are there any introductions to LEX
There is a HOWTO from TLDP you can find it at

Symbol Tables

Symbol tables are a tool that is used to simplify storing data in a compiler. They hold strings and return an integer that represents them. This can then be used to get the original string back - think of it as a key or index.
Tokens and nodes on the parse tree have a type (such as an if statement, a + operator or a variable) and a value (such as "x", "my_variable" or 6). By using a symbol table you can just use an interger to represent the value of a token rather than having to worry about passing a string around. It also makes comparisons more simple.
There are lots of ways of implementing a symbol table - arrays, hash tables and linked lists are probabily the most common. You'll also probabily need to be able to use structs in C. There are two basic functions that a symbol table will need to be able to handle.
int add_symbol (char * string);

Adds the given string to the table if it's not their already and returns an integer to represent it. If it's already in the table it returns the key that corresponds to it.

char * lookup_symbol (int key)

Returns the string that the key corresponds to.

Data Structures & Algorithms

Most compilers use tree structures to represent the program they are working on. Arrays and hash tables are traditionally used to implement symbol tables. Lots of books on data structures contain examples in C. Simon recommended 'Mastering Algorithms in C' by O'Reilly. Details of the book and a sample chapter covering implementing symbol tables can be found at See section 518.42 in the library.

Text Books

There are lots of very good text books on the subject of compiler design. Most contain examples in C, descriptions of what, how and why various sections of the compiler do what they do. Some also have how to use LEX (including example code). Most of them are very readable and do not assume knowledge of any particular language. See section 518.42 in the library.

Anything else? Mail me and I may be able to help or hunt for yourself