CMPS-104A: Fundamentals of Compiler Design I. Catalog copy CMPS-104A. Fundamentals of Compiler Design I. F,W An introduction to the basic techniques used in compiler design. Topics include compiler structure, symbol tables, regular expressions and languages, finite automata, lexical analysis, context-free languages, LL(1), recursive descent, LALR(1), and LR(1) parsing; and attribute grammars as a model of syntax-directed translation. Students use compiler building tools to construct a working compiler. Prerequisite(s): course 101 and Computer Engineering 12C and 12L. D. Long, W. Mackey Explanation of prerequisites CMPS-101: Students must understand basic programming concepts including linked lists, pointers, stacks, and trees and how to implement them with appropriate algorithmic efficiency. CMPE-012C: Assembly language and low-level machine concepts are essential to code generation. Required skills to pass the course. 1. Understanding and mastery of regular expressions and finite automata. a. regular expressions and regular grammars b. deterministic and nondeterministic finite automata c. construction of finite automata from regular expressions d. ability to use scanner-generation tools 2. Understanding and mastery of context free grammars and parsers. a. context free grammars and deterministic pushdown automata b. ability to use parser-generation tools c. top-down recursive descent parsing d. bottom-up shift-reduce parsing e. LR(k), LALR(1), SLR(1) grammars, and construction of DPDAs f. acquaintance with LL(k) grammars and parsers 3. Understanding symbol tables and code generation Core topics (must be taught) 1. General overview of a compiler's structure a. command line access and file structures b. management of the string table c. top down recursive descent parsing d. hand coding a simple scanner 2. Use of a scanner-generation tool a. reserved word tables b. tokens with and without lexical information c. notation for regular expressions d. semantic actions and token structures 3. Regular grammars and finite automata a. regular expression notation b. conversion of a regular expression to a nondeterministic finite automata (NFA) c. conversion of NFA to deterministic finite automata (DFA) d. minimization of finite automata 4. Use of a parser-generation tool a. context free and LALR(1) or LL(1) parsers b. parsing actions: shift, reduce, accept, error c. abstract syntax trees vs parse trees d. semantic actions to construct abstract syntax trees 5. Context free grammars and deterministic pushdown automata a. grammars: G = b. LR(0) and SLR(1) grammars and construction of parsers 6. Symbol table management and block structure 7. Intermediate code generation a. functions and programs as a whole b. statements and constrol structures c. expressions, operators, and function calls Optional topics 1. Advanced context free grammars a. LALR(1) and LR(1) grammars and conversion to DPDAs b. LL(k) grammars and construction of predictive parsers 2. Code optimization a. basic blocks and local optimization b. global and interprocedural optimization 3. Other topics borrowed from CMPS-104B if time permits 4. flex or lex as the scanner-generation tool 5. bison or yacc as the parser-generation tool Core lab exercises The lab exercises are several pieces of one project: the implementation of a compiler for a small language. 1. Command line access and file control. String table as a hash table. 2. Scanner, written using the flex (lex) scanner generator. 3. Parser, written using the bison (yacc) parser generator. Optional lab exercises These are `optional' only in that students may pass the course without these topics, but only with a `C'. For an `A', the `optional' projects must also be completed. 1. Construction of a symbol table manager for the language. 2. Generation of intermediate code Comments on related concurrent courses CMPS-111, Operating Systems. Both compilers and operating systems are `systems' courses and enhance each other by providing a view into the operation of computers not seen by the casual user. Comments on follow-on courses CMPS-104B, Compiler Design II, covers the back end of a compiler, internal runtime structures, and code optimization. CMPS-112, Comparative Programming Languages, can be taught at a higher level, given that students in that course are familiar with the construction of a compiler before they are taught about the design of programming languages. Text Aho, Sethi, Ullman, `Compilers, Principles, Techniques, and Tools', Addison-Wesley, 1986, aka the `Dragon' book. GNU project, `flex' and `bison' reference manuals, or online documentation of other scanner- and parser- generation tools. Possible Alternative Texts Fischer, LeBlanc, `Crafting a Compiler with C', Benjamin Cummings. Prepared by Wesley Mackey, 5/03