CMPS-104B: Fundamentals of Compiler Design II. Catalog copy CMPS-104B. Fundamentals of Compiler Design II. S A detailed study of the structure and design of a compiler. Continues study begun in course 104A. Topics include compiler structure emphasizing the back end, type systems, run-time environments; static, stack and heap storage management, garbage collection; addressing, register allocation, code generation; basic blocks and data-flow analysis; local and global code optimization; interpretation versus compilation. Students generate machine code runnable on a real machine. Prerequisite(s): course 104A. D. Long, W. Mackey Explanation of prerequisites CMPS-104B: The second compiler course studies advanced topics in compiler design that could not be covered in the first course. Required skills to pass the course. 1. ability to write an program that is unusually large for an undergraduate student 2. understanding of the runtime structure of a compiler 3. understanding of peephole, local, and global optimization 4. understanding of register allocation 5. understanding of control flow problems and structures Core topics (must be taught) 1. Runtime storage organization a. text, data, bss, memory map segments b. static data, heap data, local stack frame c. function call linkage, static and dynamic links d. storage allocation, explicit freeing e. garbage collection and automatic storage regeneration f. case study: details of a hardware stack frame and register usage on some specific machine architecture (e.g., Sparc) 2. Local control flow analysis a. basic blocks b. local optimizations performed on basic blocks c. local value numbering d. three address code and directed acyclic graphs e. algebraic transformations, constant folding, strength reduction, local common subexpression elimination 3. Register allocation a. register allocation strategies b. next-use information c. 3AC to DAG to 3AC d. graph coloring and register interference graphs 4. Global control flow analysis a. dominator trees, natural loops, induction variables b. specific global data flow problems, forward and backward flow problems, any-path and all-path problems c. live variables, reaching definitions, available expressions d. a major global data flow example Optional topics 1. Interprocedural optimization 2. Special problems of code generation for different paradigms a. object-oriented languages, dynamic dispatch b. functional languages and closures, escaping variables 3. Interpretive systems a. Java byte codes as a case study b. just in time compilation 4. Linkers and loaders a. object files, executable images, external references b. class files Core lab exercises The lab exercises are in fact several pieces of one project, namely the implementation of the back end of a small compiler. 1. construction of a scanner and parser for intermediate code 2. back-end symbol table and local storage allocator, with simple register allocation 3. local control flow analysis and approximate Sparc code generation Optional lab exercises It should be noted that these are ``optional'' only in that students may pass the course without these topics, but only with a grade near a ``C''. In order to receive a grade of ``A'', both ``optional'' projects must also be completed. 1. local register allocation with detailed next use information or other reasonably good algorithm and improved code generation for output assembly language that is assemblable, linkable and runnable as an executable image 2. local code optimization, constant folding and propagation, elimination of dead variables and dead code, local common subexpression elimination, and use of delay slots in Sparc control transfer instructions. Comments on related concurrent courses CMPS-112, Comparative Programming Languages. CMPS-112 complements CMPS-104B in that some of the special problems of implementing different language paradigms can be discussed. Taken a a whole, the three course discuss the design of a language, the implementation of its front end, and the implementation of its back end. Comments on follow-on courses Not applicable. No UCSC course uses CMPS-104B as a prerequisite. Text Aho, Sethi, Ullman, ``Compilers, Principles, Techniques, and Tools'', Addison-Wesley, 1986, aka the ``Dragon'' book. A couple of chapters discuss code generation. Paul, ``Sparc Architecture and Assembly Language Programming'', Prentice-Hall, 2000. The Sparc architecture is used as a detailed case study throughout the course. Wilson, ``Uniprocessor Garbage Collection Techniques'', available online. Prepared by Wesley Mackey, 9/02