Codegenerierung für Datenverarbeitung

Contents

Code generation is a key technique for efficient program execution and data processing. This lecture will cover the following topics from a practical perspective with accompanying hands-on exercises:

  • Execution models of programs (interpretation, bytecode, machine code generation, etc.)
  • Program representations (source code, intermediate representations (IRs), different forms of bytecode)
  • Classical techniques of code generation
    • SSA and optimization techniques, exemplary described on LLVM-IR
    • Machine code generation: instruction selection and register allocation
  • Execution of programs in virtual machines (e.g., WebAssembly, BPF, JavaScript)
    • Sandboxing and optimizations for JIT compilation
  • Execution of database queries (e.g., SQL, data frame API)
    • Execution models and code representations
  • Execution of machine code/binary translation (e.g., RISC-V)
    • Specifics when translating machine code

Organization

  • Lecture with integrated exercises: Thu 10–14 (c.t., with break) in 02.11.018
  • Exercises will include hands-on programming tasks
    • (Basic) C++ knowledge is required
  • Language: English
  • Module: CIT3230001, 6 ECTS, Bachelor/Master elective
    • Area "Databases and Information Systems" for Informatics, Wirtschaftsinformatik/Information Systems, Informatics: Games Engineering, Biomedical Engineering
    • B1.2 "Advanced Topics in Data Engineering" for Data Engineering and Analystics
  • Written exam (90 minutes), might change to oral on low registration count. Exam from winter 2022/23: [exam2223.pdf]. No retake.
  • Zulip stream for this lecture; private contact via e-mail
  • Note: This might be the last time I'll offer this course at TUM.

Prerequisites

The course is aimed at bachelor/master students who have taken the following (or similar) courses:

  • IN0004 Introduction to Computer Architecture
  • IN0008 Fundamentals of Databases

Material

Material and lecture schedule will follow at the beginning of the semester. See last year's web page for an overview on topics, homework, and material, but note that this year's content is likely to differ.

Material and exercises will be regularly provided throughout the semester.

Script (updated regularly with new lecture content, includes content of slides and additional information and comments): (appears before first lecture)

Note: The schedule below is preliminary and is likely to change during the semester.

DateLecture TopicHomework
16.10. Overview, Motivation, Interpretation Techniques
23.10. Compiler Front-end
30.10. IR Concepts, Control Flow Graph, SSA Construction
06.11. LLVM-IR
Topics for later dates are not yet finalized.

Homework

The successful completion of n-2 homeworks is required for a 0.3 grade bonus (applied to grades 1.3–4.0). Submission instructions and deadlines are noted in the homework assignments. Please carefully observe the following guidelines:

  • Include answers to theory questions as comments at the top of the source file.
  • Make sure your submission passes as many tests as possible from the test script (most homework assignments are Markdown/Bash polyglotsthat compile and test your solution when executed).
  • The preferred language is for submissions is C++. Other languages are permitted in principle (contact the lecturer first, however), but interacting with LLVM in later homeworks in inpracticable in other languages.
  • No external dependencies unless explicitly permitted, build systems other than Makefile, etc.
  • Write your solution in a single C++ file if at all possible.
    • There's often no point in using multiple files. Having all content in a single file makes it easier to grade. The homeworks don't require a large amount of code (<10kLOC in total).
    • If you really need more than just a single C++ file, combine all files s.t. this command sequence works:
      split-file submission.txt somedir; cd somedir; bash hwX.txt
    • split-file is a LLVM utility tool and is typically included in LLVM distributions (e.g., the llvm or llvm-tools package of common Linux distributions).
    • If you write your own Makefile, respect the flags specified in the environment variables. Never hard-code specific paths to llvm-config.