Compiler and Language Processing Tools Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern Prof. Dr. Arnd Poetzsch-Heffter Compilers 1 Outline 1. Language Processing Tools Application Domains Tasks of Language-Processing Tools Examples 2. Language Processing Terminology and Requirements Compiler Architecture 3. Compiler Construction Prof. Dr. Arnd Poetzsch-Heffter Compilers 2
Language processing tools Language Processing Tools Processing of source texts in (source) languages Analysis of (source) texts Translation to target languages Prof. Dr. Arnd Poetzsch-Heffter Compilers 3 Language Processing Tools Language processing tools (2) Typical source languages Programming languages: C, C++, C#, Java, Scala, Haskell, ML, Smalltalk, Prolog Script languages: JavaScript, bash Languages for configuration management: make, ant Application and tool-specific languages: Excel, JFlex, CUPS Specification languages: Z, CASL, Isabelle/HOL Formatting and data description languages: LaTeX, HTML, XML Design and architecture description languages: UML, SDL, VHDL, Verilog Prof. Dr. Arnd Poetzsch-Heffter Compilers 4
Language processing tools (3) Language Processing Tools Typical target languages Assembly, machine, and bytecode languages Programming language Data and layout description languages Languages for printer control Prof. Dr. Arnd Poetzsch-Heffter Compilers 5 Language Processing Tools Language processing tools (4) Language implementation tasks Tool support for language processing Integration into existing systems Connection to other systems Prof. Dr. Arnd Poetzsch-Heffter Compilers 6
Application domains Application Domains Programming environments Context-sensitive editors, class browers Graphical programming tools Pre-processors Compilers Interpreters Debuggers Run-time environments (loading, linking, execution, memory management) Prof. Dr. Arnd Poetzsch-Heffter Compilers 7 Application domains (2) Application Domains Generation of programs from design documents (UML) Program comprehension, re-engineering Design and implementation of domain-specific languages Robot control Simulation tools Spread sheets, active documents Web technology Analysis of Web sites Active Web sites (with integrated functionality) Abstract platforms, e.g. JVM,.NET Optimization of caching Prof. Dr. Arnd Poetzsch-Heffter Compilers 8
Related fields Application Domains Formal languages, language specification and design Programming and specification languages Programming, software engineering, software generation, software architecture System software, computer architecture Prof. Dr. Arnd Poetzsch-Heffter Compilers 9 Tasks of Language-Processing Tools Tasks of Language-Processing Tools Source Code Source Code Source Code Input Data Analyser Translation Interpreter Analysis Results Target Code Output Data Analysis, translation and interpretation are often combined. Prof. Dr. Arnd Poetzsch-Heffter Compilers 10
Tasks of Language-Processing Tools Tasks of Language-Processing Tools (2) 1. Translation Compiler implements analysis and translation OS and real machine implement interpretation Pros: Most efficient solution One interpreter for different programming languages Prerequisite for other solutions Prof. Dr. Arnd Poetzsch-Heffter Compilers 11 Tasks of Language-Processing Tools Tasks of Language-Processing Tools (3) 2. Direct interpretation Interpreter implements all tasks. Examples: JavaScript, command line languages (bash) Pros: No translation necessary (but analysis at run-time) Prof. Dr. Arnd Poetzsch-Heffter Compilers 12
Tasks of Language-Processing Tools Tasks of Language-Processing Tools (4) 3. Abstract and virtual machines Compiler implements analysis and translation to abstract machine code Abstract machine works as interpreter Examples: Java/JVM, C#,.NET Pros: Platform independent (portability, mobile code) Self-modifing programs possible 4. Other combinations Prof. Dr. Arnd Poetzsch-Heffter Compilers 13 Example: Analysis Beispiel: (Analyse) Examples package b1_1 ; class Weltklasse extends Superklasse implement BesteBohnen {Qualifikation studieren ( Arbeit schweiss) { return new Qualifikation ();}} Superklasse.class BesteBohnen.class Qualifikation.class Arbeit.class javac-analysator b1_1/weltklasse.java:4: '{' expected. extends Superklasse ^ 1 error Prof. Dr. Arnd Poetzsch-Heffter Compilers 14 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 8
Example: Translation Examples Beispiel 1: (Übersetzung) package b1_1; class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss ) { return new Qualifikation(); }} Superklasse.class BesteBohnen.class Qualifikation.class Arbeit.class javac Compiled from Weltklasse.java class b1_1/weltklasse extends implements { b1_1/weltklasse(); b1_1.qualifikation studieren(); } Method b1_1/weltklasse() Method b1_1.qualifikation studieren() 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 9 Prof. Dr. Arnd Poetzsch-Heffter Compilers 15 Example: Translation (2) Result of translation 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 10 Examples Compiled from Weltklasse.java class b1_1/weltklasse extends b1_1.superklasse implements b1_1.bestebohnen { b1_1/weltklasse(); b1_1.qualifikation studieren(b1_1.arbeit); } Method b1_1/weltklasse() 0 aload_0 1 invokespecial #6 <Method b1_1.superklasse()> 4 return Method b1_1.qualifikation studieren(b1_1.arbeit) 0 new #2 <Class b1_1.qualifikation> 3 dup 4 invokespecial #5 <Method b1_1.qualifikation()> 7 areturn Prof. Dr. Arnd Poetzsch-Heffter Compilers 16 Beispiel 1: (Fortsetzung)
Example 2: Translation Examples Beispiel 2: (Übersetzung) int main() { printf("willkommen zur Vorlesung!"); return 0; } gcc.file "hello_world.c".version "01.01" gcc2_compiled.:.section.rodata.lc0:.string "Willkommen zur Vorlesung!".text.align 16.globl main.type main,@function main: pushl %ebp movl %esp,%ebp subl $8,%esp 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 11 Prof. Dr. Arnd Poetzsch-Heffter Compilers 17 Example 2: Translation (2) Result of translation Beispiel 2: (Fortsetzung) Examples.file "hello_world.c".version "01.01" gcc2_compiled.:.section.rodata.lc0:.string "Willkommen zur Vorlesung!".text.align 16.globl main.type main,@function main: pushl %ebp movl %esp,%ebp subl $8,%esp addl $-12,%esp pushl $.LC0 call printf addl $16,%esp xorl %eax,%eax jmp.l2.p2align 4,,7.L2: movl %ebp,%esp popl %ebp ret.lfe1:.size main,.lfe1-main.ident "GCC: (GNU) 2.95.2 19991024 (release)" 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 12 Prof. Dr. Arnd Poetzsch-Heffter Compilers 18
Example 3: Translation Beispiel 3: Examples (Übersetzung) groovy.tex (104 bytes) \documentclass{article} \begin{document} \vspace*{7cm} \centerline{\huge\bf It s groovy} \end{document} latex groovy.dvi (207 bytes, binary) dvips groovy.ps (7136 bytes) %!PS-Adobe-2.0 %%Creator: dvips(k) 5.86 %%Title: groovy.dvi Prof. Dr. Arnd Poetzsch-Heffter Compilers 19 Example: Interpretation 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 13 Examples Beispiel: (Interpretation).class File.class-Datei 14 iload_1 15 iload_2 16 idiv 17 istore_3 Eingabedaten Input Data 14 iload_1 15 iload_2 16 idiv 17 istore_3 Java Virtual Machine (JVM) Output Ausgabedaten Data Prof. Dr. Arnd Poetzsch-Heffter 17.04.2007 A. Poetzsch-Heffter, CompilersTU Kaiserslautern 14 20
Examples Example: Combined technique Beispiel: (Kombinierte Technik) Java implementation with just-in-time (JIT) compiler Kombinierte Implementierungstechnik: Java-Implementierung mit JIT-Übersetzer Java-Überset- Source zungseinheit Code Unit (JIT=Just in time) javac Analysator Analyzer Translator Übersetzer.class-Datei file Eingabedaten Input Data Java Byte Code JIT-Übersetzer Translator JVM Maschinencode Machine Code reale Real Machine Maschine/Hardware / Ausgabedaten Output Data 17.04.2007 A. Poetzsch-Heffter, TU Kaiserslautern 15 Prof. Dr. Arnd Poetzsch-Heffter Compilers 21 Language Processing Terminology and Requirements Language processing: The task of translation Source Code Translator Error Message or Target Code Translator (in a broader sense): Analysis, optimization and translation Source code: Input (string) for translator in syntax of source language (SL) Target Code: Output (string) of translator in syntax of target language (TL) Prof. Dr. Arnd Poetzsch-Heffter Compilers 22
Language Processing Phases of language processing Terminology and Requirements Analysis of input: Program text Specification Diagrams Dependant on target of implementation Transformation (XSLT, refactoring) Pretty printing, formatting Semantic analysis (program comprehension) Optimization (Actual) translation Prof. Dr. Arnd Poetzsch-Heffter Compilers 23 Language Processing Compile time vs. run-time Terminology and Requirements Compile time: during run-time of compiler/translator Static: All information/aspects known at compile time, e.g.: Type checks Evaluation of constant expressions Relative addresses Run-time: during run-time of compiled program Dynamic: All information that are not statically known, e.g.: Allocation of dynamic arrays Bounds check of arrays Dynamic binding of methods Memory management of recursive procedures For dynamic aspects that cannot be handled at compile time, the compiler generates code that handles these aspects at run-time. Prof. Dr. Arnd Poetzsch-Heffter Compilers 24
Language Processing Terminology and Requirements What is a good compiler? Prof. Dr. Arnd Poetzsch-Heffter Compilers 25 Language Processing Terminology and Requirements Requirements for translators Error handling (static/dynamic) Efficient target code Choice: Fast translation with slow code vs. slow translation with fast code Semantically correct translation Prof. Dr. Arnd Poetzsch-Heffter Compilers 26
Language Processing Semantically correct translation Terminology and Requirements Intuitive definition: Compiled program behaves according to language definition of source language. Formal definition: semsl: SL_Program SL_Data SL_Data semtl: TL_Program TL_Data TL_Data compile: SL_Program TL_Program code: SL_Data TL_Data decode: TL_Data SL_Data Semantic correctness: semsl(p,d) = decode(semtl(compile(p), code(d))) Prof. Dr. Arnd Poetzsch-Heffter Compilers 27 Language Processing Compiler Architecture Compiler Architecture Source Code as String Analysis Token Stream Scanner Parser Decorated Syntax Tree (Close to SL) Translator Intermediate Language Synthesis Attribution & Optimization Attribution & Optimization Syntax Tree Name and Type Analysis Code Generator Target Code as String Peep Hole Optimization Prof. Dr. Arnd Poetzsch-Heffter Compilers 28
Language Processing Compiler Architecture Properties of compiler architectures Phases are conceptual units of translation Phases can be interleaved Design of phases depends on source language, target language and design decisions Phase vs. pass (phase can comprise more than one pass.) Separate translation of pogram parts (Interface information must be accessible.) Combination with other architecture decisions: Common intermediate language Prof. Dr. Arnd Poetzsch-Heffter Compilers 29 Language Processing Compiler Architecture Common intermediate language Source Language 1 Source Language 2 Source Language n Intermediate Language Target Language 1 Target Language 2 Target Language m Prof. Dr. Arnd Poetzsch-Heffter Compilers 30
Language Processing Compiler Architecture Dimensions of compiler construction Programming languages Sequential procedural, imperative, OO-languages Functional, logical languages Parallel languages/language constructs Target languages/machines Code for abstract machines Assembler Machine languages (CISC, RISC, ) Multi-processor/multi-core architectures Memory hierarchy Translation tasks: analysis, optimization, synthesis Construction techniques and tools: bootstrapping, generators Portability, specification, correctness Prof. Dr. Arnd Poetzsch-Heffter Compilers 31 Compiler Construction Compiler construction techniques 1. Stepwise construction Construction with compiler for different language Construction with compiler for different machine Bootstrapping 2. Compiler-compiler: Tools for compiler generation Scanner generators (regular expressions) Parser generators (context-free grammars) Attribute evaluation generators (attribute grammar) Code generator generators (machine specification) Interpreter generators (semantics of language) Other phase-specific tools 3. Special programming techniques General technique: syntax-driven Special technique: recursive descend Prof. Dr. Arnd Poetzsch-Heffter Compilers 32
Compiler Construction Stepwise construction Programming typically depends on an existing compiler for the implementation language. For compiler construction, this does not hold in general. Source, target, and implementation languages of compilers can be denoted in T-diagrams. SL TL CL T-diagram denotes compiler from source language SL to target language TL (SL TL compiler) written in language CL. Prof. Dr. Arnd Poetzsch-Heffter Compilers 33 Compiler Construction Construction with compiler for different language Given: C ML (machine language) compiler in ML Construct: SL ML compiler in ML Solution: Develop SL ML compiler in C, translate that compiler from C ML by using the existing C ML compiler to be developed by translation SL ML SL ML C C ML ML ML existing Prof. Dr. Arnd Poetzsch-Heffter Compilers 34
Compiler Construction Construction with compiler for different machine Construct: C ML 1 compiler in ML 1 Given 1. C ML 1 compiler in C 2. C ML 2 compiler in ML 2 Method: construct cross compiler First step given C ML 1 cross compiler C ML 1 C C ML 2 ML 2 ML 2 given Prof. Dr. Arnd Poetzsch-Heffter Compilers 35 Compiler Construction Construction with compiler for different machine (2) Second step given C ML 1 resulting compiler C ML 1 C C ML 1 ML 1 ML 2 cross compiler Prof. Dr. Arnd Poetzsch-Heffter Compilers 36
Compiler Construction Bootstrapping Construct: SL ML compiler in ML Suppose: yet no compiler exists Method: 1. Construct partial language SL i of SL such that SL 0 SL 1 SL 2... SL 2. Implement SL 0 compiler for ML in ML 3. Implement SL i+1 compiler for ML in SL i 4. Create SL i+1 compiler for ML in ML Prof. Dr. Arnd Poetzsch-Heffter Compilers 37 Bootstrapping (2) Compiler Construction by extension SL ML SL ML SL 2 ML SL 2 SL 2 ML ML SL 1 ML SL 1 SL 1 ML ML SL 0 SL 0 ML ML ML by translation manually Prof. Dr. Arnd Poetzsch-Heffter Compilers 38
Compiler Construction Recommended reading Wilhelm, Maurer: Chap. 1, (pp. 1 5) Chap. 6, Structure of Compilers (pp. 225 238) Appel Chap. 1, (pp. 3 14) Prof. Dr. Arnd Poetzsch-Heffter Compilers 39